提交 5432363f authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merged

Trunk sin last release Trunk sin last release
------ ------
* Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them. * Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
* fuse GpuElemwise more often(in the case where their is too many inputs that fusing all of them would bust the 256 bytes limits of parameter to gpu function) * Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
* Speed up gemv by a work around scipy gemv slowness when the matrix is in c order(the default) * Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
Theano 0.3 (2010-11-23) Theano 0.3 (2010-11-23)
----------------------- -----------------------
......
...@@ -19,7 +19,8 @@ instructions below for detailed installation steps): ...@@ -19,7 +19,8 @@ instructions below for detailed installation steps):
Linux, Mac OS X or Windows operating system Linux, Mac OS X or Windows operating system
We develop mainly on 64-bit Linux machines. 32-bit architectures are We develop mainly on 64-bit Linux machines. 32-bit architectures are
not well-tested. not well-tested. Note that GPU computing does not work yet under
Windows.
Python_ >= 2.4 Python_ >= 2.4
The development package (``python-dev`` or ``python-devel`` The development package (``python-dev`` or ``python-devel``
...@@ -330,7 +331,7 @@ Mac ...@@ -330,7 +331,7 @@ Mac
.. code-block:: bash .. code-block:: bash
$ sudo port install gcc44 py25-scipy mercurial python_select $ sudo port install gcc44 py26-scipy mercurial python_select
This will install all the required Theano dependencies. Note that This will install all the required Theano dependencies. Note that
compiling gcc takes significant time (hours)! SciPy depends on ATLAS (a compiling gcc takes significant time (hours)! SciPy depends on ATLAS (a
...@@ -344,13 +345,13 @@ Mac ...@@ -344,13 +345,13 @@ Mac
packages are updated quite frequently. packages are updated quite frequently.
- In order to use the MacPorts version of python, you might - In order to use the MacPorts version of python, you might
need to explicitly select it with ``sudo python_select python25``. The need to explicitly select it with ``sudo python_select python26``. The
reason this is necessary is because you might have an Apple-provided python reason this is necessary is because you might have an Apple-provided python
(via, for example, an Xcode installation). After performing this step, you (via, for example, an Xcode installation). After performing this step, you
should check that the symbolic link provided by ``which python`` points to should check that the symbolic link provided by ``which python`` points to
the MacPorts python. For instance, on Snow Leopard with the latest MacPorts, the MacPorts python. For instance, on Snow Leopard with the latest MacPorts,
the output of ``which python`` is ``/opt/local/bin/python`` and this symbolic the output of ``which python`` is ``/opt/local/bin/python`` and this symbolic
link points to ``/opt/local/bin/python2.5``. When executing ``sudo link points to ``/opt/local/bin/python2.6``. When executing ``sudo
python_select python26-apple`` (which you should **not** do), the link python_select python26-apple`` (which you should **not** do), the link
points to ``/usr/bin/python2.6``. points to ``/usr/bin/python2.6``.
...@@ -364,7 +365,7 @@ Mac ...@@ -364,7 +365,7 @@ Mac
- Please follow the same procedure with ``numpy``. - Please follow the same procedure with ``numpy``.
- Put ``export PYTHONPATH=/opt/local/lib/python2.5/site-packages:$PYTHONPATH`` - Put ``export PYTHONPATH=/opt/local/lib/python2.6/site-packages:$PYTHONPATH``
in your ``.bashrc`` in order to include your MacPorts Python packages in your ``.bashrc`` in order to include your MacPorts Python packages
(NumPy, SciPy) in Python's path. (NumPy, SciPy) in Python's path.
...@@ -469,7 +470,7 @@ components as in Python(x,y) that are required by Theano, follow these steps: ...@@ -469,7 +470,7 @@ components as in Python(x,y) that are required by Theano, follow these steps:
sub-directory are in your system path. This may be done by sub-directory are in your system path. This may be done by
modifying the global ``PATH`` Windows environment variables, or by creating modifying the global ``PATH`` Windows environment variables, or by creating
a ``.profile`` file in your MinGW home, containing a line like a ``.profile`` file in your MinGW home, containing a line like
``export PATH=$PATH:/c/Python27:/c/Python27/Scripts`` (note that the latter ``export PATH=$PATH:/c/Python26:/c/Python26/Scripts`` (note that the latter
will work only when you run Theano from a MinGW shell). will work only when you run Theano from a MinGW shell).
- In order to run Theano's test-suite, you will need `nose - In order to run Theano's test-suite, you will need `nose
...@@ -661,10 +662,14 @@ follows: ...@@ -661,10 +662,14 @@ follows:
Using the GPU Using the GPU
~~~~~~~~~~~~~ ~~~~~~~~~~~~~
Please note that these are tentative instructions (we have not yet been able to At this point, GPU computing does not work under Windows. The current main
get the GPU to work under Windows with Theano). issue is that the compilation commands used under Linux / MacOS to create
Please report your own successes / failures on the `theano-users`_ mailing list. and use a CUDA-based shared library with the nvcc compiler do not work with
Windows DLLs. If anyone can figure out the proper compilation steps for
Windows, please let us know on the `theano-dev`_ mailing list.
Instructions below should at least get you started so you can reproduce the
above-mentioned issue.
Those are instructions for the 32-bit version of Python (the one that comes Those are instructions for the 32-bit version of Python (the one that comes
with Python(x,y) is 32-bit). with Python(x,y) is 32-bit).
...@@ -679,44 +684,47 @@ use a compilation directory located somewhere else: ...@@ -679,44 +684,47 @@ use a compilation directory located somewhere else:
[global] [global]
base_compiledir=path_to_a_directory_without_such_characters base_compiledir=path_to_a_directory_without_such_characters
You also need to add in the configuration file those lines (make sure this
is the correct Python installation path):
.. code-block:: cfg
[cuda]
nvccflags=-LC:\Python27\libs
Then Then
1) Install CUDA driver (32-bit on 32-bit Windows, idem for 64-bit). 1) Install CUDA driver (32-bit on 32-bit Windows, idem for 64-bit).
2) Install CUDA toolkit 32-bit (even if you computer is 64-bit, 2) Install CUDA toolkit 32-bit (even if you computer is 64-bit,
must match the Python installation version) must match the Python installation version).
3) Install CUDA SDK 32-bit 3) Install CUDA SDK 32-bit.
4) Test some pre-compiled example of the sdk 4) Test some pre-compiled example of the sdk.
5) Download Visual Studio 2008 Express(free, VS2010 not supported by nvcc 3.1, 5) Download Visual Studio 2008 Express (free, VS2010 not supported by nvcc 3.1,
VS2005, not available for download, but supported by nvcc, the non free version should work too) VS2005 is not available for download but supported by nvcc, the non free version should work too).
6) Follow the instruction in the GettingStartedWindows.pdf file from CUDA web 6) Follow the instruction in the GettingStartedWindows.pdf file from the CUDA web
site to compile CUDA code with VS2008. If that don't work, you won't be site to compile CUDA code with VS2008. If that does not work, you will
able to compile GPU code with Theano. not be able to compile GPU code with Theano.
7) Put into you PATH environment variable the directory where cl.exe is. 7) Edit your Theano configuration file to add lines like the following
In my case it is: ``C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin`` (make sure these paths match your own specific installation):
8) Make sure the Theano folder is in your ``PYTHONPATH`` environment variable. .. code-block:: cfg
9) Then in Python do: ``import theano.sandbox.cuda`` [cuda]
nvccflags=-LC:\Python26\libs
[nvcc]
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin
8) In Python do: ``import theano.sandbox.cuda``. This will compile the
first CUDA file, and no error should occur.
9) Then run the Theano CUDA test files with nosetests from the
``theano/sandbox/cuda/tests`` subdirectory. In the current version of
Theano, this should fail with an error like:
.. code-block:: bash
That will print some error if their is an error to compile the first CUDA file. NVCC: nvcc fatal: Don't know what to do with
'C:/CUDA/compile/tmpmkgqx6/../cuda_ndarray/cuda_ndarray.pyd'
10) Then run the Theano CUDA test file. In Windows command line (cmd.exe),
run the program nosetests inside the Theano repository.
nosetests is installed by Python(x,y).
Generating the documentation Generating the documentation
---------------------------- ----------------------------
...@@ -739,3 +747,4 @@ The PDF of the documentation is ``html/theano.pdf``. ...@@ -739,3 +747,4 @@ The PDF of the documentation is ``html/theano.pdf``.
.. _theano-users: http://groups.google.com/group/theano-users?pli=1 .. _theano-users: http://groups.google.com/group/theano-users?pli=1
.. _theano-dev: http://groups.google.com/group/theano-dev?pli=1
...@@ -11,6 +11,7 @@ If you're feeling ambitious, go fix some `pylint ...@@ -11,6 +11,7 @@ If you're feeling ambitious, go fix some `pylint
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
release
dev_start_guide dev_start_guide
lisa_labo lisa_labo
mammouth mammouth
......
...@@ -20,6 +20,7 @@ Types and Ops that you can use to build and compile expression graphs. ...@@ -20,6 +20,7 @@ Types and Ops that you can use to build and compile expression graphs.
scalar/index scalar/index
gof/index gof/index
scan scan
sandbox/index
There are also some top-level imports that you might find more convenient: There are also some top-level imports that you might find more convenient:
......
.. _libdoc_sandbox_cuda:
===========================================
:mod:`sandbox.cuda` -- The CUDA GPU backend
===========================================
.. module:: sandbox.cuda
:platform: Unix, Windows
:synopsis: Code for GPU programming
.. moduleauthor:: LISA
.. toctree::
:maxdepth: 1
var
type
.. ../../../../theano/sandbox/cuda/type.py
.. ../../../../theano/sandbox/cuda/var.py
.. ../../../../theano/sandbox/cuda/
.. _libdoc_cuda_type:
======================================================================
:mod:`sandbox.cuda.type` -- The Type object for Cuda-allocated arrays
======================================================================
.. module:: sandbox.cuda.type
:platform: Unix, Windows
:synopsis: The Type object for CUDA-allocated arrays
.. moduleauthor:: LISA
API
===
.. ../../../../theano/sandbox/cuda/type.py
.. ../../../../theano/sandbox/cuda/var.py
.. ../../../../theano/sandbox/cuda/
.. _libdoc_cuda_var:
===================================================================
:mod:`sandbox.cuda.var` -- The Variables for Cuda-allocated arrays
===================================================================
.. module:: sandbox.cuda.var
:platform: Unix, Windows
:synopsis: The Variables object for CUDA-allocated arrays
.. moduleauthor:: LISA
API
===
.. autoclass:: theano.sandbox.cuda.var.CudaNdarraySharedVariable
:members: get_value, set_value
.. _libdoc_sandbox:
==============================================================
:mod:`sandbox` -- Experimental Code
==============================================================
.. module:: sandbox
:platform: Unix, Windows
:synopsis: Experimental code
.. moduleauthor:: LISA
.. toctree::
:maxdepth: 1
cuda/index
...@@ -142,7 +142,7 @@ transparent. But when you are using a GPU (or in future perhaps a remote machin ...@@ -142,7 +142,7 @@ transparent. But when you are using a GPU (or in future perhaps a remote machin
is not the internal representation of your data. is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it* If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to then you should use the ``return_internal_type=True`` argument to
``get_value``. It will never copy the internal object (always return in ``get_value``. It will never cast the internal object (always return in
constant time), but might return various datatypes depending on contextual constant time), but might return various datatypes depending on contextual
factors (e.g. the compute device, the dtype of the numpy array). factors (e.g. the compute device, the dtype of the numpy array).
...@@ -154,6 +154,12 @@ It is possible to use ``borrow=False`` in conjunction with ...@@ -154,6 +154,12 @@ It is possible to use ``borrow=False`` in conjunction with
``return_internal_type=True``, which will return a deep copy of the internal object. ``return_internal_type=True``, which will return a deep copy of the internal object.
This is primarily for internal debugging, not for typical use. This is primarily for internal debugging, not for typical use.
For the transparent use of different type of optimization Theano can make,
there is the policy that get_value() always return by default the same object type
it received when the shared variable was created. So if you created manually data on
the gpu and create a shared variable on the gpu with this data, get_value will always
return gpu data event when return_internal_type=False.
*Take home message:* *Take home message:*
It is safe (and sometimes much faster) to use ``get_value(borrow=True)`` when It is safe (and sometimes much faster) to use ``get_value(borrow=True)`` when
...@@ -182,6 +188,30 @@ This pattern works regardless of the compute device, and when the compute device ...@@ -182,6 +188,30 @@ This pattern works regardless of the compute device, and when the compute device
makes it possible to expose Theano's internal variables without a copy, then it makes it possible to expose Theano's internal variables without a copy, then it
goes as fast as an in-place update. goes as fast as an in-place update.
When shared variables are allocated on the GPU, the transfers to and from GPU device memory can
be costly. Here are a few tips to ensure fast and efficient use of GPU memory and bandwidth:
* Prior to Theano 0.3.1, set_value did not work in-place on the GPU. This meant that sometimes,
GPU memory for the new value would be allocated before the old memory was released. If you're
running near the limits of GPU memory, this could cause you to run out of GPU memory
unnecessarily. *Solution*: update to a newer version of Theano.
* If you are going to swap several chunks of data in and out of a shared variable repeatedly,
you will want to reuse the memory that you allocated the first time if possible - it is both
faster and more memory efficient.
*Solution*: upgrade to a recent version of Theano (>0.3.0) and consider padding your source
data to make sure that every chunk is the same size.
* It is also worth mentioning that, current GPU copying routines support only contiguous memory.
So Theano must make the ``value`` you provide ``c_contiguous`` prior to copying it.
This can require an extra copy of the data on the host. *Solution*: make sure that the value
you assign to a CudaNdarraySharedVariable is *already* ``c_contiguous``.
(Further remarks on the current implementation of the GPU version of set_value() can be found
here: :ref:`libdoc_cuda_var`)
Retrieving and assigning via the .value property Retrieving and assigning via the .value property
------------------------------------------------ ------------------------------------------------
......
...@@ -21,7 +21,7 @@ Toolkit installs a folder on your computer with subfolders *bin*, *lib*, ...@@ -21,7 +21,7 @@ Toolkit installs a folder on your computer with subfolders *bin*, *lib*,
*include*, and some more too. (Sanity check: The *bin* subfolder should contain an *nvcc* *include*, and some more too. (Sanity check: The *bin* subfolder should contain an *nvcc*
program which is the compiler for GPU code.) This folder is called the *cuda program which is the compiler for GPU code.) This folder is called the *cuda
root* directory. root* directory.
On Linux or OS-X >= 10.4, you must add the 'lib' subdirectory (and/or 'lib64' subdirectory if you have a 64-bit On Linux or OS X >= 10.4, you must add the 'lib' subdirectory (and/or 'lib64' subdirectory if you have a 64-bit Linux
computer) to your ``$LD_LIBRARY_PATH`` environment variable. computer) to your ``$LD_LIBRARY_PATH`` environment variable.
...@@ -274,3 +274,10 @@ Tips for improving performance on GPU ...@@ -274,3 +274,10 @@ Tips for improving performance on GPU
that can tell you if not enought of your graph is on the gpu or if their that can tell you if not enought of your graph is on the gpu or if their
is too much memory transfert. is too much memory transfert.
Changing the value of shared variables
--------------------------------------
To change the value of a shared variable, e.g. to provide new data to process,
use ``shared_variable.set_value(new_value)``. For a lot more detail about this,
see :ref:`aliasing`.
\ No newline at end of file
...@@ -138,7 +138,9 @@ class SharedVariable(Variable): ...@@ -138,7 +138,9 @@ class SharedVariable(Variable):
def filter_update(self, update): def filter_update(self, update):
"""When this shared variable is updated by a pfunc, the update value will be run through this function. """
When this shared variable is updated by a pfunc, the update value will be run through this function.
This is a good spot to cast or convert the update expression as necessary. This is a good spot to cast or convert the update expression as necessary.
Default behaviour is to return `update` unmodified if it is a Variable, otherwise to create a SharedVariable for it by calling ``shared(update)``. Default behaviour is to return `update` unmodified if it is a Variable, otherwise to create a SharedVariable for it by calling ``shared(update)``.
......
...@@ -7,10 +7,12 @@ import re ...@@ -7,10 +7,12 @@ import re
from theano.configparser import config, AddConfigVar, StrParam from theano.configparser import config, AddConfigVar, StrParam
def default_compiledirname(): def default_compiledirname():
platform_id = platform.platform() + '-' + platform.processor() platform_id = '-'.join([
platform_id += ('-' + platform.python_version()) platform.platform(),
platform.processor(),
platform.python_version()])
platform_id = re.sub("[\(\)\s]+", "_", platform_id) platform_id = re.sub("[\(\)\s]+", "_", platform_id)
return 'compiledir_'+platform_id return 'compiledir_' + platform_id
def is_valid_compiledir(path): def is_valid_compiledir(path):
if not os.access(path, os.R_OK | os.W_OK): if not os.access(path, os.R_OK | os.W_OK):
......
...@@ -163,19 +163,16 @@ class Container(object): ...@@ -163,19 +163,16 @@ class Container(object):
if value is None: if value is None:
self.storage[0] = None self.storage[0] = None
return return
if self.type.__class__.__name__ == "CudaNdarrayType" and isinstance(value,numpy.ndarray):
#The filter method of CudaNdarray alloc a new memory region on the gpu.
#The ref count will be decremented after that.
#That cause 2 region allocated at the same time!
#We decrement the memory reference conter now to try to lower the memory usage.
self.storage[0] = None
kwargs = {} kwargs = {}
if self.strict: if self.strict:
kwargs['strict'] = True kwargs['strict'] = True
if self.allow_downcast is not None: if self.allow_downcast is not None:
kwargs['allow_downcast'] = self.allow_downcast kwargs['allow_downcast'] = self.allow_downcast
self.storage[0] = self.type.filter(value, **kwargs) if hasattr(self.type,'filter_inplace'):
self.storage[0] = self.type.filter_inplace(value, self.storage[0], **kwargs)
else:
self.storage[0] = self.type.filter(value, **kwargs)
except Exception, e: except Exception, e:
e.args = e.args + (('Container name "%s"' % self.name),) e.args = e.args + (('Container name "%s"' % self.name),)
......
...@@ -89,7 +89,7 @@ if __name__ == "__main__": ...@@ -89,7 +89,7 @@ if __name__ == "__main__":
* numpy with ATLAS from distribution(FC9) package (1 thread) * numpy with ATLAS from distribution(FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads * manually compiled numpy and ATLAS with 2 threads
* goto with 1, 2, 4 and 8 threads. * goto with 1, 2, 4 and 8 threads.
Xeon Xeno Xeon Core2 i7 Xeon Xeon Xeon Core2 i7
lib/nb threads E5345 E5430 E5450 E8500 930 lib/nb threads E5345 E5430 E5450 E8500 930
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s
......
...@@ -33,8 +33,8 @@ else: ...@@ -33,8 +33,8 @@ else:
except NotImplementedError: except NotImplementedError:
b_sparse = False b_sparse = False
a_cuda=False a_cuda = False
b_cuda=False b_cuda = False
if a.__class__.__name__ == "CudaNdarray": if a.__class__.__name__ == "CudaNdarray":
a_cuda = True a_cuda = True
if b.__class__.__name__ == "CudaNdarray": if b.__class__.__name__ == "CudaNdarray":
......
...@@ -37,17 +37,18 @@ def test_may_share_memory(): ...@@ -37,17 +37,18 @@ def test_may_share_memory():
#test that it raise error when needed. #test that it raise error when needed.
for a_,b_,rep in [(a,(0,),False),(a,1,False),(a,None,False),]: for a_,b_,rep in [(a,(0,),False),(a,1,False),(a,None,False),]:
if rep == False: assert may_share_memory(a_,b_,False)==rep
try: assert may_share_memory(b_,a_,False)==rep
may_share_memory(a_,b_) try:
raise Exception("An error was expected") may_share_memory(a_,b_)
except TypeError: raise Exception("An error was expected")
pass except TypeError:
try: pass
may_share_memory(b_,a_) try:
raise Exception("An error was expected") may_share_memory(b_,a_)
except TypeError: raise Exception("An error was expected")
pass except TypeError:
pass
if scipy_imported: if scipy_imported:
def test_may_share_memory_scipy(): def test_may_share_memory_scipy():
...@@ -64,14 +65,18 @@ if scipy_imported: ...@@ -64,14 +65,18 @@ if scipy_imported:
assert may_share_memory(a_,b_)==rep assert may_share_memory(a_,b_)==rep
assert may_share_memory(b_,a_)==rep assert may_share_memory(b_,a_)==rep
if rep == False:
try: #test that it raise error when needed.
may_share_memory(a_,b_) for a_,b_,rep in [(a,(0,),False),(a,1,False),(a,None,False)]:
raise Exception("An error was expected") assert may_share_memory(a_,b_,False)==rep
except: assert may_share_memory(b_,a_,False)==rep
pass try:
try: may_share_memory(a_,b_)
may_share_memory(b_,a_) raise Exception("An error was expected")
raise Exception("An error was expected") except TypeError:
except: pass
pass try:
may_share_memory(b_,a_)
raise Exception("An error was expected")
except TypeError:
pass
...@@ -51,12 +51,11 @@ def set_cuda_disabled(): ...@@ -51,12 +51,11 @@ def set_cuda_disabled():
#cuda_ndarray compile and import #cuda_ndarray compile and import
cuda_path = os.path.abspath(os.path.split(__file__)[0]) cuda_path = os.path.abspath(os.path.split(__file__)[0])
date = os.stat(os.path.join(cuda_path,'cuda_ndarray.cu'))[stat.ST_MTIME] cuda_files = ('cuda_ndarray.cu', 'cuda_ndarray.cuh', 'conv_full_kernel.cu', 'conv_kernel.cu')
date = max(date,os.stat(os.path.join(cuda_path,'cuda_ndarray.cuh'))[stat.ST_MTIME]) stat_times = [os.stat(os.path.join(cuda_path, cuda_file))[stat.ST_MTIME] for cuda_file in cuda_files]
date = max(date,os.stat(os.path.join(cuda_path,'conv_full_kernel.cu'))[stat.ST_MTIME]) date = max(stat_times)
date = max(date,os.stat(os.path.join(cuda_path,'conv_kernel.cu'))[stat.ST_MTIME])
cuda_ndarray_loc = os.path.join(config.compiledir, 'cuda_ndarray') cuda_ndarray_loc = os.path.join(config.compiledir,'cuda_ndarray')
cuda_ndarray_so = os.path.join(cuda_ndarray_loc, cuda_ndarray_so = os.path.join(cuda_ndarray_loc,
'cuda_ndarray.' + get_lib_extension()) 'cuda_ndarray.' + get_lib_extension())
compile_cuda_ndarray = True compile_cuda_ndarray = True
...@@ -87,7 +86,7 @@ try: ...@@ -87,7 +86,7 @@ try:
'cuda_ndarray', 'cuda_ndarray',
code, code,
location=cuda_ndarray_loc, location=cuda_ndarray_loc,
include_dirs=[cuda_path], libs=['cublas']) include_dirs=[cuda_path], libs=['cublas'])
from cuda_ndarray.cuda_ndarray import * from cuda_ndarray.cuda_ndarray import *
except Exception, e: except Exception, e:
...@@ -105,17 +104,19 @@ if cuda_available: ...@@ -105,17 +104,19 @@ if cuda_available:
cuda_available = False cuda_available = False
cuda_initialization_error_message = e.message cuda_initialization_error_message = e.message
# We must do those import to be able to create the full doc when nvcc
from theano.sandbox.cuda.var import (CudaNdarrayVariable,
CudaNdarrayConstant,
CudaNdarraySharedVariable,
float32_shared_constructor)
from theano.sandbox.cuda.type import CudaNdarrayType
if cuda_available: if cuda_available:
#check if their is an old cuda_ndarray that was loading instead of the one we compiled! #check if their is an old cuda_ndarray that was loading instead of the one we compiled!
import cuda_ndarray.cuda_ndarray import cuda_ndarray.cuda_ndarray
if cuda_ndarray_so != cuda_ndarray.cuda_ndarray.__file__: if cuda_ndarray_so != cuda_ndarray.cuda_ndarray.__file__:
warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!") warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!")
from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda.var import (CudaNdarrayVariable,
CudaNdarrayConstant,
CudaNdarraySharedVariable,
float32_shared_constructor)
shared_constructor = float32_shared_constructor shared_constructor = float32_shared_constructor
import basic_ops import basic_ops
......
...@@ -1701,20 +1701,6 @@ class GpuSubtensor(tensor.Subtensor): ...@@ -1701,20 +1701,6 @@ class GpuSubtensor(tensor.Subtensor):
cdata = cdata[0] cdata = cdata[0]
out[0] = x.__getitem__(cdata) out[0] = x.__getitem__(cdata)
if 0:
# JSB: commenting this out because it breaks code and does not look right
# Dumi could you try to run the examples in the DeepLearningBenchmarks
# for example? This logic doesn't seem right -- we just
# cast cdata to a tuple, so it doesn't have a .start field.
# some numpy installations don't expose the __index__() methods for
# numpy.int8/16/32/64. Casting to python's int instead
start = int(cdata.start) if cdata.start!=None else None
stop = int(cdata.stop) if cdata.stop!=None else None
step = int(cdata.step) if cdata.step!=None else None
newslice = slice(start,stop,step)
out[0] = x.__getitem__(newslice)
class GpuIncSubtensor(tensor.IncSubtensor): class GpuIncSubtensor(tensor.IncSubtensor):
def make_node(self, x, y, *inputs): def make_node(self, x, y, *inputs):
assert isinstance(x.type, CudaNdarrayType) assert isinstance(x.type, CudaNdarrayType)
......
...@@ -819,20 +819,46 @@ def test_shared_float32(): ...@@ -819,20 +819,46 @@ def test_shared_float32():
# Unregister # Unregister
del theano.shared.constructors[-1] del theano.shared.constructors[-1]
def test_shared_cudandarray():
'''Test that we can create a CudaNdarraySharedVariable from a CudaNdarray'''
a = cuda.shared_constructor(cuda.CudaNdarray.zeros((2,3)))
assert isinstance(a.type, tcn.CudaNdarrayType)
import theano.tensor.tests.test_sharedvar import theano.tensor.tests.test_sharedvar
#This test the case when the shared constructor view an CudaNdarray as input
test_shared_options = theano.tensor.tests.test_sharedvar.makeSharedTester( test_shared_options = theano.tensor.tests.test_sharedvar.makeSharedTester(
shared_constructor_ = tcn.shared_constructor,
dtype_ = 'float32',
get_value_borrow_true_alias_ = True,
shared_borrow_true_alias_ = True,#True when the original value is already a CudaNdarray!
set_value_borrow_true_alias_ = True,
set_value_inplace_ = True,
set_casted_value_inplace_ = False,
shared_constructor_accept_ndarray_ = True,
internal_type_ = cuda_ndarray.CudaNdarray,
test_internal_type_ = lambda a: isinstance(a,cuda_ndarray.CudaNdarray),
theano_fct_ = theano.tensor.exp,
ref_fct_ = numpy.exp,
cast_value_ = cuda_ndarray.CudaNdarray,
op_by_matrix_ = True)
#This test the case when the shared constructor view an ndarray as input
test_shared_options2 = theano.tensor.tests.test_sharedvar.makeSharedTester(
shared_constructor_ = tcn.shared_constructor, shared_constructor_ = tcn.shared_constructor,
dtype_ = 'float32', dtype_ = 'float32',
get_value_borrow_true_alias_ = False, get_value_borrow_true_alias_ = False,
shared_borrow_true_alias_ = False, shared_borrow_true_alias_ = False,
set_value_borrow_true_alias_ = False, set_value_borrow_true_alias_ = False,
set_value_inplace_ = True,
set_casted_value_inplace_ = True,
shared_constructor_accept_ndarray_ = True,
internal_type_ = cuda_ndarray.CudaNdarray, internal_type_ = cuda_ndarray.CudaNdarray,
test_internal_type_ = lambda a: isinstance(a,cuda_ndarray.CudaNdarray), test_internal_type_ = lambda a: isinstance(a,cuda_ndarray.CudaNdarray),
theano_fct_ = theano.tensor.exp, theano_fct_ = theano.tensor.exp,
ref_fct_ = numpy.exp, ref_fct_ = numpy.exp,
cast_value_ = numpy.asarray, cast_value_ = numpy.asarray,
add_matrix_ = True) op_by_matrix_ = True)
if __name__ == '__main__': if __name__ == '__main__':
test_many_arg_elemwise() test_many_arg_elemwise()
......
...@@ -65,6 +65,48 @@ def test_softmax_optimizations(): ...@@ -65,6 +65,48 @@ def test_softmax_optimizations():
assert env.outputs[0].owner.inputs[0].owner.op == cuda.host_from_gpu assert env.outputs[0].owner.inputs[0].owner.op == cuda.host_from_gpu
assert env.outputs[0].owner.inputs[0].owner.inputs[0].owner.op == cuda.nnet.gpu_crossentropy_softmax_argmax_1hot_with_bias assert env.outputs[0].owner.inputs[0].owner.inputs[0].owner.op == cuda.nnet.gpu_crossentropy_softmax_argmax_1hot_with_bias
def test_may_share_memory_cuda():
from theano.misc.may_share_memory import may_share_memory
a = cuda.CudaNdarray(numpy.zeros((3,4),dtype='float32'))
b = cuda.CudaNdarray(numpy.zeros((3,4),dtype='float32'))
na = numpy.zeros((3,4))
nb = numpy.zeros((3,4))
va = a.view()
vb = b.view()
ra = a.reshape((4,3))
rb = b.reshape((4,3))
#can't test the transpose as ta._strides = is not implemented
#manual transpose of a
#ta = a.reshape((4,3))
#ta._strides = (ta._strides[1],ta._strides[0])#not implemented
#elem_size=elem_size = numpy.zeros(0,dtype=a.dtype).dtype.itemsize
#ta.gpudata += ta.size*elem_size
for a_,b_,rep in [(a,a,True),(b,b,True),(a,b,False),
(a,na,False),(b,nb,False),(na,b,False),(nb,a,False),
(a,va,True),(b,vb,True),(va,b,False),(a,vb,False),
(a,ra,True),(b,rb,True),(ra,b,False),(a,rb,False),
]:
assert may_share_memory(a_,b_)==rep
assert may_share_memory(b_,a_)==rep
#test that it raise error when needed.
for a_,b_,rep in [(a,(0,),False),(a,1,False),(a,None,False)]:
assert may_share_memory(a_,b_,False)==rep
assert may_share_memory(b_,a_,False)==rep
try:
may_share_memory(a_,b_)
raise Exception("An error was expected")
except TypeError:
pass
try:
may_share_memory(b_,a_)
raise Exception("An error was expected")
except TypeError:
pass
def test_grad_sqrt_sum(): def test_grad_sqrt_sum():
""" """
This trigered a bug in the past. This trigered a bug in the past.
......
...@@ -8,10 +8,13 @@ from theano import Op, Type, Apply, Variable, Constant ...@@ -8,10 +8,13 @@ from theano import Op, Type, Apply, Variable, Constant
from theano import tensor, config from theano import tensor, config
from theano import scalar as scal from theano import scalar as scal
import cuda_ndarray.cuda_ndarray as cuda try:
import cuda_ndarray # We must do those import to be able to create the full doc when nvcc
import cuda_ndarray.cuda_ndarray as cuda
from theano.sandbox.cuda.nvcc_compiler import nvcc_module_compile_str from theano.sandbox.cuda.nvcc_compiler import nvcc_module_compile_str
import cuda_ndarray
except ImportError:
pass
class CudaNdarrayType(Type): class CudaNdarrayType(Type):
...@@ -53,14 +56,18 @@ class CudaNdarrayType(Type): ...@@ -53,14 +56,18 @@ class CudaNdarrayType(Type):
self.dtype_specs() # error checking is done there self.dtype_specs() # error checking is done there
def filter(self, data, strict=False, allow_downcast=None): def filter(self, data, strict=False, allow_downcast=None):
return self.filter_inplace(data, None, strict=strict, allow_downcast=allow_downcast)
def filter_inplace(self, data, old_data, strict=False, allow_downcast=None):
if strict or allow_downcast or isinstance(data, cuda.CudaNdarray): if strict or allow_downcast or isinstance(data, cuda.CudaNdarray):
return cuda.filter(data, self.broadcastable, strict, None) return cuda.filter(data, self.broadcastable, strict, old_data)
else: # (not strict) and (not allow_downcast) else: # (not strict) and (not allow_downcast)
# Check if data.dtype can be accurately casted to self.dtype # Check if data.dtype can be accurately casted to self.dtype
if isinstance(data, numpy.ndarray): if isinstance(data, numpy.ndarray):
up_dtype = scal.upcast(self.dtype, data.dtype) up_dtype = scal.upcast(self.dtype, data.dtype)
if up_dtype == self.dtype: if up_dtype == self.dtype:
return cuda.filter(data, self.broadcastable, strict, None) return cuda.filter(data, self.broadcastable, strict, old_data)
else: else:
raise TypeError( raise TypeError(
'%s, with dtype %s, cannot store a value of ' '%s, with dtype %s, cannot store a value of '
...@@ -75,10 +82,10 @@ class CudaNdarrayType(Type): ...@@ -75,10 +82,10 @@ class CudaNdarrayType(Type):
type(data) is float and type(data) is float and
self.dtype==theano.config.floatX): self.dtype==theano.config.floatX):
return cuda.filter(converted_data, self.broadcastable, return cuda.filter(converted_data, self.broadcastable,
strict, None) strict, old_data)
elif numpy.all(data == converted_data): elif numpy.all(data == converted_data):
return cuda.filter(converted_data, self.broadcastable, return cuda.filter(converted_data, self.broadcastable,
strict, None) strict, old_data)
else: else:
raise TypeError( raise TypeError(
'%s, with dtype %s, cannot store accurately value %s, ' '%s, with dtype %s, cannot store accurately value %s, '
...@@ -87,6 +94,7 @@ class CudaNdarrayType(Type): ...@@ -87,6 +94,7 @@ class CudaNdarrayType(Type):
% (self, self.dtype, data, converted_data, self.dtype), % (self, self.dtype, data, converted_data, self.dtype),
data) data)
@staticmethod @staticmethod
def bound(a): def bound(a):
high = a.gpudata high = a.gpudata
...@@ -112,10 +120,11 @@ class CudaNdarrayType(Type): ...@@ -112,10 +120,11 @@ class CudaNdarrayType(Type):
if a.__class__ is b.__class__: if a.__class__ is b.__class__:
a_l, a_h = CudaNdarrayType.bound(a) a_l, a_h = CudaNdarrayType.bound(a)
b_l, b_h = CudaNdarrayType.bound(b) b_l, b_h = CudaNdarrayType.bound(b)
if b_l>=a_h or a_l >= b_h: if b_l >= a_h or a_l >= b_h:
return False return False
return True return True
else: return False else:
return False
@staticmethod @staticmethod
def values_eq(a, b): def values_eq(a, b):
...@@ -352,4 +361,8 @@ copy_reg.constructor(CudaNdarray_unpickler) ...@@ -352,4 +361,8 @@ copy_reg.constructor(CudaNdarray_unpickler)
def CudaNdarray_pickler(cnda): def CudaNdarray_pickler(cnda):
return (CudaNdarray_unpickler, (numpy.asarray(cnda),)) return (CudaNdarray_unpickler, (numpy.asarray(cnda),))
copy_reg.pickle(cuda.CudaNdarray, CudaNdarray_pickler, CudaNdarray_unpickler) try:
# In case cuda is not imported.
copy_reg.pickle(cuda.CudaNdarray, CudaNdarray_pickler, CudaNdarray_unpickler)
except NameError:
pass
...@@ -8,15 +8,18 @@ from theano import tensor ...@@ -8,15 +8,18 @@ from theano import tensor
from theano.compile import SharedVariable from theano.compile import SharedVariable
from theano.sandbox.cuda.type import CudaNdarrayType from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda import filter as type_support_filter try:
# We must do those import to be able to create the full doc when nvcc
from theano.sandbox.cuda.basic_ops import HostFromGpu, GpuFromHost from theano.sandbox.cuda import filter as type_support_filter
from theano.sandbox.cuda.basic_ops import HostFromGpu, GpuFromHost
except ImportError:
pass
class _operators(tensor.basic._tensor_py_operators): class _operators(tensor.basic._tensor_py_operators):
"""Define a few properties and conversion methods for CudaNdarray Variables. """Define a few properties and conversion methods for CudaNdarray Variables.
The default implementation of arithemetic operators is to build graphs of TensorType The default implementation of arithemetic operators is to build graphs of TensorType
variables. variables.
The optimization pass (specialization) will insert pure GPU implementations. The optimization pass (specialization) will insert pure GPU implementations.
This approach relieves the Cuda-Ops of having to deal with input argument checking and This approach relieves the Cuda-Ops of having to deal with input argument checking and
...@@ -49,9 +52,34 @@ class CudaNdarrayConstant(Constant, _operators): ...@@ -49,9 +52,34 @@ class CudaNdarrayConstant(Constant, _operators):
CudaNdarrayType.Constant = CudaNdarrayConstant CudaNdarrayType.Constant = CudaNdarrayConstant
class CudaNdarraySharedVariable(SharedVariable, _operators): class CudaNdarraySharedVariable(SharedVariable, _operators):
"""
Shared Variable interface to CUDA-allocated arrays
"""
get_value_return_ndarray = True
def get_value(self, borrow=False, return_internal_type=False): def get_value(self, borrow=False, return_internal_type=False):
if return_internal_type: # return a cuda_ndarray """
Return the value of this SharedVariable's internal array.
:param borrow:
permit the return of internal storage, when used in conjunction with
``return_internal_type=True``
:param return_internal_type:
True to return the internal ``cuda_ndarray`` instance rather than a ``numpy.ndarray``
(Default False)
By default ``get_value()`` copies from the GPU to a ``numpy.ndarray`` and returns that
host-allocated array.
``get_value(False,True)`` will return a GPU-allocated copy of the original GPU array.
``get_value(True,True)`` will return the original GPU-allocated array without any
copying.
"""
if return_internal_type or not self.get_value_return_ndarray:
# return a cuda_ndarray
if borrow: if borrow:
return self.container.value return self.container.value
else: else:
...@@ -60,6 +88,37 @@ class CudaNdarraySharedVariable(SharedVariable, _operators): ...@@ -60,6 +88,37 @@ class CudaNdarraySharedVariable(SharedVariable, _operators):
return numpy.asarray(self.container.value) return numpy.asarray(self.container.value)
def set_value(self, value, borrow=False): def set_value(self, value, borrow=False):
"""
Assign `value` to the GPU-allocated array.
:param borrow: ``True`` permits reusing `value` itself, ``False`` requires that this function
copies `value` into internal storage.
:note:
Prior to Theano 0.3.1, set_value did not work in-place on the GPU. This meant that sometimes,
GPU memory for the new value would be allocated before the old memory was released. If you're
running near the limits of GPU memory, this could cause you to run out of GPU memory.
Beginning with Theano 0.3.1, set_value will work in-place on the GPU, if the following conditions
are met:
* The destination on the GPU must be c_contiguous.
* The source is on the CPU.
* The old value must have the same dtype as the new value (which is a given for now,
since only float32 is supported).
* The old and new value must have the same shape.
* The old value is being completely replaced by the new value (not partially modified,
e.g. by replacing some subtensor of it).
* You change the value of the shared variable via set_value, not via the .value
accessors. You should not use the .value accessors anyway, since they will soon be
deprecated and removed.
It is also worth mentioning that, for efficient transfer to the GPU, Theano will make the new data
``c_contiguous``. This can require an extra copy of the data on the host.
This work what when borrow=True and when borrow=False
"""
if not borrow: if not borrow:
#TODO: check for cuda_ndarray type #TODO: check for cuda_ndarray type
if not isinstance(value, numpy.ndarray): if not isinstance(value, numpy.ndarray):
...@@ -84,11 +143,11 @@ CudaNdarrayType.SharedVariable = CudaNdarraySharedVariable ...@@ -84,11 +143,11 @@ CudaNdarrayType.SharedVariable = CudaNdarraySharedVariable
def cuda_shared_constructor(value, name=None, strict=False, def cuda_shared_constructor(value, name=None, strict=False,
allow_downcast=None, borrow=False, broadcastable=None): allow_downcast=None, borrow=False, broadcastable=None):
"""SharedVariable Constructor for TensorType""" """SharedVariable Constructor for CudaNdarrayType"""
# THIS CONSTRUCTOR TRIES TO CAST VALUE TO A FLOAT32, WHICH THEN GOES ONTO THE CARD # THIS CONSTRUCTOR TRIES TO CAST VALUE TO A FLOAT32, WHICH THEN GOES ONTO THE CARD
# SO INT shared vars, float64 shared vars, etc. all end up on the card. # SO INT shared vars, float64 shared vars, etc. all end up on the card.
# THIS IS NOT THE DEFAULT BEHAVIOUR THAT WE WANT. # THIS IS NOT THE DEFAULT BEHAVIOUR THAT WE WANT.
# SEE float32_shared_constructor # SEE float32_shared_constructor
#TODO: what should strict mean in this context, since we always have to make a copy? #TODO: what should strict mean in this context, since we always have to make a copy?
...@@ -115,22 +174,34 @@ def cuda_shared_constructor(value, name=None, strict=False, ...@@ -115,22 +174,34 @@ def cuda_shared_constructor(value, name=None, strict=False,
def float32_shared_constructor(value, name=None, strict=False, def float32_shared_constructor(value, name=None, strict=False,
allow_downcast=None, borrow=False, broadcastable=None): allow_downcast=None, borrow=False, broadcastable=None):
"""SharedVariable Constructor for TensorType""" """SharedVariable Constructor for CudaNdarrayType from numpy.ndarray or CudaNdarray"""
# if value isn't a float32 ndarray, then raise # if value isn't a float32 ndarray, or a CudaNdarray then raise
if not isinstance(value, numpy.ndarray):
raise TypeError('ndarray required') if not isinstance(value, (numpy.ndarray, theano.sandbox.cuda.CudaNdarray)):
if value.dtype.num != CudaNdarrayType.typenum: raise TypeError('ndarray or CudaNdarray required')
if isinstance(value, numpy.ndarray) and value.dtype.num != CudaNdarrayType.typenum:
raise TypeError('float32 ndarray required') raise TypeError('float32 ndarray required')
if broadcastable is None: if broadcastable is None:
broadcastable = (False,) * len(value.shape) broadcastable = (False,) * len(value.shape)
type = CudaNdarrayType(broadcastable=broadcastable) type = CudaNdarrayType(broadcastable=broadcastable)
deviceval = type_support_filter(value, broadcastable, False, None) get_value_return_ndarray = True
if isinstance(value, theano.sandbox.cuda.CudaNdarray):
get_value_return_ndarray = False
if borrow:
deviceval = value
else:
deviceval = value.copy()
else:
deviceval = type_support_filter(value, broadcastable, False, None)
try: try:
rval = CudaNdarraySharedVariable(type=type, value=deviceval, name=name, strict=strict) rval = CudaNdarraySharedVariable(type=type, value=deviceval, name=name, strict=strict)
except Exception, e: except Exception, e:
print "ERROR", e print "ERROR", e
raise raise
return rval
rval.get_value_return_ndarray = get_value_return_ndarray
return rval
...@@ -400,11 +400,16 @@ def test_neibs_grad_verify_grad_warp_centered(): ...@@ -400,11 +400,16 @@ def test_neibs_grad_verify_grad_warp_centered():
try: try:
unittest_tools.verify_grad(fn, [images_val], mode=mode_without_gpu) unittest_tools.verify_grad(fn, [images_val], mode=mode_without_gpu)
raise Exception("Expected an error") raise Exception("Expected an error")
if cuda.cuda_available:
unittest_tools.verify_grad(fn, [images_val], mode=mode_with_gpu)
except NotImplementedError: except NotImplementedError:
pass pass
if cuda.cuda_available:
try:
unittest_tools.verify_grad(fn, [images_val], mode=mode_with_gpu)
raise Exception("Expected an error")
except NotImplementedError:
pass
if __name__ == '__main__': if __name__ == '__main__':
#test_neibs_gpu() #test_neibs_gpu()
#test_neibs() #test_neibs()
......
...@@ -44,7 +44,7 @@ import tensor ...@@ -44,7 +44,7 @@ import tensor
import misc.safe_asarray as safe_asarray import misc.safe_asarray as safe_asarray
from tensor import opt, TensorType from tensor import opt, TensorType
import gof import gof
from gof import Optimizer, toolbox, Op, Apply from gof import Optimizer, toolbox, Op, Apply, Variable
from compile import optdb, SharedVariable, function, Param from compile import optdb, SharedVariable, function, Param
import compile import compile
import gradient import gradient
...@@ -1559,8 +1559,15 @@ class Scan(Op): ...@@ -1559,8 +1559,15 @@ class Scan(Op):
theano.config.floatX)) theano.config.floatX))
inner_gfn_ins = inner_g_outs + self.inputs inner_gfn_ins = inner_g_outs + self.inputs
g_args = [self.n_steps] + g_outs[:self.n_outs_not_shared] \
+ scan_outputs + args[1:] # Make sure you don't have numbers in here
if not isinstance(self.n_steps, Variable):
n_steps = tensor.as_tensor(self.n_steps)
else:
n_steps = self.n_steps
g_args = [n_steps] + g_outs[:self.n_outs_not_shared] \
+ scan_outputs + args[1:]
truncate_gradient = self.truncate_gradient truncate_gradient = self.truncate_gradient
for x in self.store_steps[:self.n_outs_not_shared]: for x in self.store_steps[:self.n_outs_not_shared]:
if x>0 : if x>0 :
...@@ -1571,8 +1578,11 @@ class Scan(Op): ...@@ -1571,8 +1578,11 @@ class Scan(Op):
self.n_seqs, self.n_outs, self.n_outs_not_shared, self.n_seqs, self.n_outs, self.n_outs_not_shared,
self.go_backwards, self.seqs_taps, self.outs_taps, self.go_backwards, self.seqs_taps, self.outs_taps,
truncate_gradient) truncate_gradient)
g_scan_outs = g_scan(g_args) g_scan_outs = g_scan(g_args)
# We need to add several None's fpr shared vars with updates if not type(g_scan_outs) in (list, tuple):
g_scan_outs = [ g_scan_outs ]
# We need to add several None's for shared vars with updates
gradients = [None] + g_scan_outs[:self.n_seqs+self.n_outs_not_shared] gradients = [None] + g_scan_outs[:self.n_seqs+self.n_outs_not_shared]
gradients += [None for i in xrange(self.n_outs-self.n_outs_not_shared)] gradients += [None for i in xrange(self.n_outs-self.n_outs_not_shared)]
gradients += g_scan_outs[self.n_seqs+self.n_outs_not_shared:] gradients += g_scan_outs[self.n_seqs+self.n_outs_not_shared:]
......
...@@ -15,7 +15,7 @@ def sparse_constructor(value, name=None, strict=False, allow_downcast=None, ...@@ -15,7 +15,7 @@ def sparse_constructor(value, name=None, strict=False, allow_downcast=None,
writeme writeme
""" """
if not isinstance(value, scipy.sparse.spmatrix): if not isinstance(value, scipy.sparse.spmatrix):
raise TypeError() raise TypeError("Expected a sparse matrix in the sparse shared variable constructor. Received: ",value.__class__)
if format is None: if format is None:
format = value.format format = value.format
...@@ -24,5 +24,3 @@ def sparse_constructor(value, name=None, strict=False, allow_downcast=None, ...@@ -24,5 +24,3 @@ def sparse_constructor(value, name=None, strict=False, allow_downcast=None,
value = copy.deepcopy(value) value = copy.deepcopy(value)
return SparseTensorSharedVariable(type=type, value=value, name=name, return SparseTensorSharedVariable(type=type, value=value, name=name,
strict=strict, allow_downcast=allow_downcast) strict=strict, allow_downcast=allow_downcast)
...@@ -468,7 +468,7 @@ def test_shape_i(): ...@@ -468,7 +468,7 @@ def test_shape_i():
a = SparseType('csr', dtype=sparse_dtype)() a = SparseType('csr', dtype=sparse_dtype)()
f = theano.function([a], a.shape[1], mode='FAST_RUN') f = theano.function([a], a.shape[1], mode='FAST_RUN')
assert f(sp.csr_matrix(random_lil((100,10), sparse_dtype, 3)))==(10) assert f(sp.csr_matrix(random_lil((100,10), sparse_dtype, 3))) == 10
def test_shape(): def test_shape():
# Test that getting the shape of a sparse variable # Test that getting the shape of a sparse variable
...@@ -501,11 +501,20 @@ def test_may_share_memory(): ...@@ -501,11 +501,20 @@ def test_may_share_memory():
import theano.tensor.tests.test_sharedvar import theano.tensor.tests.test_sharedvar
test_shared_options=theano.tensor.tests.test_sharedvar.makeSharedTester( test_shared_options=theano.tensor.tests.test_sharedvar.makeSharedTester(
theano.sparse.shared, 'float64', shared_constructor_ = theano.sparse.shared,
True, True, True, scipy.sparse.csc_matrix, scipy.sparse.issparse, dtype_ = 'float64',
lambda a: dense_from_sparse(a*2.), get_value_borrow_true_alias_ = True,
lambda a: numpy.asarray((a*2).todense()), shared_borrow_true_alias_ = True,
scipy.sparse.csr_matrix) set_value_borrow_true_alias_ = True,
set_value_inplace_ = False,
set_casted_value_inplace_ = False,
shared_constructor_accept_ndarray_ = False,
internal_type_ = scipy.sparse.csc_matrix,
test_internal_type_ = scipy.sparse.issparse,
theano_fct_ = lambda a: dense_from_sparse(a*2.),
ref_fct_ = lambda a: numpy.asarray((a*2).todense()),
cast_value_ = scipy.sparse.csr_matrix)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
...@@ -3538,8 +3538,16 @@ tilegrad = TileGrad() ...@@ -3538,8 +3538,16 @@ tilegrad = TileGrad()
class Tile(Op): class Tile(Op):
"""Tiles its input according to reps. Reps is of same dimension as x """
and contains the number of times to tile x in each dimension""" Construct an array by repeating the input x according to reps pattern.
Tiles its input according to reps. The len of reps is the number of
dimension of x and contains the number of times to tile x in each dimension.
:see: `numpy.tile http://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html`_
"""
def __init__(self, ndim): def __init__(self, ndim):
self.ndim = ndim self.ndim = ndim
def __eq__(self, other): def __eq__(self, other):
...@@ -4273,7 +4281,7 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False): ...@@ -4273,7 +4281,7 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False):
each element of the list. If an element of `wrt` is not differentiable each element of the list. If an element of `wrt` is not differentiable
with respect to the output, then a zero variable is returned. with respect to the output, then a zero variable is returned.
This function is a wrapper around a the more general function This function is a wrapper around the more general function
`theano.gradient.grad_sources_inputs``. `theano.gradient.grad_sources_inputs``.
""" """
......
...@@ -941,12 +941,16 @@ class CAReduce(Op): ...@@ -941,12 +941,16 @@ class CAReduce(Op):
# If it's a zero-size array, use scalar_op.identity if available # If it's a zero-size array, use scalar_op.identity if available
if variable.shape[dimension] == 0: if variable.shape[dimension] == 0:
if hasattr(self.scalar_op, 'identity'): if hasattr(self.scalar_op, 'identity'):
variable = self.scalar_op.identity variable = numpy.array(self.scalar_op.identity)
break break
else: else:
raise ValueError("Input (%s) has zero-size on axis %s, but self.scalar_op (%s) has no attribute 'identity'" % (variable, dimension, self.scalar_op)) raise ValueError("Input (%s) has zero-size on axis %s, but self.scalar_op (%s) has no attribute 'identity'" % (variable, dimension, self.scalar_op))
else: else:
variable = self.ufunc.reduce(variable, dimension) variable = self.ufunc.reduce(variable, dimension)
variable = numpy.asarray(variable)
if numpy.may_share_memory(variable, input):
# perhaps numpy is clever for reductions of size 1? We don't want this.
variable = variable.copy()
output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype) output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype)
else: else:
output[0] = numpy.copy(variable) output[0] = numpy.copy(variable)
...@@ -1169,27 +1173,79 @@ class Prod(CAReduce): ...@@ -1169,27 +1173,79 @@ class Prod(CAReduce):
).get(idtype, idtype) ).get(idtype, idtype)
def grad(self, (prod_in, ), (gz, )): def grad(self, (prod_in, ), (gz, )):
'''
The grad of this Op could be very easy, it is was not for the case
where zeros are present in a given "group" (ie. elements reduced
together to form the product).
If no zeros are found in the elements of the product, then the
partial derivative of the product relative to one of the elements
(one of the inputs) is simply the product of the other elements.
That's easy to see from the chain rule.
Now the trick (with no zeros) is to take the overall product, then
for every original element, the partial derivative is given by
this product divided by the element itself (which equals the product
of the other terms). This is easy to do by broadcasting the original
product.
(Note that we also need to broadcast-multiply by the "incoming gradient",
ie. the gradient of the cost relative to the output/product).
-----
With zeros, things get more complicated. For a given group, we have 3
cases:
* No zeros in the group. Use previous trick.
* If only one zero is present, then the gradient for that element is
non-zero, but is zero for all others.
* If more than one zero is present, then all the derivatives are zero.
For the last two cases (with 1 or more zeros), we can't use the division
trick, as this gives divisions by 0.
Implementing that case-by-case logic is not as trivial, so a bunch of
hacks are piled down here to do it. Notably, for the "only one zero"
case, there's a special Op that computes the product of the elements
in the group, minus the zero (see ProdWithoutZero). The trick is then
to use the division trick for groups with no zero, to use the
ProdWithoutZeros op where there's only one zero, and to output a
derivative of zero for any element part of a group with more than
one zero.
I do this by first counting the number of zeros in each group (see
the "T.eq()" bits), then taking this or that behavior (see T.switch)
based on the result of this count.
'''
if prod_in.dtype[0:3] in ('int','uin'): if prod_in.dtype[0:3] in ('int','uin'):
return [None] return [None]
# Prepare the broadcasting that is used everywhere to broadcast
# over the original groups (ie. broadcast over the elements of a given
# product)
gz = as_tensor_variable(gz) gz = as_tensor_variable(gz)
axis = self.axis axis = self.axis
if axis is None: if axis is None:
axis = range(prod_in.type.ndim) axis = range(prod_in.type.ndim)
if axis == (): if axis == ():
return gz, return gz,
new_dims = [] new_dims = []
i = 0 i = 0
for j, _ in enumerate(prod_in.type.broadcastable): for j, _ in enumerate(prod_in.type.broadcastable):
if j in axis: if j in axis:
new_dims.append('x') new_dims.append('x')
else: else:
new_dims.append(i) new_dims.append(i)
i += 1 i += 1
# result of the product, broadcastable over groups
prod_out = self(prod_in).dimshuffle(new_dims) prod_out = self(prod_in).dimshuffle(new_dims)
# incoming gradient, broadcastable over groups
gz = gz.dimshuffle(new_dims) gz = gz.dimshuffle(new_dims)
# division trick if we don't have zeros. This will contain
# NaNs to be eliminated in the T.switch if we do have zeros.
grad_case_without_zeros = (gz * prod_out / prod_in) grad_case_without_zeros = (gz * prod_out / prod_in)
if self.no_zeros_in_input: if self.no_zeros_in_input:
...@@ -1198,13 +1254,22 @@ class Prod(CAReduce): ...@@ -1198,13 +1254,22 @@ class Prod(CAReduce):
else: else:
T = theano.tensor T = theano.tensor
where_zeros = T.eq(prod_in, 0.0) where_zeros = T.eq(prod_in, 0.0)
sum_where_zeros = T.sum(where_zeros, axis=self.axis) sum_where_zeros = T.sum(where_zeros, axis=self.axis)
groups_with_single_zero = T.eq(sum_where_zeros, 1.0).dimshuffle(new_dims) groups_with_single_zero = T.eq(sum_where_zeros, 1).dimshuffle(new_dims)
# tensor with 0 everywhere except for those places where
# a 0 part of a group with a single zero was to be found
where_single_zero = groups_with_single_zero * where_zeros where_single_zero = groups_with_single_zero * where_zeros
where_gz_not_zero = T.neq(gz, 0.0) # further optimization to avoid computing ProdWithoutZeros
# if the incoming gradient is 0
where_gz_not_zero = T.neq(gz, 0.0)
# only take ProdWithoutZeros for the groups with single zeros
# with non-null incoming gradient
where_to_take_prod_without_zeros = \ where_to_take_prod_without_zeros = \
groups_with_single_zero * where_gz_not_zero groups_with_single_zero * where_gz_not_zero
# preprocess the original input so that we set 0 everywhere
# except for groups that contain a single zero, to avoid computing
# multiplications on other groups
prod_without_zeros_in = where_to_take_prod_without_zeros * prod_in prod_without_zeros_in = where_to_take_prod_without_zeros * prod_in
# TODO: put lazy switch here, if it'd work # TODO: put lazy switch here, if it'd work
# this is pretty efficient already (no multiplication if 0), but # this is pretty efficient already (no multiplication if 0), but
...@@ -1212,7 +1277,8 @@ class Prod(CAReduce): ...@@ -1212,7 +1277,8 @@ class Prod(CAReduce):
prod_without_zeros = ProdWithoutZeros(axis=self.axis)(prod_without_zeros_in) prod_without_zeros = ProdWithoutZeros(axis=self.axis)(prod_without_zeros_in)
prod_without_zeros = prod_without_zeros.dimshuffle(new_dims) prod_without_zeros = prod_without_zeros.dimshuffle(new_dims)
groups_without_zeros = T.eq(sum_where_zeros, 0.0).dimshuffle(new_dims) groups_without_zeros = T.eq(sum_where_zeros, 0).dimshuffle(new_dims)
final_grad = T.switch(groups_without_zeros, grad_case_without_zeros, final_grad = T.switch(groups_without_zeros, grad_case_without_zeros,
T.switch(where_single_zero, prod_without_zeros, 0.0) * gz) T.switch(where_single_zero, prod_without_zeros, 0.0) * gz)
...@@ -1228,19 +1294,28 @@ class Prod(CAReduce): ...@@ -1228,19 +1294,28 @@ class Prod(CAReduce):
return () return ()
class MulWithoutZeros(scalar.BinaryScalarOp): class MulWithoutZeros(scalar.BinaryScalarOp):
identity = 1. # "identity" here is zero, as in Reduce we don't want to start
# with reducing (1, something_else): this leads to the erronous
# case where a vector of zeros is reduced by binary reductions
# of (1, 0), which always ends up as 1 (ie. the result for
# the c version, for the product of [0,0,0], is 1.0)
identity = 0.
commutative = True commutative = True
associative = True associative = True
def impl(self, *inputs): def impl(self, x, y):
if inputs[0] == 0.: if x == 0:
return inputs[1] return y
if inputs[1] == 0.: if y == 0:
return inputs[0] return x
return inputs[1] * inputs[2] return x*y
def c_code(self, node, name, (x,y), (z, ), sub): def c_code(self, node, name, (x,y), (z, ), sub):
return ("%(z)s = ((%(x)s == 0) ? (%(y)s) : " + \ return ("%(z)s = ((%(x)s == 0) ? (%(y)s) : " + \
"((%(y)s == 0) ? (%(x)s) : ((%(y)s)*(%(x)s))) );") % locals() "((%(y)s == 0) ? (%(x)s) : ((%(y)s)*(%(x)s))) );") % locals()
def c_code_cache_version(self):
return (1,)
mul_without_zeros = MulWithoutZeros(scalar.upcast_out, name = 'mul_without_zeros') mul_without_zeros = MulWithoutZeros(scalar.upcast_out, name = 'mul_without_zeros')
class ProdWithoutZeros(CAReduce): class ProdWithoutZeros(CAReduce):
...@@ -1263,4 +1338,3 @@ class ProdWithoutZeros(CAReduce): ...@@ -1263,4 +1338,3 @@ class ProdWithoutZeros(CAReduce):
return "ProdWithoutZeros" return "ProdWithoutZeros"
else: else:
return "ProdWithoutZeros{%s}" % ", ".join(map(str, self.axis)) return "ProdWithoutZeros{%s}" % ", ".join(map(str, self.axis))
...@@ -595,5 +595,5 @@ def computeH(V,W,b,d): ...@@ -595,5 +595,5 @@ def computeH(V,W,b,d):
return H return H
from . import ConvGrad3D import ConvGrad3D
from . import ConvTransp3D import ConvTransp3D
...@@ -90,6 +90,10 @@ def broadcast_like(value, template, env): ...@@ -90,6 +90,10 @@ def broadcast_like(value, template, env):
if template not in shape_of: if template not in shape_of:
raise NotImplementedError('broadcast_like currently requires the template Variable to be in the env already') raise NotImplementedError('broadcast_like currently requires the template Variable to be in the env already')
rval = T.alloc(T.cast(value, template.dtype), *shape_of[template]) rval = T.alloc(T.cast(value, template.dtype), *shape_of[template])
# the template may have 1s in its shape without being broadcastable
if rval.broadcastable != template.broadcastable:
rval = T.unbroadcast(rval, *[i for i in xrange(rval.ndim) if rval.broadcastable[i]
and not template.broadcastable[i]])
assert rval.type == template.type assert rval.type == template.type
return rval return rval
...@@ -663,14 +667,20 @@ def local_fill_to_alloc(node): ...@@ -663,14 +667,20 @@ def local_fill_to_alloc(node):
elif v.type.broadcastable == node.outputs[0].type.broadcastable: elif v.type.broadcastable == node.outputs[0].type.broadcastable:
# this is a cast # this is a cast
rval = [T.cast(v, node.outputs[0].type.dtype)] rval = [T.cast(v, node.outputs[0].type.dtype)]
elif r.type.broadcastable == node.outputs[0].type.broadcastable:
# we are broadcasting v somehow, but not r
rval = [broadcast_like(v, r, node.env)]
else: else:
# we are broadcasting v somehow # we are broadcasting both v and r,
shape_of = node.env.shape_feature.shape_of # the output shape must be computed
#
# TODO: implement this case (including a test!)
#
# I think the strategy should be to extend the shorter shape vector
# with 1s (how?) and then take the elementwise max of the two.
# - how to flag an error of shape mismatch where broadcasting should be illegal?
return
# TODO: cut out un-necessary dimshuffles of v # TODO: cut out un-necessary dimshuffles of v
rval = [T.alloc(T.cast(v, node.outputs[0].dtype), *shape_of[node.outputs[0]])]
#if rval[0].type != node.outputs[0].type:
#print >> sys.stderr, theano.printing.debugprint(node.outputs[0], file='str')
assert rval[0].type == node.outputs[0].type, ('rval', rval[0].type, assert rval[0].type == node.outputs[0].type, ('rval', rval[0].type,
'orig', node.outputs[0].type, 'orig', node.outputs[0].type,
...@@ -764,10 +774,12 @@ def local_subtensor_make_vector(node): ...@@ -764,10 +774,12 @@ def local_subtensor_make_vector(node):
@gof.local_optimizer([T.Elemwise]) @gof.local_optimizer([T.Elemwise])
def local_useless_elemwise(node): def local_useless_elemwise(node):
""" """
eq(x,x) -> 1
neq(x,x) -> 0 eq(x,x) -> 1
mul(x) -> x neq(x,x) -> 0
add(x) -> x mul(x) -> x
add(x) -> x
identity(x) -> x
""" """
if isinstance(node.op, T.Elemwise): if isinstance(node.op, T.Elemwise):
...@@ -783,6 +795,8 @@ add(x) -> x ...@@ -783,6 +795,8 @@ add(x) -> x
return [node.inputs[0]] return [node.inputs[0]]
if node.op.scalar_op == theano.scalar.add and len(node.inputs)==1: if node.op.scalar_op == theano.scalar.add and len(node.inputs)==1:
return [node.inputs[0]] return [node.inputs[0]]
if node.op.scalar_op == theano.scalar.identity and len(node.inputs)==1:
return [node.inputs[0]]
@register_specialize @register_specialize
...@@ -2255,8 +2269,7 @@ def local_mul_specialize(node): ...@@ -2255,8 +2269,7 @@ def local_mul_specialize(node):
neg ^= True #toggles neg ^= True #toggles
elif N.all(y == 0.0): elif N.all(y == 0.0):
# if we find any zero, we just return right away # if we find any zero, we just return right away
return [T.alloc(numpy.asarray(0, dtype=node.outputs[0].dtype), return [broadcast_like(0, node.outputs[0], node.env)]
*node.env.shape_feature.shape_of[node.outputs[0]])]
else: else:
new_inputs.append(input) new_inputs.append(input)
...@@ -2273,21 +2286,14 @@ def local_mul_specialize(node): ...@@ -2273,21 +2286,14 @@ def local_mul_specialize(node):
else: else:
rval = T.mul(*new_inputs) rval = T.mul(*new_inputs)
return [T.alloc(T.cast(rval, node.outputs[0].dtype), return [broadcast_like(rval, node.outputs[0], node.env)]
*node.env.shape_feature.shape_of[node.outputs[0]])]
else: else:
# there are no variable inputs to mul # there are no variable inputs to mul
# N.B. this could have been constant-folded... # N.B. this could have been constant-folded...
if neg: if neg:
# return output's worth of -1 return [broadcast_like(-1, node.outputs[0], node.env)]
return [T.alloc(
numpy.asarray(-1, dtype=node.outputs[0].dtype),
*node.env.shape_feature.shape_of[node.outputs[0]])]
else: else:
# return output's worth of 1 return [broadcast_like(1, node.outputs[0], node.env)]
return [T.alloc(
numpy.asarray(1, dtype=node.outputs[0].dtype),
*node.env.shape_feature.shape_of[node.outputs[0]])]
register_specialize(local_mul_specialize) register_specialize(local_mul_specialize)
......
""" """
This file implement specialization optimization that break the canonicalization form This file implement specialization optimization that break the canonization form of the graph.
Currently their is problem with the order of optimization and the definition of definition of canonized graph.
Right now their is a canonization optimization phase that try to make all equivalent graph identical. This is not always the case, but it do many of the basic stuff canonical. We need to extend the definition of canonization to make this true more often.
The problem this file indent to fix in the future is that in the "Equilibrium" specialization optimization phase, there is optimization that request that the graph is canonical, some other request that this is not true, and some other that break the canonicalization for some optimization. As we can't control the order of those optimization, their is case that some optimization requesting a canonical graph won't be applied as optimization that break the canonicalization form of the graph executed before.
To fix this, we need to split the specialization phase into a phase where optimization can't break the canonicalization form and one where this is allowed. This is also needed for the stabilized optimization phase, but as it happen before the specialization phase, this cause less problem.
Also, we should make the env refuse optimization that break the canonization of the graph in the optimizations phases where the graph is supposed to be canonical.
""" """
# TODO: intelligent merge for mul/add # TODO: intelligent merge for mul/add
...@@ -30,7 +40,7 @@ from theano import scalar as scal ...@@ -30,7 +40,7 @@ from theano import scalar as scal
class MaxAndArgmaxOptimizer(Optimizer): class MaxAndArgmaxOptimizer(Optimizer):
"""Replace MaxAndArgmax by CAReduce when the argmax is not used """Replace MaxAndArgmax by CAReduce when the argmax is not used
This is faster as MaxAndArgmax don't have c code and execute it This is faster as MaxAndArgmax don't have c code and execute it
in two pass. in two pass.
""" """
...@@ -70,7 +80,7 @@ def local_max_to_min(node): ...@@ -70,7 +80,7 @@ def local_max_to_min(node):
This is tested in tensor/tests/test_basic.py:test_min_max This is tested in tensor/tests/test_basic.py:test_min_max
:note: we don't need an opt that will do the reverse as by default :note: we don't need an opt that will do the reverse as by default
the interface put only MaxAndArgmax into the graph. the interface put only MaxAndArgmax into the graph.
""" """
if node.op == T.neg and node.inputs[0].owner: if node.op == T.neg and node.inputs[0].owner:
...@@ -81,5 +91,3 @@ def local_max_to_min(node): ...@@ -81,5 +91,3 @@ def local_max_to_min(node):
return [CAReduce(scal.minimum,max.owner.op.axis)(neg.owner.inputs[0])] return [CAReduce(scal.minimum,max.owner.op.axis)(neg.owner.inputs[0])]
return False return False
...@@ -104,9 +104,9 @@ class test_Broadcast(unittest.TestCase): ...@@ -104,9 +104,9 @@ class test_Broadcast(unittest.TestCase):
xv = numpy.asarray(numpy.random.rand(*xsh)) xv = numpy.asarray(numpy.random.rand(*xsh))
yv = numpy.asarray(numpy.random.rand(*ysh)) yv = numpy.asarray(numpy.random.rand(*ysh))
zv = xv + yv zv = xv + yv
f(xv, yv) f(xv, yv)
assert xv.shape==zv.shape assert xv.shape==zv.shape
def test_perform(self): def test_perform(self):
...@@ -217,11 +217,11 @@ class test_CAReduce(unittest.TestCase): ...@@ -217,11 +217,11 @@ class test_CAReduce(unittest.TestCase):
f(xv) f(xv)
except ValueError: except ValueError:
pass pass
else: else:
self.fail() self.fail()
else: else:
self.failUnless((numpy.abs(f(xv) - zv) < 1e-10).all()) self.failUnless((numpy.abs(f(xv) - zv) < 1e-10).all())
#test CAReduce.infer_shape #test CAReduce.infer_shape
#the Shape op don't implement c_code! #the Shape op don't implement c_code!
...@@ -248,7 +248,7 @@ class test_CAReduce(unittest.TestCase): ...@@ -248,7 +248,7 @@ class test_CAReduce(unittest.TestCase):
self.with_linker(gof.CLinker(), maximum) self.with_linker(gof.CLinker(), maximum)
self.with_linker(gof.CLinker(), minimum) self.with_linker(gof.CLinker(), minimum)
#need other dtype then real #need other dtype then real
#no c_code for or_, and_ #no c_code for or_, and_
#self.with_linker(gof.CLinker(), or_) #self.with_linker(gof.CLinker(), or_)
#self.with_linker(gof.CLinker(), and_) #self.with_linker(gof.CLinker(), and_)
...@@ -258,23 +258,28 @@ class test_Prod(unittest.TestCase): ...@@ -258,23 +258,28 @@ class test_Prod(unittest.TestCase):
def setUp(self): def setUp(self):
unittest_tools.seed_rng() unittest_tools.seed_rng()
# we want to allow nans in the matrices, so we disable this DEBUG_MODE check
mode = theano.compile.mode.get_default_mode()
mode = copy(mode)
mode.check_isfinite = False
self.mode = mode
def test_verify_grad(self): def test_verify_grad(self):
# including zeros, as the case with zeros is important # including zeros, as the case with zeros is important
# (and special cases: 1 zero in the row, more than 1 zero in the row) # (and special cases: 1 zero in the row, more than 1 zero in the row)
x_val = numpy.asarray([[1,2,3],[4,5,6],[7,8,9]], dtype='float32') x_val = numpy.asarray([[1,2,3],[4,5,6],[7,8,9]], dtype='float32')
x = theano.tensor.dmatrix() x = theano.tensor.dmatrix()
# now with verify_grad # now with verify_grad
unittest_tools.verify_grad(Prod(axis=1), [x_val]) unittest_tools.verify_grad(Prod(axis=1), [x_val], mode=self.mode)
# second time, with some added complexity # second time, with some added complexity
# verify_grad takes the sum of the matrices anyway # verify_grad takes the sum of the matrices anyway
def fn(x2): def fn(x2):
return theano.tensor.sqr(Prod(axis=1)(x2)) return theano.tensor.sqr(Prod(axis=1)(x2))
unittest_tools.verify_grad(fn, [x_val]) unittest_tools.verify_grad(fn, [x_val], mode=self.mode)
def test_verify_grad_with_zeros(self): def test_verify_grad_with_zeros(self):
...@@ -287,18 +292,18 @@ class test_Prod(unittest.TestCase): ...@@ -287,18 +292,18 @@ class test_Prod(unittest.TestCase):
x2 = theano.tensor.dmatrix() x2 = theano.tensor.dmatrix()
p = Prod(axis=1)(x) p = Prod(axis=1)(x)
p2 = Prod(axis=1)(x2) p2 = Prod(axis=1)(x2)
fn = theano.function([x,x2],[p-p2]) fn = theano.function([x,x2],[p-p2], mode=self.mode)
#print "hand computed diff for each row" #print "hand computed diff for each row"
x2_val = numpy.asarray([[1., 2., 3.003], [0.003,5.,6], [0.,0.,9.01]]) x2_val = numpy.asarray([[1., 2., 3.003], [0.003,5.,6], [0.,0.,9.01]])
#print fn(x_val, x2_val) #print fn(x_val, x2_val)
fn2 = theano.function([x],[theano.tensor.grad(p.sum(),x)]) fn2 = theano.function([x],[theano.tensor.grad(p.sum(),x)], mode=self.mode)
#print "real grad" #print "real grad"
#print fn2(x_val) #print fn2(x_val)
fn3 = theano.function([x],[p]) fn3 = theano.function([x],[p], mode=self.mode)
assert numpy.allclose(fn3(x_val), [6.,0.,0.]) assert numpy.allclose(fn3(x_val), [6.,0.,0.])
# now with verify_grad # now with verify_grad
unittest_tools.verify_grad(Prod(axis=1), [x_val]) unittest_tools.verify_grad(Prod(axis=1), [x_val], mode=self.mode)
# second time, with some added complexity # second time, with some added complexity
# verify_grad takes the sum of the matrices anyway # verify_grad takes the sum of the matrices anyway
...@@ -318,11 +323,11 @@ class test_Prod(unittest.TestCase): ...@@ -318,11 +323,11 @@ class test_Prod(unittest.TestCase):
x = theano.tensor.dmatrix() x = theano.tensor.dmatrix()
x_val = numpy.array([[1,2,3],[0,5,6],[0,0,9]], dtype='float32') x_val = numpy.array([[1,2,3],[0,5,6],[0,0,9]], dtype='float32')
pwz = ProdWithoutZeros(axis=1)(x) pwz = ProdWithoutZeros(axis=1)(x)
fn = theano.function([x], pwz) fn = theano.function([x], pwz, mode=self.mode)
assert numpy.allclose(fn(x_val), [6,30,9]) assert numpy.allclose(fn(x_val), [6,30,9])
pwz_a0 = ProdWithoutZeros(axis=0)(x) pwz_a0 = ProdWithoutZeros(axis=0)(x)
fn_a0 = theano.function([x], pwz_a0) fn_a0 = theano.function([x], pwz_a0, mode=self.mode)
assert numpy.allclose(fn_a0(x_val), [1, 10, 162]) assert numpy.allclose(fn_a0(x_val), [1, 10, 162])
def test_other_grad_tests(self): def test_other_grad_tests(self):
...@@ -333,24 +338,32 @@ class test_Prod(unittest.TestCase): ...@@ -333,24 +338,32 @@ class test_Prod(unittest.TestCase):
p = Prod(axis=1) p = Prod(axis=1)
grad_p = theano.tensor.grad(p(x).sum(), x) grad_p = theano.tensor.grad(p(x).sum(), x)
grad_fn = theano.function([x], grad_p) grad_fn = theano.function([x], grad_p, mode=self.mode)
assert numpy.allclose(grad_fn(x_val1), [[6.,3.,2.],[30.,0.,0.],[0.,0.,0.]]) assert numpy.allclose(grad_fn(x_val1), [[6.,3.,2.],[30.,0.,0.],[0.,0.,0.]])
assert numpy.allclose(grad_fn(x_val2), [[0., 0., 2.], [30., 0., 0.], [72., 63., 56.], [0., 0., 90.]]) assert numpy.allclose(grad_fn(x_val2), [[0., 0., 2.], [30., 0., 0.], [72., 63., 56.], [0., 0., 90.]])
p_axis0 = Prod(axis=0) p_axis0 = Prod(axis=0)
grad_p_axis0 = theano.tensor.grad(p_axis0(x).sum(), x) grad_p_axis0 = theano.tensor.grad(p_axis0(x).sum(), x)
grad_fn_axis0 = theano.function([x], grad_p_axis0) grad_fn_axis0 = theano.function([x], grad_p_axis0, mode=self.mode)
assert numpy.allclose(grad_fn_axis0(x_val2), [[0., 400., 0.],[63., 160., 0.], [0., 100., 0.], [0., 80., 0.]]) assert numpy.allclose(grad_fn_axis0(x_val2), [[0., 400., 0.],[63., 160., 0.], [0., 100., 0.], [0., 80., 0.]])
tensor.verify_grad(p, [x_val1], rng=rng) tensor.verify_grad(p, [x_val1], rng=rng, mode=self.mode)
def test_mul_without_zeros_zeros(self):
a = numpy.zeros((3,3))
x = theano.tensor.dmatrix()
mul1 = ProdWithoutZeros(axis=0)(x)
fn_debug = theano.function([x], mul1, mode=self.mode)
fn_debug(a)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() #unittest.main()
#suite = unittest.TestSuite([test_Prod('test_verify_grad')]) suite = unittest.TestSuite([test_Prod('test_mul_without_zeros_zeros')])
#suite.addTest(test_Prod('test_verify_grad_with_zeros')) #suite.addTest(test_Prod('test_verify_grad_with_zeros'))
#suite.addTest(test_Prod('test_prod_without_zeros')) #suite.addTest(test_Prod('test_prod_without_zeros'))
#suite.addTest(test_Prod('test_other_grad_tests')) #suite.addTest(test_Prod('test_other_grad_tests'))
#unittest.TextTestRunner().run(suite) unittest.TextTestRunner().run(suite)
...@@ -1039,6 +1039,34 @@ class T_Scan(unittest.TestCase): ...@@ -1039,6 +1039,34 @@ class T_Scan(unittest.TestCase):
assert updates[b].type.ndim == b.type.ndim assert updates[b].type.ndim == b.type.ndim
def test_scan_as_tensor_on_gradients(self):
"""
Bug reported by cityhall on scan when computing the gradients
"""
to_scan = theano.tensor.dvector('to_scan')
seq = theano.tensor.dmatrix('seq')
f1 = theano.tensor.dscalar('f1')
def scanStep(prev, seq, f1):
return prev + f1 * seq
scanned, _ = theano.scan(fn = scanStep, \
sequences = [seq], \
outputs_info = [to_scan], \
non_sequences = [f1])
f_scan = theano.function(inputs=[to_scan, seq, f1], outputs=scanned)
f_scan([1,2,3], numpy.arange(12).reshape([4,3]), 1.)
t_grad = theano.tensor.grad(scanned.sum(), wrt=[to_scan, f1],
consider_constant=[seq])
f_grad = theano.function(inputs=[to_scan, seq, f1], outputs=t_grad)
f_scan([1,2,3], numpy.arange(12).reshape([4,3]), 1.)
f_grad([1,2,3], numpy.arange(12).reshape([4,3]), 1.)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
""" test code snipet in the Theano tutorials. """ test code snippet in the Theano tutorials.
""" """
import unittest import unittest
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论