merged

a01aca11 · Ian Goodfellow · ac00ff57 · c84ef1d8 · a01aca11 · a01aca11
--- a/EMAIL.txt
+++ b/EMAIL.txt
+===========================
+ Announcing Theano 0.3.1
+===========================
+This is a bug/crash fix and small feature release.
+The upgrade is recommended for everybody.
+For those using the bleeding edge version in the
+mercurial repository, we encourage you to update to the `0.3.1` tag.
+Deleting old cache
+------------------
+Since the default path of the cache directory for compiled object
+changed, we encourage you to delete the previous one.
+The easiest way to do that is to execute:
+    python -c 'import theano; print theano.config.base_compiledir'
+and then call "rm -rf" on the returned result.
+A new cache directory will then be created next time you import theano.
+What's New
+----------
+[Include the content of NEWS.txt here]
+Download
+--------
+You can download Theano from http://pypi.python.org/pypi/Theano.
+Description
+-----------
+Theano is a Python library that allows you to define, optimize, and
+efficiently evaluate mathematical expressions involving
+multi-dimensional arrays. It is built on top of NumPy. Theano
+features:
+ * tight integration with NumPy: a similar interface to NumPy's.
+  numpy.ndarrays are also used internally in Theano-compiled functions.
+ * transparent use of a GPU: perform data-intensive computations up to
+  140x faster than on a CPU (support for float32 only).
+ * efficient symbolic differentiation: Theano can compute derivatives
+  for functions of one or many inputs.
+ * speed and stability optimizations: avoid nasty bugs when computing
+  expressions such as log(1+ exp(x) ) for large values of x.
+ * dynamic C code generation: evaluate expressions faster.
+ * extensive unit-testing and self-verification: includes tools for
+  detecting and diagnosing bugs and/or potential problems.
+Theano has been powering large-scale computationally intensive
+scientific research since 2007, but it is also approachable
+enough to be used in the classroom (IFT6266 at the University of Montreal).
+Resources
+---------
+About Theano:
+http://deeplearning.net/software/theano/
+About NumPy:
+http://numpy.scipy.org/
+About Scipy:
+http://www.scipy.org/
+Machine Learning Tutorial with Theano on Deep Architectures:
+http://deeplearning.net/tutorial/
+Acknowledgments
+---------------
+I would like to thank all contributors of Theano. For this particular
+release, the people who have helped resolve many outstanding issues:
+(in alphabetical order) Frederic Bastien, Arnaud Bergeron, James
+Bergstra, Josh Bleecher Snyder, Olivier Delalleau, Guillaume
+Desjardins, Dumitru Erhan, Ian Goodfellow, Pascal Lamblin, Razvan
+Pascanu and Francois Savard and David Warde-Farley.
+Also, thank you to all NumPy and Scipy developers as Theano builds on
+its strength.
+All questions/comments are always welcome on the Theano
+mailing-lists ( http://deeplearning.net/software/theano/ )
--- a/HISTORY.txt
+++ b/HISTORY.txt
@@ -5,6 +5,30 @@
 Release Notes
 =============
+Theano 0.3 (2010-11-23)
+=======================
+This is the first major release of Theano since 0.1. Version 0.2 development started internally but it was never advertised as a release.
+There have been so many changes since 0.1 that we have lost track of many of them. Below is a *partial* list of changes since 0.1.
+ * GPU code using NVIDIA's CUDA framework is now generated for many Ops.
+ * Some interface changes since 0.1:
+     * A new "shared variable" system to allow reusing memory space between Theano functions.
+         * A new memory contract has been formally written for Theano, for people who want to minimize memory copies.
+     * The old module system has been deprecated.
+     * By default, inputs to a Theano function will not be silently downcasted (e.g. from float64 to float32).
+     * An error is now raised when using the result of logical operation on Theano variable in an 'if' (i.e. an implicit call to __nonzeros__).
+     * An error is now raised when we receive a non-aligned ndarray as input to a function (this is not supported).
+     * An error is raised when the list of dimensions passed to dimshuffle() contains duplicates or is otherwise not sensible.
+     * Call NumPy BLAS bindings for gemv operations in addition to the already supported gemm.
+     * If gcc is unavailable at import time, Theano now falls back to a Python-based emulation mode after raising a warning.
+     * An error is now raised when tensor.grad is called on a non-scalar Theano variable (in the past we would implicitly do a sum on the tensor to make it a scalar).
+     * Added support for "erf" and "erfc" functions.
+ * The current default value of the parameter axis of theano.{max,min,argmax,argmin,max_and_argmax} is deprecated. We now use the default NumPy behavior of operating on the entire tensor.
+ * Theano is now available from PyPI and installable through "easy_install" or "pip".
 Theano 0.1
 ==========

--- a/NEWS.txt
+++ b/NEWS.txt
-Modification in the trunk since the last release
+Modifications in the trunk since the last release
------------------------------------------------
- * Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
+In trunk since 0.3.1 release
- * Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
+----------------------------
+GPU:
+ * Move to the gpu fused elemwise that have other dtype then float32 in them(except float64) if the input and output are float32.
+   * This allow to move elemwise comparaison to the gpu if we cast it to float32 after that.
+Theano 0.3.1 (2011-02-21)
+----------------------------
+Deprecation:
+ * The theano shared variable attribute `value` is deprecated, use `get_value()` or `set_value()`!
+    See http://deeplearning.net/software/theano/tutorial/aliasing.html
+Bugs fixed:
+ * The random number generator in theano/sandbox/rng_mrg.py did not always return the same sequence of number on the CPU and GPU.
+    * In some cases, there was a (possibly large) fraction of non-random garbage in the returned sequence.
+ * In python mode (not the default mode) when input of elemwise operation was an empty ndarray, we were not returning an empty ndarray.
+ * Scan cached the number of steps. This caused no problem because each time you called scan the number of steps would got refreshed.
+   The problem was when you called ScanGrad which would use the cached number of steps without refreshing it.
+   To be affected by this bug, one would have to compile two graph, one that would contain a Scan and the other the corresponding GradScan, and
+   call the first function to cache the number of steps, and then call the second function with a different number of steps.
+ * In GpuConv, errors in conv_patch_stack_reduce when the entire kernel doesn't fit into shared memory.
+   The error was not found before as the impact was less then the relative tolerance of 1e-3. Now the relative tolerance is 1e-5.
+Crash fixed:
+ * Add a feature to not have an exception that makes Theano crash when taking the gradient on DimShuffle in some particular case.
+ * Compilation crash for GpuElemwise with tensor with high number of dimensions (~6 or more).
+ * Disabled C code generator that make gcc crash on complex type.
+ * Crash in optimization when an Op has no input.
+ * Output shape is now computed correctly for matrix-vector multiplication on GPU.
+ * In Scan, when using numbers as inputs, not symbolic variables.
+ * In GradScan, when there is only 1 inputs in the Scan.
+ * In GpuSum, bug in calculation of n_blocks for the 10 pattern. (Sum on the row of a matrix)
+ * Some segfault at exit with GPU code.
+Optimization:
+ * New SpecifyShape op that allow to pass more shape info in the graph.
 * Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
+ * Remove join of only 1 element.
+ * During optimization, consider one more case in get_constant_value.
+GPU:
+ * cuda_shared.value = X now works inplace!
+     * cuda_shared_var.set_value(new_ndarray) will overwrite the old value inplace in the most common case.
+ * Allow to create a CudaNdarraySharedVariable from a CudaNdarray.
+ * New init_gpu_device theano flags.
+ * Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
+ * CPU join of only 1 element that was not moved to the GPU.
+New features:
+ * tensor.reshape now makes dimensions of length 1 broadcastable.
+ * tensor.prod now implements the gradient.
+ * DebugMode now warns if an Op declared itself as returning a view of the input but did not do so.
+    * This behaviour is a problem, because it can block other Ops from being inplace on the same inputs. This could lower the reuse of memory.
+ * Sparse.structured_dot now works when both matrices are sparse
+ * Sparse type is now supported by the shape op, and the ShapeFeature optimizer works correctly with them.
+ * New 3D convolution ops, with CPU and GPU implementations.
+ * New colors in pydotprint.
+Documentation:
+ * Documented lib.amdlibm and (new) init_gpu_device config variables.
+ * A new page (was done for 0.3 but an error was hiding it on the web page) on the memory aliasing contract of Theano.
+ * Revision to the Windows installation instructions.
+ * The cuda documentation is now generated on the web server.
+ * Better documentation of .theanorc and its sections.
+Unit tests:
+ * Stop usage of deprecated functions or syntax in the unit tests.
+ * Better testing of GPU convolution nets.
+ * Make more tests able to use different random seeds.
+ * Tests of sparse now use default mode, not a hard-coded one.
+ * Remove some tests of unimplemented features.
-Theano 0.3 (2010-11-23)
+Other:
-----------------------
+ * The name of compiledir now includes the Python version to make it easier for people with many Python versions
+ * Added theano.tensor.std as a shortcut to sqrt(var(input=input, axis=axis)).
-This is the first major release of Theano since 0.1. Version 0.2 development started internally but it was never advertised as a release.
+ * Whitespace, tabulation and indentation clean-up in the code.
+ * Better detection of memory sharing between variables.
-There have been so many changes since 0.1 that we have lost track of many of them. Below is a *partial* list of changes since 0.1.
- * GPU code using NVIDIA's CUDA framework is now generated for many Ops.
- * Some interface changes since 0.1:
-     * A new "shared variable" system to allow reusing memory space between Theano functions.
-         * A new memory contract has been formally written for Theano, for people who want to minimize memory copies.
-     * The old module system has been deprecated.
-     * By default, inputs to a Theano function will not be silently downcasted (e.g. from float64 to float32).
-     * An error is now raised when using the result of logical operation on Theano variable in an 'if' (i.e. an implicit call to __nonzeros__).
-     * An error is now raised when we receive a non-aligned ndarray as input to a function (this is not supported).
-     * An error is raised when the list of dimensions passed to dimshuffle() contains duplicates or is otherwise not sensible.
-     * Call NumPy BLAS bindings for gemv operations in addition to the already supported gemm.
-     * If gcc is unavailable at import time, Theano now falls back to a Python-based emulation mode after raising a warning.
-     * An error is now raised when tensor.grad is called on a non-scalar Theano variable (in the past we would implicitly do a sum on the tensor to make it a scalar).
-     * Added support for "erf" and "erfc" functions.
- * The current default value of the parameter axis of theano.{max,min,argmax,argmin,max_and_argmax} is deprecated. We now use the default NumPy behavior of operating on the entire tensor.
- * Theano is now available from PyPI and installable through "easy_install" or "pip".
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -45,15 +45,15 @@ master_doc = 'index'
 # General substitutions.
 project = 'Theano'
-copyright = '2008--2010, LISA lab'
+copyright = '2008--2011, LISA lab'
 # The default replacements for |version| and |release|, also used in various
 # other places throughout the built documents.
 #
 # The short X.Y version.
-version = '0.3'
+version = '0.3.1'
 # The full version, including alpha/beta/rc tags.
-release = '0.3.0'
+release = '0.3.1'
 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:

--- a/doc/extending/cop.txt
+++ b/doc/extending/cop.txt
@@ -160,13 +160,17 @@ version that it produces in the code I gave above.
               raise TypeError('%s only works on doubles' % self.name)
           return gof.Apply(self, [x, y], [double()])
-       def perform(self, node, (x, y), (z, )):
+       def perform(self, node, inp, out):
+           x, y = inp
+           z, = out
           z[0] = self.fn(x, y)
       def __str__(self):
           return self.name
-       def c_code(self, node, name, (x, y), (z, ), sub):
+       def c_code(self, node, name, inp, out, sub):
+           x, y = inp
+           z, = out
           return self.ccode % locals()

--- a/doc/extending/op.txt
+++ b/doc/extending/op.txt
@@ -363,7 +363,9 @@ arithmetic operators:
               raise TypeError('%s only works on doubles' % self.name)
           return gof.Apply(self, [x, y], [double()])
-       def perform(self, node, (x, y), (z, )):
+       def perform(self, node, inp, out):
+           x, y = inp
+           z, = out
           z[0] = self.fn(x, y)
       def __str__(self):

--- a/doc/install.txt
+++ b/doc/install.txt
@@ -240,7 +240,12 @@ documentation to know how to configure them differently.
 .. note::
    The tests should be run with the ``THEANO_FLAGS`` ``device=cpu`` (default).
-    Otherwise, it will generate false errors.
+    Otherwise, it will generate false errors. If you have a GPU, it will
+    automatically be used to run GPU-related tests.
+    If you want the GPU-related tests to run on a specific GPU device, and not
+    the default one, you should use :attr:`~config.init_gpu_device`, for
+    instance ``THEANO_FLAGS=init_gpu_device=gpu1``.
 All tests should pass except those marked as ``KnownFailureTest``. If some
 test fails on your machine, you are encouraged to tell us what went wrong on
@@ -248,10 +253,10 @@ the ``theano-users@googlegroups.com`` mailing list.
 .. note::
-    `warn.ignore_bug_before=all` removes warnings that you don't need to see
+    ``warn.ignore_bug_before=all`` removes warnings that you don't need to see
    here.  It is also recommended for a new user to set this flag to a
    different value in their ``.theanorc`` file.  See
-    :attr:`config.warn.ignore_bug_before` for more details.
+    :attr:`.config.warn.ignore_bug_before` for more details.
 Troubleshooting: Make sure you have a BLAS library
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -533,8 +538,8 @@ used within a MinGW Shell (not available if you only installed Python(x,y)).
        C:\Users\login>echo %PYTHONPATH%
        C:\Users\login\Theano
- Create a new ``.theanorc`` text file (or ``.theanorc.txt``, which is easier
+- Create a new ``.theanorc`` text file (or ``.theanorc.txt``, whichever is easier
-  to create under Windows) in your user profile directory (the directory you
+  for you to create under Windows) in your user profile directory (the directory you
  are into when you start a new command prompt with ``cmd``), containing the
  following two lines:
@@ -543,6 +548,16 @@ used within a MinGW Shell (not available if you only installed Python(x,y)).
      [blas]
      ldflags =
+  You do not need to do the following now, because it is not usually needed, but if
+  later on, when running Theano, you see an error message that looks like:
+    *error: 'assert' was not declared in this scope*
+  then you will have to add another section:
+    .. code-block:: cfg
+      [gcc]
+      cxxflags = -IC:\MinGW\include
 - You are now ready to run Theano.
  It will use NumPy for dot products, which is still pretty fast (see below for
  optional instructions on how to compile your own BLAS library).

--- a/doc/internal/how_to_release.txt
+++ b/doc/internal/how_to_release.txt
@@ -4,39 +4,107 @@
 How to make a release
 ==================================================
+Get a fresh copy of the repository
+==================================
 Clone the code::
-    ssh projects@pylearn.org
+    hg clone http://hg.assembla.com/theano Theano-0.X
-    hg clone hg/Theano Theano-0.X
+It does not have to be in your PYTHONPATH.
+Update the version number
+=========================
 Edit ``setup.py`` to contain the newest version number ::
    cd Theano-0.X
    vi setup.py     # Edit the MAJOR, MINOR, MICRO and SUFFIX
-The homepage must link to the download URL, for PyPI to correctly get the
-code.
-Edit ``doc/index.txt`` to contain a link to what will be the download URL::
-    vi doc/index.txt    # Edit the link to downloads/Theano-0.X.tar.gz
 ``conf.py`` in the ``doc/`` directory should be updated in the following ways:
 * Change the ``version`` and ``release`` variables to new version number.
 * Change the upper copyright year to the current year if necessary.
+``NEWS.txt`` usually contains the name and date of the release, change them
+too.
+Tag the release
+===============
+You will need to commit the previous changes, tag the resulting version, and
+push that into the original repository::
 Tag the release. The syntax is something like the following::
-    hg commit -m"setup.py modifications for 0.X release" setup.py
+    hg commit -m"modifications for 0.X release" setup.py doc/conf.py NEWS.txt
    hg tag 0.X
    hg push
+The documentation will be automatically regenerated in the next few hours.
+Generate and upload the package
+===============================
+On PyPI
+-------
 Now change ``ISRELEASED`` in setup.py to ``True``.
 Finally, use setuptools to register and upload the release::
    python setup.py register sdist --formats=gztar,zip upload
+This command register and uploads the package on pypi.python.org. To be able
+to do that, you must register on PyPI (you can create an new account, or use
+OpenID), and be listed among the "Package Index Owners" of Theano.
+On freshmeat
+------------
+Theano project page at freshmeat is `here <http://freshmeat.net/projects/theano>`__.
+The package itself is not uploaded to freshmeat, the only thing to update is
+the description and tags.
+ou can request the rights to add a release from an admin (for instance Fred),
+pointing them to `the "roles" page
+<http://freshmeat.net/projects/theano/roles>`__. Then, create a new release from
+`the "releases" page <http://freshmeat.net/projects/theano/releases>`__.
+On mloss.org
+------------
+Project page is at http://mloss.org/software/view/241/.
+Account jaberg is listed as submitter.
+1. log in as jaberg to mloss
+2. search for theano and click the logo
+3. press 'update this project' on the left and change
+  - the version number
+  - the download link
+  - the description of what has changed 
+4. press save
+Make sure the "what's changed" text isn't too long because it will show up on
+the front page of mloss.  You have to indent bullet lines by 4 spaces I think in
+the description.
+You can "update this project" and save lots of times to get the revision text
+right. Just do not change the version number.
+Finally
+-------
 Change ``ISRELEASED`` back to ``False``.
-Regenerate the documentation.
+Announce the release
+====================
+Generate an e-mail from the template in in ``EMAIL.txt``, including content
+from ``NEWS.txt``, and send it to the following mailing lists:
+* theano-users
+* theano-announce
+* numpy-discussion@scipy.org
+* scipy-user@scipy.org
--- a/doc/library/config.txt
+++ b/doc/library/config.txt
@@ -43,9 +43,8 @@ Environment Variables
 .. envvar:: THEANO_FLAGS
-    This is a list of comma-delimited key[=value] pairs that control
+    This is a list of comma-delimited key=value pairs that control
-    Theano's behavior. A key that appears without an '=value' must be
+    Theano's behavior.
-    for a boolean value, and it acts as setting it to True.
    For example, in bash, you can override your :envvar:`THEANORC` defaults
    for <myscript>.py by typing this:
@@ -101,34 +100,41 @@ import theano and print the config variable, as in:
 .. attribute:: device
-    String value: either 'cpu', 'gpu', 'gpu0', 'gpu1', 'gpu2', or 'gpu3'
+    String value: either ``'cpu'``, ``'gpu'``, ``'gpu0'``, ``'gpu1'``,
+    ``'gpu2'``, or ``'gpu3'``
-    Default device for computations. If gpu*, change the default to try to move computation to it and to put shared variable of float32 on it.
+    Default device for computations. If ``gpu*``, change the default to try
+    to move computation to it and to put shared variable of float32 on
+    it.
    Choose the default compute device for theano graphs.  Setting this to a
-    gpu* string will make theano to try by default to move computation to it.
+    ``gpu*`` string will make theano to try by default to move computation to it.
    Also it will make theano put by default shared variable of float32 on it.
-   'gpu' lets the driver select the gpu to use, while 'gpu?' makes theano try
+    ``'gpu'`` lets the driver select the GPU to use, while ``'gpu?'`` makes Theano try
-    to use a specific device. If we are not able to use the gpu, we fall back
+    to use a specific device. If we are not able to use the GPU, either we fall back
-    to the cpu.
+    on the CPU, or an error is raised, depending on the :attr:`force_device` flag.
 .. attribute:: force_device
-    Bool value: either True or False
+    Bool value: either ``True`` or ``False``
-    Default False
+    Default: ``False``
-    If True, we raise an error if we can't use the specified device. If False, we fall back to the cpu.
+    If ``True``, we raise an error if we cannot use the specified :attr:`device`.
-    Have precedence over the device flag.
+    If ``False``, we fall back to the CPU.
 .. attribute:: init_gpu_device
-    String value: either '', 'gpu0', 'gpu1', 'gpu2', or 'gpu3'
+    String value: either ``''``, ``'gpu'``, ``'gpu0'``, ``'gpu1'``, ``'gpu2'``,
+    or ``'gpu3'``
-    Initialize the gpu device to use. This don't change anything other. So by
+    Initialize the gpu device to use.
-    default we continue to do computation on the cpu and we keep shared variable
+    When its value is gpu*, the theano flag :attr:`device` must be ``"cpu"``.
-    on the cpu memory.
+    Unlike :attr:`device`, setting this flag to a specific GPU will not
+    try to use this device by default, in particular it will **not** move
+    computations, nor shared variables, to the specified GPU.
-    When its value is gpu*, the theano flag device must be cpu.
+    This flag is useful to run GPU-specific tests on a particular GPU, instead
+    of using the default one.
 .. attribute:: floatX

--- a/doc/sandbox/sandbox.txt
+++ b/doc/sandbox/sandbox.txt
@@ -89,9 +89,13 @@ write an Op:**
            return x * numpy.log(x)
        def impl(self, x):
            return XlogX.st_impl(x)
-        def grad(self, (x,), (gz,)):
+        def grad(self, inp, grads):
+            x, = inp
+            gz, = grads
            return [gz * (1 + scalar.log(x))]
-        def c_code(self, node, name, (x,), (z,), sub):
+        def c_code(self, node, name, inp, out, sub):
+            x, = inp
+            z, = out
            if node.inputs[0].type in [scalar.float32, scalar.float64]:
                return """%(z)s =
                    %(x)s == 0.0

--- a/doc/tutorial/aliasing.txt
+++ b/doc/tutorial/aliasing.txt
@@ -77,12 +77,13 @@ subsequently make to ``np_array`` have no effect on our shared variable.
    np_array += 1 # now it is an array of 2.0 s
-    s_default.value  # -> array([1.0, 1.0])
+    s_default.get_value()  # -> array([1.0, 1.0])
-    s_false.value    # -> array([1.0, 1.0])
+    s_false.get_value()    # -> array([1.0, 1.0])
-    s_true.value    # -> array([2.0, 2.0])
+    s_true.get_value()     # -> array([2.0, 2.0])
 If we are running this with the CPU as the device,
-then changes we make to np_array *right away* will show up in ``s_true.value``
+then changes we make to np_array *right away* will show up in
+``s_true.get_value()``
 because numpy arrays are mutable, and ``s_true`` is using the ``np_array``
 object as it's internal buffer.
@@ -101,8 +102,8 @@ will terminate the aliasing).
 It is safe practice (and a good idea) to use ``borrow=True`` in a shared
 variable constructor when the shared variable stands for a large object (in
-terms of memory footprint) and you do not want to create copies of it in memory
+terms of memory footprint) and you do not want to create copies of it in
-.
+memory.
 It is not a reliable technique to use ``borrow=True`` to modify shared variables
 by side-effect, because with some devices (e.g. GPU devices) this technique will

--- a/doc/tutorial/examples.txt
+++ b/doc/tutorial/examples.txt
@@ -256,10 +256,11 @@ internal state, and returns the old state value.
 This code introduces a few new concepts.  The ``shared`` function constructs
 so-called :term:`shared variables`.  These are hybrid symbolic and non-symbolic
 variables.  Shared variables can be used in symbolic expressions just like
-the objects returned by ``dmatrices(...)`` but they also have a ``.value``
+the objects returned by ``dmatrices(...)`` but they also have an internal
-property that defines the value taken by this symbolic variable in *all* the
+value, that defines the value taken by this symbolic variable in *all* the
 functions that use it.  It is called a *shared* variable because its value is
-shared between many functions.  We will come back to this soon.
+shared between many functions.  The value can be accessed and modified by the
+``.get_value()`` and ``.set_value()`` methods. We will come back to this soon.
 The other new thing in this code is the ``updates`` parameter of function.
 The updates is a list of pairs of the form (shared-variable, new expression).
@@ -274,23 +275,23 @@ Anyway, let's try it out!
 .. If you modify this code, also change :
 .. theano/tests/test_tutorial.py:T_examples.test_examples_8
->>> state.value
+>>> state.get_value()
 array(0)
 >>> accumulator(1)
 array(0)
->>> state.value
+>>> state.get_value()
 array(1)
 >>> accumulator(300)
 array(1)
->>> state.value
+>>> state.get_value()
 array(301)
-It is possible to reset the state. Just assign to the ``.value`` property:
+It is possible to reset the state. Just use the ``.set_value()`` method:
->>> state.value = -1
+>>> state.set_value(-1)
 >>> accumulator(3)
 array(-1)
->>> state.value
+>>> state.get_value()
 array(2)
 As we mentioned above, you can define more than one function to use the same
@@ -302,7 +303,7 @@ shared variable.  These functions can both update the value.
 >>> decrementor = function([inc], state, updates=[(state, state-inc)])
 >>> decrementor(2)
 array(2)
->>> state.value
+>>> state.get_value()
 array(0)
 You might be wondering why the updates mechanism exists.  You can always
@@ -329,7 +330,7 @@ for the purpose of one particular function.
        givens=[(state, foo)])
 >>> skip_shared(1, 3)  # we're using 3 for the state, not state.value
 array(7)
->>> state.value        # old state still there, but we didn't use it
+>>> state.get_value()  # old state still there, but we didn't use it
 array(0)
 The givens parameter can be used to replace any symbolic variable, not just a
@@ -411,9 +412,11 @@ Seedings Streams
 Random variables can be seeded individually or collectively.
 You can seed just one random variable by seeding or assigning to the
-``.rng.value`` attribute.
+``.rng`` attribute, using ``.rng.set_value()``.
->>> rv_u.rng.value.seed(89234)  # seeds the generator for rv_u
+>>> rng_val = rv_u.rng.get_value(borrow=True)   # Get the rng for rv_u
+>>> rng_val.seed(89234)                         # seeds the generator
+>>> rv_u.rng.set_value(rng_val, borrow=True)    # Assign back seeded rng
 You can also seed *all* of the random variables allocated by a :class:`RandomStreams`
 object by that object's ``seed`` method.  This seed will be used to seed a
@@ -431,10 +434,12 @@ update the state of the generators used in function ``f`` above.
 For example:
->>> state_after_v0 = rv_u.rng.value.get_state()
+>>> state_after_v0 = rv_u.rng.get_value().get_state()
 >>> nearly_zeros()       # this affects rv_u's generator
 >>> v1 = f()
->>> rv_u.rng.value.set_state(state_after_v0)
+>>> rng = rng.get_value(borrow=True)
+>>> rng.set_state(state_after_v0)
+>>> rv_u.rng.set_value(rng, borrow=True)
 >>> v2 = f()             # v2 != v1

--- a/doc/tutorial/loading_and_saving.txt
+++ b/doc/tutorial/loading_and_saving.txt
@@ -133,7 +133,8 @@ matrix ``W`` and a bias ``b``, you can define:
    def __getstate__(self):
        return (W, b)
-    def __setstate__(self, (W,b)):
+    def __setstate__(self, state):
+        W, b = state
        self.W = W
        self.b = b
@@ -146,7 +147,8 @@ functions to reflect the change in name:
    def __getstate__(self):
        return (weights, bias)
-    def __setstate__(self, (W,b)):
+    def __setstate__(self, state):
+        W, b = state
        self.weights = W
        self.bias = b

--- a/setup.py
+++ b/setup.py
@@ -47,7 +47,7 @@ AUTHOR_EMAIL        = "theano-dev@googlegroups.com"
 PLATFORMS           = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
 MAJOR               = 0
 MINOR               = 3
-MICRO               = 0
+MICRO               = 1
 SUFFIX              = ""  # Should be blank except for rc's, betas, etc.
 ISRELEASED          = False

--- a/theano/__init__.py
+++ b/theano/__init__.py
@@ -69,8 +69,8 @@ FancyModule = Module
 from printing import \
    pprint, pp
+import scan as scan_module
-from scan import scan,map, reduce, foldl, foldr
+from scan import scan, map, reduce, foldl, foldr, Scan, ScanGrad
 import tensor
 import scalar

--- a/theano/compile/debugmode.py
+++ b/theano/compile/debugmode.py
@@ -381,6 +381,7 @@ class InvalidValueError(DebugModeError):
        client_node = self.client_node
        hint = self.hint
        specific_hint = self.specific_hint
+        context = debugprint(r, prefix='  ', depth=12, file=StringIO()).getvalue()
        return """InvalidValueError
        type(variable) = %(type_r)s
        variable       = %(r)s
@@ -394,6 +395,7 @@ class InvalidValueError(DebugModeError):
        client_node    = %(client_node)s
        hint           = %(hint)s
        specific_hint  = %(specific_hint)s
+        context        = ...\n%(context)s
        """ % locals()
 ########################
@@ -403,8 +405,9 @@ class InvalidValueError(DebugModeError):
 ########################
+def debugprint(r, prefix='', depth=-1, done=None, print_type=False,
-def debugprint(r, prefix='', depth=-1, done=None, print_type=False, file=sys.stdout, print_destroy_map=False, print_view_map=False):
+               file=sys.stdout, print_destroy_map=False, print_view_map=False,
+               order=[]):
    """Print the graph leading to `r` to given depth.
    :param r: Variable instance
@@ -415,6 +418,7 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False, file=sys.std
    :param file: file-like object to which to print
    :param print_destroy_map: wether to print the op destroy_map after ofther info
    :param print_view_map: wether to print the op view_map after ofther info
+    :param order: If not empty will print the index in the toposort.
    """
    if depth==0:
        return
@@ -452,22 +456,28 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False, file=sys.std
        if view_map_str and view_map_str!='{}':
            view_map_str='v='+view_map_str
+        o=''
+        if order:
+            o = str(order.index(r.owner))
        if len(a.outputs) == 1:
-            print >> file, '%s%s [@%i]%s \'%s\' %s %s' % (prefix, a.op, id(r),
+            print >> file, '%s%s [@%i]%s \'%s\' %s %s %s' % (prefix, a.op, id(r),
                                                          type_str, r_name,
                                                          destroy_map_str,
-                                                          view_map_str)
+                                                          view_map_str,
+                                                          o)
        else:
-            print >> file, '%s%s.%i [@%i]%s \'%s\' %s %s' % (prefix, a.op,
+            print >> file, '%s%s.%i [@%i]%s \'%s\' %s %s %s' % (prefix, a.op,
                                                             a.outputs.index(r),
                                                             id(r), type_str,
                                                             r_name,
                                                             destroy_map_str,
-                                                             view_map_str)
+                                                             view_map_str,
+                                                             o)
        if id(a) not in done:
            done.add(id(a))
            for i in a.inputs:
-                debugprint(i, prefix+' |', depth=depth-1, done=done, print_type=print_type, file=file)
+                debugprint(i, prefix+' |', depth=depth-1, done=done,
+                           print_type=print_type, file=file, order=order)
    else:
        #this is a variable
        print >> file, '%s%s [@%i]%s' % (prefix, r, id(r), type_str)
@@ -532,20 +542,16 @@ def _check_inputs(node, storage_map, r_vals, dr_vals, active_nodes, clobber_dr_v
        for oo,ii in vmap.iteritems():
            out_var = storage_map[node.outputs[oo]][0]
            in_var = storage_map[node.inputs[ii[0]]][0]
-            # We don't try to optimize simple scalar, as this is not worth our time
+            # We don't try to optimize simple scalar and empty ndarray,
-            # This happen at least in Subtensor when the output is a scalar
+            # as this is not worth our time. This happen at least in
-            # But this depend on the version of numpy!
+            # Subtensor when the output is a scalar But this depend on
-            if getattr(out_var,'size',2)==1:
+            # the version of numpy!
+            if getattr(out_var,'size',2)<=1:
                continue
            if isinstance(node.op, theano.compile.mode.OutputGuard):
                # This class is not in the final graph.
                continue
            if not _may_share_memory(out_var, in_var):
-                #when a subtensor return a tensor of ndim==0, numpy seam to return a copy.
-                #when have an empty ndarray(happen with output guard) it is not the same. why?
-                if hasattr(out_var,'ndim') and (out_var.ndim>0 and out_var.size>0):
-                    continue
                opt_warning("input idx %d marked as viewed but new memory allocated by node '%s'"%(ii[0],str(node)))
    for r_idx, r in enumerate(node.inputs):
@@ -1678,6 +1684,9 @@ class DebugMode(Mode):
        If any of these arguments (except optimizer) is not None, it overrides the class default.
        The linker arguments is not used. It is set their to allow Mode.requiring() and some other fct to work with DebugMode too.
        """
+        if linker is not None and not issubclass(linker, _Linker):
+            raise Exception("DebugMode can use only its own linker! Don't give him one to use it.", linker)
        super(DebugMode, self).__init__(
                optimizer=optimizer,
                linker=_Linker)

--- a/theano/compile/function_module.py
+++ b/theano/compile/function_module.py
@@ -223,6 +223,8 @@ class DeepCopyOp(theano.gof.Op):
        }
        """%locals()
+        else:
+            super(DeepCopyOp, self).c_code(node, name, inames, onames, sub)
 deep_copy_op = DeepCopyOp()
@@ -511,7 +513,7 @@ class Function(object):
        # Set positional arguments
        i = 0
-        for arg_index, arg in enumerate(args):
+        for arg in args:
            #TODO: provide a Param option for skipping the filter if we
            #      really want speed.
            s = self.input_storage[i]
@@ -523,7 +525,7 @@ class Function(object):
                            allow_downcast=s.allow_downcast)
                except Exception, e:
-                    e.args = tuple(list(e.args)+["Bad input argument at index %d" % arg_index])
+                    e.args = tuple(list(e.args)+["Bad input argument at index %d" % i])
                    raise
            s.provided += 1
            i+=1
@@ -868,9 +870,11 @@ class FunctionMaker(object):
        optimizer, linker = mode.optimizer, copy.copy(mode.linker)
        # optimize the env
-        t0 = time.time()
+        start_optimizer = time.time()
        optimizer(env)
-        _logger.debug('Optimizing took %f seconds' % (time.time() - t0))
+        end_optimizer = time.time()
+        mode.optimizer_time += end_optimizer - start_optimizer
+        _logger.debug('Optimizing took %f seconds' % (end_optimizer - start_optimizer))
        # This loop was inserted to remove aliasing between outputs when they all
        # evaluete to the same value. Originally it was OK for outputs to be aliased,
@@ -978,9 +982,11 @@ class FunctionMaker(object):
        # Get a function instance
-        t0 = time.time()
+        start_linker = time.time()
        _fn, _i, _o = self.linker.make_thunk(input_storage = input_storage_lists)
-        _logger.debug('Linking took %f seconds' % (time.time() - t0))
+        end_linker = time.time()
+        _logger.debug('Linker took %f seconds' % (end_linker - start_linker))
+        self.mode.linker_time += end_linker - start_linker
        fn = self.function_builder(_fn, _i, _o, self.indices, self.outputs, defaults, self.unpack_single, self.return_none, self)
        return fn
@@ -1226,4 +1232,3 @@ def get_info_on_inputs(named_inputs, n_unnamed_inputs):
                    get_plural(n_unnamed_inputs),
                    get_plural(n_unnamed_inputs)))
    return msg
--- a/theano/compile/mode.py
+++ b/theano/compile/mode.py
@@ -99,11 +99,15 @@ class OutputGuard(gof.Op):
        return type(self) == type(other)
    def __hash__(self):
        return hash(type(self))
-    def perform(self, node, (x,), (z,)):
+    def perform(self, node, inp, out):
+        x, = inp
+        z, = out
        z[0] = x
    def __str__(self):
        return '%s' % self.__class__.__name__
-    def c_code(self, node, nodename, (x,), (z,), sub):
+    def c_code(self, node, nodename, inp, out, sub):
+        x, = inp
+        z, = out
        return """
        Py_XDECREF(%(z)s);
        %(z)s = %(x)s;
@@ -209,7 +213,8 @@ class Mode(object):
    def __getstate__(self):
        return (self.provided_linker, self.provided_optimizer)
-    def __setstate__(self, (linker, optimizer)):
+    def __setstate__(self, state):
+        linker, optimizer = state
        self.provided_linker = linker
        self.provided_optimizer = optimizer
        if isinstance(linker, str) or linker is None:
@@ -222,6 +227,8 @@ class Mode(object):
        self._optimizer = optimizer
        self.call_time = 0
        self.fn_time = 0
+        self.optimizer_time = 0
+        self.linker_time = 0
    def __str__(self):
        return "Mode(linker = %s, optimizer = %s)" % (self.provided_linker, self.provided_optimizer)
@@ -282,10 +289,14 @@ def get_mode(orig_string):
    if string in ['Mode','ProfileMode','DebugMode']:
        if instanciated_default_mode:
            return instanciated_default_mode
+        if string == 'DebugMode':
            #need to import later to break circular dependency.
-        from profilemode import ProfileMode,prof_mode_instance_to_print
            from debugmode import DebugMode
+            #DebugMode use its own linker.
+            ret = DebugMode(optimizer=config.optimizer)
+        else:
+            # The import is needed in case string is ProfileMode
+            from profilemode import ProfileMode
            ret = eval(string+'(linker=config.linker, optimizer=config.optimizer)')
    elif not predefined_modes.has_key(string):
@@ -303,6 +314,8 @@ def get_mode(orig_string):
    #must tell python to print the summary at the end.
    if string == 'ProfileMode':
+        #need to import later to break circular dependency.
+        from profilemode import prof_mode_instance_to_print
        prof_mode_instance_to_print.append(ret)
    return ret
@@ -318,4 +331,3 @@ def register_mode(name, mode):
    if name in predefined_modes:
        raise ValueError('Mode name already taken: %s' % name)
    predefined_modes[name] = mode
--- a/theano/compile/profilemode.py
+++ b/theano/compile/profilemode.py
--- a/theano/compile/sandbox/__init__.py
+++ b/theano/compile/sandbox/__init__.py
-import sys
+import warnings
-print >> sys.stderr, "DEPRECATION: theano.compile.sandbox no longer provides shared, shared_constructor, and pfunc.  They have been moved to theano.compile."
+warnings.warn("theano.compile.sandbox no longer provides shared, shared_constructor, and pfunc.  They have been moved to theano.compile.", DeprecationWarning)
 from theano.compile.sharedvalue import shared, shared_constructor
 from theano.compile.pfunc import pfunc
--- a/theano/compile/sharedvalue.py
+++ b/theano/compile/sharedvalue.py
 """Provide a simple user friendly API to Theano-managed memory"""
 __docformat__ = 'restructuredtext en'
-import traceback
+# Standard imports
 import copy
+import logging
+import traceback
+import warnings
+# Theano imports
+from theano import config
+from theano.configparser import (TheanoConfigParser, AddConfigVar, EnumStr,
+        StrParam, IntParam, FloatParam, BoolParam)
 from theano.gof import Container, Variable, generic
-import logging
 _logger = logging.getLogger('theano.compile.sharedvalue')
 _logger.setLevel(logging.DEBUG)
 def debug(*msg): _logger.debug(' '.join(str(m) for m in msg))
@@ -14,13 +21,11 @@ def warn(*msg): _logger.warn(' '.join(str(m) for m in msg))
 def warning(*msg): _logger.warning(' '.join(str(m) for m in msg))
 def error(*msg): _logger.error(' '.join(str(m) for m in msg))
-from theano.configparser import TheanoConfigParser, AddConfigVar, EnumStr, StrParam, IntParam, FloatParam, BoolParam
-from theano import config
 AddConfigVar('shared.value_borrows',
-        ("False: shared variables 'value' property is guaranteed to not" 
+        ("DEPRECATED. You should not use the 'value' property of shared"
-            " alias theano-managed memory. True: no guarantee, but faster." 
+            " variables, but use the .get_value() and .set_value() methods."
-            " For more control consider using shared.get_value() instead."),
+            " False: shared variables 'value' property is guaranteed to not"
+            " alias theano-managed memory. True: no guarantee, but faster."),
        BoolParam(True))
 class SharedVariable(Variable):
@@ -123,8 +128,14 @@ class SharedVariable(Variable):
        return cp
    def _value_get(self):
+        warnings.warn(("The .value property of shared variables is deprecated."
+            " You should use the .get_value() method instead."),
+            stacklevel=2)
        return self.get_value(borrow=config.shared.value_borrows, return_internal_type=False)
    def _value_set(self, new_value):
+        warnings.warn(("The .value property of shared variables is deprecated."
+            " You should use the .set_value() method instead."),
+            stacklevel=2)
        return self.set_value(new_value, borrow=config.shared.value_borrows)
    #TODO: USE A CONFIG VARIABLE TO set these get/set methods to the non-borrowing versions
@@ -132,9 +143,11 @@ class SharedVariable(Variable):
    #      default.  The default support transparently (if slowly) when the 'raw' value is in a
    #      different memory space (e.g. GPU or other machine).
    value = property(_value_get, _value_set,
-            doc=("shortcut for self.get_value() and self.set_value()." 
+            doc=("DEPRECATED. Shortcut for self.get_value() and "
+                 "self.set_value(). "
                 "The `borrow` argument to these methods is read from "
-                "`theano.config.shared.value_borrows`"))
+                 "`theano.config.shared.value_borrows`. "
+                 "You should call get_value() and set_value() directly."))
    def filter_update(self, update):
@@ -194,4 +207,3 @@ def generic_constructor(value, name=None, strict=False, allow_downcast=None):
    """SharedVariable Constructor"""
    return SharedVariable(type=generic, value=value, name=name, strict=strict,
            allow_downcast=allow_downcast)
--- a/theano/compile/tests/test_debugmode.py
+++ b/theano/compile/tests/test_debugmode.py
@@ -31,14 +31,18 @@ class BROKEN_ON_PURPOSE_Add(gof.Op):
        r = gof.Apply(self, [a, b], [a.type()])
        return r
-    def perform(self, node, (a, b), (out,)):
+    def perform(self, node, inp, out_):
+        a, b = inp
+        out, = out_
        z = a+b
        #ERROR TO ADD THIS CRAPPY OFFSET
        if self.py_offset:
            out[0] = z+0.5
        else: out[0] = z
-    def c_code(self, node, name, (a, b), (z,), sub):
+    def c_code(self, node, name, inp, out, sub):
+        a, b = inp
+        z, = out
        return """
        if (%(a)s->nd != 1) {PyErr_SetString(PyExc_NotImplementedError, "rank(a) != 1"); %(fail)s;}
        if (%(b)s->nd != 1) {PyErr_SetString(PyExc_NotImplementedError, "rank(b) != 1"); %(fail)s;}
@@ -100,7 +104,9 @@ class WeirdBrokenOp(gof.Op):
        r = gof.Apply(self, [a_], [a_.type()])
        return r
-    def dontuse_perform(self, node, (a,), (out,)):
+    def dontuse_perform(self, node, inp, out_):
+        a, = inp
+        out, = out_
        if self.behaviour == 'times2':
            out[0] = a * 2
        elif self.behaviour == 'times2_inplace':
@@ -113,7 +119,9 @@ class WeirdBrokenOp(gof.Op):
        else:
            raise ValueError(self.behaviour)
-    def c_code(self, node, name, (a,), (z,), sub):
+    def c_code(self, node, name, inp, out, sub):
+        a, = inp
+        z, = out
        if "inplace" in self.behaviour:
            z_code = """
            {Py_XDECREF(%(z)s);}
@@ -253,7 +261,9 @@ def test_baddestroymap():
        def make_node(self, a, b):
            c = a.type()
            return gof.Apply(self, [a,b], [c])
-        def perform(self, node, (a,b), (c,)):
+        def perform(self, node, inp, out):
+            a, b = inp
+            c, = out
            c[0] = a
            c[0] += b
@@ -283,14 +293,18 @@ class Test_ViewMap(unittest.TestCase):
        def make_node(self, a, b):
            c = b.type()
            return gof.Apply(self, [a,b], [c])
-        def perform(self, node, (a,b), (c,)):
+        def perform(self, node, inp, out):
+            a, b = inp
+            c, = out
            c[0] = b
    class BadAddSlice(gof.Op):
        def make_node(self, a, b):
            c = b.type()
            return gof.Apply(self, [a,b], [c])
-        def perform(self, node, (a,b), (c,)):
+        def perform(self, node, inp, out):
+            a, b = inp
+            c, = out
            c[0] = b[1:3]
    def test_badviewmap_ref(self):
@@ -343,7 +357,9 @@ class Test_ViewMap(unittest.TestCase):
                c = a.type()
                d = a.type()
                return gof.Apply(self, [a,b], [c,d])
-            def perform(self, node, (a,b), (c,d)):
+            def perform(self, node, inp, out):
+                a, b = inp
+                c, d = out
                c[0] = a
                d[0] = a[1:]
@@ -364,7 +380,9 @@ class Test_ViewMap(unittest.TestCase):
                c = a.type()
                d = a.type()
                return gof.Apply(self, [a,b], [c,d])
-            def perform(self, node, (a,b), (c,d)):
+            def perform(self, node, inp, out):
+                a, b = inp
+                c, d = out
                r = a * 2
                c[0] = r
                d[0] = r[1:]
@@ -387,7 +405,9 @@ class Test_ViewMap(unittest.TestCase):
                c = a.type()
                d = a.type()
                return gof.Apply(self, [a,b], [c,d])
-            def perform(self, node, (a,b), (c,d)):
+            def perform(self, node, inp, out):
+                a, b = inp
+                c, d = out
                r = a * 1
                c[0] = r
                d[0] = r[1:]
@@ -409,7 +429,9 @@ class Test_ViewMap(unittest.TestCase):
                c = a.type()
                d = a.type()
                return gof.Apply(self, [a,b], [c,d])
-            def perform(self, node, (a,b), (c,d)):
+            def perform(self, node, inp, out):
+                a, b = inp
+                c, d = out
                r = a * 1
                c[0] = r[:-1]
                d[0] = r[1:]

--- a/theano/compile/tests/test_function_module.py
+++ b/theano/compile/tests/test_function_module.py
@@ -9,8 +9,9 @@ from theano.compile import function
 from theano import tensor
 from theano import tensor as T
 import random, theano
-import numpy as N
+import numpy as N
+from numpy.testing.noseclasses import KnownFailureTest
 PatternOptimizer = lambda p1, p2, ign=True: gof.OpKeyOptimizer(gof.PatternSub(p1, p2), ignore_newtrees=ign)
@@ -33,7 +34,7 @@ class T_function(unittest.TestCase):
        fn = function([], None) #ok
        rval = fn()
        if rval == []:
-            print >> sys.stderr, 'WARNING: ticket #254'
+            raise KnownFailureTest('See #254: Using None as function output leads to [] return value')
        else:
            assert rval is None
@@ -598,4 +599,3 @@ if __name__ == '__main__':
            assert b
        t.failUnless = fu
        t.test_deepcopy_shared_container()
--- a/theano/compile/tests/test_inplace_opt_for_value.py
+++ b/theano/compile/tests/test_inplace_opt_for_value.py
@@ -104,7 +104,8 @@ class TanhRnn(Op):
        z = x.type() #make a new symbolic variable with the same type as x
        return Apply(self, [x, z0, A], [z])
-    def perform(self, node, (x,z0,A), out):
+    def perform(self, node, inp, out):
+        x, z0, A = inp
        assert x is not None
        assert z0 is not None
        assert A is not None
@@ -115,7 +116,9 @@ class TanhRnn(Op):
            z[i+1] = N.tanh(N.dot(z[i], A) + x[i])
        out[0][0] = z
-    def grad(self, (x, z0, A), (gz,)):
+    def grad(self, inp, grads):
+        x, z0, A = inp
+        gz, = grads
        z = tanh_rnn(x, z0, A)
        gz_incl_rnn, gx = tanh_rnn_grad(A, z, gz)
        return [gx, gz_incl_rnn[0], (T.dot(z[:-1].T, gx))]
@@ -136,7 +139,8 @@ class TanhRnnGrad(Op):
    def make_node(self, A, z, gz):
        return Apply(self, [A,z,gz], (z.type(), gz.type()))
-    def perform(self, node, (A, z, gz), out):
+    def perform(self, node, inp, out):
+        A, z, gz = inp
        Tp1,M = z.shape
        T = Tp1 - 1
        gx = N.zeros((T, M))

--- a/theano/compile/tests/test_module.py
+++ b/theano/compile/tests/test_module.py
@@ -735,6 +735,8 @@ def test_pickle_aliased_memory():
    sio = StringIO.StringIO()
    handler = logging.StreamHandler(sio)
    logging.getLogger('theano.compile.function_module').addHandler(handler)
+    # Silence original handler when intentionnally generating warning messages
+    logging.getLogger('theano').removeHandler(theano.logging_default_handler)
    try:
        m.f.pickle_aliased_memory_strategy = 'warn'
        m.g.pickle_aliased_memory_strategy = 'warn'
@@ -742,6 +744,7 @@ def test_pickle_aliased_memory():
        assert sio.getvalue().startswith('aliased relat')
    finally:
        logging.getLogger('theano.compile.function_module').removeHandler(handler)
+        logging.getLogger('theano').addHandler(theano.logging_default_handler)
    try:
        m.f.pickle_aliased_memory_strategy = 'raise'
@@ -830,4 +833,3 @@ if __name__ == '__main__':
 #    t.test_shared_members()
 #    tests = unittest.TestLoader().loadTestsFromModule("T_test_module")
 #    tests.debug()
--- a/theano/compile/tests/test_pfunc.py
+++ b/theano/compile/tests/test_pfunc.py
--- a/theano/compile/tests/test_shared.py
+++ b/theano/compile/tests/test_shared.py
@@ -44,8 +44,8 @@ class Test_SharedVariable(unittest.TestCase):
        u = shared('asdf', strict=False)
        v = shared('asdf', strict=True)
-        u.value = 88
+        u.set_value(88)
-        v.value = 88
+        v.set_value(88)
    def test_create_numpy_strict_false(self):
@@ -96,14 +96,14 @@ class Test_SharedVariable(unittest.TestCase):
                strict=False)
        # check that assignments to value are casted properly
-        u.value = [3,4]
+        u.set_value([3,4])
-        assert type(u.value) is numpy.ndarray
+        assert type(u.get_value()) is numpy.ndarray
-        assert str(u.value.dtype) == 'float64'
+        assert str(u.get_value(borrow=True).dtype) == 'float64'
-        assert numpy.all(u.value == [3,4])
+        assert numpy.all(u.get_value() == [3,4])
        # check that assignments of nonsense fail
        try:
-            u.value = 'adsf'
+            u.set_value('adsf')
            assert 0
        except ValueError:
            pass
@@ -114,7 +114,8 @@ class Test_SharedVariable(unittest.TestCase):
        assert u.get_value(borrow=True) is uval
    def test_scalar_strict(self):
-        def f(var, val): var.value = val
+        def f(var, val):
+            var.set_value(val)
        b = shared(numpy.int64(7), strict=True)
        assert b.type == theano.tensor.lscalar
@@ -154,7 +155,8 @@ class Test_SharedVariable(unittest.TestCase):
    def test_tensor_strict(self):
-        def f(var, val): var.value = val
+        def f(var, val):
+            var.set_value(val)
        b = shared(numpy.int64([7]), strict=True)
        assert b.type == theano.tensor.lvector
@@ -206,47 +208,48 @@ class Test_SharedVariable(unittest.TestCase):
        # Since downcasting of a value now raises an Exception,
-        def f(var, val): var.value = val
+        def f(var, val):
+            var.set_value(val)
        b = shared(numpy.int64(7), allow_downcast=True)
        assert b.type == theano.tensor.lscalar
        f(b,8.23)
-        assert b.value==8
+        assert b.get_value()==8
        b = shared(numpy.int32(7), allow_downcast=True)
        assert b.type == theano.tensor.iscalar
        f(b,8.23)
-        assert b.value==8
+        assert b.get_value()==8
        b = shared(numpy.int16(7), allow_downcast=True)
        assert b.type == theano.tensor.wscalar
        f(b,8.23)
-        assert b.value==8
+        assert b.get_value()==8
        b = shared(numpy.int8(7), allow_downcast=True)
        assert b.type == theano.tensor.bscalar
        f(b,8.23)
-        assert b.value==8
+        assert b.get_value()==8
        b = shared(numpy.float64(7.234), allow_downcast=True)
        assert b.type == theano.tensor.dscalar
        f(b,8)
-        assert b.value==8
+        assert b.get_value()==8
        b = shared(numpy.float32(7.234), allow_downcast=True)
        assert b.type == theano.tensor.fscalar
        f(b,8)
-        assert b.value==8
+        assert b.get_value()==8
        b = shared(numpy.float(7.234), allow_downcast=True)
        assert b.type == theano.tensor.dscalar
        f(b,8)
-        assert b.value==8
+        assert b.get_value()==8
        b = shared(7.234, allow_downcast=True)
        assert b.type == theano.tensor.dscalar
        f(b,8)
-        assert b.value==8
+        assert b.get_value()==8
        c = shared(numpy.zeros((5,5), dtype='float32'), allow_downcast=True)
        self.failUnlessRaises(TypeError, f, b, numpy.random.rand(5,5))
@@ -254,37 +257,38 @@ class Test_SharedVariable(unittest.TestCase):
    def test_tensor_floatX(self):
-        def f(var, val): var.value = val
+        def f(var, val):
+            var.set_value(val)
        b = shared(numpy.int64([7]), allow_downcast=True)
        assert b.type == theano.tensor.lvector
        f(b,[8.23])
-        assert b.value == 8
+        assert b.get_value() == 8
        b = shared(numpy.int32([7]), allow_downcast=True)
        assert b.type == theano.tensor.ivector
        f(b,[8.23])
-        assert b.value == 8
+        assert b.get_value() == 8
        b = shared(numpy.int16([7]), allow_downcast=True)
        assert b.type == theano.tensor.wvector
        f(b,[8.23])
-        assert b.value == 8
+        assert b.get_value() == 8
        b = shared(numpy.int8([7]), allow_downcast=True)
        assert b.type == theano.tensor.bvector
        f(b,[8.23])
-        assert b.value == 8
+        assert b.get_value() == 8
        b = shared(numpy.float64([7.234]), allow_downcast=True)
        assert b.type == theano.tensor.dvector
        f(b,[8])
-        assert b.value == 8
+        assert b.get_value() == 8
        b = shared(numpy.float32([7.234]), allow_downcast=True)
        assert b.type == theano.tensor.fvector
        f(b,[8])
-        assert b.value == 8
+        assert b.get_value() == 8
 #numpy.float([7.234]) don't work
 #        b = shared(numpy.float([7.234]))
@@ -299,10 +303,7 @@ class Test_SharedVariable(unittest.TestCase):
        b = shared(numpy.asarray([7.234],dtype=theano.config.floatX), allow_downcast=True)
        assert b.dtype == theano.config.floatX
        f(b,[8])
-        assert b.value == 8
+        assert b.get_value() == 8
        c = shared(numpy.zeros((5,5), dtype='float32'), allow_downcast=True)
        self.failUnlessRaises(TypeError, f, b, numpy.random.rand(5,5))
--- a/theano/configdefaults.py
+++ b/theano/configdefaults.py
@@ -28,7 +28,7 @@ AddConfigVar('init_gpu_device',
         "Unlike 'device', setting this option will NOT move computations, "
         "nor shared variables, to the specified GPU. "
         "It can be used to run GPU-specific tests on a particular GPU."),
-        EnumStr('', 'gpu0', 'gpu1', 'gpu2', 'gpu3',
+        EnumStr('', 'gpu', 'gpu0', 'gpu1', 'gpu2', 'gpu3',
                allow_override=False)
        )
@@ -49,13 +49,12 @@ try:
    subprocess.Popen('gcc', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # Keep the default linker the same as the one for the mode FAST_RUN
    AddConfigVar('linker',
-            "Default linker. If not None, will use this linker with the Mode "+
+                 "Default linker used if the theano flags mode is Mode or ProfileMode",
-            "object(not ProfileMode or DebugMode)",
                 EnumStr('c|py', 'py', 'c', 'c|py_nogc', 'c&py'))
 except OSError:
    # gcc is not present, linker should default to python only
    AddConfigVar('linker',
-            "Default linker. If not None, will use this linker with the Mode object(not ProfileMode or DebugMode)",
+                 "Default linker used if the theano flags mode is Mode or ProfileMode",
                 EnumStr('py', 'c|py', 'c', 'c|py_nogc', 'c&py'))
    warning('GCC not detected ! Theano will be unable to execute optimized '+
            'C-implementations (for both CPU and GPU) and will default to '+

--- a/theano/configparser.py
+++ b/theano/configparser.py
-#For flag of bool type, we consider the string 'False','false' and '0' as False 
+# For flag of bool type, we consider the string 'False','false' and '0' as False 
 # and the string 'True', 'true', '1' as true.
-#We alsoaccept the bool type as its corresponding value!
+# We also accept the bool type as its corresponding value!
-#Normally numpy consider only the empty string as false, but this give 
-# impression that it work when it do different people expected.
 import os, StringIO, sys
 import ConfigParser
 import logging
+import warnings
 _logger = logging.getLogger('theano.config')
+class TheanoConfigWarning(Warning):
+    def warn(cls, message, stacklevel=0):
+        warnings.warn(message, cls, stacklevel=stacklevel + 3)
+    warn = classmethod(warn)
+# Check for deprecated environment variables
 for key in os.environ:
    if key.startswith("THEANO"):
        if key not in ("THEANO_FLAGS", "THEANORC"):
-            print >> sys.stderr, "ERROR: Ignoring deprecated environment variable", key
+            TheanoConfigWarning.warn("Ignoring deprecated environment variable %s" % key)
+THEANO_FLAGS = os.getenv("THEANO_FLAGS", "")
+# The THEANO_FLAGS environment variable should be a list of comma-separated
+# [section.]option=value entries. If the section part is omitted, their should be only one
+# section that contains the given option.
-THEANO_FLAGS=os.getenv("THEANO_FLAGS","")
+def parse_config_string(config_string, issue_warnings=True):
-# The THEANO_FLAGS environement variable should be a list of comma-separated
+    """
-# [section.]option[=value] entries. If the section part is omited, their should be only one
+    Parses a config string composed of comma-separated key=value components into a dict.
-# section with that contain the gived option.
+    """
+    config_dict = {}
+    for kv_pair in THEANO_FLAGS.split(','):
+        kv_pair = kv_pair.strip()
+        if not kv_pair:
+            continue
+        kv_tuple = kv_pair.split('=', 1)
+        if len(kv_tuple) == 1:
+            if issue_warnings:
+                TheanoConfigWarning.warn("Config key '%s' has no value, ignoring it" % kv_tuple[0], stacklevel=1)
+        else:
+            k, v = kv_tuple
+            # subsequent values for k will override earlier ones
+            config_dict[k] = v
+    return config_dict
+THEANO_FLAGS_DICT = parse_config_string(THEANO_FLAGS, issue_warnings=True)
 # THEANORC can contain a colon-delimited list of config files, like
 # THEANORC=~lisa/.theanorc:~/.theanorc
@@ -27,31 +53,17 @@ THEANO_FLAGS=os.getenv("THEANO_FLAGS","")
 # precedence over those in files on the left.
 def config_files_from_theanorc():
    rval = [os.path.expanduser(s) for s in os.getenv('THEANORC', '~/.theanorc').split(os.pathsep)]
-    if  os.getenv('THEANORC') is None and sys.platform=="win32":
+    if os.getenv('THEANORC') is None and sys.platform == "win32":
-        #To don't need to change the filename and make it open easily
+        # to don't need to change the filename and make it open easily
        rval.append(os.path.expanduser('~/.theanorc.txt'))
    return rval
 theano_cfg = ConfigParser.SafeConfigParser({'USER':os.getenv("USER", os.path.split(os.path.expanduser('~'))[-1])})
 theano_cfg.read(config_files_from_theanorc())
-def parse_env_flags(flags, name , default_value=None):
-    #The value in the env variable THEANO_FLAGS override the previous value
-    val = default_value
-    for flag in flags.split(','):
-        if not flag:
-            continue
-        sp=flag.split('=',1)
-        if sp[0]==name:
-            if len(sp)==1:
-                val=True
-            else:
-                val=sp[1]
-            val=str(val)
-    return val
 def fetch_val_for_key(key):
    """Return the overriding config value for a key.
-    A successful search returs a string value.
+    A successful search returns a string value.
    An unsuccessful search raises a KeyError
    The (decreasing) priority order is:
@@ -61,23 +73,10 @@ def fetch_val_for_key(key):
    """
    # first try to find it in the FLAGS
-    rval = None
+    try:
-    for name_val in THEANO_FLAGS.split(','):
+        return THEANO_FLAGS_DICT[key]
-        if not name_val:
+    except KeyError:
-            continue
+        pass
-        name_val_tuple=name_val.split('=',1)
-        if len(name_val_tuple)==1:
-            name, val = name_val_tuple, str(True)
-        else:
-            name, val = name_val_tuple
-        if name == key:
-            # rval might be overriden by a later definition in THEANO_FLAGS
-            rval = val
-    # If an rval is found, it should be a string
-    if rval is not None:
-        return rval
    # next try to find it in the config file

--- a/theano/gof/__init__.py
+++ b/theano/gof/__init__.py
@@ -158,7 +158,7 @@ from opt import (Optimizer, optimizer, SeqOptimizer,
 from optdb import \
    DB, Query, \
-    EquilibriumDB, SequenceDB
+    EquilibriumDB, SequenceDB, ProxyDB
 from toolbox import \
    Bookkeeper, History, Validator, ReplaceValidate, NodeFinder, PrintListener

--- a/theano/gof/cc.py
+++ b/theano/gof/cc.py
@@ -961,7 +961,7 @@ class CLinker(link.Linker):
                    preargs.remove('-DREPLACE_WITH_AMDLIBM')
                if 'amdlibm' in libs:
                    libs.remove('amdlibm')
+            try:
                module = c_compiler(
                    module_name=mod.name,
                    src_code = mod.code(),
@@ -970,6 +970,9 @@ class CLinker(link.Linker):
                    lib_dirs=self.lib_dirs(),
                    libs=libs,
                    preargs=preargs)
+            except Exception, e:
+                e.args += (str(self.env),)
+                raise
        finally:
            release_lock()

--- a/theano/gof/cmodule.py
+++ b/theano/gof/cmodule.py
@@ -294,6 +294,8 @@ class ModuleCache(object):
        Also, remove malformed cache directories.
        """
+        too_old_to_use = []
        compilelock.get_lock()
        try:
            # add entries that are not in the entry_from_key dictionary
@@ -316,11 +318,14 @@ class ModuleCache(object):
                    try:
                        entry = module_name_from_dir(root)
                    except ValueError: # there is a key but no dll!
+                        if not root.startswith("/tmp"):
+                            # Under /tmp, file are removed periodically by the os.
+                            # So it is normal that this happen from time to time.
                            warning("ModuleCache.refresh() Found key without dll in cache, deleting it.", key_pkl)
                        info("Erasing broken cache directory", key_pkl)
                        shutil.rmtree(root)
                        continue
-                    if (time_now - last_access_time(module_name_from_dir(root)))<self.age_thresh_use:
+                    if (time_now - last_access_time(entry))<self.age_thresh_use:
                        debug('refresh adding', key_pkl)
                        try:
                            key = cPickle.load(open(key_pkl, 'rb'))
@@ -347,6 +352,9 @@ class ModuleCache(object):
                            # assert that we haven't already got this entry somehow
                            assert entry not in self.module_from_name
                            self.loaded_key_pkl.add(key_pkl)
+                    else:
+                        too_old_to_use.append(entry)
            # remove entries that are not in the filesystem
            items_copy = list(self.entry_from_key.iteritems())
@@ -374,12 +382,17 @@ class ModuleCache(object):
                        # printing a warning, removing evidence that we ever saw this mystery
                        # key.
                        pkl_file_to_remove = os.path.join(os.path.dirname(entry), 'key.pkl')
+                        if not root.startswith("/tmp"):
+                            # Under /tmp, file are removed periodically by the os.
+                            # So it is normal that this happen from time to time.
                            warning('Removing key file %s because the corresponding module is gone from the file system.' % pkl_file_to_remove)
                        self.loaded_key_pkl.remove(pkl_file_to_remove)
        finally:
            compilelock.release_lock()
+        return too_old_to_use
    def module_from_key(self, key, fn=None):
        """
        :param fn: a callable object that will return a module for the key (it is called only if the key isn't in
@@ -483,18 +496,22 @@ class ModuleCache(object):
        compilelock.get_lock()
        try:
            # update the age of modules that have been accessed by other processes
-            self.refresh() 
+            # and get all module that are too old to use.(not loaded in self.entry_from_key)
+            too_old_to_use = self.refresh()
+            too_old_to_use = [(None,entry) for entry in too_old_to_use]
            time_now = time.time()
            # the .items() is important here:
            # we need to get a copy of the whole list of keys and entries
            items_copy = list(self.entry_from_key.iteritems())
-            for key, entry in items_copy: 
+            for key, entry in items_copy+too_old_to_use:
                age = time_now - last_access_time(entry)
                if age > age_thresh_del:
                    # TODO: we are assuming that modules that haven't been accessed in over
                    # age_thresh_del are not currently in use by other processes, but that could be
                    # false for long-running jobs...
                    assert entry not in self.module_from_name
+                    if key is not None:
                        del self.entry_from_key[key]
                    parent = os.path.dirname(entry)
                    assert parent.startswith(os.path.join(self.dirname, 'tmp'))
@@ -747,4 +764,3 @@ def gcc_module_compile_str(module_name, src_code, location=None, include_dirs=[]
 def icc_module_compile_str(*args):
    raise NotImplementedError()
--- a/theano/gof/destroyhandler.py
+++ b/theano/gof/destroyhandler.py
@@ -505,4 +505,3 @@ class DestroyHandlerHelper2(toolbox.Bookkeeper):
                        rval[app] = root_clients
        return rval
--- a/theano/gof/env.py
+++ b/theano/gof/env.py
@@ -523,10 +523,3 @@ class Env(utils.object2):
        for feature in self._features:
            e.extend(feature)
        return e, equiv
--- a/theano/gof/graph.py
+++ b/theano/gof/graph.py
@@ -417,8 +417,10 @@ def stack_search(start, expand, mode='bfs', build_inv = False):
        raise ValueError('mode should be bfs or dfs', mode)
    rval_set = set()
    rval_list = list()
-    if mode is 'bfs': start_pop = start.popleft
+    if mode == 'bfs':
-    else: start_pop = start.pop
+        start_pop = start.popleft
+    else:
+        start_pop = start.pop
    expand_inv = {}
    while start:
        l = start_pop()

--- a/theano/gof/link.py
+++ b/theano/gof/link.py
@@ -562,5 +562,3 @@ def WrapLinkerMany(linkers, wrappers):
        for f in wrappers:
            f(*args)
    return WrapLinker(linkers, wrapper)
--- a/theano/gof/op.py
+++ b/theano/gof/op.py
--- a/theano/gof/opt.py
+++ b/theano/gof/opt.py
@@ -1098,7 +1098,10 @@ def _check_chain(r, chain):
            r = r.owner.inputs[chain.pop()]
    #print 'check_chain', _check_chain.n_calls
    #_check_chain.n_calls += 1
-    return r
+    # The return value will be used as a Boolean, but some Variables cannot
+    # be used as Booleans (the results of comparisons, for instance)
+    return (r is not None)
 #_check_chain.n_calls = 0
 def check_chain(r, *chain):
@@ -1137,6 +1140,3 @@ class PureThenInplaceOptimizer(Optimizer):
        self.pure(env)
        env.extend(dh.DestroyHandler())
        self.inplace(env)
--- a/theano/gof/optdb.py
+++ b/theano/gof/optdb.py
@@ -32,12 +32,18 @@ class DB(object):
        # this is not always the case.
        if not isinstance(obj, (DB, opt.Optimizer, opt.LocalOptimizer)):
            raise TypeError('Object cannot be registered in OptDB', obj)
+        if name in self.__db__:
+            raise ValueError('The name of the object cannot be an existing tag or the name of another existing object.', obj, name)
+        # This restriction is there because in many place we suppose that 
+        # something in the DB is there only once.
+        if getattr(obj, 'name', "") in self.__db__:
+            raise ValueError('''You can\'t register the same optimization 
+multiple time in a DB. Tryed to register "%s" again under the new name "%s".
+ Use theano.gof.ProxyDB to work around that'''%(obj.name, name))
        if self.name is not None:
            tags = tags + (self.name,)
        obj.name = name
-        if name in self.__db__:
-            raise ValueError('The name of the object cannot be an existing tag or the name of another existing object.', obj, name)
        self.__db__[name] = set([obj])
        self._names.add(name)
@@ -223,3 +229,15 @@ class SequenceDB(DB):
        return sio.getvalue()
+class ProxyDB(DB):
+    """
+    This is needed as we can't register the same DB mutiple time in different position
+    in a SequentialDB
+    """
+    def __init__(self, db):
+        assert isinstance(db, DB), ""
+        self.db = db
+    def query(self, *tags, **kwtags):
+        return self.db.query(*tags, **kwtags)
--- a/theano/gof/tests/test_cc.py
+++ b/theano/gof/tests/test_cc.py
@@ -84,7 +84,8 @@ class MyOp(Op):
    def __str__(self):
        return self.name
-    def perform(self, node, inputs, (out, )):
+    def perform(self, node, inputs, out_):
+        out, = out_
        out[0] = self.impl(*inputs)
    def c_code_cache_version(self):
        return ()
@@ -100,28 +101,36 @@ class Binary(MyOp):
 class Add(Binary):
-    def c_code(self, node, name, (x, y), (z, ), sub):
+    def c_code(self, node, name, inp, out, sub):
+        x, y = inp
+        z, = out
        return "%(z)s = %(x)s + %(y)s;" % locals()
    def impl(self, x, y):
        return x + y
 add = Add()
 class Sub(Binary):
-    def c_code(self, node, name, (x, y), (z, ), sub):
+    def c_code(self, node, name, inp, out, sub):
+        x, y = inp
+        z, = out
        return "%(z)s = %(x)s - %(y)s;" % locals()
    def impl(self, x, y):
        return -10 # erroneous (most of the time)
 sub = Sub()
 class Mul(Binary):
-    def c_code(self, node, name, (x, y), (z, ), sub):
+    def c_code(self, node, name, inp, out, sub):
+        x, y = inp
+        z, = out
        return "%(z)s = %(x)s * %(y)s;" % locals()
    def impl(self, x, y):
        return x * y
 mul = Mul()
 class Div(Binary):
-    def c_code(self, node, name, (x, y), (z, ), sub):
+    def c_code(self, node, name, inp, out, sub):
+        x, y = inp
+        z, = out
        return "%(z)s = %(x)s / %(y)s;" % locals()
    def impl(self, x, y):
        return x / y
@@ -256,7 +265,9 @@ def test_duallinker_mismatch():
 ################################
 class AddFail(Binary):
-    def c_code(self, node, name, (x, y), (z, ), sub):
+    def c_code(self, node, name, inp, out, sub):
+        x, y = inp
+        z, = out
        fail=sub['fail']
        return """%(z)s = %(x)s + %(y)s;
            PyErr_SetString(PyExc_RuntimeError, "failing here");

--- a/theano/gof/tests/test_link.py
+++ b/theano/gof/tests/test_link.py
@@ -45,7 +45,8 @@ class MyOp(Op):
    def __str__(self):
        return self.name
-    def perform(self, node, inputs, (out, )):
+    def perform(self, node, inputs, out_):
+        out, = out_
        out[0] = self.impl(*inputs)
 add = MyOp(2, 'Add', lambda x, y: x + y)

--- a/theano/misc/check_blas.py
+++ b/theano/misc/check_blas.py
@@ -3,6 +3,7 @@
 #C=a*C+dot(A,B)*b
 #A,B,C matrix
 #a,b scalar
+import os
 s="""
 result for shapes=(2000,2000) and iters=100
@@ -32,6 +33,10 @@ def execute(execute=True, verbose=True):
        print '    blas.ldflags=',theano.config.blas.ldflags
        print '    compiledir=',theano.config.compiledir
        print '    floatX=',theano.config.floatX
+        print 'Some env flags:'
+        print '    MKL_NUM_THREADS=',os.getenv('MKL_NUM_THREADS')
+        print '    OMP_NUM_THREADS=',os.getenv('OMP_NUM_THREADS')
+        print '    GOTO_NUM_THREADS=',os.getenv('GOTO_NUM_THREADS')
        print
        print 'Numpy config:(used when the theano flags "blas.ldflags" is empty)'
        numpy.show_config();
@@ -83,25 +88,37 @@ if __name__ == "__main__":
        print """
        Some result that you can compare again. They where 10 executions of gemm in float64 with matrix of shape 2000x2000 on FC9.
-        Cpu tested: Xeon E5345, Xeon E5430, Xeon E5450, Core 2 E8500, Core i7 930(hyper-threads enabled)
+        Cpu tested: Xeon E5345, Xeon E5430, Xeon E5450(3Ghz), Xeon X5560(2.8Ghz, hyper-threads enabled?)
+                    Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads enabled), Core i7 950(3.07GHz, hyper-threads enabled)
        Lib tested:
            * numpy with ATLAS from distribution(FC9) package (1 thread)
            * manually compiled numpy and ATLAS with 2 threads
-            * goto with 1, 2, 4 and 8 threads.
+            * goto 1.26 with 1, 2, 4 and 8 threads.
-                          Xeon   Xeon   Xeon  Core2 i7
+                          Xeon   Xeon   Xeon  Core2 i7    i7     Xeon
-        lib/nb threads    E5345  E5430  E5450 E8500 930
+        lib/nb threads    E5345  E5430  E5450 E8500 930   950    X5560
-        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s
+        numpy_FC9_atlas/1 39.2s  35.0s  30.7s 29.6s 21.5s 19.60s
-        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s
+        goto/1            18.7s  16.1s  14.2s 13.7s 16.1s 14.67s
        numpy_MAN_atlas/2 12.0s  11.6s  10.2s 9.2s  9.0s
-        goto/2            9.5s   8.1s   7.1s  7.3s  8.1s
+        goto/2            9.5s   8.1s   7.1s  7.3s  8.1s  7.4s
-        goto/4            4.9s   4.4s   3.7s  -     4.1s
+        goto/4            4.9s   4.4s   3.7s  -     4.1s  3.8s
-        goto/8            2.7s   2.4s   2.0s  -     4.1s
+        goto/8            2.7s   2.4s   2.0s  -     4.1s  3.8s
+        openblas/1                                        14.04s
+        openblas/2                                        7.16s
+        openblas/4                                        3.71s
+        openblas/8                                        3.70s
+        mkl 10.2.2.025/1                                         13.7s
+        mkl 10.2.2.025/2                                         7.6s
+        mkl 10.2.2.025/4                                         4.0s
+        mkl 10.2.2.025/8                                         2.0s
+        mkl 11.0.083/1           7.97s
        Test time in float32 with cuda 3.0.14
        (cuda version 3.2RC and up are supposed to have faster gemm on the GTX4?? card)
        cpu/cuda version
+        GTX580/3.2        0.20s
        GTX480/3.2        0.24s
        GTX480/3.0        0.27s
        GTX470/3.2        0.29s

--- a/theano/misc/check_blas_many.sh
+++ b/theano/misc/check_blas_many.sh
@@ -7,16 +7,15 @@ cat /proc/cpuinfo |grep processor
 free
 uname -a
-t0=`THEANO_FLAGS=blas.ldflags= OMP_NUM_THREADS=1 time python misc/check_blas.py --quiet`
+TIME_PREFIX=time
-t1=`OMP_NUM_THREADS=1 time python misc/check_blas.py --quiet`
+VAR=OMP_NUM_THREADS
-t2=`OMP_NUM_THREADS=2 time python misc/check_blas.py --quiet`
+echo "numpy gemm take="
-t4=`OMP_NUM_THREADS=4 time python misc/check_blas.py --quiet`
+THEANO_FLAGS=blas.ldflags= $TIME_PREFIX python misc/check_blas.py --quiet
-t8=`OMP_NUM_THREADS=8 time python misc/check_blas.py --quiet`
+for i in 1 2 4 8:
+do
-echo "numpy gemm took: $t0"
+  export $VAR=$i
-echo "theano gemm 1 thread took: $t1"
+  x=`$TIME_PREFIX python misc/check_blas.py --quiet`
-echo "theano gemm 2 thread took: $t2"
+  echo "theano gemm with $VAR=$i took: ${x}s"
-echo "theano gemm 4 thread took: $t4"
+done
-echo "theano gemm 8 thread took: $t8"
 #Fred to test distro numpy at LISA: PYTHONPATH=/u/bastienf/repos:/usr/lib64/python2.5/site-packages THEANO_FLAGS=blas.ldflags= OMP_NUM_THREADS=8 time python misc/check_blas.py
\ No newline at end of file
--- a/theano/misc/do_nightly_build
+++ b/theano/misc/do_nightly_build
+#!/bin/bash
+#we set the compiledir to the /Tmp dir to make the test faster by bypassing the nfs network.
+date
+ROOT_CWD=/Tmp/nightly_build
+COMPILEDIR=/Tmp/lisa_theano_compile_dir_theano
+NOSETESTS=/usr/bin/nosetests
+echo "nb element in the compiledir:"
+ls ${COMPILEDIR}|wc -l
+FLAGS=warn.argmax_pushdown_bug=False,warn.gpusum_01_011_0111_bug=False,warn.sum_sum_bug=False,warn.sum_div_dimshuffle_bug=False,compiledir=${COMPILEDIR}
+export PYTHONPATH=${ROOT_CWD}:$PYTHONPATH
+cd ${ROOT_CWD}
+echo "executing nosetests with mode=FAST_COMPILE"
+THEANO_FLAGS=${FLAGS},mode=FAST_COMPILE ${NOSETESTS} Theano
+echo "nb element in the compiledir:"
+ls ${COMPILEDIR}|wc -l
+echo "executing nosetests with mode=FAST_RUN"
+THEANO_FLAGS=${FLAGS},mode=FAST_RUN ${NOSETESTS} --with-coverage --cover-package=theano Theano
+echo "nb element in the compiledir:"
+ls ${COMPILEDIR}|wc -l
+echo "executing nosetests with mode=FAST_RUN,floatX=float32"
+THEANO_FLAGS=${FLAGS},mode=FAST_RUN,floatX=float32 ${NOSETESTS} Theano
+echo "nb element in the compiledir:"
+ls ${COMPILEDIR}|wc -l
+#we change the seed and record it everyday to test different combination. We record it to be able to reproduce bug caused by different seed. We don't want multiple test in DEBUG_MODE each day as this take too long.
+seed=$RANDOM
+echo "executing nosetests with mode=DEBUG_MODE with seed of the day $seed"
+THEANO_FLAGS=${FLAGS},unittests.rseed=$seed,mode=DEBUG_MODE,DebugMode.check_strides=0,DebugMode.patience=3 ${NOSETESTS} Theano
+echo "nb element in the compiledir:"
+ls ${COMPILEDIR}|wc -l
+date
\ No newline at end of file
--- a/theano/misc/do_nightly_build_send
+++ b/theano/misc/do_nightly_build_send
+#!/bin/env python
+# Import smtplib for the actual sending function
+import smtplib
+import os.path
+import sys
+from theano.misc.buildbot_filter import filter_output
+# me == the sender's email address
+# family = the list of all recipients' email addresses
+family=['theano-buildbot@googlegroups.com']
+me='lisa@iro.umontreal.ca'
+#Those file contain the output of the do_nightly_build script.
+files=["/tmp/do_nightly_build_theano", "/tmp/do_nightly_build_pylearn", "/tmp/do_nightly_build_deeplearning"]
+print files
+print sys.argv
+if len(sys.argv)==2:
+    files=[x+sys.argv[1] for x in files]
+print files
+# Here are the email package modules we'll need
+from email.mime.text import MIMEText
+from email.mime.multipart import MIMEMultipart
+COMMASPACE = ', '
+def mysend(subject, file):
+    # Create the container (outer) email message.
+    if not os.path.isfile(file):
+        print "Error: no file",file
+        return
+    msg = MIMEMultipart()
+    msg['From'] = me
+    msg['To'] = COMMASPACE.join(family)
+    msg.preamble = 'The output of the buildbot'
+    # Open the files in binary mode.  Let the MIMEImage class automatically
+    # guess the specific image type.
+    fp = open(file, 'rb')
+    s=fp.read()
+    failures=0
+    errors=0
+    ran=False
+    nb_ran=0
+    skip=0
+    speed_failure=0
+    show_speed_failure=False
+    knownfail=0
+    gpu_time = None
+    float32_time = None
+    float64_time = None
+    for token in s.split():
+        token=token.strip('(,)')
+        if token.startswith("failures="):
+            failures+=int(token[9:])
+        elif token.startswith("errors="):
+            errors+=int(token[+7:])
+        elif token == "Ran":
+            ran=True
+        elif token.startswith("SKIP="):
+            skip+=int(token[5:])
+        elif token == "KnownFailureTest:":
+            knownfail+=1
+        elif token.startswith("speed_failure_"):
+            speed_failure+=int(token.split('=')[1])
+            show_speed_failure=True
+        elif ran:
+            ran=False
+            try: 
+                nb_ran+=int(token)
+            except Exception, e:
+                print e
+    start = ""
+    for line in s.splitlines():
+        if gpu_time is None and line.startswith("gpu % expected/get"):
+            start=line
+        elif float32_time is None and line.startswith("float32 % expected/get"):
+            start=line
+        elif float64_time is None and line.startswith("float64 % expected/get"):
+            start=line
+        elif start:
+            start+=line
+            if start[-1]=="]":
+                if start.startswith("gpu % expected/get"):
+                    gpu_time = start
+                    start = ""
+                elif start.startswith("float32 % expected/get"):
+                    float32_time = start
+                    start = ""
+                elif start.startswith("float64 % expected/get"):
+                    float64_time = start
+                    start = ""
+    s="KnownFailure are removed from Error. \n Resume of the output:\n"+filter_output(open(file))+"Full output:\n"+s
+    img = MIMEText(s)
+    fp.close()
+    msg.attach(img)
+    errors-=knownfail
+# Send the email via our own SMTP server.
+    if show_speed_failure:
+        msg['Subject'] = subject+" Fail="+str(failures)+" Err="+str(errors)+" Ran="+str(nb_ran)+" Skip="+str(skip)+" KnownFail="+str(knownfail)+ " SpeedFailure="+str(speed_failure)
+    else:
+        msg['Subject'] = subject+" Fail="+str(failures)+" Err="+str(errors)+" Ran="+str(nb_ran)+" Skip="+str(skip)+" KnownFail="+str(knownfail)
+    print msg['Subject']
+    s = smtplib.SMTP()
+    s.connect()
+    s.sendmail(me, family, msg.as_string())
+    s.close()
+    print "Finished sending email for",subject
+mysend('Theano buildbot',files[0])
+mysend('Pylearn buildbot',files[1])
+mysend('Deep Learning Tutorial buildbot',files[2])
--- a/theano/misc/hooks/reindent.py
+++ b/theano/misc/hooks/reindent.py
@@ -255,13 +255,13 @@ class Reindenter:
        return line
    # Line-eater for tokenize.
-    def tokeneater(self, type, token, (sline, scol), end, line,
+    def tokeneater(self, type, token, pos, end, line,
                   INDENT=tokenize.INDENT,
                   DEDENT=tokenize.DEDENT,
                   NEWLINE=tokenize.NEWLINE,
                   COMMENT=tokenize.COMMENT,
                   NL=tokenize.NL):
+        sline, scol = pos
        if type == NEWLINE:
            # A program statement, or ENDMARKER, will eventually follow,
            # after some (possibly empty) run of tokens of the form

--- a/theano/misc/may_share_memory.py
+++ b/theano/misc/may_share_memory.py
@@ -10,35 +10,29 @@ from theano.tensor.basic import TensorType
 try:
    import scipy.sparse
+    from theano.sparse.basic import SparseType
+    def _is_sparse(a):
+        return scipy.sparse.issparse(a)
 except ImportError:
-    #scipy not imported, their can be only ndarray
+    #scipy not imported, their can be only ndarray and cudandarray
-    def may_share_memory(a, b, raise_other_type=True):
+    def _is_sparse(a):
-        if not isinstance(a, numpy.ndarray) or not isinstance(b, numpy.ndarray):
-            if raise_other_type:
-                raise TypeError("may_share_memory support only ndarray when scipy is not available")
        return False
-        return numpy.may_share_memory(a,b)
+import theano.sandbox.cuda as cuda
+if cuda.cuda_available:
+    def _is_cuda(a):
+        return isinstance(a, cuda.CudaNdarray)
 else:
-    #scipy imported, their can be ndarray and sparse type
+    def _is_cuda(a):
-    from theano.sparse.basic import _is_sparse, SparseType
+        return False
-    def may_share_memory(a, b, raise_other_type=True):
+def may_share_memory(a, b, raise_other_type=True):
    a_ndarray = isinstance(a, numpy.ndarray)
    b_ndarray = isinstance(b, numpy.ndarray)
-        try:
    a_sparse = _is_sparse(a)
-        except NotImplementedError:
-            a_sparse = False
-        try:
    b_sparse = _is_sparse(b)
-        except NotImplementedError:
+    a_cuda = _is_cuda(a)
-            b_sparse = False
+    b_cuda = _is_cuda(b)
-        a_cuda = False
-        b_cuda = False
-        if a.__class__.__name__ == "CudaNdarray":
-            a_cuda = True
-        if b.__class__.__name__ == "CudaNdarray":
-            b_cuda = True
    if not(a_ndarray or a_sparse or a_cuda) or not(b_ndarray or b_sparse or b_cuda):
        if raise_other_type:

--- a/theano/misc/pycuda_example.py
+++ b/theano/misc/pycuda_example.py
@@ -124,9 +124,10 @@ class PycudaElemwiseSourceModuleOp(Op):
        self.pycuda_fct = mod.get_function(fct_name)
        return out_node
-    def perform(self, node, inputs, (z,)):
+    def perform(self, node, inputs, out):
        #TODO support broadcast!
        #TODO assert all input have the same shape
+        z, = out
        if z[0] is None or z[0].shape!=inputs[0].shape:
            z[0] = theano.sandbox.cuda.CudaNdarray.zeros(inputs[0].shape)
        self.pycuda_fct(inputs[0],inputs[1],z[0], block=(inputs[0].shape[0],inputs[0].shape[1],1))
@@ -191,8 +192,9 @@ class PycudaElemwiseKernelOp(Op):
 #include <numpy/arrayobject.h>""")
        return out_node
-    def perform(self, node, inputs, (z,)):
+    def perform(self, node, inputs, out):
        #TODO assert all input have the same shape
+        z, = out
        if z[0] is None or z[0].shape!=inputs[0].shape:
            z[0] = theano.sandbox.cuda.CudaNdarray.zeros(inputs[0].shape)
        i = inputs + z

--- a/theano/misc/tests/test_pycuda.py
+++ b/theano/misc/tests/test_pycuda.py
@@ -56,8 +56,8 @@ def test_pycuda_elemwise_kernel():
    assert any([ isinstance(node.op, theano.sandbox.cuda.GpuElemwise) for node in f.maker.env.toposort()])
    assert any([ isinstance(node.op, PycudaElemwiseKernelOp) for node in f2.maker.env.toposort()])
-    val1 = numpy.random.rand(5,5)
+    val1 = numpy.asarray(numpy.random.rand(5,5), dtype='float32')
-    val2 = numpy.random.rand(5,5)
+    val2 = numpy.asarray(numpy.random.rand(5,5), dtype='float32')
    #val1 = numpy.ones((5,5))
    #val2 = numpy.arange(25).reshape(5,5)
    assert (f(val1,val2) == f2(val1,val2)).all()

--- a/theano/printing.py
+++ b/theano/printing.py
@@ -47,18 +47,21 @@ def debugprint(obj, depth=-1, print_type=False, file=None):
        _file = file
    done = set()
    results_to_print = []
+    order = []
    if isinstance(obj, gof.Variable):
        results_to_print.append(obj)
    elif isinstance(obj, gof.Apply):
        results_to_print.extend(obj.outputs)
    elif isinstance(obj, Function):
        results_to_print.extend(obj.maker.env.outputs)
+        order = obj.maker.env.toposort()
    elif isinstance(obj, (list, tuple)):
        results_to_print.extend(obj)
    else:
        raise TypeError("debugprint cannot print an object of this type", obj)
    for r in results_to_print:
-        debugmode.debugprint(r, depth=depth, done=done, print_type=print_type, file=_file)
+        debugmode.debugprint(r, depth=depth, done=done, print_type=print_type,
+                             file=_file, order=order)
    if file is _file:
        return file
    elif file=='str':
@@ -370,16 +373,14 @@ pprint.assign(lambda pstate, r: hasattr(pstate, 'target') and pstate.target is n
 pp = pprint
-def pydotprint(fct, outfile=os.path.join(config.compiledir,'theano.pydotprint.png'),
+def pydotprint(fct, outfile=None,
-        compact=True, mode=None, format='png', with_ids=False):
+        compact=True, format='png', with_ids=False):
    """
    print to a file in png format the graph of op of a compile theano fct.
    :param fct: the theano fct returned by theano.function.
    :param outfile: the output file where to put the graph.
    :param compact: if True, will remove intermediate var that don't have name.
-    :param mode: if a ProfileMode, add to each Apply label (s in apply,% in apply in total op time, % in fct time)
-                         Otherwise ignore it
    :param format: the file format of the output.
    In the graph, box are an Apply Node(the execution of an op) and ellipse are variable.
@@ -388,12 +389,18 @@ def pydotprint(fct, outfile=os.path.join(config.compiledir,'theano.pydotprint.pn
    We print the op of the apply in the Apply box with a number that represent the toposort order of application of those Apply.
    If an Apply have more then 1 input, print add a label to the edge that in the index of the inputs.
-    green ellipse are input to the graph
+    green ellipses are inputs to the graph
-    blue ellipse are output of the graph
+    blue ellipses are outputs of the graph
-    grey ellipse are var generated by the graph that are not output and are not used.
+    grey ellipses are var generated by the graph that are not output and are not used.
+    red ellipses are transfer to/from the gpu.
+        op with those name GpuFromHost, HostFromGpu
    """
+    if outfile is None:
+        outfile = os.path.join(config.compiledir,'theano.pydotprint.' +
+                               config.device + '.' + format)
+    mode = fct.maker.mode
    if not isinstance(mode,ProfileMode) or not mode.fct_call.has_key(fct):
-        mode=None
+        mode = None
    try:
        import pydot as pd
    except:
@@ -445,7 +452,8 @@ def pydotprint(fct, outfile=os.path.join(config.compiledir,'theano.pydotprint.pn
            prof_str='   (%.3fs,%.3f%%,%.3f%%)'%(time,pt,pf)
        applystr = str(node.op).replace(':','_')
        if (applystr in all_strings) or with_ids:
-            applystr = applystr+'    id='+str(topo.index(node))+prof_str
+            applystr = applystr+'    id='+str(topo.index(node))
+        applystr += prof_str
        all_strings.add(applystr)
        apply_name_cache[node] = applystr
        return applystr
@@ -461,12 +469,18 @@ def pydotprint(fct, outfile=os.path.join(config.compiledir,'theano.pydotprint.pn
    var_shape='box'
    for node_idx,node in enumerate(topo):
        astr=apply_name(node)
+        if node.op.__class__.__name__ in ('GpuFromHost','HostFromGpu'):
+            # highlight CPU-GPU transfers to simplify optimization
+            g.add_node(pd.Node(astr,color='red',shape=apply_shape))
+        else:
            g.add_node(pd.Node(astr,shape=apply_shape))
        for id,var in enumerate(node.inputs):
            varstr=var_name(var)
-            label=''
+            label=str(var.type)
            if len(node.inputs)>1:
-                label=str(id)
+                label=str(id)+' '+label
            if var.owner is None:
                g.add_node(pd.Node(varstr,color='green',shape=var_shape))
                g.add_edge(pd.Edge(varstr,astr, label=label))
@@ -480,9 +494,9 @@ def pydotprint(fct, outfile=os.path.join(config.compiledir,'theano.pydotprint.pn
        for id,var in enumerate(node.outputs):
            varstr=var_name(var)
            out = any([x[0]=='output' for x in var.clients])
-            label=''
+            label=str(var.type)
            if len(node.outputs)>1:
-                label=str(id)
+                label=str(id)+' '+label
            if out:
                g.add_edge(pd.Edge(astr, varstr, label=label))
                g.add_node(pd.Node(varstr,color='blue',shape=var_shape))
@@ -581,8 +595,3 @@ def pydot_var(vars, outfile=os.path.join(config.compiledir,'theano.pydotprint.pn
    g.write_png(outfile, prog='dot')
    print 'The output file is available at',outfile
--- a/theano/sandbox/cuda/GpuConv3D.py
+++ b/theano/sandbox/cuda/GpuConv3D.py
@@ -38,7 +38,8 @@ class GpuConv3D(theano.Op):
    def c_code_cache_version(self):
        return ()
-    def c_code(self, node, nodename, (V,W,b,d), outputs, sub):
+    def c_code(self, node, nodename, inputs, outputs, sub):
+        V, W, b, d = inputs
        fail = sub['fail']
        H = outputs[0]
@@ -87,7 +88,7 @@ PyErr_Format(PyExc_ValueError, "GpuConv3D: d must be a vector CudaNdarray");
                        const int inputChannels = CudaNdarray_HOST_DIMS(%(V)s)[4];
                        if (CudaNdarray_HOST_DIMS(%(W)s)[4] != inputChannels)
                        {
-                            PyErr_Format(PyExc_ValueError, "Conv3D: W operates on a %%i channel image but the image has %%i channels",CudaNdarray_HOST_DIMS(%(W)s)[4],inputChannels);
+                            PyErr_Format(PyExc_ValueError, "GpuConv3D: W operates on a %%i channel image but the image has %%i channels",CudaNdarray_HOST_DIMS(%(W)s)[4],inputChannels);
                            %(fail)s
                        }
 {  //extra scope so error handler jumps don't cause errors
@@ -99,19 +100,19 @@ PyErr_Format(PyExc_ValueError, "GpuConv3D: d must be a vector CudaNdarray");
                        const int vidDur = CudaNdarray_HOST_DIMS(%(V)s)[3];
            if (vidHeight < filterHeight)
            {
-                PyErr_Format(PyExc_ValueError, "W has a height of %%i but V is only %%i pixels tall",filterHeight,vidHeight);
+                PyErr_Format(PyExc_ValueError, "GpuConv3D: W has a height of %%i but V is only %%i pixels tall",filterHeight,vidHeight);
                %(fail)s
            }
 { // extra scope so fail works
            if (vidWidth < filterWidth)
            {
-                PyErr_Format(PyExc_ValueError, "W has a width of %%i but V is only %%i pixels wide",filterWidth,vidWidth);
+                PyErr_Format(PyExc_ValueError, "GpuConv3D: W has a width of %%i but V is only %%i pixels wide",filterWidth,vidWidth);
                %(fail)s
            }
 { // extra scope so fail works
            if (vidDur < filterDur)
            {
-                PyErr_Format(PyExc_ValueError, "W has a duration of %%i but V is only %%i pixels long",filterDur,vidDur);
+                PyErr_Format(PyExc_ValueError, "GpuConv3D: W has a duration of %%i but V is only %%i pixels long",filterDur,vidDur);
                %(fail)s
            }
 { // extra scope so fail works

--- a/theano/sandbox/cuda/GpuConvGrad3D.py
+++ b/theano/sandbox/cuda/GpuConvGrad3D.py
@@ -66,7 +66,8 @@ class GpuConvGrad3D(theano.Op):
        output_storage[0][0] = dCdW
-    def c_code(self, node, nodename, (V,d,WShape,dCdH), outputs, sub):
+    def c_code(self, node, nodename, inputs, outputs, sub):
+        V, d, WShape, dCdH = inputs
        fail = sub['fail']
        dCdW = outputs[0]

--- a/theano/sandbox/cuda/GpuConvTransp3D.py
+++ b/theano/sandbox/cuda/GpuConvTransp3D.py
@@ -48,7 +48,8 @@ class GpuConvTransp3D(theano.Op):
    def c_code_cache_version(self):
        return ()
-    def c_code(self, node, nodename, (W, b, d, H, RShape), outputs, sub):
+    def c_code(self, node, nodename, inputs, outputs, sub):
+        W, b, d, H, RShape = inputs
        fail = sub['fail']
        R = outputs[0]

--- a/theano/sandbox/cuda/__init__.py
+++ b/theano/sandbox/cuda/__init__.py
-import atexit, os, stat
+import atexit, os, stat, sys
 from theano.compile import optdb
 from theano import config
 from theano.gof.cmodule import get_lib_extension
@@ -8,13 +8,17 @@ _logger_name = 'theano.sandbox.cuda'
 _logger = logging.getLogger(_logger_name)
 _logger.setLevel(logging.WARNING)
 def error(*msg):
-    _logger.error('ERROR (%s): %s'% ( _logger_name, ' '.join(str(m) for m in msg)))
+    _logger.error('ERROR (%s): %s'% (
+        _logger_name, ' '.join(str(m) for m in msg)))
 def warning(*msg):
-    _logger.warning('WARNING (%s): %s'% ( _logger_name, ' '.join(str(m) for m in msg)))
+    _logger.warning('WARNING (%s): %s'% ( _logger_name,
+        ' '.join(str(m) for m in msg)))
 def info(*msg):
-    _logger.info('INFO (%s): %s'% ( _logger_name, ' '.join(str(m) for m in msg)))
+    _logger.info('INFO (%s): %s'% ( _logger_name,
+        ' '.join(str(m) for m in msg)))
 def debug(*msg):
-    _logger.debug('DEBUG (%s): %s'% ( _logger_name, ' '.join(str(m) for m in msg)))
+    _logger.debug('DEBUG (%s): %s'% ( _logger_name,
+        ' '.join(str(m) for m in msg)))
 # Compile cuda_ndarray.cu
@@ -29,7 +33,7 @@ cuda_available = True
 # Global variable to avoid displaying the same warning multiple times.
 cuda_warning_is_displayed = False
-#This variable is set to True when we enable the cuda.(i.e. when use() is called)
+#This variable is set to True when we enable cuda.(i.e. when use() is called)
 cuda_enabled = False
 # Code factorized within a function so that it may be called from multiple
@@ -51,8 +55,13 @@ def set_cuda_disabled():
 #cuda_ndarray compile and import
 cuda_path = os.path.abspath(os.path.split(__file__)[0])
-cuda_files = ('cuda_ndarray.cu', 'cuda_ndarray.cuh', 'conv_full_kernel.cu', 'conv_kernel.cu')
+cuda_files = (
-stat_times = [os.stat(os.path.join(cuda_path, cuda_file))[stat.ST_MTIME] for cuda_file in cuda_files]
+        'cuda_ndarray.cu',
+        'cuda_ndarray.cuh',
+        'conv_full_kernel.cu',
+        'conv_kernel.cu')
+stat_times = [os.stat(os.path.join(cuda_path, cuda_file))[stat.ST_MTIME]
+        for cuda_file in cuda_files]
 date = max(stat_times)
 cuda_ndarray_loc = os.path.join(config.compiledir, 'cuda_ndarray')
@@ -113,7 +122,8 @@ from theano.sandbox.cuda.var import (CudaNdarrayVariable,
 from theano.sandbox.cuda.type import CudaNdarrayType
 if cuda_available:
-    #check if their is an old cuda_ndarray that was loading instead of the one we compiled!
+    # check if their is an old cuda_ndarray that was loading instead of the one
+    # we compiled!
    import cuda_ndarray.cuda_ndarray
    if cuda_ndarray_so != cuda_ndarray.cuda_ndarray.__file__:
        warning("WARNING: cuda_ndarray was loaded from",
@@ -128,9 +138,12 @@ outdated!""")
    import basic_ops
    from basic_ops import (GpuFromHost, HostFromGpu, GpuElemwise,
                           GpuDimShuffle, GpuSum, GpuReshape, GpuContiguous,
-            GpuSubtensor, GpuIncSubtensor, GpuFlatten, GpuShape, GpuAlloc,
+                           GpuSubtensor, GpuIncSubtensor,
-            GpuJoin,fscalar, fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
+                           GpuAdvancedSubtensor1, GpuAdvancedIncSubtensor1,
-                           , scalar, vector, matrix, row, col, tensor3, tensor4)
+                           GpuFlatten, GpuShape, GpuAlloc,
+                           GpuJoin, fscalar, fvector, fmatrix, frow, fcol,
+                           ftensor3, ftensor4, scalar, vector, matrix, row, col,
+                           tensor3, tensor4)
    from basic_ops import host_from_gpu, gpu_from_host
    import opt
    import cuda_ndarray
@@ -143,7 +156,11 @@ def use(device, force=False, default_to_move_computation_to_gpu = True,
        raise EnvironmentError("You forced use of device %s, but CUDA initialization failed "
                               "with error:\n%s" % (device, cuda_initialization_error_message))
    if not cuda_available:
-        warning('CUDA is installed, but device %s is not available' % device)
+        if cuda_initialization_error_message:
+            error_addendum = " (error: %s)" % cuda_initialization_error_message
+        else:
+            error_addendum = ""
+        warning('CUDA is installed, but device %s is not available%s' % (device, error_addendum))
        return
    if device == 'gpu':
@@ -163,35 +180,41 @@ def use(device, force=False, default_to_move_computation_to_gpu = True,
        try:
            if device !='gpu':
                gpu_init(device)
-            else:
-                #warning To let people see that the gpu will be used.
-                _logger.warn("We let the driver select the gpu device to use")
            if move_shared_float32_to_gpu:
                handle_shared_float32(True)
            use.device_number = device
            cuda_enabled = True
+            print >> sys.stderr, "Using gpu device %d: %s" % (active_device_number(), active_device_name())
        except (EnvironmentError, ValueError), e:
-            _logger.error("ERROR: Not using GPU. Initialisation of device %i failed:\n%s" % (device, e))
+            _logger.error(("ERROR: Not using GPU."
+                " Initialisation of device %i failed:\n%s") % (device, e))
            cuda_enabled = False
            if force:
-                e.args+=("You asked to force this device and it failed. No fallback to the cpu or other gpu device.",)
+                e.args+=(("You asked to force this device and it failed."
+                        " No fallback to the cpu or other gpu device."),)
                raise
    elif use.device_number != device:
-        _logger.warning("WARNING: ignoring call to use(%s), GPU number %i is already in use." %(str(device), use.device_number))
+        _logger.warning(("WARNING: ignoring call to use(%s), GPU number %i "
+            "is already in use.") %(str(device), use.device_number))
    if default_to_move_computation_to_gpu:
-        optdb.add_tags('gpu',
+        optdb.add_tags('gpu_opt',
+                       'fast_run',
+                       'inplace')
+        optdb.add_tags('gpu_after_fusion',
                       'fast_run',
                       'inplace')
    if force:
        try:
-            #in case the device if just gpu, we check that the driver init it correctly.
+            #in case the device if just gpu,
+            # we check that the driver init it correctly.
            cuda_ndarray.cuda_ndarray.CudaNdarray.zeros((5,5))
-        except (Exception, NameError), e:#NameError when no gpu present as cuda_ndarray is not loaded.
+        except (Exception, NameError), e:
-            e.args+=("ERROR: GPU did not work and we told to don't use the cpu. ",)
+            # NameError when no gpu present as cuda_ndarray is not loaded.
+            e.args+=("ERROR: GPU forced but failed. ",)
            raise
@@ -212,7 +235,8 @@ def handle_shared_float32(tf):
 if config.device.startswith('gpu'):
    use(device=config.device, force=config.force_device)
 elif config.init_gpu_device:
-    assert config.device=="cpu", "We can use the Theano flag init_gpu_device only when the Theano flag device=='cpu'"
+    assert config.device=="cpu", ("We can use the Theano flag init_gpu_device"
+            " only when the Theano flag device=='cpu'")
    warning(("GPU device %s will be initialized, and used if a GPU is needed. "
          "However, no computation, nor shared variables, will be implicitly "
          "moved to that device. If you want that behavior, use the 'device' "

--- a/theano/sandbox/cuda/basic_ops.py
+++ b/theano/sandbox/cuda/basic_ops.py
--- a/theano/sandbox/cuda/blas.py
+++ b/theano/sandbox/cuda/blas.py
@@ -363,14 +363,17 @@ class GpuConv(Op):
        return ['cuda_ndarray.cuh','<stdio.h>']
    def c_code_cache_version(self):
-        return (0,8)
+        return (0,13) # raise this whenever modifying any of the support_code_files
    def c_support_code_apply(self, node, nodename):
+        # REMEMBER TO RAISE c_code_cache_version when changing any of these files
        return open(os.path.join(os.path.split(__file__)[0],'conv_kernel.cu')).read()+\
            open(os.path.join(os.path.split(__file__)[0],'conv_full_kernel.cu')).read()+\
            open(os.path.join(os.path.split(__file__)[0],'conv.cu')).read()
-    def c_code(self, node, nodename, (img, kern), (out,), sub):
+    def c_code(self, node, nodename, inp, out_, sub):
+        img, kern = inp
+        out, = out_
        dx = self.subsample[0]
        dy = self.subsample[1]
        border_mode = self.border_mode
@@ -405,8 +408,7 @@ class GpuConv(Op):
    CudaNdarray * out2 = (CudaNdarray *)CudaNdarray_Conv(%(img)s, %(kern)s, %(out)s,
                     mode, dx, dy, version, verbose);
-    if(%(out)s && %(out)s==out2)
+    Py_XDECREF(%(out)s);
-         Py_DECREF(out2);//CudaNdarray_Conv incremented the count to out
    %(out)s = out2;
 """%sub
@@ -435,7 +437,9 @@ class GpuDownsampleFactorMax(Op):
        #raise NotImplementedError('only C is implemented')
    def c_code_cache_version(self):
        return (1)
-    def c_code(self, node, nodename, (x,), (z,), sub):
+    def c_code(self, node, nodename, inp, out, sub):
+        x, = inp
+        z, = out
        fail = sub['fail']
        ds0, ds1 = self.ds
        ignore_border = int(self.ignore_border)
@@ -586,7 +590,9 @@ class GpuDownsampleFactorMaxGrad(Op):
        #return ()
        return (3,)
-    def c_code(self, node, nodename, (x, z, gz), (gx,), sub):
+    def c_code(self, node, nodename, inp, out, sub):
+        x, z, gz = inp
+        gx, = out
        fail = sub['fail']
        ds0, ds1 = self.ds
        ignore_border = int(self.ignore_border)

--- a/theano/sandbox/cuda/conv.cu
+++ b/theano/sandbox/cuda/conv.cu
--- a/theano/sandbox/cuda/conv_kernel.cu
+++ b/theano/sandbox/cuda/conv_kernel.cu
+// REMEMBER TO RAISE c_code_cache_version when changing this file
+//
 //implement the valid convolution only
 /*
@@ -46,7 +48,8 @@ __device__ void load_to_shared(float * dst, const float * src, const int thread_
  if (nb_thread < 64)
    {
      if(flipped)
-        //TODO very slow on device before 1.3. make access to kern sequential and access to d_kern flipped.
+        //TODO very slow on device before 1.3.
+        //     make access to kern sequential and access to d_kern flipped.
        for(int i=thread_id;i<N;i+=nb_thread)
          dst[i]=src[N - 1 - i];
        //dst[N-1-i]=src[i];
@@ -88,10 +91,9 @@ __device__ void load_to_shared(float * dst, const float * src, const int thread_
 			       const bool flipped=false, const bool c_contiguous=true){
  if(flipped && ! c_contiguous){
    for(int i=thread_id;i<nb_row*nb_col;i+=nb_thread)
-      dst[nb_row*nb_col-1-i]=src[i/nb_col*stride_row+i%nb_col*stride_col];
+      dst[nb_row*nb_col-1-i]=src[(i/nb_col)*stride_row+(i%nb_col)*stride_col];
  }else if(c_contiguous){
    load_to_shared(dst, src, thread_id, nb_thread, nb_col*nb_row, flipped);
  }else if(flipped){//c_contiguous==true
    //TODO very slow on device before 1.3. make access to kern sequential and access to d_kern flipped.
    int N=nb_col*nb_row;
@@ -440,10 +442,12 @@ conv_patch_stack_reduce( float* img, float* kern, float* out,
 		  int kern_stride_col, int kern_stride_row,
 		  int kern_stride_stack, int kern_stride_nkern)
 {
-  int __shared__ out_len, out_wid, nb_thread_id;
+  //int __shared__ out_len, out_wid, nb_thread_id;
-  out_len = img_len - kern_len + 1;
+  //out_len = img_len - kern_len + 1;
-  out_wid = img_wid - kern_wid + 1;
+  //out_wid = img_wid - kern_wid + 1;
-  nb_thread_id = blockDim.z*blockDim.y*blockDim.x;
+  const int out_wid = blockDim.x;
+  const int out_len = blockDim.y;
+  const int nb_thread_id = blockDim.z*blockDim.y*blockDim.x;
  extern __shared__ float s_data[];
@@ -458,9 +462,16 @@ conv_patch_stack_reduce( float* img, float* kern, float* out,
    int out_row = ty;//output row
    const int thread_id  = tz*blockDim.y*blockDim.x+ty*blockDim.x+tx;
-    float * d_img=&s_data[0];//size of [IMAGE_LEN * IMAGE_WID];
+    //d_img size [IMAGE_LEN * IMAGE_WID];
-    float * d_kern=&s_data[img_len * img_wid];//size of [(preload_full_kern?KERNEL_LEN:blockDim.z) * KERNEL_WID];
+    float * d_img=&s_data[0];
-    float * d_reduce=&s_data[img_len*img_wid+(preload_full_kern?kern_len:blockDim.z)*kern_wid];
+    //d_kern size[(preload_full_kern?KERNEL_LEN:blockDim.z) * KERNEL_WID]
+    float * d_kern=&s_data[img_len * img_wid];
+    //d_reduce size [n_threads]
+    //N.B. this overlaps with d_img and d_kern!
+    float * d_reduce=&s_data[0];
    float sum = 0.0f;
    kern+=kern_stride_nkern*blockIdx.y;//the good nkern
@@ -471,30 +482,31 @@ conv_patch_stack_reduce( float* img, float* kern, float* out,
      __syncthreads();
      load_to_shared(d_img, img, thread_id, nb_thread_id, img_wid, img_len,
 		     img_stride_col, img_stride_row, false, c_contiguous);
-      if(!(split && ! preload_full_kern))
-	load_to_shared(d_kern, kern, thread_id, nb_thread_id, kern_wid, kern_len,
-		       kern_stride_col, kern_stride_row, flipped_kern, c_contiguous);
-      __syncthreads();
      if(split && ! preload_full_kern){
-	for(int first_row=0, row=tz;first_row<kern_len;row+=blockDim.z, first_row+=blockDim.z){
+	for(int first_row=0;first_row<kern_len;first_row+=blockDim.z){
-	  int idx3;
+            //N.B. - Jan 30, 2011 with CUDA 3.2 I found that without the explicit cast to
-	  //TODO: test/check for flipped_kern
+            // (int)blockDim.z, idx3 would sometimes be negative. I'm rusty on my signed vs. unsigned
-	  if(flipped_kern)
+            // details, but that seemed really weird. tricky bug to find too.
-	    idx3=(kern_len-(first_row)-blockDim.z);//the current last row flipped
+          int idx3 = flipped_kern
-	  else
+              ? max((kern_len - (int)blockDim.z - first_row),0)
-	    idx3=first_row;
+              : first_row;
+          int len3 = min(blockDim.z, kern_len - first_row);
 	  __syncthreads();
-	  load_to_shared(d_kern, kern+idx3*kern_stride_row, thread_id, nb_thread_id, kern_wid, blockDim.z,
+	  load_to_shared(d_kern, kern+idx3*kern_stride_row, thread_id, nb_thread_id, kern_wid, len3,
 			 kern_stride_col, kern_stride_row, flipped_kern, c_contiguous);
 	  __syncthreads();
-	  const float* idx_kern=&d_kern[tz*kern_stride_row];
+	  const float* idx_kern=&d_kern[tz*kern_wid];
-	  const float* idx_in=&d_img[(row+out_row)*img_wid+out_col];
+	  const float* idx_in=&d_img[(first_row+tz+out_row)*img_wid+out_col];
 	  float sum2 = 0;
-	  if(row<kern_len)
+	  if(tz<len3)
 	    convolutionRowNoFlip<KERN_WIDTH>(sum2,idx_in,idx_kern,kern_wid);
 	  sum+=sum2;
 	}
      }else if(split){
+	load_to_shared(d_kern, kern, thread_id, nb_thread_id, kern_wid, kern_len,
+		       kern_stride_col, kern_stride_row, flipped_kern, c_contiguous);
+        __syncthreads();
 	for(int row=tz;row<kern_len;row+=blockDim.z){
 	  const float* idx_kern=&d_kern[row*kern_wid];
 	  const float* idx_in=&d_img[(row+out_row)*img_wid+out_col];
@@ -504,18 +516,21 @@ conv_patch_stack_reduce( float* img, float* kern, float* out,
 	int row = tz;//The row of the kernel.
 	const float* idx_kern=&d_kern[row*kern_wid];
 	const float* idx_in=&d_img[(row+out_row)*img_wid+out_col];
+	load_to_shared(d_kern, kern, thread_id, nb_thread_id, kern_wid, kern_len,
+		       kern_stride_col, kern_stride_row, flipped_kern, c_contiguous);
+        __syncthreads();
 	convolutionRowNoFlip<KERN_WIDTH>(sum,idx_in,idx_kern,kern_wid);
      }
 	__syncthreads(); // ensure calculations have completed before any thread starts changing the shared memory
    }
-    //reduce
+    //reduce no sync because previous loop ends with sync
    d_reduce[thread_id]=sum;
    __syncthreads();
-    if(thread_id<out_len*out_wid){
+    if(thread_id<out_len*out_wid){ // blockDim.x==out_wid, blockDim.y==out_len
-      sum=0;
+      //sum=0;
-      for(int i=0;i<blockDim.z;i++){
+      for(int i=1;i<blockDim.z;i++){
-	sum+=d_reduce[thread_id+i*blockDim.x*blockDim.y];
+	sum+=d_reduce[thread_id+i*out_wid*out_len];
      }
      out[batch_id*out_wid*out_len*nkern+//the good batch
 	  out_wid*out_len*blockIdx.y+//the output image

--- a/theano/sandbox/cuda/cuda_ndarray.cu
+++ b/theano/sandbox/cuda/cuda_ndarray.cu
--- a/theano/sandbox/cuda/cuda_ndarray.cuh
+++ b/theano/sandbox/cuda/cuda_ndarray.cuh
@@ -403,6 +403,11 @@ int CudaNdarray_alloc_contiguous(CudaNdarray *self, const int nd, const inttype
            self->devdata = 0;
            return -1;
        }
+        if (0)
+            fprintf(stderr,
+                "Allocated devdata %p (self=%p)\n",
+                self->devdata,
+                self);
        self->data_allocated = size;
    }
    return 0;

--- a/theano/sandbox/cuda/nnet.py
+++ b/theano/sandbox/cuda/nnet.py
@@ -77,7 +77,9 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias (Op):
        """
-    def c_code(self, node, nodename, (x, b, y_idx), (nll, sm, am), sub):
+    def c_code(self, node, nodename, inp, out, sub):
+        x, b, y_idx = inp
+        nll, sm, am = out
        classname=self.__class__.__name__
        fail = sub['fail']
        sio = StringIO.StringIO()
@@ -191,7 +193,9 @@ class GpuCrossentropySoftmax1HotWithBiasDx (Op):
    def c_code_cache_version(self):
        return (3,)
        #return ()
-    def c_code(self, node, nodename, (dnll, sm, y_idx), (dx,), sub):
+    def c_code(self, node, nodename, inp, out, sub):
+        dnll, sm, y_idx = inp
+        dx, = out
        fail = sub['fail']
        return """
        if ((%(dnll)s->nd != 1)
@@ -306,7 +310,9 @@ class GpuSoftmax (Op):
    def c_code_cache_version(self):
        #return ()
        return (2,) + inline_softmax.code_version
-    def c_code(self, node, nodename, (x,), (z,), sub):
+    def c_code(self, node, nodename, inp, out, sub):
+        x, = inp
+        z, = out
        fail = sub['fail']
        return """
        if (%(x)s->nd != 2)
@@ -394,7 +400,9 @@ class GpuSoftmaxWithBias (Op):
        #return ()
        return (2,) + inline_softmax.code_version
-    def c_code(self, node, nodename, (x,b), (z,), sub):
+    def c_code(self, node, nodename, inp, out, sub):
+        x, b = inp
+        z, = out
        fail = sub['fail']
        return """
        if (%(x)s->nd != 2)

--- a/theano/sandbox/cuda/nvcc_compiler.py
+++ b/theano/sandbox/cuda/nvcc_compiler.py
@@ -181,6 +181,18 @@ def nvcc_module_compile_str(
            except ValueError, e:
                done = True
+    # Remove "-u Symbol" arguments, since they are usually not relevant
+    # for the new compilation, even if they were used for compiling python.
+    # If they are necessary, the nvcc syntax is "-U Symbol" with a capital U.
+    done = False
+    while not done:
+        try:
+            indexof = cmd.index('-u')
+            cmd.pop(indexof) # Remove -u
+            cmd.pop(indexof) # Remove argument to -u
+        except ValueError, e:
+            done = True
    #cmd.append("--ptxas-options=-v")  #uncomment this to see register and shared-mem requirements
    debug('Running cmd', ' '.join(cmd))
    orig_dir = os.getcwd()

--- a/theano/sandbox/cuda/opt.py
+++ b/theano/sandbox/cuda/opt.py
--- a/theano/sandbox/cuda/tests/test_basic_ops.py
+++ b/theano/sandbox/cuda/tests/test_basic_ops.py
--- a/theano/sandbox/cuda/tests/test_blas.py
+++ b/theano/sandbox/cuda/tests/test_blas.py
--- a/theano/sandbox/cuda/tests/test_conv_cuda_ndarray.py
+++ b/theano/sandbox/cuda/tests/test_conv_cuda_ndarray.py
--- a/theano/sandbox/cuda/tests/test_cuda_ndarray.py
+++ b/theano/sandbox/cuda/tests/test_cuda_ndarray.py
--- a/theano/sandbox/cuda/tests/test_mlp.py
+++ b/theano/sandbox/cuda/tests/test_mlp.py
--- a/theano/sandbox/cuda/tests/test_nnet.py
+++ b/theano/sandbox/cuda/tests/test_nnet.py
--- a/theano/sandbox/cuda/tests/test_vector_matrix_dot.py
+++ b/theano/sandbox/cuda/tests/test_vector_matrix_dot.py
--- a/theano/sandbox/cuda/type.py
+++ b/theano/sandbox/cuda/type.py
--- a/theano/sandbox/cuda/var.py
+++ b/theano/sandbox/cuda/var.py
--- a/theano/sandbox/fourier.py
+++ b/theano/sandbox/fourier.py
--- a/theano/sandbox/minimal.py
+++ b/theano/sandbox/minimal.py
--- a/theano/sandbox/multinomial.py
+++ b/theano/sandbox/multinomial.py
--- a/theano/sandbox/neighbourhoods.py
+++ b/theano/sandbox/neighbourhoods.py
--- a/theano/sandbox/neighbours.py
+++ b/theano/sandbox/neighbours.py
--- a/theano/sandbox/rng_mrg.py
+++ b/theano/sandbox/rng_mrg.py
--- a/theano/sandbox/softsign.py
+++ b/theano/sandbox/softsign.py
--- a/theano/sandbox/solve.py
+++ b/theano/sandbox/solve.py
--- a/theano/sandbox/test_multinomial.py
+++ b/theano/sandbox/test_multinomial.py
--- a/theano/sandbox/test_neighbours.py
+++ b/theano/sandbox/test_neighbours.py
--- a/theano/sandbox/test_rng_mrg.py
+++ b/theano/sandbox/test_rng_mrg.py
--- a/theano/scalar/basic_scipy.py
+++ b/theano/scalar/basic_scipy.py
--- a/theano/scan.py
+++ b/theano/scan.py
--- a/theano/sparse/sandbox/truedot.py
+++ b/theano/sparse/sandbox/truedot.py
--- a/theano/sparse/tests/test_basic.py
+++ b/theano/sparse/tests/test_basic.py
--- a/theano/tensor/basic.py
+++ b/theano/tensor/basic.py
--- a/theano/tensor/blas.py
+++ b/theano/tensor/blas.py
--- a/theano/tensor/elemwise.py
+++ b/theano/tensor/elemwise.py
--- a/theano/tensor/nnet/Conv3D.py
+++ b/theano/tensor/nnet/Conv3D.py
--- a/theano/tensor/nnet/ConvGrad3D.py
+++ b/theano/tensor/nnet/ConvGrad3D.py
--- a/theano/tensor/nnet/ConvTransp3D.py
+++ b/theano/tensor/nnet/ConvTransp3D.py
--- a/theano/tensor/nnet/conv.py
+++ b/theano/tensor/nnet/conv.py
--- a/theano/tensor/nnet/nnet.py
+++ b/theano/tensor/nnet/nnet.py
--- a/theano/tensor/nnet/sigm.py
+++ b/theano/tensor/nnet/sigm.py
--- a/theano/tensor/nnet/tests/test_conv.py
+++ b/theano/tensor/nnet/tests/test_conv.py
--- a/theano/tensor/nnet/tests/test_nnet.py
+++ b/theano/tensor/nnet/tests/test_nnet.py
--- a/theano/tensor/opt.py
+++ b/theano/tensor/opt.py
--- a/theano/tensor/raw_random.py
+++ b/theano/tensor/raw_random.py
--- a/theano/tensor/shared_randomstreams.py
+++ b/theano/tensor/shared_randomstreams.py
--- a/theano/tensor/signal/downsample.py
+++ b/theano/tensor/signal/downsample.py
--- a/theano/tensor/signal/tests/test_downsample.py
+++ b/theano/tensor/signal/tests/test_downsample.py
--- a/theano/tensor/tests/mlp_test.py
+++ b/theano/tensor/tests/mlp_test.py
--- a/theano/tensor/tests/test_basic.py
+++ b/theano/tensor/tests/test_basic.py
--- a/theano/tensor/tests/test_blas.py
+++ b/theano/tensor/tests/test_blas.py
--- a/theano/tensor/tests/test_complex.py
+++ b/theano/tensor/tests/test_complex.py
--- a/theano/tensor/tests/test_opt.py
+++ b/theano/tensor/tests/test_opt.py
--- a/theano/tensor/tests/test_randomstreams.py
+++ b/theano/tensor/tests/test_randomstreams.py
--- a/theano/tensor/tests/test_raw_random.py
+++ b/theano/tensor/tests/test_raw_random.py
--- a/theano/tensor/tests/test_shared_randomstreams.py
+++ b/theano/tensor/tests/test_shared_randomstreams.py
--- a/theano/tensor/tests/test_sharedvar.py
+++ b/theano/tensor/tests/test_sharedvar.py
--- a/theano/tensor/xlogx.py
+++ b/theano/tensor/xlogx.py
--- a/theano/tests/test_gradient.py
+++ b/theano/tests/test_gradient.py
--- a/theano/tests/test_scan.py
+++ b/theano/tests/test_scan.py
--- a/theano/tests/test_tutorial.py
+++ b/theano/tests/test_tutorial.py