merge

45f07e93 · James Bergstra · ea627155 · b60891b6 · 45f07e93 · 45f07e93
--- a/doc/extending/pipeline.txt
+++ b/doc/extending/pipeline.txt
@@ -101,14 +101,13 @@ case if ``borrow`` was True, the thunk would be allowed to reuse (or
    Compiled libraries are stored within a specific compilation directory,
    which by default is set to ``$HOME/.theano/compiledir_xxx``, where
    ``xxx`` identifies the platform. It may be manually set to a different
-    location either by setting the ``THEANO_COMPILEDIR`` environment variable,
-    the ``THEANO_BASE_COMPILEDIR`` environment variable
-    or by calling ``theano.gof.compiledir.set_compiledir(..)`` within your
-    Python script.
+    location either by setting :attr:`config.compiledir` or
+    :attr:`config.base_compiledir`, either within your Python script or by
+    using one of the configuration mechanisms described in :mod:`config`.

    The compile cache is based upon the C++ code of the graph to be compiled.
-    So, if you change compilation environment variables, such as
-    ``THEANO_BLAS_LDFLAGS``, you will need to manually remove your compile cache,
+    So, if you change compilation configuration variables, such as
+    :attr:`config.blas.ldflags`, you will need to manually remove your compile cache,
    using ``Theano/bin/theano-compiledir clear``

    Theano also implements a lock mechanism that prevents

--- a/doc/extending/unittest.txt
+++ b/doc/extending/unittest.txt
@@ -260,21 +260,22 @@ Example:
 >>> m = theano.Module()
 >>> minstance = m.make(mode='DEBUG_MODE')

-Whenever possible, unit tests should omit this parameter. Leaving-out
-the mode will ensure that unit tests use the default mode (defined in
-compile.mode.default_mode). This default_mode is set to the
-THEANO_DEFAULT_MODE environment variable, if it is present. If not, it
-defaults to 'FAST_RUN'.
-
-This allows the user to easily switch the mode in which unittests are
+Whenever possible, unit tests should omit this parameter. Leaving
+out the mode will ensure that unit tests use the default mode
+(defined in compile.mode.default_mode). This default_mode is set to
+the configuration variable :attr:`config.mode`, which defaults to
+'FAST_RUN', and can be set by various mechanisms (see :mod:`config`).
+
+In particular, the enviromnment variable :envvar:`THEANO_FLAGS`
+allows the user to easily switch the mode in which unittests are
 run. For example to run all tests in all modes from a BASH script,
 type this:

 .. code-block:: bash

-    THEANO_DEFAULT_MODE=FAST_COMPILE nosetests
-    THEANO_DEFAULT_MODE=FAST_RUN nosetests
-    THEANO_DEFAULT_MODE=DEBUG_MODE nosetests
+    THEANO_FLAGS='mode=FAST_COMPILE' nosetests
+    THEANO_FLAGS='mode=FAST_RUN' nosetests
+    THEANO_FLAGS='mode=DEBUG_MODE' nosetests

 Using Random Values in Test Cases
 ---------------------------------
@@ -299,14 +300,12 @@ do the following:

 The behaviour of seed_rng is as follows:

-* If an explicit seed is given, it will be used for seending numpy's rng.
-
-* If not, it will try to get a seed from the THEANO_UNITTEST_SEED variable.
+* If an explicit seed is given, it will be used for seeding numpy's rng.

-* If THEANO_UNITTEST_SEED is set to "random", it will seed the
-  rng. with None, which is equivalent to seeding with a random seed.
+* If not, it will use ``config.unittest.rseed`` (its default value is 666).

-* If THEANO_UNITTEST_SEED is not defined, it will use a default seed of 666.
+* If config.unittest.rseed is set to "random", it will seed the rng with
+  None, which is equivalent to seeding with a random seed.


 The main advantage of using unittest_tools.seed_rng is that it allows
@@ -317,7 +316,8 @@ a higher confidence that the variables are correct), while still
 making sure unittests are deterministic.

 Users who prefer their unittests to be random (when run on their local
-machine) can simply set THEANO_UNITTEST_SEED to 'random'.
+machine) can simply set ``config.unittest.rseed`` to 'random' (see
+:mod:`config`).

 Similarly, to provide a seed to numpy.random.RandomState, simply use:


--- a/doc/index.txt
+++ b/doc/index.txt
@@ -40,7 +40,7 @@ Roughly in order of what you'll want to check out:
 * :ref:`internal` -- How to maintaining Theano, LISA-specific tips, and more...
 * `API <api/>`_ -- The automatically-generated API

-You can download the latest `PDF documentation <http://deeplearning.net/theanodoc/theano.pdf>`_, rather than reading it online.
+You can download the latest `PDF documentation <http://deeplearning.net/software/theano/theano.pdf>`_, rather than reading it online.

 Community
 =========

--- a/doc/install.txt
+++ b/doc/install.txt
@@ -339,9 +339,9 @@ Generating the documentation
 ----------------------------

 You can read the latest HTML documentation `here
-<http://deeplearning.net/theanodoc>`__.
+<http://deeplearning.net/software/theano>`__.
 You can download the latest PDF documentation `here
-<http://deeplearning.net/theanodoc/theano.pdf>`__.
+<http://deeplearning.net/software/theano/theano.pdf>`__.

 We recommend you look at the documentation on the website, since it
 will be more current than the documentation included with the package.

--- a/doc/internal/lisa_labo.txt
+++ b/doc/internal/lisa_labo.txt
@@ -8,8 +8,12 @@ LISA Labo specific instructions
 Tips for running at LISA
 ------------------------

-Use the fast BLAS library that Fred installed, by setting
-`THEANO_BLAS_LDFLAGS=-lgoto`.
+Shell configuration files ``/opt/lisa/os/.local.{bash,csh}rc`` should define
+:envvar:`THEANORC` to include ``/opt/lisa/os/.local.theanorc`` as a
+configuration file.
+
+``/opt/lisa/os/.local.theanorc`` should include the right default values for
+the lab, in particular, ``blas.ldflags`` should contain '-lgoto'.

 Tips for running on a cluster
 -----------------------------

--- a/doc/internal/mammouth.txt
+++ b/doc/internal/mammouth.txt
@@ -14,7 +14,8 @@ To run Theano on the Mammouth cluster, follow these simple steps:

      Perhaps even put this in your ``.bashrc``

-    * ``set THEANO_BLAS_LDFLAGS='-lmkl -lguide -fopenmp'``
+    * set ``config.blas.ldflags`` to ``'-lmkl -lguide -fopenmp'``
+      (see :mod:`config` to know how)

      Note: the -lguide flag works, however the fix should probably be considered temporary.
      Intel has deprecated libguide.so in favor of the newer library libiomp5.so. However, 

--- a/doc/internal/metadocumentation.txt
+++ b/doc/internal/metadocumentation.txt
@@ -145,7 +145,7 @@ Then it executes something like

 .. code-block:: bash

-    THEANO_UNITTEST_SEED=<SEED> THEANO_DEFAULT_MODE=DEBUG_MODE /usr/bin/nosetests --with-coverage --cover-package=theano --cover-package=pylearn
+    THEANO_FLAGS='unittests.rseed=<SEED>,mode=DEBUG_MODE' /usr/bin/nosetests --with-coverage --cover-package=theano --cover-package=pylearn

 in the updated ``theano`` directory.
 The output is emailed automatically to one of the developers.

--- a/doc/introduction.txt
+++ b/doc/introduction.txt
@@ -130,7 +130,7 @@ Getting started


 A PDF version of the online documentation may be found `here
-<http://deeplearning.net/theanodoc/theano.pdf>`_.
+<http://deeplearning.net/software/theano/theano.pdf>`_.


 Contact us

--- a/doc/library/compile/debugmode.txt
+++ b/doc/library/compile/debugmode.txt
@@ -35,7 +35,7 @@ DebugMode can be used as follows:
    f(0) 
    f(7) 

-It can also be used by setting an environment variable ``THEANO_DEFAULT_MODE=DEBUG_MODE``.
+It can also be used by setting the configuration variable :attr:`config.mode`.
 It can also be used by passing a DebugMode instance as the mode, as in 

 >>> f = theano.function([x], 10*x, mode=DebugMode(check_c_code=False))
@@ -78,47 +78,47 @@ Reference
    Each of these exceptions inherits from the more generic `DebugModeError`.

    If there are no internal errors, this mode behaves like FAST_RUN or FAST_COMPILE, but takes
-    a little longer and uses more memory.  
+    a little longer and uses more memory.

    If there are internal errors, this mode will raise an `DebugModeError` exception.

-    .. attribute:: stability_patience = config.THEANO_DEBUGMODE_PATIENCE
+    .. attribute:: stability_patience = config.DebugMode.patience

        When checking for the stability of optimization, recompile the graph this many times.
        Default 10.

-    .. attribute:: check_c_code = config.THEANO_DEBUGMODE_CHECK_C
+    .. attribute:: check_c_code = config.DebugMode.check_c

        Should we evaluate (and check) the `c_code` implementations?

        ``True`` -> yes, ``False`` -> no.
-        
+
        Default yes.

-    .. attribute:: check_py_code = config.THEANO_DEBUGMODE_CHECK_PY
+    .. attribute:: check_py_code = config.DebugMode.check_py

    Should we evaluate (and check) the `perform` implementations?

        ``True`` -> yes, ``False`` -> no.
-        
+
        Default yes.

-    .. attribute:: check_isfinite = config.THEANO_DEBUGMODE_CHECK_FINITE
+    .. attribute:: check_isfinite = config.DebugMode.check_finite

        Should we check for (and complain about) ``NaN``/``Inf`` ndarray elements?

        ``True`` -> yes, ``False`` -> no.
-        
+
        Default yes.

-    .. attribute:: require_matching_strides = config.THEANO_DEBUGMODE_CHECK_STRIDES
+    .. attribute:: require_matching_strides = config.DebugMode.check_strides

        Check for (and complain about) Ops whose python and C
        outputs are ndarrays with different strides. (This can catch bugs, but
-        is generally overly strict.) 
-        
+        is generally overly strict.)
+
        0 -> no check, 1 -> warn, 2 -> err.
-        
+
        Default warn.

    .. method:: __init__(self, optimizer='fast_run', stability_patience=None, check_c_code=None, check_py_code=None, check_isfinite=None, require_matching_strides=None, linker=None)
@@ -128,7 +128,7 @@ Reference
        If any of these arguments (except optimizer) is not None, it overrides the class default.
        The linker arguments is not used. It is set their to allow Mode.requiring() and some other fct to work with DebugMode too.

-    
+

 The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``)
 is quite strict, and can raise several different Exception types.

--- a/doc/library/compile/function.txt
+++ b/doc/library/compile/function.txt
@@ -134,7 +134,7 @@ Reference
    about how output variables should be returned.

    The default is typically 'FAST_RUN' but this can be changed in
-    :doc:`theano.config <../config>` or via :envvar:`THEANO_DEFAULT_MODE`.  The mode
+    :doc:`theano.config <../config>`.  The mode
    argument controls the sort of optimizations that will be applied to the
    graph, and the way the optimized graph will be evaluated.


--- a/doc/library/compile/mode.txt
+++ b/doc/library/compile/mode.txt
@@ -10,21 +10,21 @@
 Guide
 =====

-The ``mode`` parameter to :func:`theano.function`` controls how the
+The ``mode`` parameter to :func:`theano.function` controls how the
 inputs-to-outputs graph is transformed into a callable object.

 Theano defines the following modes by name:

- ``FAST_COMPILE``: Apply just a few optimizations, but use C op implementations where possible.
- ``FAST_RUN``: Apply all optimizations, and use C op implementations where possible.
- ``DEBUG_MODE``: Verify the correctness of all optimizations, and compare C and python 
+- ``'FAST_COMPILE'``: Apply just a few graph optimizations, but use C implementations where possible.
+- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
+- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python 
    implementations. This mode can take much longer than the other modes, 
    but can identify many kinds of problems.

-The default mode is typically 'FAST_RUN', but it can be controlled via the
-environment variable 'THEANO_DEFAULT_MODE', which can in turn be overridden by
+The default mode is typically ``FAST_RUN``, but it can be controlled via the
+configuration variable :attr:`config.mode`, which can in turn be overridden by
 setting ``theano.compile.mode.default_mode`` directly, which can in turn be
-overridden by passing the keyword argument to ``theano.function``.
+overridden by passing the keyword argument to :func:`theano.function`.

 .. TODO::


--- a/doc/library/floatX.txt
+++ b/doc/library/floatX.txt
-
-.. _libdoc_floatX:
-
-=======================================================================
-:mod:`floatX` -- Switching Between 'float32' and 'float64'
-=======================================================================
-
-.. module:: floatX
-   :platform: Unix, Windows
-   :synopsis: easy switching between float32 and float64
-.. moduleauthor:: LISA
-
-
-Guide
-=====
-
-On the CPU, 'float32' computations are often twice as fast as 'float64'
-and are half the size.
-On GPUs the speed difference between 'float32'``  and 'float64' is much greater.
-Often we develop our code using double-precision expressions, and then wonder if
-we might get the same answer much more quickly with single-precision arithmetic.
-If we have used ``tensor.dmatrix`` and ``tensor.dvector`` and so on throughout
-our code, it could be tedious to switch to single-precision Variables.  To make
-switching precisions easier, Theano provides the ``floatX`` module.
-
->>> from theano.floatX import xmatrix, xvector, xtensor4
->>> import numpy
->>> a = xvector('a')
->>> b = xmatrix()
->>> c = xtensor4()
-
-These calls are identical to ``dvector``, ``dmatrix``, and ``dtensor4`` by default, but a
-single environment variable can switch them to ``fvector``, ``fmatrix`` and ``ftensor4``.
-
-You can set the floatX precision via ``floatX`` in the :envvar:`THEANO_FLAGS`.
-It defaults to ``'float64'``. To set it to ``'float32'`` in *bash* for example, type ``export THEANO_FLAGS=floatX=float64``.
-To set it from within your program call :func:`set_floatX`
-
-The current floatX precision is stored in ``theano.config.floatX`` as a string.
-Its value is either 'float32' or 'float64'. 
-So it is easy to allocate a numpy vector of the floatX dtype.
-
->>> import theano.config as config
->>> print config.floatX   # either 'float32' or 'float64'
->>> x = numpy.asarray([1,2,3], dtype=config.floatX)
-
-Reference
-==========
-
-.. function:: xscalar(name=None)
-
-    Alias for either :func:`dscalar` or :func:`fscalar`
-
-.. function:: xvector(name=None)
-
-    Alias for either :func:`dvector` or :func:`fvector`
-
-.. function:: xmatrix(name=None)
-
-    Alias for either :func:`dmatrix` or :func:`fmatrix`
-
-.. function:: xrow(name=None)
-
-    Alias for either :func:`drow` or :func:`frow`
-
-.. function:: xcol(name=None)
-
-    Alias for either :func:`dcol` or :func:`fcol`
-
-.. function:: xtensor3(name=None)
-
-    Alias for either :func:`dtensor3` or :func:`ftensor3`
-
-.. function:: xtensor4(name=None)
-
-    Alias for either :func:`dtensor4` or :func:`ftensor4`
-
-.. function:: set_floatX(dtype=config.floatX)
-
-    Reset the :func:`xscalar`, ... :func:`xtensor4` aliases to return Variables with given dtype.
-    This is called at import-time when setting floatX in :envvar:`THEANO_FLAGS`.
-
-
-
--- a/doc/library/tensor/basic.txt
+++ b/doc/library/tensor/basic.txt
@@ -263,7 +263,7 @@ them perfectly, but a dscalar otherwise.

 .. note::

-    When config.floatX==float32 (see :module:`config`), then Python floats
+    When config.floatX==float32 (see :mod:`config`), then Python floats
    are stored instead as single-precision floats.

    For fine control of this rounding policy, see

--- a/doc/tutorial/modes.txt
+++ b/doc/tutorial/modes.txt
@@ -22,7 +22,7 @@ Theano defines the following modes by name:
    but can identify many kinds of problems.

 The default mode is typically ``FAST_RUN``, but it can be controlled via
-the environment variable ``THEANO_DEFAULT_MODE``, which can in turn be
+the configuration variable :attr:`config.mode`, which can in turn be
 overridden by setting `theano.compile.mode.default_mode` directly,
 which can in turn be overridden by passing the keyword argument to
 :func:`theano.function <function.function>`.
@@ -30,7 +30,6 @@ which can in turn be overridden by passing the keyword argument to
 ================= =============================================================== ===============================================================================
 short name        Full constructor                                                What does it do?
 ================= =============================================================== ===============================================================================
-(default)         ``compile.mode.Mode(linker='py', optimizer=None)``              Python implementations with zero graph modifications.
 FAST_COMPILE      ``compile.mode.Mode(linker='c|py', optimizer='fast_compile')``  C implementations where available, quick and cheap graph transformations
 FAST_RUN          ``compile.mode.Mode(linker='c|py', optimizer='fast_run')``      C implementations where available, all available graph transformations.
 DEBUG_MODE        ``compile.debugmode.DebugMode()``                               Both implementations where available, all available graph transformations.

--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt
@@ -16,7 +16,7 @@ Setting up CUDA

 The first thing you'll need for Theano to use your GPU is Nvidia's
 GPU-programming toolchain.  You should install at least the CUDA driver and the CUDA Toolkit, as 
-:ref:`described here <http://www.nvidia.com/object/cuda_get.html>`.  The CUDA
+`described here <http://www.nvidia.com/object/cuda_get.html>`_.  The CUDA
 Toolkit installs a folder on your computer with subfolders *bin*, *lib*,
 *include*, and some more too.  (Sanity check: The *bin* subfolder should contain an *nvcc*
 program which is the compiler for GPU code.)  This folder is called the *cuda

--- a/theano/compile/mode.py
+++ b/theano/compile/mode.py
@@ -221,12 +221,8 @@ predefined_modes = {'FAST_COMPILE': FAST_COMPILE,
                    'SANITY_CHECK': SANITY_CHECK}


-##
-# The default mode used by functions and modules is read from the environment
-# variable THEANO_DEFAULT_MODE. Unit tests will run using this value. If the env. var.
-# is not set, it will default to 'FAST_RUN'
+# The default mode used by functions and modules is read from the configuration.
 # keep default_mode.optimizer==default_optimizer and default_mode.linker==default_linker!
-##
 default_mode = config.mode

 def get_mode(string):

--- a/theano/compile/profilemode.py
+++ b/theano/compile/profilemode.py
@@ -319,10 +319,10 @@ class ProfileMode(Mode):
 register_mode('PROFILE_MODE',ProfileMode())

 def atexit_print_default_profile_mode():
-    """Print the summary of the predefied mode PROFILE_MODE if used.
+    """Print the summary of the predefined mode PROFILE_MODE if used.
    
-    This all to have the summary printed at exit when we do
-    THEANO_DEFAULT_MODE=PROFILE_MODE
+    This all to have the summary printed at exit when
+    config.mode=PROFILE_MODE
    """
    prof_mode=predefined_modes["PROFILE_MODE"]
    if prof_mode.local_time[0]>0:

--- a/theano/sandbox/cuda/__init__.py
+++ b/theano/sandbox/cuda/__init__.py
@@ -18,16 +18,19 @@ def debug(*msg):

 # Compile cuda_ndarray.cu
 # This need that nvcc (part of cuda) is installed. If it is not, a warning is
-# printed and this module will not be working properly (we set `enable_cuda`
+# printed and this module will not be working properly (we set `cuda_available`
 # to False).

 # This variable is True by default, and set to False if something goes wrong
 # when trying to initialize cuda.
-enable_cuda = True
+cuda_available = True

 # Global variable to avoid displaying the same warning multiple times.
 cuda_warning_is_displayed = False

+#This variable is set to True when we enable the cuda.(i.e. when use() is called)
+cuda_enabled = False
+
 # Code factorized within a function so that it may be called from multiple
 # places (which is not currently the case, but may be useful in the future).
 def set_cuda_disabled():
@@ -38,8 +41,8 @@ def set_cuda_disabled():
    Note that there is no point calling this function from outside of
    `cuda.__init__`, since it has no effect once the module is loaded.
    """
-    global enable_cuda, cuda_warning_is_displayed
-    enable_cuda = False
+    global cuda_available, cuda_warning_is_displayed
+    cuda_available = False
    if not cuda_warning_is_displayed:
        cuda_warning_is_displayed = True
        warning('Cuda is disabled, cuda-based code will thus not be '
@@ -70,7 +73,7 @@ try:
        if not nvcc_compiler.is_nvcc_available():
            set_cuda_disabled()

-        if enable_cuda:
+        if cuda_available:
            code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read()

            if not os.path.exists(cuda_ndarray_loc):
@@ -84,7 +87,7 @@ except Exception, e:
    error( "Failed to compile cuda_ndarray.cu: %s" % str(e))
    set_cuda_disabled()

-if enable_cuda:
+if cuda_available:
    #check if their is an old cuda_ndarray that was loading instead of the one we compiled!
    import cuda_ndarray.cuda_ndarray
    if os.path.join(config.compiledir,'cuda_ndarray','cuda_ndarray.so')!=cuda_ndarray.cuda_ndarray.__file__:
@@ -104,7 +107,8 @@ if enable_cuda:
    import cuda_ndarray


-def use(device=config.device):
+def use(device):
+    global cuda_enabled, enabled_cuda
    if device.startswith('gpu'):
        device = int(device[3:])
    elif device == 'cpu':
@@ -122,8 +126,10 @@ def use(device=config.device):
            gpu_init(device)
            handle_shared_float32(True)
            use.device_number = device
+            cuda_enabled = True
        except RuntimeError, e:
            _logger.warning("ERROR: Not using GPU. Initialisation of device %i failed. %s" %(device, e))
+            enabled_cuda = False
    elif use.device_number != device:
        logging.getLogger('theano.sandbox.cuda').warning("WARNING: ignoring call to use(%s), GPU number %i is already in use." %(str(device), use.device_number))
    optdb.add_tags('gpu',
@@ -144,5 +150,6 @@ def handle_shared_float32(tf):
    else:
        raise NotImplementedError('removing our handler')

-if enable_cuda and config.device.startswith('gpu'):
-    use()
+if cuda_available and config.device.startswith('gpu'):
+    use(config.device)
+
--- a/theano/sandbox/cuda/opt.py
+++ b/theano/sandbox/cuda/opt.py
@@ -381,11 +381,11 @@ def local_gpu_conv(node):
            gpu_conv = GpuConvOp_from_ConvOp(node.op)
            return [host_from_gpu(gpu_conv(gpu_from_host(img), gpu_from_host(kern)))]

-import theano.sandbox.downsample
+import theano.tensor.signal.downsample as downsample
 @register_opt()
 @local_optimizer([])
 def local_gpu_downsample_factor_max(node):
-    if isinstance(node.op, theano.sandbox.downsample.DownsampleFactorMax):
+    if isinstance(node.op, downsample.DownsampleFactorMax):
        x, = node.inputs
        if (x.owner and x.owner.op == host_from_gpu):
            gpu_ds = GpuDownsampleFactorMax(node.op.ds, node.op.ignore_border)
@@ -394,7 +394,7 @@ def local_gpu_downsample_factor_max(node):
 @register_opt()
 @local_optimizer([])
 def local_gpu_downsample_factor_max_grad(node):
-    if isinstance(node.op, theano.sandbox.downsample.DownsampleFactorMaxGrad):
+    if isinstance(node.op, downsample.DownsampleFactorMaxGrad):
        x,z,gz = node.inputs
        if (x.owner and x.owner.op == host_from_gpu):
            gpu_ds_grad = GpuDownsampleFactorMaxGrad(node.op.ds, node.op.ignore_border)

--- a/theano/sandbox/cuda/tests/test_basic_ops.py
+++ b/theano/sandbox/cuda/tests/test_basic_ops.py
@@ -11,7 +11,7 @@ import theano.tensor as T
 # Skip test if cuda_ndarray is not available.
 from nose.plugins.skip import SkipTest
 import theano.sandbox.cuda as cuda_ndarray
-if cuda_ndarray.enable_cuda == False:
+if cuda_ndarray.cuda_enabled == False:
    raise SkipTest('Optional package cuda disabled')

 import theano.sandbox.cuda as tcn

--- a/theano/sandbox/cuda/tests/test_bench_loopfusion.py
+++ b/theano/sandbox/cuda/tests/test_bench_loopfusion.py
@@ -270,7 +270,7 @@ def test_bench_elemwise(n_iter=1000, **kwargs):
        # Skip test if cuda_ndarray is not available.
        from nose.plugins.skip import SkipTest
        import theano.sandbox.cuda as cuda_ndarray
-        if cuda_ndarray.enable_cuda == False:
+        if cuda_ndarray.cuda_enabled == False:
            raise SkipTest('Optional package cuda disabled')
        import theano.sandbox.cuda
        theano.sandbox.cuda.use()

--- a/theano/sandbox/cuda/tests/test_blas.py
+++ b/theano/sandbox/cuda/tests/test_blas.py
@@ -8,12 +8,12 @@ import numpy
 # Skip test if cuda_ndarray is not available.
 from nose.plugins.skip import SkipTest
 import theano.sandbox.cuda as cuda_ndarray
-if cuda_ndarray.enable_cuda == False:
+if cuda_ndarray.cuda_enabled == False:
    raise SkipTest('Optional package cuda disabled')

 import theano.sandbox.cuda as tcn

-from theano.sandbox.downsample import DownsampleFactorMax
+from theano.tensor.signal.downsample import DownsampleFactorMax

 import theano.compile.mode


--- a/theano/sandbox/cuda/tests/test_conv_cuda_ndarray.py
+++ b/theano/sandbox/cuda/tests/test_conv_cuda_ndarray.py
@@ -5,7 +5,7 @@ import theano
 # Skip test if cuda_ndarray is not available.
 from nose.plugins.skip import SkipTest
 import theano.sandbox.cuda as cuda_ndarray
-if cuda_ndarray.enable_cuda == False:
+if cuda_ndarray.cuda_enabled == False:
    raise SkipTest('Optional package cuda disabled')
    
 cuda_tensor4 = cuda_ndarray.CudaNdarrayType([False]*4)

--- a/theano/sandbox/cuda/tests/test_cuda_ndarray.py
+++ b/theano/sandbox/cuda/tests/test_cuda_ndarray.py
@@ -3,7 +3,7 @@ import theano
 import theano.sandbox.cuda as cuda_ndarray
 # Skip test if cuda_ndarray is not available.
 from nose.plugins.skip import SkipTest
-if cuda_ndarray.enable_cuda == False:
+if cuda_ndarray.cuda_enabled == False:
        raise SkipTest('Optional package cuda disabled')
 import numpy


--- a/theano/sandbox/cuda/tests/test_nnet.py
+++ b/theano/sandbox/cuda/tests/test_nnet.py
@@ -7,14 +7,14 @@ from theano import tensor
 import theano.tensor.nnet

 import theano.sandbox.conv
-import theano.sandbox.downsample
+import theano.tensor.signal.downsample as downsample

 import numpy

 # Skip test if cuda_ndarray is not available.
 from nose.plugins.skip import SkipTest
 import theano.sandbox.cuda as cuda_ndarray
-if cuda_ndarray.enable_cuda == False:
+if cuda_ndarray.cuda_enabled == False:
    raise SkipTest('Optional package cuda disabled')

 import theano.sandbox.cuda as tcn
@@ -307,7 +307,7 @@ def run_conv_nnet2_classif(use_gpu, isize, ksize, n_batch, n_iter,
    conv_op.set_flops()
    conv_op1.set_flops()

-    ds_op = theano.sandbox.downsample.DownsampleFactorMax((2,2), ignore_border=False)
+    ds_op = downsample.DownsampleFactorMax((2,2), ignore_border=False)
    if downsample_ops:
        hid = tensor.tanh(ds_op(conv_op(x, w0)+b0.dimshuffle((0,'x','x'))))
    else:

--- a/theano/sandbox/cuda/tests/test_opt.py
+++ b/theano/sandbox/cuda/tests/test_opt.py
@@ -8,7 +8,7 @@ import numpy
 # Skip test if cuda_ndarray is not available.
 from nose.plugins.skip import SkipTest
 import theano.sandbox.cuda as cuda_ndarray
-if cuda_ndarray.enable_cuda == False:
+if cuda_ndarray.cuda_available == False:
    raise SkipTest('Optional package cuda disabled')

 import theano.compile.mode

--- a/theano/sandbox/downsample.py
+++ b/theano/sandbox/downsample.py
--- a/theano/sandbox/test_downsample.py
+++ b/theano/sandbox/test_downsample.py
-import unittest, sys, time
-import numpy as N
-import theano.tensor as T
-from theano.tests import unittest_tools as utt
-from theano.sandbox.downsample import DownsampleFactorMax
-from theano import function, Mode
-
-def max_pool(images=None, imshp=None, maxpoolshp=None, ignore_border=True):
-    """Implements a max pooling layer
-
-    Uses the same API as sp.max_pool but uses the Downsample op instead.
-
-    Takes as input a 2D tensor of shape batch_size x img_size and performs max pooling.
-    Max pooling downsamples by taking the max value in a given area, here defined by
-    maxpoolshp. Outputs a 2D tensor of shape batch_size x output_size.
-
-    Parameters are keyword arguments in order to use func_to_mod.
-
-    @param images: 2D tensor containing images on which to apply convolution.
-                   Assumed to be of shape batch_size x img_size
-    @param imgshp: tuple containing image dimensions
-    @param maxpoolshp: tuple containing shape of area to max pool over
-    
-    @output out1: symbolic result (2D tensor)
-    @output out2: logical shape of the output
-
-    """
-    if len(imshp) == 2:
-        imshp = (1,) + imshp
-    elif len(imshp)!=3:
-        raise NotImplementedError("!")
-    
-    # all these reshapes should happen in place
-    imrshp = T.stack(images.shape[0],
-                          *[T.as_tensor(x) for x in imshp])
-    imtensor = T.reshape(images, imrshp)
-
-    maxpop = DownsampleFactorMax(maxpoolshp, ignore_border)
-    rval = maxpop(imtensor)
-
-    return T.flatten(rval,2), maxpop.out_shape(imshp, maxpoolshp, ignore_border)
-
-class TestDownsampleFactorMax(unittest.TestCase):
-    def test_maxpool(self):
-        # generate flatted images
-        maxpoolshps = ((1,1),(2,2),(3,3),(2,3))
-        imval = N.random.rand(4,10,64,64)
-        images = T.dmatrix()
-        dmatrix4=T.TensorType('float64', (False, False, False, False))
-        images4=dmatrix4()
-        tctot, tpytot, ntot = [],[],[]
-        for maxpoolshp in maxpoolshps:
-            for border in [True,False]:
-                print 'maxpoolshp', maxpoolshp,'border', border
-           
-                # numeric verification
-                xi=0
-                yi=0
-                if not border:
-                    if imval.shape[-2] % maxpoolshp[0]:
-                        xi += 1
-                    if imval.shape[-1] % maxpoolshp[1]:
-                        yi += 1
-                my_output_val = N.zeros((imval.shape[0], imval.shape[1],
-                                         imval.shape[2]/maxpoolshp[0]+xi,
-                                         imval.shape[3]/maxpoolshp[1]+yi))
-            
-                time1=time.time()
-                for n in range(imval.shape[0]):
-                    for k in range(imval.shape[1]):
-                        for i in range(my_output_val.shape[2]):
-                            ii =  i*maxpoolshp[0]
-                            for j in range(my_output_val.shape[3]):
-                                jj = j*maxpoolshp[1]
-                                patch = imval[n,k,ii:ii+maxpoolshp[0],jj:jj+maxpoolshp[1]]
-                                my_output_val[n,k,i,j] = N.max(patch)
-                my_output_val = my_output_val.reshape(imval.shape[0],-1)
-                ntot+=[time.time()-time1]
-
-                # symbolic stuff
-            #### wrapper to DownsampleFactorMax op ####
-                output, outshp = max_pool(images, imval.shape[1:], maxpoolshp, border)
-                assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
-                assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
-                f = function([images,],[output,])
-                imval2=imval.reshape(imval.shape[0],-1)
-                output_val = f(imval2)
-                assert N.all(output_val == my_output_val)
-                
-                #DownsampleFactorMax op
-                maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=border)(images4)
-                f = function([images4],maxpool_op,mode=Mode(linker="py"))
-                f2 = function([images4],maxpool_op,mode=Mode(linker="c"))
-                f3 = function([images4],maxpool_op)#for when we want to use the debug mode
-                time1=time.time()
-                output_val = f(imval)
-                tctot+=[time.time()-time1]
-                assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
-                time1=time.time()
-                output_val = f2(imval)
-                tpytot+=[time.time()-time1]
-                assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
-                output_val = f3(imval)
-
-        print 'Numpy processing time: %.3fs'%sum(ntot),ntot
-        print 'c Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tctot),tctot
-        print 'py Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tpytot),tpytot
-        d=N.asarray(ntot)/tctot
-        print 'speed up c theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
-        d=N.asarray(ntot)/tpytot
-        print 'speed up py theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
-
-    def test_DownsampleFactorMax_grad(self):
-        # generate flatted images
-        maxpoolshps = ((1,1),(3,2),(2,3))
-        imval = N.random.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate
-        do_theano=True
-        for maxpoolshp in maxpoolshps:
-            for border in [True,False]:
-                print 'maxpoolshp', maxpoolshp, 'border', border
-                def mp(input):
-                    return DownsampleFactorMax(maxpoolshp, ignore_border=border)(input)
-                utt.verify_grad(mp, [imval])
-
-if __name__ == '__main__':
-    t = TestDownsampleFactorMax("test_maxpool").run()
-    #t.test_maxpool()
-    from theano.tests import main
-#    main("test_sp")
--- a/theano/tensor/__init__.py
+++ b/theano/tensor/__init__.py
@@ -28,5 +28,3 @@ import nnet # used for softmax, sigmoid, etc.



-
-
--- a/theano/tensor/basic.py
+++ b/theano/tensor/basic.py
@@ -261,11 +261,14 @@ def _wrap_tensor_into_member(x):
 compile.module.register_wrapper(_obj_is_wrappable_as_tensor, _wrap_tensor_into_member)

 if int(config.tensor.cmp_sloppy)>1:
-    # This environment variable is a quick-and-dirty way to get low-precision comparisons.
-    # For a more precise setting of these tolerances set them explicitly in your user code by
-    # assigning, for example, "theano.tensor.basic.float32_atol = ..."
-
-    #when THEANO_CMP_SLOPPY>1 we are even more sloppy. This is usefull to test the gpu as they don't use extended precision and this cause some difference bigger then the normal sloppy.
+    # This config variable is a quick-and-dirty way to get low-precision
+    # comparisons.  For a more precise setting of these tolerances set
+    # them explicitly in your user code by assigning, for example,
+    # "theano.tensor.basic.float32_atol = ..."
+
+    # When config.tensor.cmp_sloppy>1 we are even more sloppy. This is
+    # useful to test the GPU as they don't use extended precision and
+    # this cause some difference bigger then the normal sloppy.
    float32_atol = 5e-4
    float32_rtol = 1e-3 
    float64_rtol = 1e-4
@@ -3597,8 +3600,10 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps=None, tol=None, mode=None, cast

        o_fn = function(tensor_pt, o_output)
        o_fn_out = o_fn(*[p.copy() for p in pt])
-        
-        random_projection = rng.rand(*o_fn_out.shape)
+
+        # random_projection should not have elements too small,
+        # otherwise too much precision is lost in numerical gradient
+        random_projection = rng.rand(*o_fn_out.shape) + 0.5
        if cast_to_output_type:
            random_projection = numpy.array(random_projection,
                                            dtype=o_output.dtype)

--- a/theano/tensor/blas.py
+++ b/theano/tensor/blas.py
@@ -44,7 +44,7 @@ def ldflags(libs=True, flags=False):
    """Return a list of libraries against which an Op's object file should be
    linked to benefit from a BLAS implementation.
    
-    Default: ['blas'], but environment variable THEANO_BLAS_LDFLAGS overrides this.
+    Default: ['blas'], but configuration variable config.blas.ldflags overrides this.
    """
    rval = []
    for t in config.blas.ldflags.split():
@@ -52,9 +52,9 @@ def ldflags(libs=True, flags=False):
            t0, t1, t2 = t[0:3]
            assert t0 == '-'
        except:
-            raise ValueError('invalid token in THEANO_BLAS_LDFLAGS', t)
+            raise ValueError('invalid token in config.blas.ldflags', t)
        if t1 == 'L':
-            raise ValueError('library dir not allowed in THEANO_BLAS_LDFLAGS', t)
+            raise ValueError('library dir not allowed in config.blas.ldflags', t)
        elif libs and t1=='l': # example -lmkl
            rval.append(t[2:])
        elif flags and t1!='l': # example -openmp

--- a/theano/tensor/elemwise.py
+++ b/theano/tensor/elemwise.py
@@ -822,7 +822,14 @@ class CAReduce(Op):
        to_reduce = reversed(sorted(axis))
        if to_reduce:
            for dimension in to_reduce:
-                variable = self.ufunc.reduce(variable, dimension)
+                # If it's a zero-size array, use scalar_op.identity if available
+                if variable.shape[dimension] == 0:
+                    if hasattr(self.scalar_op, 'identity'):
+                        variable = self.scalar_op.identity
+                    else:
+                        raise ValueError("Input (%s) has zero-size on axis %s, but self.scalar_op (%s) has no attribute 'identity'" % (variable, dimension, self.scalar_op))
+                else:
+                    variable = self.ufunc.reduce(variable, dimension)
            output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype)
        else:
            output[0] = numpy.copy(variable)

--- a/theano/tensor/signal/__init__.py
+++ b/theano/tensor/signal/__init__.py
--- a/theano/tensor/signal/conv.py
+++ b/theano/tensor/signal/conv.py
--- a/theano/tensor/signal/downsample.py
+++ b/theano/tensor/signal/downsample.py
--- a/theano/tensor/signal/tests/__init__.py
+++ b/theano/tensor/signal/tests/__init__.py
--- a/theano/tensor/signal/tests/speed_test_conv.py
+++ b/theano/tensor/signal/tests/speed_test_conv.py
--- a/theano/tensor/signal/tests/test_conv.py
+++ b/theano/tensor/signal/tests/test_conv.py
+import sys, time, unittest
+import numpy
+from scipy import signal
+
+import theano
+import theano.tensor as T
+from theano import function, Mode
+from theano.tests import unittest_tools as utt
+
+from theano.tensor.signal import conv
+
+from theano.tensor.basic import _allclose
+
+class TestConv2D(unittest.TestCase):
+
+    def setUp(self):
+        utt.seed_rng()
+        self.input = T.dtensor4('input')
+        self.filters = T.dtensor4('filters')
+
+    def validate(self, image_shape, filter_shape,
+                 border_mode='valid', subsample=(1,1),
+                 N_image_shape=None, N_filter_shape=None,
+                 input=None, filters=None, 
+                 unroll_batch=0, unroll_kern=0, unroll_patch=True,
+                 verify_grad=True):
+
+        if N_image_shape is None:
+            N_image_shape = image_shape
+        if N_filter_shape is None:
+            N_filter_shape = filter_shape
+    
+        if not input:
+            input = self.input
+        if not filters:
+            filters = self.filters
+        
+        ############# THEANO IMPLEMENTATION ############
+        
+        # we create a symbolic function so that verify_grad can work
+        def sym_conv2d(input, filters):
+            # define theano graph and function
+            return conv.conv2d(input, filters, image_shape, filter_shape,
+                          border_mode, subsample, unroll_batch=unroll_batch,
+                          unroll_kern=unroll_kern, unroll_patch=unroll_patch)
+
+        output = sym_conv2d(input, filters)
+        theano_conv = theano.function([input, filters], output)
+          
+        # initialize input and compute result
+        image_data  = numpy.random.random(N_image_shape)
+        filter_data = numpy.random.random(N_filter_shape)
+        theano_output = theano_conv(image_data, filter_data)
+
+        ############# REFERENCE IMPLEMENTATION ############
+        s = 1. if border_mode is 'full' else -1.
+        out_shape2d = numpy.array(N_image_shape[-2:]) +\
+                      s*numpy.array(N_filter_shape[-2:]) - s
+        out_shape2d = numpy.ceil(out_shape2d / numpy.array(subsample))
+        out_shape = (N_image_shape[0],N_filter_shape[0]) + tuple(out_shape2d)
+        ref_output = numpy.zeros(out_shape)
+
+        # loop over output feature maps
+        for k in range(N_filter_shape[0]):
+            # loop over input feature maps
+            for l in range(N_filter_shape[1]):
+
+                filter2d = filter_data[k,l,:,:]
+
+                # loop over mini-batches
+                for b in range(N_image_shape[0]):
+                    image2d = image_data[b,l,:,:]
+                    output2d = signal.convolve2d(image2d, filter2d, border_mode)
+
+                    ref_output[b,k,:,:] +=\
+                       output2d[::subsample[0],::subsample[1]]
+
+        self.failUnless(_allclose(theano_output, ref_output))
+
+        ############# TEST GRADIENT ############
+        if verify_grad:
+            utt.verify_grad(sym_conv2d, [image_data, filter_data])
+
+
+    def test_basic(self):
+        """
+        Tests that basic convolutions work for odd and even dimensions of image and filter
+        shapes, as well as rectangular images and filters.
+        """
+        self.validate((3,2,8,8), (4,2,5,5), 'valid')
+        self.validate((3,2,7,5), (5,2,2,3), 'valid')
+        self.validate((3,2,7,5), (5,2,3,2), 'valid')
+        self.validate((3,2,8,8), (4,2,5,5), 'full')
+        self.validate((3,2,7,5), (5,2,2,3), 'full')
+        # test filter same size as input
+        self.validate((3,2,3,3), (4,2,3,3), 'valid')
+
+    def test_unroll_patch_false(self):
+        """
+        unroll_patch is True by default. Test basic convs with False.
+        """
+        self.validate((3,2,7,5), (5,2,2,3), 'valid', unroll_patch=False)
+        self.validate((3,2,7,5), (5,2,2,3), 'full', unroll_patch=False)
+        self.validate((3,2,3,3), (4,2,3,3), 'valid', unroll_patch=False)
+
+    def test_unroll_special(self):
+        """
+        (unroll_kern, unroll_batch) in (0,1),(1,0) is special case.
+        """
+        self.validate((6,2,3,3), (3,2,2,2), 'valid', unroll_batch=1)
+
+    def test_unroll_batch(self):
+        """
+        Test mini-batch unrolling for various legal values.
+        """
+        # mini-batch of size 6 is multiple of 2 and 3. Should work.
+        self.validate((6,2,3,3), (3,2,2,2), 'valid', unroll_batch=2, verify_grad=False)
+        self.validate((6,2,3,3), (3,2,2,2), 'valid', unroll_batch=3, verify_grad=False)
+
+    def test_unroll_kern(self):
+        """
+        Test kernel unrolling for various legal values.
+        """
+        # 6 filters is a multiple of 2 and 3. Should work.
+        self.validate((2,3,3,3), (6,3,2,2), 'valid', unroll_kern=2, verify_grad=False)
+        self.validate((2,3,3,3), (6,3,2,2), 'valid', unroll_kern=3, verify_grad=False)
+
+    def test_subsample(self):
+        """
+        Tests convolution where subsampling != (1,1)
+        """
+        self.validate((3,2,7,5), (5,2,2,3), 'valid', subsample=(2,2))
+        self.validate((3,2,7,5), (5,2,2,3), 'full', subsample=(2,2))
+        self.validate((3,2,7,5), (5,2,2,3), 'valid', subsample=(2,1))
+
+    def test_invalid_filter_shape(self):
+        """
+        Tests scenario where filter_shape[1] != input_shape[1]
+        """
+        def f():
+            self.validate((3,2,8,8), (4,3,5,5), 'valid')
+        self.failUnlessRaises(AssertionError, f)
+
+    def test_missing_info(self):
+        """
+        Test convolutions for various pieces of missing info.
+        """
+        self.validate(None, None, 
+                      N_image_shape=(3,2,8,8), 
+                      N_filter_shape=(4,2,5,5))
+        self.validate((3,2,None,None), None,
+                      N_image_shape=(3,2,8,8), 
+                      N_filter_shape=(4,2,5,5))
+        self.validate((None,2,None,None), (None,2,5,5),
+                      N_image_shape=(3,2,8,8), 
+                      N_filter_shape=(4,2,5,5))
+
+    def test_full_mode(self):
+        """
+        Tests basic convolution in full mode and case where filter 
+        is larger than the input image.
+        """
+        self.validate((3,2,5,5), (4,2,8,8), 'full')
+        def f():
+            self.validate((3,2,5,5), (4,2,8,8), 'valid')
+        self.failUnlessRaises(Exception, f)
+
+    def test_wrong_input(self):
+        """
+        Make sure errors are raised when image and kernel are not 4D tensors
+        """
+        try:
+            self.validate((3,2,8,8), (4,2,5,5), 'valid', input = T.dmatrix())
+            self.validate((3,2,8,8), (4,2,5,5), 'valid', filters = T.dvector())
+            self.validate((3,2,8,8), (4,2,5,5), 'valid', input = T.dtensor3())
+            # should never reach here
+            self.fail()
+        except: 
+            pass
--- a/theano/tensor/signal/tests/test_downsample.py
+++ b/theano/tensor/signal/tests/test_downsample.py
+import unittest, sys, time
+import numpy
+import theano.tensor as tensor
+from theano.tests import unittest_tools as utt
+from theano.tensor.signal.downsample import DownsampleFactorMax, max_pool2D
+from theano import function, Mode
+
+
+class TestDownsampleFactorMax(unittest.TestCase):
+    def setUp(self):
+        utt.seed_rng()
+
+    @staticmethod
+    def numpy_max_pool2D(input, ds, ignore_border=False):
+        '''Helper function, implementing max_pool2D in pure numpy'''
+        if len(input.shape) < 2:
+            raise NotImplementedError('input should have at least 2 dim, shape is %s'\
+                    % str(input.shape))
+
+        xi=0
+        yi=0
+        if not ignore_border:
+            if input.shape[-2] % ds[0]:
+                xi += 1
+            if input.shape[-1] % ds[1]:
+                yi += 1
+
+        out_shp = list(input.shape[:-2])
+        out_shp.append(input.shape[-2]/ds[0]+xi)
+        out_shp.append(input.shape[-1]/ds[1]+yi)
+
+        output_val = numpy.zeros(out_shp)
+
+        for k in numpy.ndindex(input.shape[:-2]):
+            for i in range(output_val.shape[-2]):
+                ii =  i*ds[0]
+                for j in range(output_val.shape[-1]):
+                    jj = j*ds[1]
+                    patch = input[k][ii:ii+ds[0],jj:jj+ds[1]]
+                    output_val[k][i,j] = numpy.max(patch)
+        return output_val
+
+    def test_DownsampleFactorMax(self):
+        rng = numpy.random.RandomState(utt.fetch_seed())
+
+        # generate random images
+        maxpoolshps = ((1,1),(2,2),(3,3),(2,3))
+        imval = rng.rand(4,10,64,64)
+        images = tensor.dtensor4()
+
+        for maxpoolshp in maxpoolshps:
+            for ignore_border in [True,False]:
+                print 'maxpoolshp =', maxpoolshp
+                print 'ignore_border =', ignore_border
+
+                ## Pure Numpy computation
+                numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
+
+                output = max_pool2D(images, maxpoolshp, ignore_border)
+                f = function([images,],[output,])
+                output_val = f(imval)
+                assert numpy.all(output_val == numpy_output_val)
+
+                #DownsampleFactorMax op
+                maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(images)
+                f = function([images], maxpool_op)
+                output_val = f(imval)
+                assert (numpy.abs(output_val - numpy_output_val) < 1e-5).all()
+
+    def test_DownsampleFactorMax_grad(self):
+        rng = numpy.random.RandomState(utt.fetch_seed())
+        maxpoolshps = ((1,1),(3,2),(2,3))
+        imval = rng.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate
+
+        for maxpoolshp in maxpoolshps:
+            for ignore_border in [True,False]:
+                print 'maxpoolshp =', maxpoolshp
+                print 'ignore_border =', ignore_border
+                def mp(input):
+                    return DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(input)
+                utt.verify_grad(mp, [imval], rng=rng)
+
+    def test_max_pool2D_2D(self):
+        rng = numpy.random.RandomState(utt.fetch_seed())
+
+        maxpoolshps = ((1,1),(3,2))
+        imval = rng.rand(4,7)
+        images = tensor.dmatrix()
+
+        for maxpoolshp in maxpoolshps:
+            for ignore_border in [True,False]:
+                print 'maxpoolshp =', maxpoolshp
+                print 'ignore_border =', ignore_border
+                numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
+
+                output = max_pool2D(images, maxpoolshp, ignore_border)
+                output_val = function([images], output)(imval)
+                assert numpy.all(output_val == numpy_output_val)
+
+                def mp(input):
+                    return max_pool2D(input, maxpoolshp, ignore_border)
+                utt.verify_grad(mp, [imval], rng=rng)
+
+    def test_max_pool2D_3D(self):
+        rng = numpy.random.RandomState(utt.fetch_seed())
+
+        maxpoolshps = [(1,2)]
+        imval = rng.rand(2,3,4)
+        images = tensor.dtensor3()
+
+        for maxpoolshp in maxpoolshps:
+            for ignore_border in [True,False]:
+                print 'maxpoolshp =', maxpoolshp
+                print 'ignore_border =', ignore_border
+                numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
+
+                output = max_pool2D(images, maxpoolshp, ignore_border)
+                output_val = function([images], output)(imval)
+                assert numpy.all(output_val == numpy_output_val)
+
+                c = tensor.sum(output)
+                c_val = function([images], c)(imval)
+
+                g = tensor.grad(c, images)
+                g_val = function([images], [g.shape, tensor.min(tensor.min(tensor.min(g))), tensor.max(tensor.max(tensor.max(g)))])(imval)
+
+                def mp(input):
+                    return max_pool2D(input, maxpoolshp, ignore_border)
+                utt.verify_grad(mp, [imval], rng=rng)
+
+
+    def test_max_pool2D_6D(self):
+        rng = numpy.random.RandomState(utt.fetch_seed())
+
+        maxpoolshps = [(3,2)]
+        imval = rng.rand(2,1,1,1,3,4)
+        images = tensor.TensorType('float64', [False]*6)()
+
+        for maxpoolshp in maxpoolshps:
+            for ignore_border in [True,False]:
+                print 'maxpoolshp =', maxpoolshp
+                print 'ignore_border =', ignore_border
+                numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
+
+                output = max_pool2D(images, maxpoolshp, ignore_border)
+                output_val = function([images], output)(imval)
+                assert numpy.all(output_val == numpy_output_val)
+
+                def mp(input):
+                    return max_pool2D(input, maxpoolshp, ignore_border)
+                utt.verify_grad(mp, [imval], rng=rng)
+
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/theano/tensor/tests/test_elemwise.py
+++ b/theano/tensor/tests/test_elemwise.py
@@ -133,6 +133,8 @@ class test_CAReduce(unittest.TestCase):
                           ((5, 6), (1, )),
                           ((5, 6), ()),
                           ((2, 3, 4, 5), (0, 1, 3)),
+                           ((5, 0), (0, )),
+                           ((5, 0), (1, )),
                           ((), ())]:
            x = TensorType('float64', [(entry == 1) for entry in xsh])('x')
            e = CAReduce(add, axis = tosum)(x)
@@ -149,7 +151,7 @@ class test_CAReduce(unittest.TestCase):

    def test_c(self):
        self.with_linker(gof.CLinker())
-        
+

 if __name__ == '__main__':
    unittest.main()
--- a/theano/tests/unittest_tools.py
+++ b/theano/tests/unittest_tools.py
 import unittest
 import numpy
 import theano.tensor as T
-from theano.configparser import config, AddConfigVar, IntParam
+from theano.configparser import config, AddConfigVar, StrParam
 import os, sys

 AddConfigVar('unittests.rseed',
-        "Seed to use for randomized unit tests",
-        IntParam(666))
+        "Seed to use for randomized unit tests. Special value 'random' means using a seed of None.",
+        StrParam(666))

 def fetch_seed(pseed=None):
    """
    Returns the seed to use for running the unit tests.
-    If an explicit seed is given, it will be used for seending numpy's rng.
-    If not, it will try to get a seed from the THEANO_UNITTEST_SEED variable.
-    If THEANO_UNITTEST_SEED is set to "random", it will seed the rng. with None,
+    If an explicit seed is given, it will be used for seeding numpy's rng.
+    If not, it will use config.unittest.rseed (its default value is 666).
+    If config.unittest.rseed is set to "random", it will seed the rng with None,
    which is equivalent to seeding with a random seed.
-    If THEANO_UNITTEST_SEED is not defined, it will use a default seed of 666.

    Useful for seeding RandomState objects.
    >>> rng = numpy.random.RandomState(unittest_tools.fetch_seed())
@@ -35,7 +34,7 @@ def fetch_seed(pseed=None):
        #backport
        #seed = int(seed) if seed else None
    except ValueError:
-        print >> sys.stderr, 'Error: THEANO_UNITTEST_SEED contains '\
+        print >> sys.stderr, 'Error: config.unittests.rseed contains '\
                'invalid seed, using None instead'
        seed = None

@@ -49,7 +48,7 @@ def seed_rng(pseed=None):

    seed = fetch_seed(pseed)
    if pseed and pseed!=seed:
-        print >> sys.stderr, 'Warning: using seed given by THEANO_UNITTEST_SEED=%i'\
+        print >> sys.stderr, 'Warning: using seed given by config.unittests.rseed=%i'\
                'instead of seed %i given as parameter' % (seed, pseed)
    numpy.random.seed(seed)
    return seed