Merge pull request #1 from Theano/master

merge changes

Merge pull request #1 from Theano/master
7bfa5330 · Chienli Ma(马千里) · 5ec4c302 · 65ac8e8a · 7bfa5330 · 7bfa5330
--- a/doc/install.txt
+++ b/doc/install.txt
@@ -210,7 +210,7 @@ If you are a developer of Theano, then check out the :ref:`dev_start_guide`.
 If you want the bleeding-edge without developing the code you can use pip for
 this with the command line below. Note that it will also try to install Theano's dependencies 
-(like numpy and scipy), but not upgrade them. If you wish to upgrade them,
+(like NumPy and SciPy), but not upgrade them. If you wish to upgrade them,
 remove the ``--no-deps`` switch to it, but go see a previous warning before doing this.
 .. code-block:: bash
@@ -365,7 +365,7 @@ There are many ways to configure BLAS for Theano. This is done with the Theano
 flags ``blas.ldflags`` (:ref:`libdoc_config`). The default is to use the BLAS
 installation information in NumPy, accessible via
 ``numpy.distutils.__config__.show()``.  You can tell theano to use a different
-version of BLAS, in case you did not compile numpy with a fast BLAS or if numpy
+version of BLAS, in case you did not compile NumPy with a fast BLAS or if NumPy
 was compiled with a static library of BLAS (the latter is not supported in
 Theano).
@@ -412,7 +412,7 @@ that we use.
 3) Install the ATLAS library. ATLAS is an open source optimized version of
 BLAS. You can install a precompiled version on most OSes, but if you're willing
 to invest the time, you can compile it to have a faster version (we have seen
-speed-ups of up to 3x, especialy on more recent computers, against the
+speed-ups of up to 3x, especially on more recent computers, against the
 precompiled one). On Fedora, ``sudo yum install atlas-devel``. Under Ubuntu,
 ``sudo apt-get install libatlas-base-dev libatlas-base`` or
 ``libatlas3gf-sse2`` if your CPU supports SSE2 instructions. Then set the
@@ -544,7 +544,7 @@ If you are affiliated with a university (as student or employee), you can
 download the installer for free.
 EPD installation includes in particular Python (and the development headers),
-numpy, scipy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is
+NumPy, SciPy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is
 necessary for it to work) and the MKL implementation of blas. The Mac OS and
 Linux version do not include g++.
@@ -570,14 +570,14 @@ terminal excute this command:
    $ sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
 See the section `install_bleeding_edge`_ for more
-information on the bleading edge version.
+information on the bleeding edge version.
 Then you must install g++. You can do this by installing XCode. See the first bullet in the :ref:`macports` section.
 .. note::
   If you use the trunk or version 0.6 or later of Theano, we try to
-   automaticaly link with the EPD blas version.  Due to Mac OS
+   automatically link with the EPD blas version.  Due to Mac OS
   peculiarities, we need a user intervention to do it.  We detect
   if the user did the modification and if not, we tell him how to do
   it.
@@ -593,7 +593,7 @@ will also be optimized as we will reuse the faster BLAS version
 automatically.
 Anaconda installation includes in particular Python (and the development headers),
-numpy, scipy, nose, sphinx, pip, and a acceptable BLAS version. The Mac OS and
+NumPy, SciPy, nose, sphinx, pip, and a acceptable BLAS version. The Mac OS and
 Linux version do not include g++.
 After installing Anaconda, in a terminal execute this command to
@@ -611,21 +611,21 @@ To install the missing Theano optional dependency (pydot):
 If you want the bleeding edge version, `download
 and install git <http://git-scm.com/downloads>`_. Then in a
-terminal excute this command:
+terminal execute this command:
 .. code-block:: bash
    $ sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
 See the section `install_bleeding_edge`_ for more
-information on the bleading edge version.
+information on the bleeding edge version.
 Then you must install g++. You can do this by installing XCode. See the first bullet in the :ref:`macports` section.
 .. note::
   If you use the trunk or a version after 0.6rc3 of Theano, we try to
-   automaticaly link with the python library.  Due to Mac OS
+   automatically link with the python library.  Due to Mac OS
   peculiarities, we need a user intervention to do it.  We detect
   if the user did the modification and if not, we tell him how to do
   it.
@@ -789,7 +789,7 @@ try the following.
  command on ``.so`` files found under your ``~/.theano`` directory. This will
  list shared libraries dependencies, and may help identify incompatibilities.
-Please infom us if you have trouble installing and running Theano on your Mac.
+Please inform us if you have trouble installing and running Theano on your Mac.
 We would be especially interested in dependencies that we missed listing,
 alternate installation steps, GPU instructions, as well as tests that fail on
 your platform (use the ``theano-users@googlegroups.com`` mailing list, but
@@ -816,11 +816,11 @@ If you are affiliated with a university (as student or employee), you can
 download the installation for free.
 EPD installation includes in particular Python (and the development headers),
-numpy, scipy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is
+NumPy, SciPy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is
 necessary for it to work), g++, and the MKL
 implementation of blas.
-If you want to use the iPython shell, you should first try to import numpy
+If you want to use the iPython shell, you should first try to import NumPy
 in it::
    C:\Users\user>ipython
@@ -978,14 +978,14 @@ MinGW, but this has not been tested yet.
  After unpacking its source code (you may use `7-zip
  <http://www.7-zip.org/>`__), you can build and install it from within
  its code directory by running the following command (either from a Windows
-  command prompot or an MSYS shell):
+  command prompt or an MSYS shell):
    .. code-block:: bash
        python setup.py install
 At this point, whether you installed Python(x,y) or individual components, you
-should have MinGW, Python, Numpy, Scipy and Nose installed.
+should have MinGW, Python, Numpy, SciPy and Nose installed.
 Installing Theano
@@ -1069,7 +1069,7 @@ Command lines listed below are assumed to be run in a Windows prompt
 used within an MSYS Shell (not available if you only installed Python(x,y)).
 - The first option is to navigate to the
-  `Theano github page <http://github.com/Theano/Theano>`__ and click the ``ZIP``
+  `Theano GitHub page <http://github.com/Theano/Theano>`__ and click the ``ZIP``
  button in the top-left corner to download a zip file with the latest
  development version. Unzip this file where you want Theano to be
  installed, then rename the unzipped folder to ``Theano``.
@@ -1173,7 +1173,7 @@ Editing code in Visual Studio
 You will find a Visual Studio solution file (``Theano.sln``) in the root of
 the Theano repository. Note that this project file may not be kept up-to-date
-and is not officiallly supported by the core Theano developers: it is provided
+and is not officially supported by the core Theano developers: it is provided
 for convenience only.
 Also, be aware that it will not make Theano use Visual Studio to compile C
 files: it is only meant to provide an easy way to edit Theano code within
@@ -1189,7 +1189,7 @@ MKL library included in EPD, so you should not need to compile your own BLAS.
   The instructions below have not been tested in a Windows 64 bit environment.
-If you want a faster and/or multithreaded BLAS library, you can
+If you want a faster and/or multi-threaded BLAS library, you can
 compile OpenBLAS (ATLAS may work too, but was not tested, and is
 usually reported to be slower and more difficult to compile -- especially
 on Windows).

--- a/doc/install_ubuntu.txt
+++ b/doc/install_ubuntu.txt
 .. _install_ubuntu:
-Easy Installation of an optimized Theano on Ubuntu
+Easy Installation of an Optimized Theano on Current Ubuntu
-==================================================
+==========================================================
-These instructions were written for Ubuntu 11.04, 11.10, 12.04, 12.10, 13.04,
+For Ubuntu 11.10 through 14.04:
-13.10 and 14.04.
+.. code-block: bash
+    sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
+    sudo pip install Theano
+For Ubuntu 11.04:
+.. code-block: bash
+    sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ git libatlas3gf-base libatlas-dev
+    sudo pip install Theano
 .. note::
-    It is possible to have a faster installation of Theano than the one these
+    If you have error that contain "gfortran" in it, like this one:
-    instructions will provide, but this will make the installation more
-    complicated and/or may require that you buy software. This is a simple set
+        ImportError: ('/home/Nick/.theano/compiledir_Linux-2.6.35-31-generic-x86_64-with-Ubuntu-10.10-maverick--2.6.6/tmpIhWJaI/0c99c52c82f7ddc775109a06ca04b360.so: undefined symbol: _gfortran_st_write_done'
-    of installation instructions that will leave you with a relatively
-    well-optimized version that uses only free software. With more work or by
+    The problem is probably that NumPy is linked with a different blas
-    investing money (i.e. buying a license to a proprietary BLAS
+    then then one currently available (probably ATLAS). There is 2
-    implementation), it is possible to gain further performance.
+    possible fixes:
+    1) Uninstall ATLAS and install OpenBLAS.
+    2) Use the Theano flag "blas.ldflags=-lblas -lgfortran"
+    1) is better as OpenBLAS is faster then ATLAS and NumPy is
+    probably already linked with it. So you won't need any other
+    change in Theano files or Theano configuration.
 .. note::
@@ -45,40 +63,33 @@ These instructions were written for Ubuntu 11.04, 11.10, 12.04, 12.10, 13.04,
   The development version of Theano supports Python 3.3 and
   probably supports Python 3.2, but we do not test on it.
+Bleeding Edge Installs
+----------------------
-Installation steps
+If you would like, instead, to install the bleeding edge Theano (from github) 
-~~~~~~~~~~~~~~~~~~
+such that you can edit and contribute to Theano, replace the `pip install Theano` 
+command with:
-Ubuntu 11.10/12.04/12.10/13.04/13.10:
- 1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git``
- 2) ``sudo pip install Theano``
- If the packages ``libatlas3gf-base`` or ``libatlas-dev`` are already installed, there will be problems as they conflict with ``libopenblas-dev``.
+.. code-block: bash
- If you see NumPy errors, the simplest is to remove ``libopenblas-dev`` and its dependency ``libopenblas-base`` like this: ``sudo apt-get remove libopenblas-base``.
- The ideal would be that you remove ``libatlas3gf-base`` and ``libatlas-dev``,
- but you will need to reinstall python-numpy, python-scipy and all other packages that used it.
- OpenBLAS is faster then ATLAS most of the time and it allows to control the number of threads used during the execution.
-Ubuntu 11.04:
+    git clone git://github.com/Theano/Theano.git
- 1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ git libatlas3gf-base libatlas-dev``
+    cd Theano 
- 2) ``sudo pip install Theano``
+    python setup.py develop --user
+    cd ..
-.. note::
+VirtualEnv
+----------
-    If you have error that contain "gfortran" in it, like this one:
+If you would like to install Theano in a VirtualEnv, you will want to pass the 
+`--system-site-packages` flag when creating the VirtualEnv so that it will pick up 
+the system-provided `Numpy` and `SciPy`.
-        ImportError: ('/home/Nick/.theano/compiledir_Linux-2.6.35-31-generic-x86_64-with-Ubuntu-10.10-maverick--2.6.6/tmpIhWJaI/0c99c52c82f7ddc775109a06ca04b360.so: undefined symbol: _gfortran_st_write_done'
+.. code-block: bash
-    The problem is probably that NumPy is linked with a different blas
+    virtualenv --system-site-packages -p python2.7 theano-env
-    then then one currently available (probably ATLAS). There is 2
+    source theano-env/bin/activate
-    possible fixes:
+    pip install Theano
-    1) Uninstall ATLAS and install OpenBLAS.
-    2) Use the Theano flag "blas.ldflags=-lblas -lgfortran"
-    1) is better as OpenBLAS is faster then ATLAS and NumPy is
-    probably already linked with it. So you won't need any other
-    change in Theano files or Theano configuration.
 Test the newly installed packages
@@ -121,6 +132,15 @@ Theano should link to a parallel version of Blas and use all cores
 when possible. By default it should use all cores. Set the environment
 variable "OMP_NUM_THREADS=N" to specify to use N threads.
+.. note::
+    It is possible to have a faster installation of Theano than the one these
+    instructions provide, but this will make the installation more
+    complicated and/or may require that you buy software. This is a simple set
+    of installation instructions that will leave you with a relatively
+    well-optimized version that uses only free software. With more work or by
+    investing money (i.e. buying a license to a proprietary BLAS
+    implementation), it is possible to gain further performance.
 Updating Theano
 ~~~~~~~~~~~~~~~
@@ -141,16 +161,24 @@ system package, you can run this:
    sudo pip install --upgrade theano
-Bleeding edge
+Updating Bleeding Edge Installs
-~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Do like in the section "Updating Theano", but use
+Change to the Theano directory and run:
-``git+git://github.com/Theano/Theano.git`` instead of ``theano``.
+.. code-block: bash
+    git pull
 Manual Openblas instruction
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. comment::
+    I believe this is outdated, my machine seems to be using 8 threads 
+    happily with the binary openblas...
 The openblas included in Ubuntu is limited to 2 threads. If you want
 to use more cores at the same time, you will need to compile it
 yourself. Here is some code that will help you.

--- a/doc/introduction.txt
+++ b/doc/introduction.txt
@@ -85,7 +85,7 @@ is like a programming language in the sense that you have to
 It is good to think of ``theano.function`` as the interface to a
 compiler which builds a callable object from a purely symbolic graph.
-One of theano's most important features is that ``theano.function``
+One of Theano's most important features is that ``theano.function``
 can optimize a graph and even compile some or all of it into native
 machine instructions.

--- a/theano/gof/cmodule.py
+++ b/theano/gof/cmodule.py
@@ -673,6 +673,10 @@ class ModuleCache(object):
                continue
            if not os.path.isdir(root):
                continue
+            # Some sub directory we do not want the cache to mess
+            # with.  This can cause problems with multiple process.
+            if os.path.split(root)[1] in ["lock_dir"]:
+                continue
            files = os.listdir(root)
            if not files or 'delete.me' in files:
                rmtree(root, ignore_nocleanup=True,

--- a/theano/gof/compiledir.py
+++ b/theano/gof/compiledir.py
@@ -64,8 +64,87 @@ compiledir_format_dict = {
        "gxx_version": gcc_version_str.replace(" ", "_"),
        "hostname": socket.gethostname(),
        }
+def short_platform(r=None, p=None):
+    """Return a safe shorter version of platform.platform().
+    The old default Theano compiledir used platform.platform in
+    it. This use the platform.version() as a substring. This is too
+    specific as it contain the full kernel number and package
+    version. This cause the compiledir to change each time there is a
+    new linux kernel update. This function remove the part of platform
+    that are too precise.
+    If we have something else then expected, we do nothing. So this
+    should be safe on other OS.
+    Some example if we use platform.platform() direction. On the same
+    OS, with just some kernel updates.
+    compiledir_Linux-2.6.32-504.el6.x86_64-x86_64-with-redhat-6.6-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-431.29.2.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-431.23.3.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-431.20.3.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-431.11.2.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-431.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-358.23.2.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-358.6.2.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-358.6.1.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-358.2.1.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-358.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
+    compiledir_Linux-2.6.32-358.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6
+    compiledir_Linux-2.6.32-279.14.1.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6
+    compiledir_Linux-2.6.32-279.14.1.el6.x86_64-x86_64-with-redhat-6.3-Santiago-x86_64-2.6.6
+    compiledir_Linux-2.6.32-279.5.2.el6.x86_64-x86_64-with-redhat-6.3-Santiago-x86_64-2.6.6
+    compiledir_Linux-2.6.32-220.13.1.el6.x86_64-x86_64-with-redhat-6.3-Santiago-x86_64-2.6.6
+    compiledir_Linux-2.6.32-220.13.1.el6.x86_64-x86_64-with-redhat-6.2-Santiago-x86_64-2.6.6
+    compiledir_Linux-2.6.32-220.7.1.el6.x86_64-x86_64-with-redhat-6.2-Santiago-x86_64-2.6.6
+    compiledir_Linux-2.6.32-220.4.1.el6.x86_64-x86_64-with-redhat-6.2-Santiago-x86_64-2.6.6
+    We suppose the version are ``X.Y[.*]-(digit)*(anything)*``. We
+    keep ``X.Y`` and don't keep less important digit in the part
+    before ``-`` and we remove the leading digit after the first
+    ``-``.
+    If the information don't fit that pattern, we do not modify
+    platform.
+    """
+    if r is None:
+        r = platform.release()
+    if p is None:
+        p = platform.platform()
+    sp = r.split('-')
+    if len(sp) < 2:
+        return p
+    # For the split before the first -, we remove all learning digit:
+    kernel_version = sp[0].split('.')
+    if len(kernel_version) <= 2:
+        # kernel version should always have at least 3 number.
+        # If not, it use another semantic, so don't change it.
+        return p
+    sp[0] = '.'.join(kernel_version[:2])
+    # For the split after the first -, we remove leading non-digit value.
+    rest = sp[1].split('.')
+    while len(rest):
+        if rest[0].isdigit():
+            del rest[0]
+        else:
+            break
+    sp[1] = '.'.join(rest)
+    # For sp[2:], we don't change anything.
+    sr = '-'.join(sp)
+    p = p.replace(r, sr)
+    return p
+compiledir_format_dict['short_platform'] = short_platform()
 compiledir_format_keys = ", ".join(sorted(compiledir_format_dict.keys()))
-default_compiledir_format = ("compiledir_%(platform)s-%(processor)s-"
+default_compiledir_format = ("compiledir_%(short_platform)s-%(processor)s-"
                             "%(python_version)s-%(python_bitwidth)s")
 AddConfigVar("compiledir_format",

--- a/theano/gof/compilelock.py
+++ b/theano/gof/compilelock.py
@@ -24,6 +24,7 @@ AddConfigVar('compile.wait',
             IntParam(5, lambda i: i > 0, allow_override=False),
             in_c_key=False)
 def _timeout_default():
    return config.compile.wait * 24
@@ -37,6 +38,8 @@ period for running processes.""",
                      allow_override=False),
             in_c_key=False)
+hostname = socket.gethostname()
 def force_unlock():
    """
@@ -129,6 +132,7 @@ def set_lock_status(use_lock):
 # This is because None is a valid input for timeout
 notset = object()
 def lock(tmp_dir, timeout=notset, min_wait=None, max_wait=None, verbosity=1):
    """
    Obtain lock access by creating a given temporary directory (whose base will
@@ -212,13 +216,14 @@ def lock(tmp_dir, timeout=notset, min_wait=None, max_wait=None, verbosity=1):
                        other_host = read_owner.split('_')[2]
                    except IndexError:
                        other_host = ()  # make sure it isn't equal to any host
-                    if other_host == socket.gethostname():
+                    if other_host == hostname:
                        try:
+                            # Just check if the other process still exist.
                            os.kill(int(read_owner.split('_')[0]), 0)
                        except OSError:
                            other_dead = True
                        except AttributeError:
-                            pass #os.kill does not exist on windows
+                            pass  # os.kill does not exist on windows
                except Exception:
                    read_owner = 'failure'
                if other_dead:
@@ -305,7 +310,7 @@ def refresh_lock(lock_file):
    unique_id = '%s_%s_%s' % (
        os.getpid(),
        ''.join([str(random.randint(0, 9)) for i in range(10)]),
-        socket.gethostname())
+        hostname)
    lock_write = open(lock_file, 'w')
    lock_write.write(unique_id + '\n')
    lock_write.close()

--- a/theano/gof/tests/test_compiledir.py
+++ b/theano/gof/tests/test_compiledir.py
+from theano.gof.compiledir import short_platform
+def test_short_platform():
+    for r, p, a in [  # (release, platform, answer)
+        ('3.2.0-70-generic',
+         'Linux-3.2.0-70-generic-x86_64-with-debian-wheezy-sid',
+         "Linux-3.2--generic-x86_64-with-debian-wheezy-sid"),
+        ('3.2.0-70.1-generic',
+         'Linux-3.2.0-70.1-generic-x86_64-with-debian-wheezy-sid',
+         "Linux-3.2--generic-x86_64-with-debian-wheezy-sid"),
+        ('3.2.0-70.1.2-generic',
+         'Linux-3.2.0-70.1.2-generic-x86_64-with-debian-wheezy-sid',
+         "Linux-3.2--generic-x86_64-with-debian-wheezy-sid"),
+        ('2.6.35.14-106.fc14.x86_64',
+         'Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-fedora-14-Laughlin',
+         'Linux-2.6-fc14.x86_64-x86_64-with-fedora-14-Laughlin'),
+    ]:
+        o = short_platform(r, p)
+        assert o == a, (o, a)
--- a/theano/sandbox/cuda/cuda_ndarray.cu
+++ b/theano/sandbox/cuda/cuda_ndarray.cu
@@ -2731,13 +2731,13 @@ CudaNdarray_get_dev_data(CudaNdarray *self, void *closure)
 {
    float * p =  CudaNdarray_DEV_DATA(self);
    //printf("get_dev_data %p %li \n", p, (long int)p );
-    return PyInt_FromLong((long int) CudaNdarray_DEV_DATA(self));
+    return PyInt_FromSize_t((size_t) CudaNdarray_DEV_DATA(self));
 }
 static int
 CudaNdarray_set_dev_data(CudaNdarray *self, PyObject *value, void *closure)
 {
-    long int newdevdata = PyInt_AsLong(value);
+    Py_ssize_t newdevdata = PyInt_AsSsize_t(value);
    //printf("set_dev_data %p %li \n",(float*)newdevdata ,newdevdata);
    if (PyErr_Occurred())
    {

--- a/theano/sandbox/cuda/dnn.py
+++ b/theano/sandbox/cuda/dnn.py
@@ -149,6 +149,13 @@ class DnnVersion(GpuOp):
    def c_libraries(self):
        return ['cudnn']
+    def c_support_code(self):
+        return """
+#if PY_MAJOR_VERSION >= 3
+#define PyInt_FromLong PyLong_FromLong
+#endif
+"""
    def make_node(self):
        return Apply(self, [], [Generic()()])
@@ -455,37 +462,11 @@ class GpuDnnConvGradW(DnnBase, COp):
                     [CudaNdarrayType(broadcastable)()])
    def infer_shape(self, node, shape):
-        h = shape[0][2]  # Height of input feature maps
-        w = shape[0][3]  # Width of input feature maps
-        kh = shape[1][2]  # Height of each filter
-        kw = shape[1][3]  # Width of each filter
-        out3 = kh
-        out4 = kw
-        desc = node.inputs[2].owner.op
-        sh, sw = desc.subsample
-        # We don't have the information necessary, namely the weight size so
-        # we cannot infer the shape
-        if sh != 1 or sw != 1:
-            raise ShapeError(
-                'Unable to infer shape for stride (%d, %d)' % (sh, sw)
-            )
-        if desc.border_mode == 'full':
-            out3 = 2 - h + (kh - 1) * sh
-            out4 = 2 - w + (kw - 1) * sw
-        else:
-            # border_mode is 'valid'
-            assert(desc.border_mode == 'valid')
-            out3 = h - (kh - 1) * sh
-            out4 = w - (kw - 1) * sw
        return [(
            shape[1][1],
            shape[0][1],
-            out3,
+            node.inputs[3],
-            out4
+            node.inputs[4]
        )]
@@ -543,35 +524,11 @@ class GpuDnnConvGradI(DnnBase, COp):
                     [CudaNdarrayType(broadcastable)()])
    def infer_shape(self, node, shape):
-        padh = 0
-        padw = 0
-        desc = node.inputs[2].owner.op
-        sh, sw = desc.subsample
-        # We don't have the information necessary, namely the image size so
-        # we cannot infer the shape
-        if sh != 1 or sw != 1:
-            raise ShapeError(
-                'Unable to infer shape for stride (%d, %d)' % (sh, sw)
-            )
-        if desc.border_mode == 'full':
-            padh = shape[0][2] - 1
-            padw = shape[0][3] - 1
-        elif isinstance(desc.border_mode, tuple):
-            padh, padw = desc.border_mode
-        else:
-            assert desc.border_mode == 'valid'
-        out2 = (shape[1][2] - 1) * sh + shape[0][2] - 2*padh
-        out3 = (shape[1][3] - 1) * sw + shape[0][3] - 2*padw
        return [(
            shape[1][0],
            shape[0][1],
-            out2,
+            node.inputs[3],
-            out3
+            node.inputs[4]
        )]

--- a/theano/sandbox/cuda/opt.py
+++ b/theano/sandbox/cuda/opt.py
@@ -2000,12 +2000,6 @@ def local_gpu_extract_diagonal(node):
                gpu_from_host(diag_node.inputs[0]))]
    return False
-def typeConstructor(broadcastable, dtype):
-    if dtype == 'float32':
-        return CudaNdarrayType(broadcastable=broadcastable)
-    else:
-        return tensor.TensorType(broadcastable=broadcastable, dtype=dtype)
 @register_opt('scan')
 @local_optimizer([gpu_from_host, scan_op.Scan])
 def gpuScanOptimization(node):
@@ -2065,9 +2059,7 @@ def gpuScanOptimization(node):
            nw_op = scan_op.Scan(scan_ins,
                                 scan_outs,
-                                 info,
+                                 info).make_node(*nw_ins)
-                                 typeConstructor=typeConstructor).make_node(
-                                     *nw_ins)
            _outputs = nw_op.outputs
            return _outputs
@@ -2113,8 +2105,7 @@ def gpuScanOptimization(node):
            _outputs = scan_op.Scan(
                scan_ins,
                scan_outs,
-                info,
+                info).make_node(*nw_ins).outputs
-                typeConstructor=typeConstructor).make_node(*nw_ins).outputs
            outputs = []
            for x, y in zip(_outputs, node.outputs):
                if isinstance(y.type, CudaNdarrayType):
@@ -2126,8 +2117,7 @@ def gpuScanOptimization(node):
 optdb.register('gpu_scanOp_make_inplace',
-               scan_opt.ScanInplaceOptimizer(typeConstructor=typeConstructor,
+               scan_opt.ScanInplaceOptimizer(gpu_flag=True),
-                                             gpu_flag=True),
               75,
               'gpu',
               'fast_run',

--- a/theano/sandbox/cuda/tests/test_dnn.py
+++ b/theano/sandbox/cuda/tests/test_dnn.py
@@ -199,6 +199,7 @@ def test_dnn_tag():
 class TestDnnInferShapes(utt.InferShapeTester):
    def setUp(self):
        super(TestDnnInferShapes, self).setUp()
+        self.mode = mode_with_gpu
    def test_softmax(self):
        t = T.ftensor4('t')

--- a/theano/sandbox/gpuarray/opt.py
+++ b/theano/sandbox/gpuarray/opt.py
@@ -716,13 +716,11 @@ def local_scan_to_gpua(node):
    _cmodule_key = gof.CLinker().cmodule_key_(local_fgraph, [])
    info['gpu_hash'] = hash(_cmodule_key)
-    nw_op = scan_op.Scan(scan_ins, scan_outs, info,
+    nw_op = scan_op.Scan(scan_ins, scan_outs, info).make_node(*nw_ins)
-                         typeConstructor=GpuArrayType).make_node(*nw_ins)
    return nw_op.outputs
 optdb.register('gpua_scanOp_make_inplace',
-               scan_opt.ScanInplaceOptimizer(typeConstructor=GpuArrayType,
+               scan_opt.ScanInplaceOptimizer(gpua_flag=True),
-                                             gpua_flag=True),
               75,
               'gpua',
               'fast_run',

--- a/theano/sandbox/gpuarray/tests/test_scan.py
+++ b/theano/sandbox/gpuarray/tests/test_scan.py
@@ -15,6 +15,7 @@ from theano.sandbox.gpuarray.tests.test_basic_ops import mode_with_gpu
 class T_Scan(TestCase):
    def setUp(self):
        utt.seed_rng()
+        super(T_Scan, self).setUp()
    def test_one_sequence_one_output_weights_gpu1(self):
        def f_rnn(u_t, x_tm1, W_in, W):

--- a/theano/scan_module/scan.py
+++ b/theano/scan_module/scan.py
@@ -594,7 +594,9 @@ def scan(fn,
        if init_out.get('taps', None) == [-1]:
            actual_arg = init_out['initial']
-            arg = safe_new(init_out['initial'])
+            if not isinstance(actual_arg, tensor.Variable):
+                actual_arg = tensor.as_tensor_variable(actual_arg)
+            arg = safe_new(actual_arg)
            if isinstance(arg, tensor.Constant):
                # safe new returns a clone of the constants, but that is not
                # what we need for initial states

--- a/theano/scan_module/scan_op.py
+++ b/theano/scan_module/scan_op.py
@@ -40,8 +40,8 @@ _logger = logging.getLogger('theano.scan_module.scan_op')
 from theano.configparser import AddConfigVar, BoolParam
 AddConfigVar('scan.allow_gc',
-             "Allow/disallow gc inside of Scan (default: True)",
+             "Allow/disallow gc inside of Scan (default: False)",
-             BoolParam(True))
+             BoolParam(False))
 class Scan(PureOp):
@@ -49,7 +49,6 @@ class Scan(PureOp):
                 inputs,
                 outputs,
                 info,
-                 typeConstructor=None,
                ):
        """
        :param inputs: inputs of the inner function of scan
@@ -58,21 +57,6 @@ class Scan(PureOp):
            the scan op (like number of different types of
            arguments, name, mode, if it should run on GPU or
            not, etc.)
-        :param typeConstructor: function that constructs an equivalent
-            to Theano TensorType
-        Note: ``typeConstructor`` had been added to refactor how
-        Theano deals with the GPU. If it runs on the GPU, scan needs
-        to construct certain outputs (those who reside in the GPU
-        memory) as the GPU-specific type.  However we can not import
-        gpu code in this file (as it is in sandbox, and not available
-        on each machine) so the workaround is that the GPU
-        optimization passes to the constructor of this class a
-        function that is able to construct a GPU type. This way the
-        class Scan does not need to be aware of the details for the
-        GPU, it just constructs any tensor using this function (which
-        by default constructs normal tensors).
        """
        if 'gpua' not in info:
            info['gpua'] = False
@@ -88,19 +72,13 @@ class Scan(PureOp):
        self.output_types = []
        idx = 0
        jdx = 0
-        tensorConstructor = lambda broadcastable, dtype: TensorType(
-            broadcastable=broadcastable, dtype=dtype)
-        if typeConstructor is None:
-            typeConstructor = tensorConstructor
        while idx < self.n_mit_mot_outs:
            # Not that for mit_mot there are several output slices per
            # output sequence
            o = outputs[idx]
            self.output_types.append(
-                typeConstructor(
+                o.type.clone(broadcastable=(False,) + o.type.broadcastable))
-                    broadcastable=(False,) + o.type.broadcastable,
-                    dtype=o.type.dtype))
            idx += len(self.mit_mot_out_slices[jdx])
            jdx += 1
@@ -110,9 +88,7 @@ class Scan(PureOp):
        for o in outputs[idx:end]:
            self.output_types.append(
-                typeConstructor(
+                o.type.clone(broadcastable=(False,) + o.type.broadcastable))
-                    broadcastable=(False,) + o.type.broadcastable,
-                    dtype=o.type.dtype))
        # shared outputs + possibly the ending condition
        for o in outputs[end:]:
@@ -241,10 +217,9 @@ class Scan(PureOp):
            if rval.ndim == as_var.ndim:
                rval = as_var.type.filter_variable(rval)
            else:
-                tmp = as_var.type.__class__(
+                tmp = as_var.type.clone(
-                    broadcastable=tuple(var.broadcastable[:1])+\
+                    broadcastable=(tuple(var.broadcastable[:1]) +
-                                  tuple(as_var.broadcastable),
+                                   tuple(as_var.broadcastable)))
-                    dtype=as_var.dtype)
                rval = tmp.filter_variable(rval)
            return rval
@@ -517,11 +492,11 @@ class Scan(PureOp):
        return aux_txt
    def __hash__(self):
-        return (hash(type(self)) ^
+        return hash((type(self),
-                # and a hash representing the inner graph using the
+                     # and a hash representing the inner graph using the
-                # CLinker.cmodule_key_
+                     # CLinker.cmodule_key_
-                self._hash_inner_graph ^
+                     self._hash_inner_graph,
-                scan_utils.hash_listsDictsTuples(self.info))
+                     scan_utils.hash_listsDictsTuples(self.info)))
    def make_thunk(self, node, storage_map, compute_map, no_recycling):
        """

--- a/theano/scan_module/scan_opt.py
+++ b/theano/scan_module/scan_opt.py
@@ -916,9 +916,8 @@ class PushOutScanOutput(gof.Optimizer):
 class ScanInplaceOptimizer(Optimizer):
    """Graph optimizer for Scan(makes it run inplace)"""
-    def __init__(self, typeConstructor=None, gpu_flag=False, gpua_flag=False):
+    def __init__(self, gpu_flag=False, gpua_flag=False):
        Optimizer.__init__(self)
-        self.typeConstructor = typeConstructor
        self.gpu_flag = gpu_flag
        self.gpua_flag = gpua_flag
@@ -960,8 +959,7 @@ class ScanInplaceOptimizer(Optimizer):
                inputs = ls_begin + ls + ls_end
                new_op = scan_op.Scan(op.inputs,
                                      op.outputs,
-                                      info,
+                                      info)
-                                      typeConstructor=self.typeConstructor)
                # Do not call make_node for test_value
                new_outs = new_op(*inputs, **dict(return_list=True))
@@ -2087,8 +2085,7 @@ scan_eqopt2 = theano.gof.EquilibriumDB()
 optdb.register('scan_eqopt1', scan_eqopt1, .1, 'fast_run', 'scan')
 optdb.register('scan_eqopt2', scan_eqopt2, 1.6, 'fast_run', 'scan')
 optdb.register('scanOp_make_inplace',
-               ScanInplaceOptimizer(typeConstructor=None,
+               ScanInplaceOptimizer(),
-                                    gpu_flag=False),
               75,
               'fast_run',
               'inplace',

--- a/theano/scan_module/scan_perform.c
+++ b/theano/scan_module/scan_perform.c
--- a/theano/scan_module/scan_perform.pyx
+++ b/theano/scan_module/scan_perform.pyx
--- a/theano/scan_module/scan_perform_ext.py
+++ b/theano/scan_module/scan_perform_ext.py
--- a/theano/scan_module/scan_utils.py
+++ b/theano/scan_module/scan_utils.py
--- a/theano/sparse/basic.py
+++ b/theano/sparse/basic.py
--- a/theano/sparse/tests/test_basic.py
+++ b/theano/sparse/tests/test_basic.py
--- a/theano/tensor/opt.py
+++ b/theano/tensor/opt.py
--- a/theano/tensor/subtensor.py
+++ b/theano/tensor/subtensor.py
--- a/theano/tensor/var.py
+++ b/theano/tensor/var.py