提交 7bfa5330 authored 作者: Chienli Ma(马千里)'s avatar Chienli Ma(马千里)

Merge pull request #1 from Theano/master

merge changes
...@@ -210,7 +210,7 @@ If you are a developer of Theano, then check out the :ref:`dev_start_guide`. ...@@ -210,7 +210,7 @@ If you are a developer of Theano, then check out the :ref:`dev_start_guide`.
If you want the bleeding-edge without developing the code you can use pip for If you want the bleeding-edge without developing the code you can use pip for
this with the command line below. Note that it will also try to install Theano's dependencies this with the command line below. Note that it will also try to install Theano's dependencies
(like numpy and scipy), but not upgrade them. If you wish to upgrade them, (like NumPy and SciPy), but not upgrade them. If you wish to upgrade them,
remove the ``--no-deps`` switch to it, but go see a previous warning before doing this. remove the ``--no-deps`` switch to it, but go see a previous warning before doing this.
.. code-block:: bash .. code-block:: bash
...@@ -365,7 +365,7 @@ There are many ways to configure BLAS for Theano. This is done with the Theano ...@@ -365,7 +365,7 @@ There are many ways to configure BLAS for Theano. This is done with the Theano
flags ``blas.ldflags`` (:ref:`libdoc_config`). The default is to use the BLAS flags ``blas.ldflags`` (:ref:`libdoc_config`). The default is to use the BLAS
installation information in NumPy, accessible via installation information in NumPy, accessible via
``numpy.distutils.__config__.show()``. You can tell theano to use a different ``numpy.distutils.__config__.show()``. You can tell theano to use a different
version of BLAS, in case you did not compile numpy with a fast BLAS or if numpy version of BLAS, in case you did not compile NumPy with a fast BLAS or if NumPy
was compiled with a static library of BLAS (the latter is not supported in was compiled with a static library of BLAS (the latter is not supported in
Theano). Theano).
...@@ -412,7 +412,7 @@ that we use. ...@@ -412,7 +412,7 @@ that we use.
3) Install the ATLAS library. ATLAS is an open source optimized version of 3) Install the ATLAS library. ATLAS is an open source optimized version of
BLAS. You can install a precompiled version on most OSes, but if you're willing BLAS. You can install a precompiled version on most OSes, but if you're willing
to invest the time, you can compile it to have a faster version (we have seen to invest the time, you can compile it to have a faster version (we have seen
speed-ups of up to 3x, especialy on more recent computers, against the speed-ups of up to 3x, especially on more recent computers, against the
precompiled one). On Fedora, ``sudo yum install atlas-devel``. Under Ubuntu, precompiled one). On Fedora, ``sudo yum install atlas-devel``. Under Ubuntu,
``sudo apt-get install libatlas-base-dev libatlas-base`` or ``sudo apt-get install libatlas-base-dev libatlas-base`` or
``libatlas3gf-sse2`` if your CPU supports SSE2 instructions. Then set the ``libatlas3gf-sse2`` if your CPU supports SSE2 instructions. Then set the
...@@ -544,7 +544,7 @@ If you are affiliated with a university (as student or employee), you can ...@@ -544,7 +544,7 @@ If you are affiliated with a university (as student or employee), you can
download the installer for free. download the installer for free.
EPD installation includes in particular Python (and the development headers), EPD installation includes in particular Python (and the development headers),
numpy, scipy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is NumPy, SciPy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is
necessary for it to work) and the MKL implementation of blas. The Mac OS and necessary for it to work) and the MKL implementation of blas. The Mac OS and
Linux version do not include g++. Linux version do not include g++.
...@@ -570,14 +570,14 @@ terminal excute this command: ...@@ -570,14 +570,14 @@ terminal excute this command:
$ sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git $ sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
See the section `install_bleeding_edge`_ for more See the section `install_bleeding_edge`_ for more
information on the bleading edge version. information on the bleeding edge version.
Then you must install g++. You can do this by installing XCode. See the first bullet in the :ref:`macports` section. Then you must install g++. You can do this by installing XCode. See the first bullet in the :ref:`macports` section.
.. note:: .. note::
If you use the trunk or version 0.6 or later of Theano, we try to If you use the trunk or version 0.6 or later of Theano, we try to
automaticaly link with the EPD blas version. Due to Mac OS automatically link with the EPD blas version. Due to Mac OS
peculiarities, we need a user intervention to do it. We detect peculiarities, we need a user intervention to do it. We detect
if the user did the modification and if not, we tell him how to do if the user did the modification and if not, we tell him how to do
it. it.
...@@ -593,7 +593,7 @@ will also be optimized as we will reuse the faster BLAS version ...@@ -593,7 +593,7 @@ will also be optimized as we will reuse the faster BLAS version
automatically. automatically.
Anaconda installation includes in particular Python (and the development headers), Anaconda installation includes in particular Python (and the development headers),
numpy, scipy, nose, sphinx, pip, and a acceptable BLAS version. The Mac OS and NumPy, SciPy, nose, sphinx, pip, and a acceptable BLAS version. The Mac OS and
Linux version do not include g++. Linux version do not include g++.
After installing Anaconda, in a terminal execute this command to After installing Anaconda, in a terminal execute this command to
...@@ -611,21 +611,21 @@ To install the missing Theano optional dependency (pydot): ...@@ -611,21 +611,21 @@ To install the missing Theano optional dependency (pydot):
If you want the bleeding edge version, `download If you want the bleeding edge version, `download
and install git <http://git-scm.com/downloads>`_. Then in a and install git <http://git-scm.com/downloads>`_. Then in a
terminal excute this command: terminal execute this command:
.. code-block:: bash .. code-block:: bash
$ sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git $ sudo pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
See the section `install_bleeding_edge`_ for more See the section `install_bleeding_edge`_ for more
information on the bleading edge version. information on the bleeding edge version.
Then you must install g++. You can do this by installing XCode. See the first bullet in the :ref:`macports` section. Then you must install g++. You can do this by installing XCode. See the first bullet in the :ref:`macports` section.
.. note:: .. note::
If you use the trunk or a version after 0.6rc3 of Theano, we try to If you use the trunk or a version after 0.6rc3 of Theano, we try to
automaticaly link with the python library. Due to Mac OS automatically link with the python library. Due to Mac OS
peculiarities, we need a user intervention to do it. We detect peculiarities, we need a user intervention to do it. We detect
if the user did the modification and if not, we tell him how to do if the user did the modification and if not, we tell him how to do
it. it.
...@@ -789,7 +789,7 @@ try the following. ...@@ -789,7 +789,7 @@ try the following.
command on ``.so`` files found under your ``~/.theano`` directory. This will command on ``.so`` files found under your ``~/.theano`` directory. This will
list shared libraries dependencies, and may help identify incompatibilities. list shared libraries dependencies, and may help identify incompatibilities.
Please infom us if you have trouble installing and running Theano on your Mac. Please inform us if you have trouble installing and running Theano on your Mac.
We would be especially interested in dependencies that we missed listing, We would be especially interested in dependencies that we missed listing,
alternate installation steps, GPU instructions, as well as tests that fail on alternate installation steps, GPU instructions, as well as tests that fail on
your platform (use the ``theano-users@googlegroups.com`` mailing list, but your platform (use the ``theano-users@googlegroups.com`` mailing list, but
...@@ -816,11 +816,11 @@ If you are affiliated with a university (as student or employee), you can ...@@ -816,11 +816,11 @@ If you are affiliated with a university (as student or employee), you can
download the installation for free. download the installation for free.
EPD installation includes in particular Python (and the development headers), EPD installation includes in particular Python (and the development headers),
numpy, scipy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is NumPy, SciPy, nose, sphinx, easy_install, pydot (but *not* `Graphviz`_, which is
necessary for it to work), g++, and the MKL necessary for it to work), g++, and the MKL
implementation of blas. implementation of blas.
If you want to use the iPython shell, you should first try to import numpy If you want to use the iPython shell, you should first try to import NumPy
in it:: in it::
C:\Users\user>ipython C:\Users\user>ipython
...@@ -978,14 +978,14 @@ MinGW, but this has not been tested yet. ...@@ -978,14 +978,14 @@ MinGW, but this has not been tested yet.
After unpacking its source code (you may use `7-zip After unpacking its source code (you may use `7-zip
<http://www.7-zip.org/>`__), you can build and install it from within <http://www.7-zip.org/>`__), you can build and install it from within
its code directory by running the following command (either from a Windows its code directory by running the following command (either from a Windows
command prompot or an MSYS shell): command prompt or an MSYS shell):
.. code-block:: bash .. code-block:: bash
python setup.py install python setup.py install
At this point, whether you installed Python(x,y) or individual components, you At this point, whether you installed Python(x,y) or individual components, you
should have MinGW, Python, Numpy, Scipy and Nose installed. should have MinGW, Python, Numpy, SciPy and Nose installed.
Installing Theano Installing Theano
...@@ -1069,7 +1069,7 @@ Command lines listed below are assumed to be run in a Windows prompt ...@@ -1069,7 +1069,7 @@ Command lines listed below are assumed to be run in a Windows prompt
used within an MSYS Shell (not available if you only installed Python(x,y)). used within an MSYS Shell (not available if you only installed Python(x,y)).
- The first option is to navigate to the - The first option is to navigate to the
`Theano github page <http://github.com/Theano/Theano>`__ and click the ``ZIP`` `Theano GitHub page <http://github.com/Theano/Theano>`__ and click the ``ZIP``
button in the top-left corner to download a zip file with the latest button in the top-left corner to download a zip file with the latest
development version. Unzip this file where you want Theano to be development version. Unzip this file where you want Theano to be
installed, then rename the unzipped folder to ``Theano``. installed, then rename the unzipped folder to ``Theano``.
...@@ -1173,7 +1173,7 @@ Editing code in Visual Studio ...@@ -1173,7 +1173,7 @@ Editing code in Visual Studio
You will find a Visual Studio solution file (``Theano.sln``) in the root of You will find a Visual Studio solution file (``Theano.sln``) in the root of
the Theano repository. Note that this project file may not be kept up-to-date the Theano repository. Note that this project file may not be kept up-to-date
and is not officiallly supported by the core Theano developers: it is provided and is not officially supported by the core Theano developers: it is provided
for convenience only. for convenience only.
Also, be aware that it will not make Theano use Visual Studio to compile C Also, be aware that it will not make Theano use Visual Studio to compile C
files: it is only meant to provide an easy way to edit Theano code within files: it is only meant to provide an easy way to edit Theano code within
...@@ -1189,7 +1189,7 @@ MKL library included in EPD, so you should not need to compile your own BLAS. ...@@ -1189,7 +1189,7 @@ MKL library included in EPD, so you should not need to compile your own BLAS.
The instructions below have not been tested in a Windows 64 bit environment. The instructions below have not been tested in a Windows 64 bit environment.
If you want a faster and/or multithreaded BLAS library, you can If you want a faster and/or multi-threaded BLAS library, you can
compile OpenBLAS (ATLAS may work too, but was not tested, and is compile OpenBLAS (ATLAS may work too, but was not tested, and is
usually reported to be slower and more difficult to compile -- especially usually reported to be slower and more difficult to compile -- especially
on Windows). on Windows).
......
.. _install_ubuntu: .. _install_ubuntu:
Easy Installation of an optimized Theano on Ubuntu Easy Installation of an Optimized Theano on Current Ubuntu
================================================== ==========================================================
These instructions were written for Ubuntu 11.04, 11.10, 12.04, 12.10, 13.04, For Ubuntu 11.10 through 14.04:
13.10 and 14.04.
.. code-block: bash
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git
sudo pip install Theano
For Ubuntu 11.04:
.. code-block: bash
sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ git libatlas3gf-base libatlas-dev
sudo pip install Theano
.. note:: .. note::
It is possible to have a faster installation of Theano than the one these If you have error that contain "gfortran" in it, like this one:
instructions will provide, but this will make the installation more
complicated and/or may require that you buy software. This is a simple set ImportError: ('/home/Nick/.theano/compiledir_Linux-2.6.35-31-generic-x86_64-with-Ubuntu-10.10-maverick--2.6.6/tmpIhWJaI/0c99c52c82f7ddc775109a06ca04b360.so: undefined symbol: _gfortran_st_write_done'
of installation instructions that will leave you with a relatively
well-optimized version that uses only free software. With more work or by The problem is probably that NumPy is linked with a different blas
investing money (i.e. buying a license to a proprietary BLAS then then one currently available (probably ATLAS). There is 2
implementation), it is possible to gain further performance. possible fixes:
1) Uninstall ATLAS and install OpenBLAS.
2) Use the Theano flag "blas.ldflags=-lblas -lgfortran"
1) is better as OpenBLAS is faster then ATLAS and NumPy is
probably already linked with it. So you won't need any other
change in Theano files or Theano configuration.
.. note:: .. note::
...@@ -45,40 +63,33 @@ These instructions were written for Ubuntu 11.04, 11.10, 12.04, 12.10, 13.04, ...@@ -45,40 +63,33 @@ These instructions were written for Ubuntu 11.04, 11.10, 12.04, 12.10, 13.04,
The development version of Theano supports Python 3.3 and The development version of Theano supports Python 3.3 and
probably supports Python 3.2, but we do not test on it. probably supports Python 3.2, but we do not test on it.
Bleeding Edge Installs
----------------------
Installation steps If you would like, instead, to install the bleeding edge Theano (from github)
~~~~~~~~~~~~~~~~~~ such that you can edit and contribute to Theano, replace the `pip install Theano`
command with:
Ubuntu 11.10/12.04/12.10/13.04/13.10:
1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git``
2) ``sudo pip install Theano``
If the packages ``libatlas3gf-base`` or ``libatlas-dev`` are already installed, there will be problems as they conflict with ``libopenblas-dev``. .. code-block: bash
If you see NumPy errors, the simplest is to remove ``libopenblas-dev`` and its dependency ``libopenblas-base`` like this: ``sudo apt-get remove libopenblas-base``.
The ideal would be that you remove ``libatlas3gf-base`` and ``libatlas-dev``,
but you will need to reinstall python-numpy, python-scipy and all other packages that used it.
OpenBLAS is faster then ATLAS most of the time and it allows to control the number of threads used during the execution.
Ubuntu 11.04: git clone git://github.com/Theano/Theano.git
1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ git libatlas3gf-base libatlas-dev`` cd Theano
2) ``sudo pip install Theano`` python setup.py develop --user
cd ..
.. note:: VirtualEnv
----------
If you have error that contain "gfortran" in it, like this one:
If you would like to install Theano in a VirtualEnv, you will want to pass the
`--system-site-packages` flag when creating the VirtualEnv so that it will pick up
the system-provided `Numpy` and `SciPy`.
ImportError: ('/home/Nick/.theano/compiledir_Linux-2.6.35-31-generic-x86_64-with-Ubuntu-10.10-maverick--2.6.6/tmpIhWJaI/0c99c52c82f7ddc775109a06ca04b360.so: undefined symbol: _gfortran_st_write_done' .. code-block: bash
The problem is probably that NumPy is linked with a different blas virtualenv --system-site-packages -p python2.7 theano-env
then then one currently available (probably ATLAS). There is 2 source theano-env/bin/activate
possible fixes: pip install Theano
1) Uninstall ATLAS and install OpenBLAS.
2) Use the Theano flag "blas.ldflags=-lblas -lgfortran"
1) is better as OpenBLAS is faster then ATLAS and NumPy is
probably already linked with it. So you won't need any other
change in Theano files or Theano configuration.
Test the newly installed packages Test the newly installed packages
...@@ -121,6 +132,15 @@ Theano should link to a parallel version of Blas and use all cores ...@@ -121,6 +132,15 @@ Theano should link to a parallel version of Blas and use all cores
when possible. By default it should use all cores. Set the environment when possible. By default it should use all cores. Set the environment
variable "OMP_NUM_THREADS=N" to specify to use N threads. variable "OMP_NUM_THREADS=N" to specify to use N threads.
.. note::
It is possible to have a faster installation of Theano than the one these
instructions provide, but this will make the installation more
complicated and/or may require that you buy software. This is a simple set
of installation instructions that will leave you with a relatively
well-optimized version that uses only free software. With more work or by
investing money (i.e. buying a license to a proprietary BLAS
implementation), it is possible to gain further performance.
Updating Theano Updating Theano
~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~
...@@ -141,16 +161,24 @@ system package, you can run this: ...@@ -141,16 +161,24 @@ system package, you can run this:
sudo pip install --upgrade theano sudo pip install --upgrade theano
Bleeding edge Updating Bleeding Edge Installs
~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Do like in the section "Updating Theano", but use Change to the Theano directory and run:
``git+git://github.com/Theano/Theano.git`` instead of ``theano``.
.. code-block: bash
git pull
Manual Openblas instruction Manual Openblas instruction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. comment::
I believe this is outdated, my machine seems to be using 8 threads
happily with the binary openblas...
The openblas included in Ubuntu is limited to 2 threads. If you want The openblas included in Ubuntu is limited to 2 threads. If you want
to use more cores at the same time, you will need to compile it to use more cores at the same time, you will need to compile it
yourself. Here is some code that will help you. yourself. Here is some code that will help you.
......
...@@ -85,7 +85,7 @@ is like a programming language in the sense that you have to ...@@ -85,7 +85,7 @@ is like a programming language in the sense that you have to
It is good to think of ``theano.function`` as the interface to a It is good to think of ``theano.function`` as the interface to a
compiler which builds a callable object from a purely symbolic graph. compiler which builds a callable object from a purely symbolic graph.
One of theano's most important features is that ``theano.function`` One of Theano's most important features is that ``theano.function``
can optimize a graph and even compile some or all of it into native can optimize a graph and even compile some or all of it into native
machine instructions. machine instructions.
......
...@@ -673,6 +673,10 @@ class ModuleCache(object): ...@@ -673,6 +673,10 @@ class ModuleCache(object):
continue continue
if not os.path.isdir(root): if not os.path.isdir(root):
continue continue
# Some sub directory we do not want the cache to mess
# with. This can cause problems with multiple process.
if os.path.split(root)[1] in ["lock_dir"]:
continue
files = os.listdir(root) files = os.listdir(root)
if not files or 'delete.me' in files: if not files or 'delete.me' in files:
rmtree(root, ignore_nocleanup=True, rmtree(root, ignore_nocleanup=True,
......
...@@ -64,8 +64,87 @@ compiledir_format_dict = { ...@@ -64,8 +64,87 @@ compiledir_format_dict = {
"gxx_version": gcc_version_str.replace(" ", "_"), "gxx_version": gcc_version_str.replace(" ", "_"),
"hostname": socket.gethostname(), "hostname": socket.gethostname(),
} }
def short_platform(r=None, p=None):
"""Return a safe shorter version of platform.platform().
The old default Theano compiledir used platform.platform in
it. This use the platform.version() as a substring. This is too
specific as it contain the full kernel number and package
version. This cause the compiledir to change each time there is a
new linux kernel update. This function remove the part of platform
that are too precise.
If we have something else then expected, we do nothing. So this
should be safe on other OS.
Some example if we use platform.platform() direction. On the same
OS, with just some kernel updates.
compiledir_Linux-2.6.32-504.el6.x86_64-x86_64-with-redhat-6.6-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-431.29.2.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-431.23.3.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-431.20.3.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-431.11.2.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-431.el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-358.23.2.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-358.6.2.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-358.6.1.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-358.2.1.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-358.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6-64
compiledir_Linux-2.6.32-358.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6
compiledir_Linux-2.6.32-279.14.1.el6.x86_64-x86_64-with-redhat-6.4-Santiago-x86_64-2.6.6
compiledir_Linux-2.6.32-279.14.1.el6.x86_64-x86_64-with-redhat-6.3-Santiago-x86_64-2.6.6
compiledir_Linux-2.6.32-279.5.2.el6.x86_64-x86_64-with-redhat-6.3-Santiago-x86_64-2.6.6
compiledir_Linux-2.6.32-220.13.1.el6.x86_64-x86_64-with-redhat-6.3-Santiago-x86_64-2.6.6
compiledir_Linux-2.6.32-220.13.1.el6.x86_64-x86_64-with-redhat-6.2-Santiago-x86_64-2.6.6
compiledir_Linux-2.6.32-220.7.1.el6.x86_64-x86_64-with-redhat-6.2-Santiago-x86_64-2.6.6
compiledir_Linux-2.6.32-220.4.1.el6.x86_64-x86_64-with-redhat-6.2-Santiago-x86_64-2.6.6
We suppose the version are ``X.Y[.*]-(digit)*(anything)*``. We
keep ``X.Y`` and don't keep less important digit in the part
before ``-`` and we remove the leading digit after the first
``-``.
If the information don't fit that pattern, we do not modify
platform.
"""
if r is None:
r = platform.release()
if p is None:
p = platform.platform()
sp = r.split('-')
if len(sp) < 2:
return p
# For the split before the first -, we remove all learning digit:
kernel_version = sp[0].split('.')
if len(kernel_version) <= 2:
# kernel version should always have at least 3 number.
# If not, it use another semantic, so don't change it.
return p
sp[0] = '.'.join(kernel_version[:2])
# For the split after the first -, we remove leading non-digit value.
rest = sp[1].split('.')
while len(rest):
if rest[0].isdigit():
del rest[0]
else:
break
sp[1] = '.'.join(rest)
# For sp[2:], we don't change anything.
sr = '-'.join(sp)
p = p.replace(r, sr)
return p
compiledir_format_dict['short_platform'] = short_platform()
compiledir_format_keys = ", ".join(sorted(compiledir_format_dict.keys())) compiledir_format_keys = ", ".join(sorted(compiledir_format_dict.keys()))
default_compiledir_format = ("compiledir_%(platform)s-%(processor)s-" default_compiledir_format = ("compiledir_%(short_platform)s-%(processor)s-"
"%(python_version)s-%(python_bitwidth)s") "%(python_version)s-%(python_bitwidth)s")
AddConfigVar("compiledir_format", AddConfigVar("compiledir_format",
......
...@@ -24,6 +24,7 @@ AddConfigVar('compile.wait', ...@@ -24,6 +24,7 @@ AddConfigVar('compile.wait',
IntParam(5, lambda i: i > 0, allow_override=False), IntParam(5, lambda i: i > 0, allow_override=False),
in_c_key=False) in_c_key=False)
def _timeout_default(): def _timeout_default():
return config.compile.wait * 24 return config.compile.wait * 24
...@@ -37,6 +38,8 @@ period for running processes.""", ...@@ -37,6 +38,8 @@ period for running processes.""",
allow_override=False), allow_override=False),
in_c_key=False) in_c_key=False)
hostname = socket.gethostname()
def force_unlock(): def force_unlock():
""" """
...@@ -129,6 +132,7 @@ def set_lock_status(use_lock): ...@@ -129,6 +132,7 @@ def set_lock_status(use_lock):
# This is because None is a valid input for timeout # This is because None is a valid input for timeout
notset = object() notset = object()
def lock(tmp_dir, timeout=notset, min_wait=None, max_wait=None, verbosity=1): def lock(tmp_dir, timeout=notset, min_wait=None, max_wait=None, verbosity=1):
""" """
Obtain lock access by creating a given temporary directory (whose base will Obtain lock access by creating a given temporary directory (whose base will
...@@ -212,13 +216,14 @@ def lock(tmp_dir, timeout=notset, min_wait=None, max_wait=None, verbosity=1): ...@@ -212,13 +216,14 @@ def lock(tmp_dir, timeout=notset, min_wait=None, max_wait=None, verbosity=1):
other_host = read_owner.split('_')[2] other_host = read_owner.split('_')[2]
except IndexError: except IndexError:
other_host = () # make sure it isn't equal to any host other_host = () # make sure it isn't equal to any host
if other_host == socket.gethostname(): if other_host == hostname:
try: try:
# Just check if the other process still exist.
os.kill(int(read_owner.split('_')[0]), 0) os.kill(int(read_owner.split('_')[0]), 0)
except OSError: except OSError:
other_dead = True other_dead = True
except AttributeError: except AttributeError:
pass #os.kill does not exist on windows pass # os.kill does not exist on windows
except Exception: except Exception:
read_owner = 'failure' read_owner = 'failure'
if other_dead: if other_dead:
...@@ -305,7 +310,7 @@ def refresh_lock(lock_file): ...@@ -305,7 +310,7 @@ def refresh_lock(lock_file):
unique_id = '%s_%s_%s' % ( unique_id = '%s_%s_%s' % (
os.getpid(), os.getpid(),
''.join([str(random.randint(0, 9)) for i in range(10)]), ''.join([str(random.randint(0, 9)) for i in range(10)]),
socket.gethostname()) hostname)
lock_write = open(lock_file, 'w') lock_write = open(lock_file, 'w')
lock_write.write(unique_id + '\n') lock_write.write(unique_id + '\n')
lock_write.close() lock_write.close()
......
from theano.gof.compiledir import short_platform
def test_short_platform():
for r, p, a in [ # (release, platform, answer)
('3.2.0-70-generic',
'Linux-3.2.0-70-generic-x86_64-with-debian-wheezy-sid',
"Linux-3.2--generic-x86_64-with-debian-wheezy-sid"),
('3.2.0-70.1-generic',
'Linux-3.2.0-70.1-generic-x86_64-with-debian-wheezy-sid',
"Linux-3.2--generic-x86_64-with-debian-wheezy-sid"),
('3.2.0-70.1.2-generic',
'Linux-3.2.0-70.1.2-generic-x86_64-with-debian-wheezy-sid',
"Linux-3.2--generic-x86_64-with-debian-wheezy-sid"),
('2.6.35.14-106.fc14.x86_64',
'Linux-2.6.35.14-106.fc14.x86_64-x86_64-with-fedora-14-Laughlin',
'Linux-2.6-fc14.x86_64-x86_64-with-fedora-14-Laughlin'),
]:
o = short_platform(r, p)
assert o == a, (o, a)
...@@ -2731,13 +2731,13 @@ CudaNdarray_get_dev_data(CudaNdarray *self, void *closure) ...@@ -2731,13 +2731,13 @@ CudaNdarray_get_dev_data(CudaNdarray *self, void *closure)
{ {
float * p = CudaNdarray_DEV_DATA(self); float * p = CudaNdarray_DEV_DATA(self);
//printf("get_dev_data %p %li \n", p, (long int)p ); //printf("get_dev_data %p %li \n", p, (long int)p );
return PyInt_FromLong((long int) CudaNdarray_DEV_DATA(self)); return PyInt_FromSize_t((size_t) CudaNdarray_DEV_DATA(self));
} }
static int static int
CudaNdarray_set_dev_data(CudaNdarray *self, PyObject *value, void *closure) CudaNdarray_set_dev_data(CudaNdarray *self, PyObject *value, void *closure)
{ {
long int newdevdata = PyInt_AsLong(value); Py_ssize_t newdevdata = PyInt_AsSsize_t(value);
//printf("set_dev_data %p %li \n",(float*)newdevdata ,newdevdata); //printf("set_dev_data %p %li \n",(float*)newdevdata ,newdevdata);
if (PyErr_Occurred()) if (PyErr_Occurred())
{ {
......
...@@ -149,6 +149,13 @@ class DnnVersion(GpuOp): ...@@ -149,6 +149,13 @@ class DnnVersion(GpuOp):
def c_libraries(self): def c_libraries(self):
return ['cudnn'] return ['cudnn']
def c_support_code(self):
return """
#if PY_MAJOR_VERSION >= 3
#define PyInt_FromLong PyLong_FromLong
#endif
"""
def make_node(self): def make_node(self):
return Apply(self, [], [Generic()()]) return Apply(self, [], [Generic()()])
...@@ -455,37 +462,11 @@ class GpuDnnConvGradW(DnnBase, COp): ...@@ -455,37 +462,11 @@ class GpuDnnConvGradW(DnnBase, COp):
[CudaNdarrayType(broadcastable)()]) [CudaNdarrayType(broadcastable)()])
def infer_shape(self, node, shape): def infer_shape(self, node, shape):
h = shape[0][2] # Height of input feature maps
w = shape[0][3] # Width of input feature maps
kh = shape[1][2] # Height of each filter
kw = shape[1][3] # Width of each filter
out3 = kh
out4 = kw
desc = node.inputs[2].owner.op
sh, sw = desc.subsample
# We don't have the information necessary, namely the weight size so
# we cannot infer the shape
if sh != 1 or sw != 1:
raise ShapeError(
'Unable to infer shape for stride (%d, %d)' % (sh, sw)
)
if desc.border_mode == 'full':
out3 = 2 - h + (kh - 1) * sh
out4 = 2 - w + (kw - 1) * sw
else:
# border_mode is 'valid'
assert(desc.border_mode == 'valid')
out3 = h - (kh - 1) * sh
out4 = w - (kw - 1) * sw
return [( return [(
shape[1][1], shape[1][1],
shape[0][1], shape[0][1],
out3, node.inputs[3],
out4 node.inputs[4]
)] )]
...@@ -543,35 +524,11 @@ class GpuDnnConvGradI(DnnBase, COp): ...@@ -543,35 +524,11 @@ class GpuDnnConvGradI(DnnBase, COp):
[CudaNdarrayType(broadcastable)()]) [CudaNdarrayType(broadcastable)()])
def infer_shape(self, node, shape): def infer_shape(self, node, shape):
padh = 0
padw = 0
desc = node.inputs[2].owner.op
sh, sw = desc.subsample
# We don't have the information necessary, namely the image size so
# we cannot infer the shape
if sh != 1 or sw != 1:
raise ShapeError(
'Unable to infer shape for stride (%d, %d)' % (sh, sw)
)
if desc.border_mode == 'full':
padh = shape[0][2] - 1
padw = shape[0][3] - 1
elif isinstance(desc.border_mode, tuple):
padh, padw = desc.border_mode
else:
assert desc.border_mode == 'valid'
out2 = (shape[1][2] - 1) * sh + shape[0][2] - 2*padh
out3 = (shape[1][3] - 1) * sw + shape[0][3] - 2*padw
return [( return [(
shape[1][0], shape[1][0],
shape[0][1], shape[0][1],
out2, node.inputs[3],
out3 node.inputs[4]
)] )]
......
...@@ -2000,12 +2000,6 @@ def local_gpu_extract_diagonal(node): ...@@ -2000,12 +2000,6 @@ def local_gpu_extract_diagonal(node):
gpu_from_host(diag_node.inputs[0]))] gpu_from_host(diag_node.inputs[0]))]
return False return False
def typeConstructor(broadcastable, dtype):
if dtype == 'float32':
return CudaNdarrayType(broadcastable=broadcastable)
else:
return tensor.TensorType(broadcastable=broadcastable, dtype=dtype)
@register_opt('scan') @register_opt('scan')
@local_optimizer([gpu_from_host, scan_op.Scan]) @local_optimizer([gpu_from_host, scan_op.Scan])
def gpuScanOptimization(node): def gpuScanOptimization(node):
...@@ -2065,9 +2059,7 @@ def gpuScanOptimization(node): ...@@ -2065,9 +2059,7 @@ def gpuScanOptimization(node):
nw_op = scan_op.Scan(scan_ins, nw_op = scan_op.Scan(scan_ins,
scan_outs, scan_outs,
info, info).make_node(*nw_ins)
typeConstructor=typeConstructor).make_node(
*nw_ins)
_outputs = nw_op.outputs _outputs = nw_op.outputs
return _outputs return _outputs
...@@ -2113,8 +2105,7 @@ def gpuScanOptimization(node): ...@@ -2113,8 +2105,7 @@ def gpuScanOptimization(node):
_outputs = scan_op.Scan( _outputs = scan_op.Scan(
scan_ins, scan_ins,
scan_outs, scan_outs,
info, info).make_node(*nw_ins).outputs
typeConstructor=typeConstructor).make_node(*nw_ins).outputs
outputs = [] outputs = []
for x, y in zip(_outputs, node.outputs): for x, y in zip(_outputs, node.outputs):
if isinstance(y.type, CudaNdarrayType): if isinstance(y.type, CudaNdarrayType):
...@@ -2126,8 +2117,7 @@ def gpuScanOptimization(node): ...@@ -2126,8 +2117,7 @@ def gpuScanOptimization(node):
optdb.register('gpu_scanOp_make_inplace', optdb.register('gpu_scanOp_make_inplace',
scan_opt.ScanInplaceOptimizer(typeConstructor=typeConstructor, scan_opt.ScanInplaceOptimizer(gpu_flag=True),
gpu_flag=True),
75, 75,
'gpu', 'gpu',
'fast_run', 'fast_run',
......
...@@ -199,6 +199,7 @@ def test_dnn_tag(): ...@@ -199,6 +199,7 @@ def test_dnn_tag():
class TestDnnInferShapes(utt.InferShapeTester): class TestDnnInferShapes(utt.InferShapeTester):
def setUp(self): def setUp(self):
super(TestDnnInferShapes, self).setUp() super(TestDnnInferShapes, self).setUp()
self.mode = mode_with_gpu
def test_softmax(self): def test_softmax(self):
t = T.ftensor4('t') t = T.ftensor4('t')
......
...@@ -716,13 +716,11 @@ def local_scan_to_gpua(node): ...@@ -716,13 +716,11 @@ def local_scan_to_gpua(node):
_cmodule_key = gof.CLinker().cmodule_key_(local_fgraph, []) _cmodule_key = gof.CLinker().cmodule_key_(local_fgraph, [])
info['gpu_hash'] = hash(_cmodule_key) info['gpu_hash'] = hash(_cmodule_key)
nw_op = scan_op.Scan(scan_ins, scan_outs, info, nw_op = scan_op.Scan(scan_ins, scan_outs, info).make_node(*nw_ins)
typeConstructor=GpuArrayType).make_node(*nw_ins)
return nw_op.outputs return nw_op.outputs
optdb.register('gpua_scanOp_make_inplace', optdb.register('gpua_scanOp_make_inplace',
scan_opt.ScanInplaceOptimizer(typeConstructor=GpuArrayType, scan_opt.ScanInplaceOptimizer(gpua_flag=True),
gpua_flag=True),
75, 75,
'gpua', 'gpua',
'fast_run', 'fast_run',
......
...@@ -15,6 +15,7 @@ from theano.sandbox.gpuarray.tests.test_basic_ops import mode_with_gpu ...@@ -15,6 +15,7 @@ from theano.sandbox.gpuarray.tests.test_basic_ops import mode_with_gpu
class T_Scan(TestCase): class T_Scan(TestCase):
def setUp(self): def setUp(self):
utt.seed_rng() utt.seed_rng()
super(T_Scan, self).setUp()
def test_one_sequence_one_output_weights_gpu1(self): def test_one_sequence_one_output_weights_gpu1(self):
def f_rnn(u_t, x_tm1, W_in, W): def f_rnn(u_t, x_tm1, W_in, W):
......
...@@ -594,7 +594,9 @@ def scan(fn, ...@@ -594,7 +594,9 @@ def scan(fn,
if init_out.get('taps', None) == [-1]: if init_out.get('taps', None) == [-1]:
actual_arg = init_out['initial'] actual_arg = init_out['initial']
arg = safe_new(init_out['initial']) if not isinstance(actual_arg, tensor.Variable):
actual_arg = tensor.as_tensor_variable(actual_arg)
arg = safe_new(actual_arg)
if isinstance(arg, tensor.Constant): if isinstance(arg, tensor.Constant):
# safe new returns a clone of the constants, but that is not # safe new returns a clone of the constants, but that is not
# what we need for initial states # what we need for initial states
......
...@@ -40,8 +40,8 @@ _logger = logging.getLogger('theano.scan_module.scan_op') ...@@ -40,8 +40,8 @@ _logger = logging.getLogger('theano.scan_module.scan_op')
from theano.configparser import AddConfigVar, BoolParam from theano.configparser import AddConfigVar, BoolParam
AddConfigVar('scan.allow_gc', AddConfigVar('scan.allow_gc',
"Allow/disallow gc inside of Scan (default: True)", "Allow/disallow gc inside of Scan (default: False)",
BoolParam(True)) BoolParam(False))
class Scan(PureOp): class Scan(PureOp):
...@@ -49,7 +49,6 @@ class Scan(PureOp): ...@@ -49,7 +49,6 @@ class Scan(PureOp):
inputs, inputs,
outputs, outputs,
info, info,
typeConstructor=None,
): ):
""" """
:param inputs: inputs of the inner function of scan :param inputs: inputs of the inner function of scan
...@@ -58,21 +57,6 @@ class Scan(PureOp): ...@@ -58,21 +57,6 @@ class Scan(PureOp):
the scan op (like number of different types of the scan op (like number of different types of
arguments, name, mode, if it should run on GPU or arguments, name, mode, if it should run on GPU or
not, etc.) not, etc.)
:param typeConstructor: function that constructs an equivalent
to Theano TensorType
Note: ``typeConstructor`` had been added to refactor how
Theano deals with the GPU. If it runs on the GPU, scan needs
to construct certain outputs (those who reside in the GPU
memory) as the GPU-specific type. However we can not import
gpu code in this file (as it is in sandbox, and not available
on each machine) so the workaround is that the GPU
optimization passes to the constructor of this class a
function that is able to construct a GPU type. This way the
class Scan does not need to be aware of the details for the
GPU, it just constructs any tensor using this function (which
by default constructs normal tensors).
""" """
if 'gpua' not in info: if 'gpua' not in info:
info['gpua'] = False info['gpua'] = False
...@@ -88,19 +72,13 @@ class Scan(PureOp): ...@@ -88,19 +72,13 @@ class Scan(PureOp):
self.output_types = [] self.output_types = []
idx = 0 idx = 0
jdx = 0 jdx = 0
tensorConstructor = lambda broadcastable, dtype: TensorType(
broadcastable=broadcastable, dtype=dtype)
if typeConstructor is None:
typeConstructor = tensorConstructor
while idx < self.n_mit_mot_outs: while idx < self.n_mit_mot_outs:
# Not that for mit_mot there are several output slices per # Not that for mit_mot there are several output slices per
# output sequence # output sequence
o = outputs[idx] o = outputs[idx]
self.output_types.append( self.output_types.append(
typeConstructor( o.type.clone(broadcastable=(False,) + o.type.broadcastable))
broadcastable=(False,) + o.type.broadcastable,
dtype=o.type.dtype))
idx += len(self.mit_mot_out_slices[jdx]) idx += len(self.mit_mot_out_slices[jdx])
jdx += 1 jdx += 1
...@@ -110,9 +88,7 @@ class Scan(PureOp): ...@@ -110,9 +88,7 @@ class Scan(PureOp):
for o in outputs[idx:end]: for o in outputs[idx:end]:
self.output_types.append( self.output_types.append(
typeConstructor( o.type.clone(broadcastable=(False,) + o.type.broadcastable))
broadcastable=(False,) + o.type.broadcastable,
dtype=o.type.dtype))
# shared outputs + possibly the ending condition # shared outputs + possibly the ending condition
for o in outputs[end:]: for o in outputs[end:]:
...@@ -241,10 +217,9 @@ class Scan(PureOp): ...@@ -241,10 +217,9 @@ class Scan(PureOp):
if rval.ndim == as_var.ndim: if rval.ndim == as_var.ndim:
rval = as_var.type.filter_variable(rval) rval = as_var.type.filter_variable(rval)
else: else:
tmp = as_var.type.__class__( tmp = as_var.type.clone(
broadcastable=tuple(var.broadcastable[:1])+\ broadcastable=(tuple(var.broadcastable[:1]) +
tuple(as_var.broadcastable), tuple(as_var.broadcastable)))
dtype=as_var.dtype)
rval = tmp.filter_variable(rval) rval = tmp.filter_variable(rval)
return rval return rval
...@@ -517,11 +492,11 @@ class Scan(PureOp): ...@@ -517,11 +492,11 @@ class Scan(PureOp):
return aux_txt return aux_txt
def __hash__(self): def __hash__(self):
return (hash(type(self)) ^ return hash((type(self),
# and a hash representing the inner graph using the # and a hash representing the inner graph using the
# CLinker.cmodule_key_ # CLinker.cmodule_key_
self._hash_inner_graph ^ self._hash_inner_graph,
scan_utils.hash_listsDictsTuples(self.info)) scan_utils.hash_listsDictsTuples(self.info)))
def make_thunk(self, node, storage_map, compute_map, no_recycling): def make_thunk(self, node, storage_map, compute_map, no_recycling):
""" """
......
...@@ -916,9 +916,8 @@ class PushOutScanOutput(gof.Optimizer): ...@@ -916,9 +916,8 @@ class PushOutScanOutput(gof.Optimizer):
class ScanInplaceOptimizer(Optimizer): class ScanInplaceOptimizer(Optimizer):
"""Graph optimizer for Scan(makes it run inplace)""" """Graph optimizer for Scan(makes it run inplace)"""
def __init__(self, typeConstructor=None, gpu_flag=False, gpua_flag=False): def __init__(self, gpu_flag=False, gpua_flag=False):
Optimizer.__init__(self) Optimizer.__init__(self)
self.typeConstructor = typeConstructor
self.gpu_flag = gpu_flag self.gpu_flag = gpu_flag
self.gpua_flag = gpua_flag self.gpua_flag = gpua_flag
...@@ -960,8 +959,7 @@ class ScanInplaceOptimizer(Optimizer): ...@@ -960,8 +959,7 @@ class ScanInplaceOptimizer(Optimizer):
inputs = ls_begin + ls + ls_end inputs = ls_begin + ls + ls_end
new_op = scan_op.Scan(op.inputs, new_op = scan_op.Scan(op.inputs,
op.outputs, op.outputs,
info, info)
typeConstructor=self.typeConstructor)
# Do not call make_node for test_value # Do not call make_node for test_value
new_outs = new_op(*inputs, **dict(return_list=True)) new_outs = new_op(*inputs, **dict(return_list=True))
...@@ -2087,8 +2085,7 @@ scan_eqopt2 = theano.gof.EquilibriumDB() ...@@ -2087,8 +2085,7 @@ scan_eqopt2 = theano.gof.EquilibriumDB()
optdb.register('scan_eqopt1', scan_eqopt1, .1, 'fast_run', 'scan') optdb.register('scan_eqopt1', scan_eqopt1, .1, 'fast_run', 'scan')
optdb.register('scan_eqopt2', scan_eqopt2, 1.6, 'fast_run', 'scan') optdb.register('scan_eqopt2', scan_eqopt2, 1.6, 'fast_run', 'scan')
optdb.register('scanOp_make_inplace', optdb.register('scanOp_make_inplace',
ScanInplaceOptimizer(typeConstructor=None, ScanInplaceOptimizer(),
gpu_flag=False),
75, 75,
'fast_run', 'fast_run',
'inplace', 'inplace',
......
This source diff could not be displayed because it is too large. You can view the blob instead.
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论