提交 134658d7 authored 作者: James Bergstra's avatar James Bergstra

merge

...@@ -37,8 +37,9 @@ Roughly in order of what you'll want to check out: ...@@ -37,8 +37,9 @@ Roughly in order of what you'll want to check out:
* :ref:`optimizations` -- Guide to Theano's graph optimizations. * :ref:`optimizations` -- Guide to Theano's graph optimizations.
* :ref:`extending` -- Learn to add a Type, Op, or graph optimization. * :ref:`extending` -- Learn to add a Type, Op, or graph optimization.
* :ref:`internal` -- How to maintaining Theano, LISA-specific tips, and more... * :ref:`internal` -- How to maintaining Theano, LISA-specific tips, and more...
* `API <api/>`_ -- The automatically-generated API
You can download the latest `PDF documentation <http://pylearn.org/theano/theano.pdf>`_, rather than reading it online. You can download the latest `PDF documentation <http://deeplearning.net/theanodoc/theano.pdf>`_, rather than reading it online.
Community Community
========= =========
...@@ -47,7 +48,7 @@ Community ...@@ -47,7 +48,7 @@ Community
* Register and post to `theano-dev`_ if you want to talk to the developers. * Register and post to `theano-dev`_ if you want to talk to the developers.
* We try to stay organized with `Theano's Trac <trac/>`__ * We try to stay organized with `Theano's Trac <http://trac-hg.assembla.com/theano/report/1>`__
* Come visit us in Montreal! Most of the developers are students in the LISA_ group at the `University of Montreal`_. * Come visit us in Montreal! Most of the developers are students in the LISA_ group at the `University of Montreal`_.
...@@ -70,8 +71,6 @@ Community ...@@ -70,8 +71,6 @@ Community
LICENSE LICENSE
.. _theano-dev: http://groups.google.com/group/theano-dev .. _theano-dev: http://groups.google.com/group/theano-dev
.. _theano-users: http://groups.google.com/group/theano-users .. _theano-users: http://groups.google.com/group/theano-users
.. _tickets: http://pylearn.org/theano/trac/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority .. _tickets: http://pylearn.org/theano/trac/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
......
...@@ -20,7 +20,7 @@ to be installed: ...@@ -20,7 +20,7 @@ to be installed:
We develop mainly on 64-bit Linux machines. 32-bit architectures are We develop mainly on 64-bit Linux machines. 32-bit architectures are
not well-tested. not well-tested.
python >= 2.5 python >= 2.5 (2.4 should be supported as well)
`numpy <http://numpy.scipy.org/>`_ >= 1.2 `numpy <http://numpy.scipy.org/>`_ >= 1.2
Earlier versions have memory leaks. Earlier versions have memory leaks.
...@@ -30,6 +30,8 @@ to be installed: ...@@ -30,6 +30,8 @@ to be installed:
is buggy in 0.6. (scipy.csc_matrix dot has a bug with singleton is buggy in 0.6. (scipy.csc_matrix dot has a bug with singleton
dimensions. There may be more bugs.) dimensions. There may be more bugs.)
A BLAS installation (with Level 3 functionality)
The following libraries and software are optional: The following libraries and software are optional:
g++, python-dev g++, python-dev
...@@ -42,41 +44,49 @@ The following libraries and software are optional: ...@@ -42,41 +44,49 @@ The following libraries and software are optional:
`mercurial <http://www.selenic.com/mercurial/>`_ `mercurial <http://www.selenic.com/mercurial/>`_
To download bleeding-edge version of Theano. To download bleeding-edge version of Theano.
.. _install_bleeding_edge:
Getting the code
-----------------
Easy install If you are a developer of Theano, then check out the :ref:`dev_start_guide` guide.
------------
The following command will install the latest release of Theano The following are general instructions that will set you up with the bleeding-edge
on your system: version of Theano. First, get the code using `mercurial <http://www.selenic.com/mercurial/wiki/>`__:
.. code-block:: bash .. code-block:: bash
easy_install Theano hg clone http://hg.assembla.com/theano Theano
Manual install Configuring PYTHONPATH
-------------- ---------------------------
The subdirectory Theano/theano has to be located in a path
mentioned in your PYTHONPATH. In order to do that, you can either
create a symbolic link to Theano/theano in a directory already
mentioned in your PYTHONPATH environment variable, or modify the
PYTHONPATH so that it mentions Theano.
To install the latest release of Theano from source, visit the `downloads To create a symbolic link:
<http://pylearn.org/theano/downloads/>`_ page and download the release you
want. Unpack the release, and type:
.. code-block:: bash .. code-block:: bash
python setup.py build ln -s Theano/theano <someplace on your PYTHONPATH>/theano
python setup.py test
python setup.py install
.. _install_bleeding_edge: To modify the environment variable PYTHONPATH in bash, you may do this:
.. code-block:: bash
Bleeding Edge export PYTHONPATH=<path to Theano's parent dir>/Theano:$PYTHONPATH
--------------
Feeling lucky and want to run bleeding-edge code? In csh:
Then check out the :ref:`dev_start_guide` guide.
.. code-block:: csh
Configuring the environment setenv PYTHONPATH <path to Theano's parent dir>/Theano:$PYTHONPATH
---------------------------
Configuring Theano's environmental variables
---------------------------------------------
Two environment variables are used to control automatic code Two environment variables are used to control automatic code
generation. It is possible to use Theano in a way which avoids all generation. It is possible to use Theano in a way which avoids all
...@@ -118,6 +128,33 @@ automatic code generation, but that way is much, much slower. ...@@ -118,6 +128,33 @@ automatic code generation, but that way is much, much slower.
Omitting this variable defaults the mode to ``'FAST_RUN'``. Omitting this variable defaults the mode to ``'FAST_RUN'``.
Testing your installation
---------------------------
Once you have completed these steps, you should run the theano test suite like this:
.. code-block:: bash
cd Theano
nosetests #execute all the tests
All tests should pass. If some test fails on your machine, you are
encouraged to tell us what went wrong on the ``theano-users@googlegroups.com``
mailing list.
Updating
-------------
To update your library to the latest revision, change directory (``cd``)
to your ``Theano`` folder and execute the following command:
.. code-block:: bash
hg pull -u
You should update frequently, bugs are fixed on a very regular basis.
Mac Mac
--- ---
...@@ -126,20 +163,21 @@ Mac ...@@ -126,20 +163,21 @@ Mac
- -
.. code-block:: bash .. code-block:: bash
$ sudo port install gcc42 py25-zlib py25-numpy py25-scipy mercurial $ sudo port install gcc44 py25-zlib py25-numpy py25-scipy mercurial
Note that compiling gcc42 takes a significant time (hours) so it is probably Note that compiling gcc takes a significant time (hours) so it is probably
not the best solution if you are in a rush! It may happen that SciPy not the best solution if you are in a rush! It may happen that SciPy
fails to compile the first time and still compiles just fine on a second fails to compile the first time and still compiles just fine on a second
try. Same thing with py25-zlib. try. Same thing with py25-zlib.
- Install some kind of BLAS library (TODO: how?) - scipy depends on ATLAS (a BLAS library), which will be installed by MacPorts.
- Set ``THEANO_BLAS_LDFLAGS`` to something which will link against said BLAS - Set ``THEANO_BLAS_LDFLAGS`` to something which will link against said BLAS
library. E.g., ``THEANO_BLAS_LDFLAGS='-lcblas -latlas -lgfortran'``. library. E.g., ``THEANO_BLAS_LDFLAGS='-lcblas -latlas -lgfortran'``.
This advice has not been tested recently, so please inform us of your results. These installation instructions have not tested recently, please infom us of your results!
We would be especially interested in dependencies that we missed listing, as well as tests
that fail on your platform (use the ``theano-users@googlegroups.com`` mailing list).
Windows Windows
...@@ -240,16 +278,19 @@ but this has not been tested yet. ...@@ -240,16 +278,19 @@ but this has not been tested yet.
``export PYTHONPATH=PYTHONPATH:$HOME/Theano``. ``export PYTHONPATH=PYTHONPATH:$HOME/Theano``.
- Please note that at this time, some tests (launched using ``nosetests``) are - Please note that at this time, some tests (launched using ``nosetests``) are
still failing under Windows. still failing under Windows: we are working on fixing them.
We are working on fixing them. It may also happen that many tests may fail while running the test-suite,
due to insufficient memory resources: one workaround is to run nosetests
multiple times under individual subdirectories.
Generating the documentation Generating the documentation
---------------------------- ----------------------------
You can read the latest HTML documentation `here You can read the latest HTML documentation `here
<http://pylearn.org/theano/contents.html>`__. <http://deeplearning.net/theanodoc>`__.
You can download the latest PDF documentation `here You can download the latest PDF documentation `here
<http://pylearn.org/theano/theano.pdf>`__. <http://deeplearning.net/theanodoc/theano.pdf>`__.
We recommend you look at the documentation on the website, since it We recommend you look at the documentation on the website, since it
will be more current than the documentation included with the package. will be more current than the documentation included with the package.
......
...@@ -21,11 +21,10 @@ Developer Start Guide ...@@ -21,11 +21,10 @@ Developer Start Guide
Accounts Accounts
======== ========
To obtain developer access: send an email to an admin with an username and To obtain developer access: register with `Assembla
temporary password. Pending approval, this will give you access to both the <http://www.assembla.com/>`_ and add yourself as a watcher on the `Theano space
repository and Trac. You should then change your password in the <http://www.assembla.com/spaces/theano>`_. Then send an email to an admin asking
`<http://pylearn.org/theano/prefs preferences>` tab - do *NOT* use a good to be promoted to a member of the project.
password! We are using plain text http which is not secure.
Theano code Theano code
...@@ -34,10 +33,9 @@ Theano code ...@@ -34,10 +33,9 @@ Theano code
*To get the source via mercurial,* you must have `mercurial *To get the source via mercurial,* you must have `mercurial
<http://www.selenic.com/mercurial/wiki/>`__ installed. <http://www.selenic.com/mercurial/wiki/>`__ installed.
The code that makes up Theano is in a single repository available in The code that makes up Theano is in a `single repository
`<http://pylearn.org/hg/Theano>`__. <http://www.assembla.com/spaces/theano/trac_mercurial_tool>`__. As a developer,
you should clone this repository like this:
As a developer, you should clone this repository like this:
.. code-block:: bash .. code-block:: bash
...@@ -121,9 +119,6 @@ to your ``Theano`` folder and execute the following command: ...@@ -121,9 +119,6 @@ to your ``Theano`` folder and execute the following command:
hg pull -u hg pull -u
You may also download the latest source directly as a gzip'd tar file:
`<http://pylearn.org/hg/Theano/archive/tip.tar.gz>`__.
Nightly test Nightly test
============ ============
......
...@@ -129,7 +129,8 @@ Getting started ...@@ -129,7 +129,8 @@ Getting started
the :ref:`tutorial` first though. the :ref:`tutorial` first though.
A PDF version of the online documentation may be found `here <theano.pdf>`_. A PDF version of the online documentation may be found `here
<http://deeplearning.net/theanodoc/theano.pdf>`_.
Contact us Contact us
......
...@@ -331,6 +331,8 @@ Indexing ...@@ -331,6 +331,8 @@ Indexing
Basic indexing. Basic indexing.
Mirrors numpy's `basic indexing <http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html>`_. Read that page first.
Advanced indexing. Advanced indexing.
.. _libdoc_tensor_elementwise: .. _libdoc_tensor_elementwise:
......
...@@ -40,10 +40,10 @@ This is a sort of memo for developers and would-be developers. ...@@ -40,10 +40,10 @@ This is a sort of memo for developers and would-be developers.
.. _mercurial: http://www.selenic.com/mercurial/wiki/ .. _mercurial: http://www.selenic.com/mercurial/wiki/
.. _nosetests: http://somethingaboutorange.com/mrl/projects/nose/ .. _nosetests: http://somethingaboutorange.com/mrl/projects/nose/
.. _numpy: http://numpy.scipy.org/ .. _numpy: http://numpy.scipy.org/
.. _python: http://www.python.or .. _python: http://www.python.org
.. _scipy: http://scipy.org/ .. _scipy: http://scipy.org/
.. _autodiff: http://autodiff.org .. _autodiff: http://www.autodiff.org
.. _boost.python: http://www.boost.org/doc/libs/1_38_0/libs/python/doc/index.html .. _boost.python: http://www.boost.org/doc/libs/1_38_0/libs/python/doc/index.html
.. _cython: http://www.cython.org/ .. _cython: http://www.cython.org/
.. _liboil: http://liboil.freedesktop.org/wiki/ .. _liboil: http://liboil.freedesktop.org/wiki/
......
...@@ -41,9 +41,10 @@ details about these building blocks see :ref:`variable`, :ref:`op`, ...@@ -41,9 +41,10 @@ details about these building blocks see :ref:`variable`, :ref:`op`,
.. figure:: apply.png .. figure:: apply.png
:align: center :align: center
Arrows represent references to the Python objects pointed at. The blue
box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green Arrows represent references to the Python objects pointed at. The blue
circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`. box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green
circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
The graph can be traversed starting from outputs (the result of some The graph can be traversed starting from outputs (the result of some
...@@ -104,7 +105,7 @@ how to compute the gradient of the node's outputs with respect to its ...@@ -104,7 +105,7 @@ how to compute the gradient of the node's outputs with respect to its
inputs. Note that if an :ref:`op` does not provide this information, inputs. Note that if an :ref:`op` does not provide this information,
it is assumed that the gradient does not defined. it is assumed that the gradient does not defined.
Using the Using the
`chain rule <http://en.wikipedia.org/wiki/Chain_rile>`_ `chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_
these gradients can be composed in order to obtain the expression of the these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs . gradient of the graph's output with respect to the graph's inputs .
......
...@@ -29,9 +29,10 @@ class ConvOp(Op): ...@@ -29,9 +29,10 @@ class ConvOp(Op):
#TODO: make the stacksize its own parameter, and make imshp a pair #TODO: make the stacksize its own parameter, and make imshp a pair
def __init__(self, imshp, kshp, nkern, bsize, dx, dy, output_mode='valid', def __init__(self, imshp=None, kshp=None, nkern=None, bsize=None, dx=None, dy=None, output_mode='valid',
unroll_batch=4, unroll_batch=0,
unroll_kern=4, unroll_kern=0,
unroll_patch=False,
imshp_logical=None, imshp_logical=None,
kshp_logical=None, kshp_logical=None,
kshp_logical_top_aligned=True, kshp_logical_top_aligned=True,
...@@ -47,6 +48,7 @@ class ConvOp(Op): ...@@ -47,6 +48,7 @@ class ConvOp(Op):
dx - patch stride rows dx - patch stride rows
dy - patch stride cols dy - patch stride cols
out_mode - 'valid', 'full' out_mode - 'valid', 'full'
unroll_patch - c code generation option
unroll_batch - c code generation option unroll_batch - c code generation option
unroll_kern - c code generation option unroll_kern - c code generation option
verbose - passed to GpuConv verbose - passed to GpuConv
...@@ -60,6 +62,7 @@ class ConvOp(Op): ...@@ -60,6 +62,7 @@ class ConvOp(Op):
gradient on the filters. gradient on the filters.
unroll_patch. If True will use a version that is faster then without not unroll by unroll the patch loop.
unroll_batch. If >0 will use a version that will unroll the batch loop by the value of the option. By default don't use this version of the code. unroll_batch. If >0 will use a version that will unroll the batch loop by the value of the option. By default don't use this version of the code.
unroll_nkern. idem as unroll_batch but unroll the kernel loop. unroll_nkern. idem as unroll_batch but unroll the kernel loop.
...@@ -95,6 +98,7 @@ class ConvOp(Op): ...@@ -95,6 +98,7 @@ class ConvOp(Op):
self.unroll_batch=unroll_batch self.unroll_batch=unroll_batch
self.unroll_kern=unroll_kern self.unroll_kern=unroll_kern
self.unroll_patch=unroll_patch
if self.unroll_batch>0 and self.bsize % self.unroll_batch!=0: if self.unroll_batch>0 and self.bsize % self.unroll_batch!=0:
if self.bsize<=self.unroll_batch: if self.bsize<=self.unroll_batch:
...@@ -407,6 +411,7 @@ using namespace std; ...@@ -407,6 +411,7 @@ using namespace std;
d["self_imshp0"]=self.imshp[0] d["self_imshp0"]=self.imshp[0]
d["self_imshp1"]=self.imshp[1] d["self_imshp1"]=self.imshp[1]
d["self_imshp2"]=self.imshp[2] d["self_imshp2"]=self.imshp[2]
d["mode"]=self.out_mode.upper()
d["self_kshp0"]=self.kshp[0] d["self_kshp0"]=self.kshp[0]
d["self_kshp1"]=self.kshp[1] d["self_kshp1"]=self.kshp[1]
d["self_kshp_logical_r"] = self.kshp_logical[0] d["self_kshp_logical_r"] = self.kshp_logical[0]
...@@ -439,8 +444,12 @@ using namespace std; ...@@ -439,8 +444,12 @@ using namespace std;
#print self.out_mode, d["self_imshp_logical_stride_r"] #print self.out_mode, d["self_imshp_logical_stride_r"]
if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical: if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical:
# print "return imshp!=imshp_logical or self.kshp != self.kshp_logical shape version"
return _conv_op_code_a % d return _conv_op_code_a % d
if self.unroll_patch:
# print "return unroll patch version",self.dx,self.dy
return _conv_op_code_unroll_patch%d
if self.unroll_batch>0 or self.unroll_kern>0: if self.unroll_batch>0 or self.unroll_kern>0:
if self.unroll_batch<=0: self.unroll_batch=1 if self.unroll_batch<=0: self.unroll_batch=1
if self.unroll_kern<=0: self.unroll_kern=1 if self.unroll_kern<=0: self.unroll_kern=1
...@@ -1212,3 +1221,295 @@ Py_XDECREF(img2d); ...@@ -1212,3 +1221,295 @@ Py_XDECREF(img2d);
Py_XDECREF(filtersflipped); Py_XDECREF(filtersflipped);
""" """
return ret return ret
_conv_op_code_unroll_patch = """
const int mode=%(mode)s;
int typenum=0, typenum_f=0;
PyArrayObject *ain1=NULL, *ain2=NULL, *filtersflipped_arr=NULL, *img2d_arr=NULL;
const %(type)s fill_value = 0;
int type_im=PyArray_TYPE(%(img2d)s);
int type_ker=PyArray_TYPE(%(filtersflipped)s);
npy_intp dim_zz[2]={%(self_outshp0)s,%(self_outshp1)s};
npy_intp dim_im[2]={%(self_imshp1)s,%(self_imshp2)s};
npy_intp dim_ker[2]={%(self_kshp0)s,%(self_kshp1)s};
PyArray_Dims img2d_shape;
npy_intp img2d_dim[4]={1,1,0,0};
img2d_shape.ptr=img2d_dim;
img2d_shape.len=4;
PyArray_Dims kerns_shape;
npy_intp kerns_dim[4]={1,1,0,0};
kerns_shape.ptr=kerns_dim;
kerns_shape.len=4;
PyObject *img2d=NULL, *contig, *filtersflipped=NULL;
if(%(img2d)s->nd==2){
img2d_dim[3]=%(img2d)s->dimensions[1];
img2d_dim[2]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==3){
img2d_dim[3]=%(img2d)s->dimensions[2];
img2d_dim[2]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==4){
img2d_dim[3]=%(img2d)s->dimensions[3];
img2d_dim[2]=%(img2d)s->dimensions[2];
img2d_dim[1]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else {
PyErr_SetString(PyExc_ValueError, "img don't have a good shape");
%(fail)s;
}
if(%(filtersflipped)s->nd==3){
kerns_dim[3]=%(filtersflipped)s->dimensions[2];
kerns_dim[2]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else if(%(filtersflipped)s->nd==4){
kerns_dim[3]=%(filtersflipped)s->dimensions[3];
kerns_dim[2]=%(filtersflipped)s->dimensions[2];
kerns_dim[1]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else{
std:stringstream temp;
temp << "nddim="<<%(filtersflipped)s->nd;
std::string param = temp.str();
PyErr_SetString(PyExc_ValueError,
("kernel don't have a good shape. " + param).c_str());
%(fail)s;
}
img2d = PyArray_Newshape(%(img2d)s,&img2d_shape, PyArray_CORDER);
img2d_arr = (PyArrayObject*)img2d;
if ((img2d_arr->strides[3] != sizeof(%(type)s))
|| (img2d_arr->strides[2] != img2d_arr->dimensions[3]*sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)img2d));
Py_DECREF(img2d);
img2d = contig;
if (!PyArray_ISCONTIGUOUS(img2d)){
PyErr_SetString(PyExc_ValueError, "img2d isn't contiguous");
%(fail)s;
}
}
img2d_arr = (PyArrayObject*)img2d;
filtersflipped = PyArray_Newshape(%(filtersflipped)s,&kerns_shape, PyArray_CORDER);
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if ((filtersflipped_arr->strides[3] != sizeof(%(type)s))
|| (filtersflipped_arr->strides[2] != filtersflipped_arr->dimensions[3]*sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)filtersflipped));
Py_DECREF(filtersflipped);
filtersflipped = contig;
if (!PyArray_ISCONTIGUOUS(filtersflipped)){
PyErr_SetString(PyExc_ValueError, "filtersflipped isn't contiguous");
%(fail)s;
}
}
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if(mode != VALID && mode != FULL){
PyErr_SetString(PyExc_ValueError, "invalid mode, only full and valid are supported"); %(fail)s;
}
typenum = PyArray_ObjectType((PyObject*)%(img2d)s, 0);
typenum_f = PyArray_ObjectType((PyObject*)%(filtersflipped)s, 0);
if (typenum < 0) {PyErr_SetString(PyExc_ValueError, "Invalid type"); %(fail)s;}
if (typenum != typenum_f) {PyErr_SetString(PyExc_ValueError, "Input types must match"); %(fail)s;}
if (!img2d) %(fail)s;
if (!filtersflipped) %(fail)s;
if ((!%(z)s)
|| *PyArray_DIMS(%(z)s)!=4
||(%(z)s->dimensions[0] != %(self_bsize)s)
||(%(z)s->dimensions[1] != %(self_nkern)s)
||(%(z)s->dimensions[2] != dim_zz[0])
|| (%(z)s->dimensions[3] != dim_zz[1])
)
{
if (%(z)s) Py_DECREF(%(z)s);
npy_intp dims[4] = {0,0,0,0};
if(!dims) %(fail)s;
dims[0]=%(self_bsize)s;
dims[1]=%(self_nkern)s;
dims[2]=dim_zz[0];
dims[3]=dim_zz[1];
%(z)s = (PyArrayObject*) PyArray_ZEROS(4, dims, typenum,0);
}else{
//PyArray_FILLWBYTE((PyObject*)%(z)s,0);
}
int Os[2];
Os[0]=%(self_outshp0)s;
Os[1]=%(self_outshp1)s;
//I keep the formula to calculte Os in case we need it in the futur.
//if (mode == FULL) {Os[0] = (int)ceil((dim_im[0]+dim_ker[0]-1)/float(%(self_dx)s)); Os[1] = ceil((dim_im[1]+dim_ker[1]-1)/float(%(self_dy)s));}
//else {Os[0] = (int)ceil((dim_im[0]-dim_ker[0]+1)/float(%(self_dx)s)); Os[1] = (int)ceil((dim_im[1]-dim_ker[1]+1)/float(%(self_dy)s));}
for(int b=0;b< %(self_bsize)s;b++){
for(int n_kern=0;n_kern<%(self_nkern)s;n_kern++){
//assertions
if (%(z)s->strides[0] != %(z)s->dimensions[1] *%(z)s->dimensions[2] *%(z)s->dimensions[3] * sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[1] != %(z)s->dimensions[2] * %(z)s->dimensions[3] * sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[2] != %(z)s->dimensions[3] * sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[3] != sizeof(%(type)s)) %(fail)s;
%(type)s * __restrict__ out=(%(type)s *)(PyArray_GETPTR2(%(z)s,b,n_kern));
for (int i = 0; i < dim_zz[0]*dim_zz[1]; ++i) out[i] = 0;
for(int stack_size=0;stack_size<%(self_imshp0)s;stack_size++){
const %(type)s * __restrict__ in=(%(type)s *)(PyArray_GETPTR2(img2d,b,stack_size));
const %(type)s * __restrict__ hvals=(%(type)s *)(PyArray_GETPTR2(filtersflipped,n_kern,stack_size));
int new_m;
for (int iter_m=0; iter_m < Os[0]; iter_m++) {
// Reposition index into input image based on requested output size
int pos_m = iter_m*%(self_dx)s;//The position of the patch in the image
if (mode == FULL) new_m = pos_m ;
else new_m = (pos_m+dim_ker[0]-1);
for (int iter_n=0; iter_n < Os[1]; iter_n++) { // loop over columns
int pos_n=iter_n*%(self_dy)s;
%(type)s sum=0;
%(type)s sum2=0;
%(type)s sum3=0;
%(type)s sum4=0;
int nb_sum=0;
// Sum over kernel, if index into image is out of bounds
// fill with the value
for (int j=0; j < dim_ker[0]; j++) {
int ind0 = (new_m-j);
if(mode==FULL){
const %(type)s * idx_hvals=&hvals[j*dim_ker[1]];
if(ind0 < 0 || ind0 >= dim_im[0]){
if(fill_value!=0)
for (int k=0; k < dim_ker[1]; k++) {
sum+= idx_hvals[k] * fill_value;
}
}else{
//do the part where kernel is to the right of the img
//TODO: implement unroll patch for fill_value!=0
int k=0,max_k=max((int)(pos_n-dim_im[1])+1,0);
if(fill_value!=0){
for(k=0;k<max_k;k++){
sum+= idx_hvals[k]*fill_value;
}
}else {k=max_k;}
//do the part where the kernel is on the img
max_k=min(pos_n+1,(int)dim_ker[1]);
const %(type)s * idx_in=&in[ind0*dim_im[1]];
if(iter_n + 4*%(self_dy)s < Os[1]
&& iter_n>dim_ker[1]-1+3
&& iter_n<dim_im[1]-dim_ker[1]+1-3){
nb_sum=4;
//cout<<4<<endl;
for (int ind1=pos_n-k; k<max_k; k++,ind1--) {
sum+=idx_hvals[k]*idx_in[ind1];
sum2+=idx_hvals[k]*idx_in[ind1+%(self_dy)s];
sum3+=idx_hvals[k]*idx_in[ind1+2*%(self_dy)s];
sum4+=idx_hvals[k]*idx_in[ind1+3*%(self_dy)s];
}
}else if(iter_n + 2*%(self_dy)s < Os[1]
&& iter_n>dim_ker[1]-1
&& iter_n<dim_im[1]-dim_ker[1]+1){
//cout<<2<<endl;
nb_sum=2;
// if(iter_n==dim_ker[1]-1){//k-1<min(pos_n+%(self_dy)s,(int)dim_ker[1])){
// sum2+=idx_hvals[k-1]*idx_in[pos_n-k-%(self_dy)s];
// }
for (int ind1=pos_n-k; k<max_k; k++,ind1--) {
sum+=idx_hvals[k]*idx_in[ind1];
sum2+=idx_hvals[k]*idx_in[ind1+%(self_dy)s];
}
// sum2+=idx_hvals[k]*idx_in[pos_n-k+%(self_dy)s];
// sum+=idx_hvals[k]*idx_in[pos_n-k];
// k++;
}else{
//cout<<1<<endl;
nb_sum=1;
/*
%(type)s sum_=0;
if((k-max_k) & 0x1 != 0){
sum+= idx_hvals[k] * idx_in[pos_n-k];
}
for (int ind1=pos_n-k; k<max_k; k+=2,ind1-=2) {
sum+= idx_hvals[k] * idx_in[ind1];
sum_+= idx_hvals[k+1] * idx_in[ind1-1];
}
sum+=sum_;
*/
for (int ind1=pos_n-k; k<max_k; k++,ind1--) {
sum+=idx_hvals[k]*idx_in[ind1];
}
}
//do the part to the left of the img
if(fill_value!=0)
for(;k<dim_ker[1];k++) sum+= idx_hvals[k]*fill_value;
}
}else{//valid mode
const %(type)s* idx_in=&in[ind0*dim_im[1]];
const %(type)s* idx_hvals=&hvals[j*dim_ker[1]];
if(iter_n + 4*%(self_dy)s < Os[1]){
nb_sum=4;
for (int k=dim_ker[1]-1,im_idx=pos_n; k >=0; k--,im_idx++) {
sum+=idx_hvals[k]*idx_in[im_idx];
sum2+=idx_hvals[k]*idx_in[im_idx+%(self_dy)s];
sum3+=idx_hvals[k]*idx_in[im_idx+2*%(self_dy)s];
sum4+=idx_hvals[k]*idx_in[im_idx+3*%(self_dy)s];
}
}else if(iter_n + 2*%(self_dy)s < Os[1]){
nb_sum=2;
for (int k=dim_ker[1]-1,im_idx=pos_n; k >=0; k--,im_idx++) {
sum+=idx_hvals[k]*idx_in[im_idx];
sum2+=idx_hvals[k]*idx_in[im_idx+%(self_dy)s];
}
}else{
nb_sum=1;
for (int k=dim_ker[1]-1,im_idx=pos_n; k >=0; k--,im_idx++) {
sum+=idx_hvals[k]*idx_in[im_idx];
}
}
}//else valid mode
}//for j
switch(nb_sum){
case 4: out[iter_m*dim_zz[1]+iter_n+3] %(affectation)s sum4;
case 3: out[iter_m*dim_zz[1]+iter_n+2] %(affectation)s sum3;
case 2: out[iter_m*dim_zz[1]+iter_n+1] %(affectation)s sum2;
case 1: out[iter_m*dim_zz[1]+iter_n] %(affectation)s sum;
}
iter_n+=nb_sum-1;
/*
out[iter_m*dim_zz[1]+iter_n] %(affectation)s sum;
if(nb_sum>=2){
iter_n++;
out[iter_m*dim_zz[1]+iter_n] %(affectation)s sum2;
}
if(nb_sum>=3){
iter_n++;
out[iter_m*dim_zz[1]+iter_n] %(affectation)s sum3;
}
if(nb_sum>=4){
iter_n++;
out[iter_m*dim_zz[1]+iter_n] %(affectation)s sum4;
}
*/
}//for iter_n
}//for iter_m
}//for stack_size
if (0 && (mode==FULL)){
for (int i = 0; i < dim_zz[0]*dim_zz[1]; ++i)
std::cout << " " << out[i];
std::cout << "\\n";
}
}//for n_kern
}//for b
Py_XDECREF(img2d);
Py_XDECREF(filtersflipped);
"""
import os, sys import os, sys
from theano.gof.compiledir import get_compiledir from theano.gof.compiledir import get_compiledir
from theano.compile import optdb from theano.compile import optdb
import theano.config as config
import logging, copy import logging, copy
_logger_name = 'theano_cuda_ndarray' _logger_name = 'theano_cuda_ndarray'
...@@ -15,8 +16,34 @@ def debug(*msg): ...@@ -15,8 +16,34 @@ def debug(*msg):
_logger.debug(_logger_name+'DEBUG: '+' '.join(str(m) for m in msg)) _logger.debug(_logger_name+'DEBUG: '+' '.join(str(m) for m in msg))
#compile type_support.cu # Compile type_support.cu
#this need that nvcc(part of cuda) is installed # This need that nvcc (part of cuda) is installed. If it is not, a warning is
# printed and this module will not be working properly (we set `enable_cuda`
# to False).
# This variable is True by default, and set to False if something goes wrong
# when trying to initialize cuda.
enable_cuda = True
# Global variable to avoid displaying the same warning multiple times.
cuda_warning_is_displayed = False
# Code factorized within a function so that it may be called from multiple
# places (which is not currently the case, but may be useful in the future).
def set_cuda_disabled():
"""Function used to disable cuda.
A warning is displayed, so that the user is aware that cuda-based code is
not going to work.
Note that there is no point calling this function from outside of
`cuda.__init__`, since it has no effect once the module is loaded.
"""
global enable_cuda, cuda_warning_is_displayed
enable_cuda = False
if not cuda_warning_is_displayed:
cuda_warning_is_displayed = True
warning('Cuda is disabled, cuda-based code will thus not be '
'working properly')
old_file = os.path.join(os.path.split(__file__)[0],'type_support.so') old_file = os.path.join(os.path.split(__file__)[0],'type_support.so')
if os.path.exists(old_file): if os.path.exists(old_file):
...@@ -30,6 +57,10 @@ except ImportError: ...@@ -30,6 +57,10 @@ except ImportError:
import nvcc_compiler import nvcc_compiler
if not nvcc_compiler.is_nvcc_available():
set_cuda_disabled()
if enable_cuda:
print __file__ print __file__
cuda_path=os.path.split(old_file)[0] cuda_path=os.path.split(old_file)[0]
...@@ -64,21 +95,20 @@ except ImportError: ...@@ -64,21 +95,20 @@ except ImportError:
from type_support.type_support import * from type_support.type_support import *
if enable_cuda:
from theano.sandbox.cuda.type import CudaNdarrayType from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda.var import (CudaNdarrayVariable, from theano.sandbox.cuda.var import (CudaNdarrayVariable,
CudaNdarrayConstant, CudaNdarrayConstant,
CudaNdarraySharedVariable, CudaNdarraySharedVariable,
shared_constructor) shared_constructor)
import basic_ops import basic_ops
from basic_ops import (GpuFromHost, HostFromGpu, GpuElemwise, from basic_ops import (GpuFromHost, HostFromGpu, GpuElemwise,
GpuDimShuffle, GpuSum, GpuReshape, GpuDimShuffle, GpuSum, GpuReshape,
GpuSubtensor, GpuIncSubtensor, GpuFlatten, GpuShape) GpuSubtensor, GpuIncSubtensor, GpuFlatten, GpuShape)
import opt import opt
import cuda_ndarray import cuda_ndarray
import theano.config as config
def use(device=config.THEANO_GPU): def use(device=config.THEANO_GPU):
if use.device_number is None: if use.device_number is None:
......
...@@ -19,6 +19,15 @@ def debug(*args): ...@@ -19,6 +19,15 @@ def debug(*args):
#sys.stderr.write('DEBUG:'+ ' '.join(str(a) for a in args)+'\n') #sys.stderr.write('DEBUG:'+ ' '.join(str(a) for a in args)+'\n')
_logger.debug("DEBUG: "+' '.join(str(a) for a in args)) _logger.debug("DEBUG: "+' '.join(str(a) for a in args))
def is_nvcc_available():
"""Return True iff the nvcc compiler is found."""
try:
subprocess.call(['nvcc', '--version'], stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
return True
except:
return False
def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[], lib_dirs=[], libs=[], def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[], lib_dirs=[], libs=[],
preargs=[]): preargs=[]):
""" """
......
import numpy
import theano
import theano.sandbox.scan
# generator network, only one output , type scalar ; no sequence or
# non sequence arguments
def test_1():
def f_pow2(x_tm1):
return (2*x_tm1, {})
s = theano.tensor.dvector()
n_steps = theano.tensor.dscalar()
Y = theano.sandbox.scan.scan(f_pow2, [],s, [],n_steps = n_steps)
f1 = theano.function([s,n_steps], Y)
assert( numpy.any(f1([1],3)== [2,4,8]) )
# simple rnn, one input, one state, weights for each; input/state are
# vectors, weights are scalars
def test_2():
def f_rnn(u_t,x_tm1,W_in, W):
return (u_t*W_in+x_tm1*W, {})
u = theano.tensor.dvector()
x0 = theano.tensor.dvector()
W_in = theano.tensor.dscalar()
W = theano.tensor.dscalar()
Y = theano.sandbox.scan.scan(f_rnn, u,x0,[W_in,W])
f2 = theano.function([u,x0,W_in,W], Y)
assert(numpy.any(f2([1,2,3,4],[1],.1,1)== numpy.array([1.1,1.3,1.6,2.])))
# simple rnn, one input, one state, weights for each; input/state are
# vectors, weights are scalars; using shared variables
def test_3():
u = theano.tensor.dvector()
x0 = theano.tensor.dvector()
W_in = theano.shared(.1, name = 'w_in')
W = theano.shared(1., name ='w')
def f_rnn_shared(u_t,x_tm1):
return (u_t*W_in+x_tm1*W, {})
Y = theano.sandbox.scan.scan(f_rnn_shared, u,x0,[])
f3 = theano.function([u,x0], Y)
assert(numpy.any(f3([1,2,3,4],[1])== numpy.array([1.1,1.3,1.6,2.])))
# some rnn with multiple outputs and multiple inputs; other dimension
# instead of scalars/vectors
def test_4():
W_in2 = theano.shared(numpy.array([1.,2.]), name='win2')
W = theano.shared(numpy.array([[2.,1.],[1.,1.]]), name='w')
W_out = theano.shared(numpy.array([.5,1.]), name = 'wout')
W_in1 = theano.tensor.dmatrix('win')
u1 = theano.tensor.dmatrix('u1')
u2 = theano.tensor.dvector('u2')
x0 = theano.tensor.dmatrix('x0')
y0 = theano.tensor.dvector('y0')
## Why dot doesn;t work with scalars !??
## Why * doesn't support SharedVariable and TensorVariable
def f_rnn_cmpl(u1_t, u2_t, x_tm1, y_tm1, W_in1):
return ({}, [theano.dot(u1_t,W_in1) + u2_t* W_in2 + \
theano.dot(x_tm1, W), theano.dot(x_tm1, W_out)])
Y = theano.sandbox.scan.scan(f_rnn_cmpl,[u1,u2],[x0,y0],W_in1)
f4 = theano.function([u1,u2,x0,y0,W_in1], Y)
(x,y) = f4( numpy.array([[1,2],[1,2],[1,2]]), \
numpy.array([1,2,3]), \
numpy.array([[0,0]]), \
numpy.array([1]), \
numpy.array([[1,1],[1,1]]))
assert( numpy.all(x == numpy.array([[4.,5.],[18.,16.],[58.,43.]])))
assert( numpy.all(y == numpy.array([0.,7.,25.])))
# basic ESN using updates
def test_5():
W_in = theano.shared(numpy.array([1.,1.]), name='win')
W = theano.shared(numpy.array([[.1,0.],[.0,.1]]),name='w')
W_out= theano.shared(numpy.array([.5,1.]), name='wout')
u = theano.tensor.dvector('u')
x = theano.shared(numpy.array([0.,0.]),'x')
y0 = theano.tensor.dvector('y0')
def f_ESN(u_t):
return ( theano.dot(x,W_out), \
{ x: W_in*u_t + theano.dot(x,W) } )
Y = theano.sandbox.scan.scan(f_ESN,u,y0,[],outputs_taps={0:[]})
f5 = theano.function([u,y0],Y)
assert( f5( numpy.array([1,2,3]), numpy.array([0])) == \
numpy.array([0.,1.4,3.15]))
# basic ESN using updates ; moving backwards
def test_6():
W_in = theano.shared(numpy.array([1.,1.]), name='win')
W = theano.shared(numpy.array([[.1,0.],[.0,.1]]),name='w')
W_out= theano.shared(numpy.array([.5,1.]), name='wout')
u = theano.tensor.dvector('u')
x = theano.shared(numpy.array([0.,0.]),'x')
y0 = theano.tensor.dvector('y0')
def f_ESN(u_t):
return ( theano.dot(x,W_out), \
{ x: W_in*u_t + theano.dot(x,W) } )
Y = theano.sandbox.scan.scan(f_ESN,u,y0,[],outputs_taps={0:[]}, \
go_backwards = True)
f6 = theano.function([u,y0],Y)
assert( f6( numpy.array([1,2,3]), numpy.array([0])) == \
numpy.array([0., 4.5, 3.45]))
'''
TO TEST:
- test taps (for sequences and outputs )
- test gradient (one output)
- test gradient (multiple outputs)
- test gradient (go_bacwards)
- test gradient (multiple outputs / some uncomputable )
- test gradient (truncate_gradient)
- test gradient (force_gradient)
- test inplace map
'''
if __name__=='__main__':
test_1()
test_2()
test_3()
test_4()
test_5()
test_6()
...@@ -62,17 +62,6 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, ...@@ -62,17 +62,6 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={},
# compute number of sequences and number of seqs # compute number of sequences and number of seqs
n_seqs = len(seqs) n_seqs = len(seqs)
# see if there are outputs that do not feed anything back to the function
# applied recursively
#outs_tapkeys = outputs_taps.keys()
#outs_tapkeys.sort()
#for k in outs_tapkeys:
# if outputs_taps[k] == []:
# # add empty lists where you have outputs that do not have past
# # values
# init_outs = init_outs[:k] + [[]] + init_outs[k:]
n_outs = len(init_outs) n_outs = len(init_outs)
...@@ -185,7 +174,8 @@ class Scan(theano.Op): ...@@ -185,7 +174,8 @@ class Scan(theano.Op):
self.destroy_map = {} self.destroy_map = {}
if inplace: if inplace:
self.destroy_map = inplace_map for i in inplace_map.keys():
self.destroy_map.update({i: [inplace_map[i]] } )
self.seqs_taps = seqs_taps self.seqs_taps = seqs_taps
self.outs_taps = outs_taps self.outs_taps = outs_taps
...@@ -205,11 +195,23 @@ class Scan(theano.Op): ...@@ -205,11 +195,23 @@ class Scan(theano.Op):
updates = updates, mode = mode) updates = updates, mode = mode)
g_y = [outputs[0].type()] g_y = [outputs[0].type()]
g_args = theano.tensor.grad(outputs[0],inputs, g_cost = g_y[-1])
def compute_gradient(y, g_y):
gmap = theano.gradient.grad_sources_inputs( \
[(y,g_y)], theano.gof.graph.inputs([y]), False)
def zero(p):
return theano.tensor.TensorConstant(theano.tensor.TensorType(\
dtype=p.type.dtype, broadcastable=[]),
numpy.asarray(0,dtype = p.type.dtype))
return [gmap.get(p, zero(p)) for p in inputs]
g_args = compute_gradient( outputs[0], g_y[-1])
# for all outputs compute gradients and then sum them up # for all outputs compute gradients and then sum them up
for y in outputs[1:]: for y in outputs[1:]:
g_y += [y.type()] g_y += [y.type()]
g_args_y = theano.tensor.grad(y,inputs, g_cost=g_y[-1]) g_args_y = compute_gradient( y,g_y[-1])
for i in xrange(len(g_args)): for i in xrange(len(g_args)):
g_args[i] += g_args_y[i] g_args[i] += g_args_y[i]
...@@ -256,6 +258,7 @@ class Scan(theano.Op): ...@@ -256,6 +258,7 @@ class Scan(theano.Op):
(self.n_args == other.n_args) (self.n_args == other.n_args)
return rval return rval
def __hash__(self): def __hash__(self):
return hash(type(self)) ^ \ return hash(type(self)) ^ \
hash(self.n_seqs) ^ \ hash(self.n_seqs) ^ \
......
...@@ -41,7 +41,7 @@ def flip(kern, kshp): ...@@ -41,7 +41,7 @@ def flip(kern, kshp):
global_rng = N.random.RandomState(3423489) global_rng = N.random.RandomState(3423489)
dmatrix4=T.TensorType('float64', (False, False, False, False)) dmatrix4=T.TensorType('float64', (False, False, False, False))
def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll_batch=0, unroll_kern=0, img=T.dmatrix(), validate=True, conv_op_py=False, do_convolve2=False, do_print=True, repeat=1): def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll_batch=0, unroll_kern=0, img=T.dmatrix(), validate=True, conv_op_py=False, do_convolve2=False, do_print=True, repeat=1, unroll_patch=0):
# build actual input images # build actual input images
imgval = global_rng.rand(bsize, imshp[0], imshp[1], imshp[2]) imgval = global_rng.rand(bsize, imshp[0], imshp[1], imshp[2])
...@@ -121,7 +121,7 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll ...@@ -121,7 +121,7 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll
hidval1=outval.copy() hidval1=outval.copy()
# ConvOp # ConvOp
conv_op = ConvOp(imshp, kshp, nkern, bsize, ss[0],ss[1], conv_mode, unroll_batch=unroll_batch, unroll_kern=unroll_kern)(inputs4, kerns4) conv_op = ConvOp(imshp, kshp, nkern, bsize, ss[0],ss[1], conv_mode, unroll_batch=unroll_batch, unroll_kern=unroll_kern, unroll_patch=unroll_patch)(inputs4, kerns4)
l1shp=N.hstack((nkern, l1shp=N.hstack((nkern,
getFilterOutShp(imshp, kshp, ss, conv_mode))) getFilterOutShp(imshp, kshp, ss, conv_mode)))
propup2 = function([inputs4, kerns4], conv_op) propup2 = function([inputs4, kerns4], conv_op)
...@@ -328,7 +328,7 @@ class TestConvOp(unittest.TestCase): ...@@ -328,7 +328,7 @@ class TestConvOp(unittest.TestCase):
ssizess = [[(1,1),(1,2)],[(1,1),(2,2)]] ssizess = [[(1,1),(1,2)],[(1,1),(2,2)]]
convmodes = ['valid','full'] convmodes = ['valid','full']
do_convolve2=True do_convolve2=True
unroll = [(0,0),(1,1),(2,2),(3,2)]#(batch,kern) unroll = [(0,0,False),(0,0,True),(1,1,False),(2,2,False),(3,2,False)]#(batch,kern,patch)
do_speed_test = False do_speed_test = False
# TODO: this version show a bug that was fixed # TODO: this version show a bug that was fixed
...@@ -338,6 +338,11 @@ class TestConvOp(unittest.TestCase): ...@@ -338,6 +338,11 @@ class TestConvOp(unittest.TestCase):
# nkerns = [2,2] # per output pixel # nkerns = [2,2] # per output pixel
# ssizes = [(1,1),(2,2)]#2,2)] # ssizes = [(1,1),(2,2)]#2,2)]
# bsizes = [1,1] # batch size
# imshp_starts = [(1,10,10),(1,5,6)]
# kshpss = ([[2,3],[3,2]],[[2,2],[2,2]])
# nkernss = [[1,1],[1,1]] # per output pixel
N.set_printoptions(threshold=N.nan) N.set_printoptions(threshold=N.nan)
# symbolic stuff # symbolic stuff
...@@ -356,8 +361,8 @@ class TestConvOp(unittest.TestCase): ...@@ -356,8 +361,8 @@ class TestConvOp(unittest.TestCase):
unroll_batch = [1,2,4,5,10,20] unroll_batch = [1,2,4,5,10,20]
unroll_kern = [1,2,4,5,10,20] unroll_kern = [1,2,4,5,10,20]
unroll_batch = [1,2,5] unroll_batch = [1,4,5]
unroll_kern = [1,2,5] unroll_kern = [1,4,5]
bsize = 20 # batch size bsize = 20 # batch size
imshp_start = (1,48,48)#un square shape to test more corner case. imshp_start = (1,48,48)#un square shape to test more corner case.
...@@ -374,9 +379,17 @@ class TestConvOp(unittest.TestCase): ...@@ -374,9 +379,17 @@ class TestConvOp(unittest.TestCase):
timing = N.zeros((len(unroll_batch),len(unroll_kern),3)) timing = N.zeros((len(unroll_batch),len(unroll_kern),3))
t_b_k=[] t_b_k=[]
#calculate the timing with unrolling #calculate the timing with unrolling
t_=[[ 7.60572791, 3.95069814, 3.74271464], [ 4.05631089, 2.90384555, 2.93613672], [ 3.90551591, 2.92595196, 3.00102282]]
best=[]
worst=[]
best=[0.52690219879150391, 2.4266397953033447]
worst=[0.92042708396911621, 6.8822150230407715]
t_=[]
for unroll_b, n_b in zip(unroll_batch,range(len(unroll_batch))): for unroll_b, n_b in zip(unroll_batch,range(len(unroll_batch))):
for unroll_k, n_k in zip(unroll_kern,range(len(unroll_kern))): for unroll_k, n_k in zip(unroll_kern,range(len(unroll_kern))):
t_b_k.append(str(unroll_b)+"/"+str(unroll_k)) t_b_k.append(str(unroll_b)+"/"+str(unroll_k))
if not t_:
tctot, tpytot, ntot=[],[],[] tctot, tpytot, ntot=[],[],[]
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))): for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))): for ss, n_ss in zip(ssizes,range(len(ssizes))):
...@@ -384,36 +397,68 @@ class TestConvOp(unittest.TestCase): ...@@ -384,36 +397,68 @@ class TestConvOp(unittest.TestCase):
tctot+=[tctot_] tctot+=[tctot_]
tpytot+=[tpytot_] tpytot+=[tpytot_]
ntot+=[ntot_] ntot+=[ntot_]
if unroll_b==4 and unroll_k==4:
print "unroll 4/4",tctot
best=tctot
if unroll_b==1 and unroll_k==1:
print "unroll 1/1",tctot
worst=tctot
timing[n_b,n_k]=[sum(tctot), sum(tpytot), sum(ntot)] timing[n_b,n_k]=[sum(tctot), sum(tpytot), sum(ntot)]
if not t_:
t=timing[:,:,0]#We select only the c timing.
else:
t=t_
t=N.asarray(t)
#calculate the old timing #calculate the old timing
tctot,tpytot,ntot=0,0,0 tctot_=[0.52555489540100098, 6.6634182929992676]
# tctot_=[]
tctot,tpytot,ntot=[],[],[]
if not tctot_:
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))): for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))): for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate) tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate)
tctot+=tctot_ tctot+=[tctot_]
tpytot+=tpytot_ tpytot+=[tpytot_]
ntot+=ntot_ ntot+=[ntot_]
print "old code timing %.3fs"%tctot else: tctot=N.asarray(tctot_)
print "old code timing %.3fs"%sum(tctot),tctot
# print timing best=N.asarray(best)
t=timing[:,:,0]#We select only the c timing. worst=N.asarray(worst)
print "timing for unrolled version" print "timing for unrolled version"
print t_b_k print t_b_k
print t print t
print "max %.3fs"%t.max(), "max param(batch unloop size/kernel unloop size)", t_b_k[t.argmax()] print "max %.3fs"%t.max(), "max param(batch unloop size/kernel unloop size)", t_b_k[t.argmax()]
print "min %.3fs"%t.min(), "min param(batch unloop size/kernel unloop size)", t_b_k[t.argmin()] print "min %.3fs"%t.min(), "min param(batch unloop size/kernel unloop size)", t_b_k[t.argmin()]
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t.min(),tctot/t.min()) print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t.min(),sum(tctot)/t.min())
print worst/best,tctot/best
tctot_patch = []
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=2)
tctot_patch += [tctot_]
t_patch=sum(tctot_patch)
print "unroll_patch time", tctot_patch
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch,sum(tctot)/t_patch)
print best/tctot_patch, worst/tctot_patch
print best
print worst
print tctot
print tctot_patch
return return
for i in range(len(kshpss)): for i in range(len(kshpss)):
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))): for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizess[i],range(len(ssizess[i]))): for ss, n_ss in zip(ssizess[i],range(len(ssizess[i]))):
for un_b, un_k in unroll: for un_b, un_k, un_p in unroll:
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet( tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(
conv_mode, ss, bsizes[i], imshp_starts[i], conv_mode, ss, bsizes[i], imshp_starts[i],
kshpss[i], nkernss[i], kshpss[i], nkernss[i],
img=img, unroll_batch=un_b, unroll_kern=un_k, img=img, unroll_batch=un_b, unroll_kern=un_k,
unroll_patch=un_p,
validate=True) validate=True)
tctot+=[tctot_] tctot+=[tctot_]
tpytot+=[tpytot_] tpytot+=[tpytot_]
...@@ -428,6 +473,11 @@ class TestConvOp(unittest.TestCase): ...@@ -428,6 +473,11 @@ class TestConvOp(unittest.TestCase):
d=N.asarray(ntot)/tpytot d=N.asarray(ntot)/tpytot
print 'speed up py theano(ConvOp) vs convolve2d: %.3fx'%d.mean(),d print 'speed up py theano(ConvOp) vs convolve2d: %.3fx'%d.mean(),d
def init_data(self,shape):
return N.ones(shape)
return N.random.random(shape)
def test_ConvOpGrad(self): def test_ConvOpGrad(self):
""" """
test the gradient in float and double test the gradient in float and double
...@@ -442,7 +492,7 @@ class TestConvOp(unittest.TestCase): ...@@ -442,7 +492,7 @@ class TestConvOp(unittest.TestCase):
kshps = [(2,3)] kshps = [(2,3)]
imshps = [(2,3,4)] imshps = [(2,3,4)]
modes = ['valid', 'full'] modes = ['valid', 'full']
unroll = [(0,0),(1,1),(2,3)] unroll = [(0,0,True),(1,1,False),(2,3,False),(1,1,False),(0,0,False)]#(batch,kern,patch)
ssizes = [(1,1),(2,2)] ssizes = [(1,1),(2,2)]
for typ in types: for typ in types:
...@@ -457,12 +507,12 @@ class TestConvOp(unittest.TestCase): ...@@ -457,12 +507,12 @@ class TestConvOp(unittest.TestCase):
imgvals = N.array(N.random.random(N.hstack((bsize,imshp))),dtype=imgs.dtype) imgvals = N.array(N.random.random(N.hstack((bsize,imshp))),dtype=imgs.dtype)
for kshp in kshps: for kshp in kshps:
t=numpy.array([imshp[1]-kshp[0],imshp[2]-kshp[1]]) t=numpy.array([imshp[1]-kshp[0],imshp[2]-kshp[1]])
kernvals = N.array(N.random.rand(nkern,visdim,kshp[0], kernvals = N.array(self.init_data((nkern,visdim,kshp[0],
kshp[1]),dtype=kerns.dtype) kshp[1])),dtype=kerns.dtype)
# 'full' mode should support kernels bigger than the input # 'full' mode should support kernels bigger than the input
if mode == 'valid' and (t<0).any(): if mode == 'valid' and (t<0).any():
continue continue
for un_b,un_k in unroll: for un_b,un_k, un_p in unroll:
for ss in ssizes: for ss in ssizes:
print 'test_ConvOpGrad' print 'test_ConvOpGrad'
print 'mode type:', mode, typ print 'mode type:', mode, typ
...@@ -476,14 +526,14 @@ class TestConvOp(unittest.TestCase): ...@@ -476,14 +526,14 @@ class TestConvOp(unittest.TestCase):
def test_i(imgs): def test_i(imgs):
convop = ConvOp(imshp, kshp, nkern, bsize, ss[0], ss[1], convop = ConvOp(imshp, kshp, nkern, bsize, ss[0], ss[1],
output_mode=mode, unroll_batch=un_b, unroll_kern=un_k) output_mode=mode, unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p)
return convop(imgs, kernvals) return convop(imgs, kernvals)
def test_k(kerns): def test_k(kerns):
convop = ConvOp(imshp, kshp, nkern, bsize, ss[0], ss[1], convop = ConvOp(imshp, kshp, nkern, bsize, ss[0], ss[1],
output_mode=mode, unroll_batch=un_b, unroll_kern=un_k) output_mode=mode, unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p)
return convop(imgvals, kerns) return convop(imgvals, kerns)
print mode, imshp, kshp, un_b, un_k, ss
#TODO the tolerance needed to pass is very high for float32(0.17). Is this acceptable? Expected? #TODO the tolerance needed to pass is very high for float32(0.17). Is this acceptable? Expected?
tol = None tol = None
if typ=="float32": if typ=="float32":
......
from scan import Scan
import unittest import unittest
import theano import theano
import theano.sandbox.scan
import random import random
import numpy.random import numpy.random
...@@ -74,6 +75,14 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None, ...@@ -74,6 +75,14 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None,
def compareArrays(a,b):
if type(a) in (list,tuple):
a = numpy.array(a)
if type(b) in (list, tuple):
b = numpy.array(b)
return numpy.all( abs(a-b) < 1e-5)
...@@ -82,10 +91,9 @@ class T_Scan(unittest.TestCase): ...@@ -82,10 +91,9 @@ class T_Scan(unittest.TestCase):
utt.seed_rng() utt.seed_rng()
# generator network, only one output , type scalar ; no sequence or # generator network, only one output , type scalar ; no sequence or
# non sequence arguments # non sequence arguments
def test_1(): def test_1(self):
def f_pow2(x_tm1): def f_pow2(x_tm1):
return (2*x_tm1, {}) return (2*x_tm1, {})
...@@ -94,11 +102,12 @@ class T_Scan(unittest.TestCase): ...@@ -94,11 +102,12 @@ class T_Scan(unittest.TestCase):
Y = theano.sandbox.scan.scan(f_pow2, [],s, [],n_steps = n_steps) Y = theano.sandbox.scan.scan(f_pow2, [],s, [],n_steps = n_steps)
f1 = theano.function([s,n_steps], Y) f1 = theano.function([s,n_steps], Y)
assert( numpy.any(f1([1],3)== [2,4,8]) )
assert(compareArrays(f1([1],3), [2,4,8]))
# simple rnn, one input, one state, weights for each; input/state are # simple rnn, one input, one state, weights for each; input/state are
# vectors, weights are scalars # vectors, weights are scalars
def test_2(): def test_2(self):
def f_rnn(u_t,x_tm1,W_in, W): def f_rnn(u_t,x_tm1,W_in, W):
return (u_t*W_in+x_tm1*W, {}) return (u_t*W_in+x_tm1*W, {})
...@@ -110,13 +119,14 @@ class T_Scan(unittest.TestCase): ...@@ -110,13 +119,14 @@ class T_Scan(unittest.TestCase):
Y = theano.sandbox.scan.scan(f_rnn, u,x0,[W_in,W]) Y = theano.sandbox.scan.scan(f_rnn, u,x0,[W_in,W])
f2 = theano.function([u,x0,W_in,W], Y) f2 = theano.function([u,x0,W_in,W], Y)
v_u = numpy.array([1.,2.,3.,4.])
assert(numpy.any(f2([1,2,3,4],[1],.1,1)== \ v_x0 = numpy.array([1])
numpy.array([1.1,1.3,1.6,2.]))) v_out = numpy.array([1.1,1.3,1.6,2.])
assert(compareArrays( f2(v_u,v_x0,.1,1), v_out ) )
# simple rnn, one input, one state, weights for each; input/state are # simple rnn, one input, one state, weights for each; input/state are
# vectors, weights are scalars; using shared variables # vectors, weights are scalars; using shared variables
def test_3(): def test_3(self):
u = theano.tensor.dvector() u = theano.tensor.dvector()
x0 = theano.tensor.dvector() x0 = theano.tensor.dvector()
...@@ -129,13 +139,15 @@ class T_Scan(unittest.TestCase): ...@@ -129,13 +139,15 @@ class T_Scan(unittest.TestCase):
Y = theano.sandbox.scan.scan(f_rnn_shared, u,x0,[]) Y = theano.sandbox.scan.scan(f_rnn_shared, u,x0,[])
f3 = theano.function([u,x0], Y) f3 = theano.function([u,x0], Y)
v_u = numpy.array([1.,2.,3.,4.])
assert(numpy.any(f3([1,2,3,4],[1])== numpy.array([1.1,1.3,1.6,2.]))) v_x0 = numpy.array([1.])
v_out = numpy.array([1.1,1.3,1.6,2.])
assert(compareArrays(f3(v_u,v_x0),v_out))
# some rnn with multiple outputs and multiple inputs; other dimension # some rnn with multiple outputs and multiple inputs; other dimension
# instead of scalars/vectors # instead of scalars/vectors
def test_4(): def test_4(self):
W_in2 = theano.shared(numpy.array([1.,2.]), name='win2') W_in2 = theano.shared(numpy.array([1.,2.]), name='win2')
W = theano.shared(numpy.array([[2.,1.],[1.,1.]]), name='w') W = theano.shared(numpy.array([[2.,1.],[1.,1.]]), name='w')
...@@ -153,19 +165,21 @@ class T_Scan(unittest.TestCase): ...@@ -153,19 +165,21 @@ class T_Scan(unittest.TestCase):
Y = theano.sandbox.scan.scan(f_rnn_cmpl,[u1,u2],[x0,y0],W_in1) Y = theano.sandbox.scan.scan(f_rnn_cmpl,[u1,u2],[x0,y0],W_in1)
f4 = theano.function([u1,u2,x0,y0,W_in1], Y) f4 = theano.function([u1,u2,x0,y0,W_in1], Y)
v_u1 = numpy.array([[1.,2.],[1.,2.],[1.,2.]])
v_u2 = numpy.array([1.,2.,3.])
v_x0 = numpy.array([[0.,0.]])
v_y0 = numpy.array([1])
v_Win1 = numpy.array([[1.,1.],[1.,1.]])
v_x = numpy.array([[4.,5.],[18.,16.],[58.,43.]])
v_y = numpy.array([0.,7.,25.])
(x,y) = f4( v_u1, v_u2, v_x0, v_y0, v_Win1)
(x,y) = f4( numpy.array([[1,2],[1,2],[1,2]]), \ assert( compareArrays(x,v_x))
numpy.array([1,2,3]), \ assert( compareArrays(y,v_y))
numpy.array([[0,0]]), \
numpy.array([1]), \
numpy.array([[1,1],[1,1]]))
assert( numpy.all(x == numpy.array([[4.,5.],[18.,16.],[58.,43.]])))
assert( numpy.all(y == numpy.array([0.,7.,25.])))
# basic ESN using updates # basic ESN using updates
def test_5(): def test_5(self):
W_in = theano.shared(numpy.array([1.,1.]), name='win') W_in = theano.shared(numpy.array([1.,1.]), name='win')
W = theano.shared(numpy.array([[.1,0.],[.0,.1]]),name='w') W = theano.shared(numpy.array([[.1,0.],[.0,.1]]),name='w')
W_out= theano.shared(numpy.array([.5,1.]), name='wout') W_out= theano.shared(numpy.array([.5,1.]), name='wout')
...@@ -181,11 +195,14 @@ class T_Scan(unittest.TestCase): ...@@ -181,11 +195,14 @@ class T_Scan(unittest.TestCase):
Y = theano.sandbox.scan.scan(f_ESN,u,y0,[],outputs_taps={0:[]}) Y = theano.sandbox.scan.scan(f_ESN,u,y0,[],outputs_taps={0:[]})
f5 = theano.function([u,y0],Y) f5 = theano.function([u,y0],Y)
assert( f5( numpy.array([1,2,3]), numpy.array([0])) == \ v_u = numpy.array([1.,2.,3.])
numpy.array([0.,1.4,3.15])) v_y0 = numpy.array([0.])
v_out = numpy.array([0.,1.5,3.15])
out = f5( v_u, v_y0 )
assert( compareArrays(v_out, out))
# basic ESN using updates ; moving backwards # basic ESN using updates ; moving backwards
def test_6(): def test_6(self):
W_in = theano.shared(numpy.array([1.,1.]), name='win') W_in = theano.shared(numpy.array([1.,1.]), name='win')
W = theano.shared(numpy.array([[.1,0.],[.0,.1]]),name='w') W = theano.shared(numpy.array([[.1,0.],[.0,.1]]),name='w')
W_out= theano.shared(numpy.array([.5,1.]), name='wout') W_out= theano.shared(numpy.array([.5,1.]), name='wout')
...@@ -202,20 +219,100 @@ class T_Scan(unittest.TestCase): ...@@ -202,20 +219,100 @@ class T_Scan(unittest.TestCase):
go_backwards = True) go_backwards = True)
f6 = theano.function([u,y0],Y) f6 = theano.function([u,y0],Y)
assert( f6( numpy.array([1,2,3]), numpy.array([0])) == \ v_u = numpy.array([1.,2.,3.])
numpy.array([0., 4.5, 3.45])) v_y0 = numpy.array([0])
v_out = numpy.array([0.,4.5,3.45])
out = f6(v_u, v_y0)
assert( compareArrays(out, v_out))
# simple rnn, one input, one state, weights for each; input/state are
# vectors, weights are scalars; using shared variables and past
# taps (sequences and outputs)
def test_7(self):
u = theano.tensor.dvector()
x0 = theano.tensor.dvector()
W_in = theano.shared(.1, name = 'w_in')
W = theano.shared(1., name ='w')
def f_rnn_shared(u_tm2, x_tm1, x_tm2):
return (u_tm2*W_in+x_tm1*W+x_tm2, {})
Y = theano.sandbox.scan.scan(f_rnn_shared, u,x0, [], \
sequences_taps = {0:[-2]}, outputs_taps = {0:[-1,-2]})
f7 = theano.function([u,x0], Y)
v_u = numpy.asarray([1.,2.,3.,4.])
v_x0 = numpy.asarray([1.,2.])
out = numpy.asarray([3.1,5.3])
assert (compareArrays( out, f7(v_u, v_x0)))
# simple rnn, one input, one state, weights for each; input/state are
# vectors, weights are scalars; using shared variables and past
# taps (sequences and outputs) and future taps for sequences
def test_8(self):
u = theano.tensor.dvector()
x0 = theano.tensor.dvector()
W_in = theano.shared(.1, name = 'w_in')
W = theano.shared(1., name ='w')
def f_rnn_shared(u_tm2,u_tp2, x_tm1, x_tm2):
return ((u_tm2+u_tp2)*W_in+x_tm1*W+x_tm2, {})
Y = theano.sandbox.scan.scan(f_rnn_shared, u,x0, [], \
sequences_taps = {0:[-2,2]}, outputs_taps = {0:[-1,-2]})
f8 = theano.function([u,x0], Y)
v_u = numpy.array([1.,2.,3.,4.,5.,6.])
v_x0 = numpy.array([1.,2.])
out = numpy.array([3.6, 6.4])
assert (compareArrays( out, f8(v_u, v_x0) ) )
'''
NOTE : BROKEN .. inplace doesn't work due to a stochasticOpimization
TODO : talk james
# simple rnn ; compute inplace
def test_9(self):
u = theano.tensor.dvector()
mu = theano.Param( u, mutable = True)
x0 = theano.tensor.dvector()
W_in = theano.shared(.1)
W = theano.shared(1.)
def f_rnn_shared(u_t, x_tm1):
return (u_t*W_in + x_tm1*W, {})
Y = theano.sandbox.scan.scan(f_rnn_shared, u, x0,[], \
inplace_map={0:0} )
f9 = theano.function([mu,x0], Y , #mode = 'FAST_RUN')
mode = 'DEBUG_MODE')
v_u = numpy.array([1.,2.,3.])
v_x0 = numpy.array([1.])
out = f9(v_u, v_x0)
v_out = numpy.array([1.1,1.3,1.6])
assert (compareArrays(out, v_out))
print v_u
assert (compareArrays(v_u, out))
'''
# test gradient simple network
def test_10(self):
pass
''' '''
TO TEST: TO TEST:
- test taps (for sequences and outputs )
- test gradient (one output) - test gradient (one output)
- test gradient (multiple outputs) - test gradient (multiple outputs)
- test gradient (go_bacwards) - test gradient (go_bacwards)
- test gradient (multiple outputs / some uncomputable ) - test gradient (multiple outputs / some uncomputable )
- test gradient (truncate_gradient) - test gradient (truncate_gradient)
- test gradient (force_gradient) - test gradient (force_gradient)
- test inplace map - test_gradient (taps past/future)
''' '''
......
...@@ -1020,13 +1020,18 @@ def local_advanced_indexing_crossentropy_onehot_grad(node): ...@@ -1020,13 +1020,18 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
# / softmax(x) # / softmax(x)
# which arises from the gradient of log(softmax(x))[arange(y.shape[0]), y] # which arises from the gradient of log(softmax(x))[arange(y.shape[0]), y]
# #
# TODO: explain variants of case 1.
# TODO: explain other variants of case 2.
# In some cases, in case 2., insted of "-1. like (AdvancedSubtensor...)", # In some cases, in case 2., insted of "-1. like (AdvancedSubtensor...)",
# we can have "-1. like ([-1] * AdvancedSubtensor...)". This case will be # we can have "-1. like ([-1] * AdvancedSubtensor...)". This case will be
# recognized too, but other variants, even with the same shape, might not # recognized too, but other variants, even with the same shape, might not
# (yet). # (yet).
# The base cases are realized when the gradient of the
# cost wrt the output is equal to 1. When this gradient
# has another (scalar) value, it typically appears in the
# second argument of AdvancedIncSubtensor. In that case, we
# try to extract it, and feed it as the output gradient of
# crossentropy_softmax_1hot_with_bias_dx.
# #
# N.B. Regarding clients -- This substitution is important for numerical stability, so we # N.B. Regarding clients -- This substitution is important for numerical stability, so we
# perform the substitution even when intermediate values have multiple clients. # perform the substitution even when intermediate values have multiple clients.
...@@ -1052,43 +1057,60 @@ def local_advanced_indexing_crossentropy_onehot_grad(node): ...@@ -1052,43 +1057,60 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
else: else:
return return
# Check that incr has the form -1./sm[arange(len(y)), y] # In the base case (output gradient = 1), incr is -1./sm[arange(len(y)), y]
# Here, we are looking for the AdvancedSubtensor term (sm[arange(len(y)), y]),
# the remaining of the expression will be used to compute outgrad_factor
# outgrad_factor will be constructed in 3 steps as follow:
# outgrad_factor = +/- 1 (initial sign)
# outgrad_factor *= numerator
# outgrad_factor /= denominator
adv_subtensor = None
outgrad_factor = 1.
# If there's a 'minus' sign before the whole expression, put it in
# outgrad_factor and iterate
if incr.owner and incr.owner.op == tensor.neg:
outgrad_factor = -1.
incr = incr.owner.inputs[0]
if incr.owner and incr.owner.op == tensor.true_div: if incr.owner and incr.owner.op == tensor.true_div:
num, denom = incr.owner.inputs num, denom = incr.owner.inputs
if not (hasattr(num, 'data') and numpy.all(num.data == -1)): # set outgrad_factor according to the numerator,
# it may be divided later
if hasattr(num, 'data') and numpy.all(num.data == -1):
# Base case, num is -1
outgrad_factor *= 1.
elif numpy.all(num.broadcastable):
# Otherwise, it should be a scalar
outgrad_factor *= -num
else:
return return
#else: OK
if not denom.owner: if not denom.owner:
return return
adv_subtensor = None
if isinstance(denom.owner.op, tensor.AdvancedSubtensor): if isinstance(denom.owner.op, tensor.AdvancedSubtensor):
# Base case
adv_subtensor = denom adv_subtensor = denom
mult_factor = 1 outgrad_factor /= 1.
elif denom.owner.op == tensor.mul: elif denom.owner.op == tensor.mul:
# Try to find the AdvancedSubtensor node mentionned above # Try to find the AdvancedSubtensor node mentionned above,
# For now, we support only the case where the other inputs # and a scalar that is equal to the output gradient
# of the "mul" node are of integer type, so we are sure it
# does not affect the gradient computation.
for i, input in enumerate(denom.owner.inputs): for i, input in enumerate(denom.owner.inputs):
if input.owner and isinstance(input.owner.op, tensor.AdvancedSubtensor): if input.owner and isinstance(input.owner.op, tensor.AdvancedSubtensor):
adv_subtensor = input
other_inputs = [in_ for (j, in_) in enumerate(denom.owner.inputs) if j!=i] other_inputs = [in_ for (j, in_) in enumerate(denom.owner.inputs) if j!=i]
if len(other_inputs) == 1: if len(other_inputs) == 1:
mult_factor = other_inputs[0] rest = other_inputs[0]
else: else:
mult_factor = tensor.mul(*[other_inputs]) rest = tensor.mul(*[other_inputs])
# Check that mult_factor is of integer type # Check that rest is a scalar
if mult_factor.dtype.startswith('int')\ if numpy.all(rest.broadcastable):
or mult_factor.dtype.startswith('uint'): adv_subtensor = input
#OK outgrad_factor /= rest
break break
else:
# That subtensor was not right
adv_subtensor = None
else: else:
return return
...@@ -1103,6 +1125,8 @@ def local_advanced_indexing_crossentropy_onehot_grad(node): ...@@ -1103,6 +1125,8 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
#else: OK #else: OK
else: else:
return return
else:
return
# Check that rows is arange(labels.shape[0]) # Check that rows is arange(labels.shape[0])
if not _check_rows_is_arange_len_labels(rows, labels): if not _check_rows_is_arange_len_labels(rows, labels):
...@@ -1147,7 +1171,7 @@ def local_advanced_indexing_crossentropy_onehot_grad(node): ...@@ -1147,7 +1171,7 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
if incr.owner and incr.owner.op == tensor.fill: if incr.owner and incr.owner.op == tensor.fill:
model, value = incr.owner.inputs model, value = incr.owner.inputs
adv_subtensor = None adv_subtensor = None
mult_factor = 1 outgrad_factor = None
if model.owner and isinstance(model.owner.op, tensor.AdvancedSubtensor): if model.owner and isinstance(model.owner.op, tensor.AdvancedSubtensor):
adv_subtensor = model adv_subtensor = model
else: else:
...@@ -1169,17 +1193,16 @@ def local_advanced_indexing_crossentropy_onehot_grad(node): ...@@ -1169,17 +1193,16 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
if not (maybe_log_sm is log_sm and maybe_rows is rows and maybe_labels is labels): if not (maybe_log_sm is log_sm and maybe_rows is rows and maybe_labels is labels):
return return
#else: OK #else: OK
else:
return
# In the base case, value is the constant '-1' # In the base case, value is the constant '-1'
if hasattr(value, 'data') and numpy.all(value.data == -1): if hasattr(value, 'data') and numpy.all(value.data == -1):
mult_factor = 1 outgrad_factor = 1.
# In the case of -1/denom, if denom is of integer type # Otherwise, it should be a scalar, and the output gradient
elif value.owner and value.owner.op == tensor.true_div: # would be -value
val_num, val_denom = value.owner.inputs elif numpy.all(value.broadcastable):
if hasattr(val_num, 'data') and numpy.all(val_num.data == -1): outgrad_factor = -value
if val_denom.dtype.startswith('int')\
or val_denom.dtype.startswith('uint'):
mult_factor = val_denom
else: else:
return return
...@@ -1204,11 +1227,10 @@ def local_advanced_indexing_crossentropy_onehot_grad(node): ...@@ -1204,11 +1227,10 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
# Dimension check before substitution # Dimension check before substitution
if labels.ndim == 1 and x_var.ndim == 2: if labels.ndim == 1 and x_var.ndim == 2:
if mult_factor is not None: if outgrad_factor is not None:
out_grad = tensor.fill(x_var[:,0], 1./mult_factor) out_grad = tensor.fill(x_var[:,0], outgrad_factor)
return [crossentropy_softmax_1hot_with_bias_dx(out_grad, sm, labels)] return [crossentropy_softmax_1hot_with_bias_dx(out_grad, sm, labels)]
else: else:
print 'mult_factor is None?'
return return
else: else:
return return
......
...@@ -346,7 +346,7 @@ def local_IncSubtensor_serialize(node): ...@@ -346,7 +346,7 @@ def local_IncSubtensor_serialize(node):
# #
# add(x, incsubtensor(b, c), incsubtensor(b, d)) # add(x, incsubtensor(b, c), incsubtensor(b, d))
# -> incsubtensor(incsubtensor(add(x,b), c), d) # -> incsubtensor(incsubtensor(add(x,b,b), c), d)
""" """
def movable(i): def movable(i):
...@@ -354,7 +354,8 @@ def local_IncSubtensor_serialize(node): ...@@ -354,7 +354,8 @@ def local_IncSubtensor_serialize(node):
return i.owner \ return i.owner \
and isinstance(i.owner.op, T.IncSubtensor) \ and isinstance(i.owner.op, T.IncSubtensor) \
and i.type == o_type \ and i.type == o_type \
and len(i.clients) == 1 and len(i.clients) == 1 \
and not i.owner.op.set_instead_of_inc
if node.op == T.add: if node.op == T.add:
o_type = node.outputs[0].type o_type = node.outputs[0].type
...@@ -383,7 +384,8 @@ def local_IncSubtensor_serialize(node): ...@@ -383,7 +384,8 @@ def local_IncSubtensor_serialize(node):
@gof.local_optimizer([None]) @gof.local_optimizer([None])
def local_inplace_setsubtensor(node): def local_inplace_setsubtensor(node):
if isinstance(node.op, T.IncSubtensor) and not node.op.inplace: if isinstance(node.op, T.IncSubtensor) and not node.op.inplace:
new_op = T.IncSubtensor(node.op.idx_list, inplace=True) new_op = T.IncSubtensor(node.op.idx_list, inplace=True, \
set_instead_of_inc=node.op.set_instead_of_inc)
new_node = new_op(*node.inputs) new_node = new_op(*node.inputs)
return [new_node] return [new_node]
return False return False
...@@ -932,8 +934,11 @@ def local_neg_neg(node): ...@@ -932,8 +934,11 @@ def local_neg_neg(node):
@register_specialize @register_specialize
@gof.local_optimizer([T.neg]) @gof.local_optimizer([T.neg])
def local_neg_div_neg(node): def local_neg_div_neg(node):
"""- (-a / b) -> a / b
Also performs - (c / b) -> ((-c) / b) when c is a scalar constant.
"""
if node.op == T.neg: if node.op == T.neg:
"""- (-a / b) -> a / b"""
if node.inputs[0].owner and node.inputs[0].owner.op == T.true_div: if node.inputs[0].owner and node.inputs[0].owner.op == T.true_div:
frac = node.inputs[0] frac = node.inputs[0]
num, denom = frac.owner.inputs num, denom = frac.owner.inputs
...@@ -942,6 +947,11 @@ def local_neg_div_neg(node): ...@@ -942,6 +947,11 @@ def local_neg_div_neg(node):
# No other clients of the original division # No other clients of the original division
new_num = num.owner.inputs[0] new_num = num.owner.inputs[0]
return [T.true_div(new_num, denom)] return [T.true_div(new_num, denom)]
elif numpy.all(num.broadcastable) and isinstance(num, gof.Constant):
if len(frac.clients) == 1:
new_num = -num.data
return [T.true_div(new_num, denom)]
@gof.local_optimizer([T.mul]) @gof.local_optimizer([T.mul])
def local_mul_zero(node): def local_mul_zero(node):
......
...@@ -265,7 +265,9 @@ def permutation_helper(random_state, n, shape): ...@@ -265,7 +265,9 @@ def permutation_helper(random_state, n, shape):
""" """
# n should be a 0-dimension array # n should be a 0-dimension array
assert n.shape == () assert n.shape == ()
n = n.item() # Note that it is important to convert `n` into an integer, because if it
# is a long, the numpy permutation function will crash on Windows.
n = int(n.item())
out_shape = list(shape) out_shape = list(shape)
out_shape.append(n) out_shape.append(n)
......
...@@ -35,7 +35,7 @@ class ScalarSharedVariable(SharedVariable, _tensor_py_operators): ...@@ -35,7 +35,7 @@ class ScalarSharedVariable(SharedVariable, _tensor_py_operators):
@shared_constructor @shared_constructor
def scalar_constructor(value, name=None, strict=False, dtype=None): def scalar_constructor(value, name=None, strict=False, dtype=None):
"""SharedVariable constructor for scalar values. Defaults to int64 or float64. """SharedVariable constructor for scalar values. Default: int64 or float64.
:note: We implement this using 0-d tensors for now. :note: We implement this using 0-d tensors for now.
...@@ -50,12 +50,14 @@ def scalar_constructor(value, name=None, strict=False, dtype=None): ...@@ -50,12 +50,14 @@ def scalar_constructor(value, name=None, strict=False, dtype=None):
else: else:
dtype = type(value).__name__ dtype = type(value).__name__
type = TensorType(dtype=dtype, broadcastable=[]) tensor_type = TensorType(dtype=dtype, broadcastable=[])
try: try:
# don't pass the dtype to asarray because we want this to fail if strict is True and the # Do not pass the dtype to asarray because we want this to fail if
# types do not match # strict is True and the types do not match.
rval = ScalarSharedVariable(type=type, value=numpy.asarray(value), name=name, strict=strict) rval = ScalarSharedVariable(type=tensor_type,
value=numpy.asarray(value),
name=name, strict=strict)
return rval return rval
except: except:
traceback.print_exc() traceback.print_exc()
......
...@@ -223,93 +223,13 @@ class T_CrossentropyCategorical1Hot(unittest.TestCase): ...@@ -223,93 +223,13 @@ class T_CrossentropyCategorical1Hot(unittest.TestCase):
assert not has_softmax assert not has_softmax
assert not has_softmaxdx assert not has_softmaxdx
def test_argmax_pushdown(): def test_get_rid_of_advanced_indexing_version_of_xent(self):
x = tensor.dmatrix()
env = gof.Env(
[x],
[tensor.max(softmax(tensor.exp(tensor.tanh(sigmoid(x)))))])
theano.compile.mode.optdb.query(
theano.compile.mode.OPT_FAST_RUN).optimize(env)
#print 'AFTER'
#for node in env.toposort():
#print node.op
assert len(env.toposort()) == 2 # an output_guard is second
assert env.toposort()[0].op == tensor._max_and_argmax
def test_argmax_pushdown_bias():
x = tensor.dmatrix()
b = tensor.dvector()
env = gof.Env(
[x,b],
[tensor.max(softmax_with_bias(x, b))])
theano.compile.mode.optdb.query(
theano.compile.mode.OPT_FAST_RUN).optimize(env)
print 'AFTER'
for node in env.toposort():
print node.op
assert len(env.toposort()) == 4
assert isinstance(env.toposort()[0].op, tensor.DimShuffle)
assert isinstance(env.toposort()[1].op, tensor.Elemwise)
assert isinstance(env.toposort()[2].op, tensor.MaxAndArgmax)
assert str(env.toposort()[3].op) == 'OutputGuard'
def test_asymptotic_32():
"""
This test makes sure that our functions behave sensibly when huge values are present
"""
#TODO: consider adding the optimization of crossentropy into the current mode for the
# purpose of running this test
for dtype in 'float32', 'float64':
if dtype == 'float32':
x = tensor.fmatrix()
x2 = tensor.fvector()
else:
x = tensor.dmatrix()
x2 = tensor.dvector()
y = tensor.lvector()
c = categorical_crossentropy(softmax(x+x2), y)
f = theano.function([x,y,x2], [c.sum(), tensor.grad(c.sum(), x)], mode='FAST_RUN')
if 0:
for i, n in enumerate( f.maker.env.toposort()):
print i, n
xval = numpy.zeros((5, 5), dtype=dtype)
x2val = numpy.zeros(5, dtype=xval.dtype)
for i in xrange(100):
cval, gxval = f(xval, numpy.arange(5), x2val)
xval -= 100.3 * gxval
#print cval, gxval
assert cval == 0 # no problem going to zero error
#what about when x gets really big?
xval = numpy.zeros((5, 5), dtype=dtype)
x2val = numpy.zeros(5, dtype=xval.dtype)
for i in xrange(100):
cval, gxval = f(xval, numpy.arange(5), x2val)
xval += 100000.3 * gxval
#print cval, gxval
assert cval > 61750000
assert gxval[0,0] == -1.0
assert gxval[0,1] == 0.25
def test_get_rid_of_advanced_indexing_version_of_xent():
verbose = 0 verbose = 0
if 0: mode = 'DEBUG_MODE' # TODO: add the optimization in FAST_COMPILE?
else: mode = 'FAST_RUN' # In the mean time, run it as 'FAST_RUN' instead
mode = theano.compile.mode.get_default_mode()
if mode == 'FAST_COMPILE':
mode = 'FAST_RUN'
rng = numpy.random.RandomState(utt.fetch_seed()) rng = numpy.random.RandomState(utt.fetch_seed())
...@@ -326,13 +246,15 @@ def test_get_rid_of_advanced_indexing_version_of_xent(): ...@@ -326,13 +246,15 @@ def test_get_rid_of_advanced_indexing_version_of_xent():
print i, node print i, node
# Last node should be the output # Last node should be the output
print i, pprint(node.outputs[0]) print i, pprint(node.outputs[0])
print
## Basic case ## Basic case
expressions = [ expressions = [
T.sum(-T.log(softmax(x)[T.arange(y.shape[0]), y])), T.sum(-T.log(softmax(x)[T.arange(y.shape[0]), y])),
-T.sum(T.log(softmax(x)[T.arange(y.shape[0]), y])), -T.sum(T.log(softmax(x)[T.arange(y.shape[0]), y])),
-T.sum(T.log(softmax(x))[T.arange(y.shape[0]), y]), -T.sum(T.log(softmax(x))[T.arange(y.shape[0]), y]),
T.sum(-T.log(softmax(x))[T.arange(y.shape[0]), y])] T.sum(-T.log(softmax(x))[T.arange(y.shape[0]), y])
]
for expr in expressions: for expr in expressions:
# Verify the optimizer worked on the expressions # Verify the optimizer worked on the expressions
...@@ -401,6 +323,187 @@ def test_get_rid_of_advanced_indexing_version_of_xent(): ...@@ -401,6 +323,187 @@ def test_get_rid_of_advanced_indexing_version_of_xent():
g(x_val, b_val, y_val) g(x_val, b_val, y_val)
def test_scale_cost(self):
# TODO: add the optimization in FAST_COMPILE?
# In the mean time, run it as 'FAST_RUN' instead
mode = theano.compile.mode.get_default_mode()
if mode == 'FAST_COMPILE':
mode = 'FAST_RUN'
rng = numpy.random.RandomState(utt.fetch_seed())
x_val = rng.randn(3,5)
b_val = rng.randn(5)
y_val = numpy.asarray([2,4,1])
x = T.dmatrix('x')
b = T.dvector('b')
y = T.lvector('y')
a = T.dscalar('a')
def print_graph(func):
for i, node in enumerate(func.maker.env.toposort()):
print i, node
# Last node should be the output
print i, pprint(node.outputs[0])
def validate_fn_graph(func):
# The graph of the function should not have softmax anymore
has_cx1hot = False
has_softmax = False
for node in func.maker.env.toposort():
if node.op == crossentropy_softmax_argmax_1hot_with_bias:
has_cx1hot = True
if node.op == softmax:
has_softmax = True
assert has_cx1hot
assert not has_softmax
def validate_grad_graph(func):
# The graph of the gradient should not have softmaxgrad anymore
has_cx1hotdx = False
has_softmax = False
has_softmaxdx = False
for node in func.maker.env.toposort():
if node.op == crossentropy_softmax_1hot_with_bias_dx:
has_cx1hotdx = True
if node.op == softmax:
has_softmax = True
if node.op == softmax_grad:
has_softmaxdx = True
assert has_cx1hotdx
assert has_softmax
assert not has_softmaxdx
## Cases to test
expressions = [
a * T.sum(-T.log(softmax(x)[T.arange(y.shape[0]), y])),
-a * T.sum(T.log(softmax(x)[T.arange(y.shape[0]), y])),
a * (-T.sum(T.log(softmax(x)[T.arange(y.shape[0]), y]))),
a * T.sum(T.log(softmax(x)[T.arange(y.shape[0]), y])),
a * T.sum(-T.log(softmax(x))[T.arange(y.shape[0]), y]),
-a * T.sum(T.log(softmax(x))[T.arange(y.shape[0]), y]),
a * (-T.sum(T.log(softmax(x))[T.arange(y.shape[0]), y])),
a * T.sum(T.log(softmax(x))[T.arange(y.shape[0]), y]),
a * T.mean(-T.log(softmax(x)[T.arange(y.shape[0]), y])),
-a * T.mean(T.log(softmax(x)[T.arange(y.shape[0]), y])),
a * (-T.mean(T.log(softmax(x)[T.arange(y.shape[0]), y]))),
a * T.mean(T.log(softmax(x)[T.arange(y.shape[0]), y])),
a * T.mean(-T.log(softmax(x))[T.arange(y.shape[0]), y]),
-a * T.mean(T.log(softmax(x))[T.arange(y.shape[0]), y]),
a * (-T.mean(T.log(softmax(x))[T.arange(y.shape[0]), y])),
a * T.mean(T.log(softmax(x))[T.arange(y.shape[0]), y]),
]
for expr in expressions:
# Verify the optimizer worked on the expressions
f = theano.function([x,y,a], expr, mode=mode)
assert 5 <= len(f.maker.env.toposort()) <= 10
validate_fn_graph(f)
f(x_val, y_val, 0.1)
# Verify the gradient wrt x
g = theano.function([x,y,a], T.grad(expr, x), mode=mode)
assert 5 <= len(g.maker.env.toposort()) <= 12
validate_grad_graph(g)
g(x_val, y_val, 0.1)
# Verify the gradient when providing output gradient
h = theano.function([x,y,a], T.grad(expr, x, g_cost=a*x.sum()), mode=mode)
assert 8 <= len(h.maker.env.toposort()) <= 17
validate_grad_graph(h)
h(x_val, y_val, 0.1)
def test_argmax_pushdown():
x = tensor.dmatrix()
env = gof.Env(
[x],
[tensor.max(softmax(tensor.exp(tensor.tanh(sigmoid(x)))))])
theano.compile.mode.optdb.query(
theano.compile.mode.OPT_FAST_RUN).optimize(env)
#print 'AFTER'
#for node in env.toposort():
#print node.op
assert len(env.toposort()) == 2 # an output_guard is second
assert env.toposort()[0].op == tensor._max_and_argmax
def test_argmax_pushdown_bias():
x = tensor.dmatrix()
b = tensor.dvector()
env = gof.Env(
[x,b],
[tensor.max(softmax_with_bias(x, b))])
theano.compile.mode.optdb.query(
theano.compile.mode.OPT_FAST_RUN).optimize(env)
print 'AFTER'
for node in env.toposort():
print node.op
assert len(env.toposort()) == 4
assert isinstance(env.toposort()[0].op, tensor.DimShuffle)
assert isinstance(env.toposort()[1].op, tensor.Elemwise)
assert isinstance(env.toposort()[2].op, tensor.MaxAndArgmax)
assert str(env.toposort()[3].op) == 'OutputGuard'
def test_asymptotic_32():
"""
This test makes sure that our functions behave sensibly when huge values are present
"""
#TODO: consider adding the optimization of crossentropy into the current mode for the
# purpose of running this test
for dtype in 'float32', 'float64':
if dtype == 'float32':
x = tensor.fmatrix()
x2 = tensor.fvector()
else:
x = tensor.dmatrix()
x2 = tensor.dvector()
y = tensor.lvector()
c = categorical_crossentropy(softmax(x+x2), y)
f = theano.function([x,y,x2], [c.sum(), tensor.grad(c.sum(), x)], mode='FAST_RUN')
if 0:
for i, n in enumerate( f.maker.env.toposort()):
print i, n
xval = numpy.zeros((5, 5), dtype=dtype)
x2val = numpy.zeros(5, dtype=xval.dtype)
for i in xrange(100):
cval, gxval = f(xval, numpy.arange(5), x2val)
xval -= 100.3 * gxval
#print cval, gxval
assert cval == 0 # no problem going to zero error
#what about when x gets really big?
xval = numpy.zeros((5, 5), dtype=dtype)
x2val = numpy.zeros(5, dtype=xval.dtype)
for i in xrange(100):
cval, gxval = f(xval, numpy.arange(5), x2val)
xval += 100000.3 * gxval
#print cval, gxval
assert cval > 61750000
assert gxval[0,0] == -1.0
assert gxval[0,1] == 0.25
# hint - call the argmax push-down optimization first too # hint - call the argmax push-down optimization first too
......
...@@ -283,12 +283,12 @@ class T_RandomStreams(unittest.TestCase): ...@@ -283,12 +283,12 @@ class T_RandomStreams(unittest.TestCase):
assert numpy.all(fn_val1 == numpy_val1) assert numpy.all(fn_val1 == numpy_val1)
def test_shuffle_row_elements(self): def test_shuffle_row_elements(self):
"""Test that RandomStreams.shuffle_row_elements generates the right results""" """Ensure RandomStreams.shuffle_row_elements generates right results"""
# Check over two calls to see if the random state is correctly updated. # Check over two calls to see if the random state is correctly updated.
# On matrices, for each row, the elements of that row should be
# On matrices, for each row, the elements of that row should be shuffled. # shuffled.
# Note that this differs from numpy.random.shuffle, where all the elements # Note that this differs from numpy.random.shuffle, where all the
# of the matrix are shuffled. # elements of the matrix are shuffled.
mm = Module() mm = Module()
mm.random = RandomStreams(234) mm.random = RandomStreams(234)
m_input = tensor.dmatrix() m_input = tensor.dmatrix()
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论