提交 45f07e93 authored 作者: James Bergstra's avatar James Bergstra

merge

......@@ -101,14 +101,13 @@ case if ``borrow`` was True, the thunk would be allowed to reuse (or
Compiled libraries are stored within a specific compilation directory,
which by default is set to ``$HOME/.theano/compiledir_xxx``, where
``xxx`` identifies the platform. It may be manually set to a different
location either by setting the ``THEANO_COMPILEDIR`` environment variable,
the ``THEANO_BASE_COMPILEDIR`` environment variable
or by calling ``theano.gof.compiledir.set_compiledir(..)`` within your
Python script.
location either by setting :attr:`config.compiledir` or
:attr:`config.base_compiledir`, either within your Python script or by
using one of the configuration mechanisms described in :mod:`config`.
The compile cache is based upon the C++ code of the graph to be compiled.
So, if you change compilation environment variables, such as
``THEANO_BLAS_LDFLAGS``, you will need to manually remove your compile cache,
So, if you change compilation configuration variables, such as
:attr:`config.blas.ldflags`, you will need to manually remove your compile cache,
using ``Theano/bin/theano-compiledir clear``
Theano also implements a lock mechanism that prevents
......
......@@ -260,21 +260,22 @@ Example:
>>> m = theano.Module()
>>> minstance = m.make(mode='DEBUG_MODE')
Whenever possible, unit tests should omit this parameter. Leaving-out
the mode will ensure that unit tests use the default mode (defined in
compile.mode.default_mode). This default_mode is set to the
THEANO_DEFAULT_MODE environment variable, if it is present. If not, it
defaults to 'FAST_RUN'.
This allows the user to easily switch the mode in which unittests are
Whenever possible, unit tests should omit this parameter. Leaving
out the mode will ensure that unit tests use the default mode
(defined in compile.mode.default_mode). This default_mode is set to
the configuration variable :attr:`config.mode`, which defaults to
'FAST_RUN', and can be set by various mechanisms (see :mod:`config`).
In particular, the enviromnment variable :envvar:`THEANO_FLAGS`
allows the user to easily switch the mode in which unittests are
run. For example to run all tests in all modes from a BASH script,
type this:
.. code-block:: bash
THEANO_DEFAULT_MODE=FAST_COMPILE nosetests
THEANO_DEFAULT_MODE=FAST_RUN nosetests
THEANO_DEFAULT_MODE=DEBUG_MODE nosetests
THEANO_FLAGS='mode=FAST_COMPILE' nosetests
THEANO_FLAGS='mode=FAST_RUN' nosetests
THEANO_FLAGS='mode=DEBUG_MODE' nosetests
Using Random Values in Test Cases
---------------------------------
......@@ -299,14 +300,12 @@ do the following:
The behaviour of seed_rng is as follows:
* If an explicit seed is given, it will be used for seending numpy's rng.
* If not, it will try to get a seed from the THEANO_UNITTEST_SEED variable.
* If an explicit seed is given, it will be used for seeding numpy's rng.
* If THEANO_UNITTEST_SEED is set to "random", it will seed the
rng. with None, which is equivalent to seeding with a random seed.
* If not, it will use ``config.unittest.rseed`` (its default value is 666).
* If THEANO_UNITTEST_SEED is not defined, it will use a default seed of 666.
* If config.unittest.rseed is set to "random", it will seed the rng with
None, which is equivalent to seeding with a random seed.
The main advantage of using unittest_tools.seed_rng is that it allows
......@@ -317,7 +316,8 @@ a higher confidence that the variables are correct), while still
making sure unittests are deterministic.
Users who prefer their unittests to be random (when run on their local
machine) can simply set THEANO_UNITTEST_SEED to 'random'.
machine) can simply set ``config.unittest.rseed`` to 'random' (see
:mod:`config`).
Similarly, to provide a seed to numpy.random.RandomState, simply use:
......
......@@ -40,7 +40,7 @@ Roughly in order of what you'll want to check out:
* :ref:`internal` -- How to maintaining Theano, LISA-specific tips, and more...
* `API <api/>`_ -- The automatically-generated API
You can download the latest `PDF documentation <http://deeplearning.net/theanodoc/theano.pdf>`_, rather than reading it online.
You can download the latest `PDF documentation <http://deeplearning.net/software/theano/theano.pdf>`_, rather than reading it online.
Community
=========
......
......@@ -339,9 +339,9 @@ Generating the documentation
----------------------------
You can read the latest HTML documentation `here
<http://deeplearning.net/theanodoc>`__.
<http://deeplearning.net/software/theano>`__.
You can download the latest PDF documentation `here
<http://deeplearning.net/theanodoc/theano.pdf>`__.
<http://deeplearning.net/software/theano/theano.pdf>`__.
We recommend you look at the documentation on the website, since it
will be more current than the documentation included with the package.
......
......@@ -8,8 +8,12 @@ LISA Labo specific instructions
Tips for running at LISA
------------------------
Use the fast BLAS library that Fred installed, by setting
`THEANO_BLAS_LDFLAGS=-lgoto`.
Shell configuration files ``/opt/lisa/os/.local.{bash,csh}rc`` should define
:envvar:`THEANORC` to include ``/opt/lisa/os/.local.theanorc`` as a
configuration file.
``/opt/lisa/os/.local.theanorc`` should include the right default values for
the lab, in particular, ``blas.ldflags`` should contain '-lgoto'.
Tips for running on a cluster
-----------------------------
......
......@@ -14,7 +14,8 @@ To run Theano on the Mammouth cluster, follow these simple steps:
Perhaps even put this in your ``.bashrc``
* ``set THEANO_BLAS_LDFLAGS='-lmkl -lguide -fopenmp'``
* set ``config.blas.ldflags`` to ``'-lmkl -lguide -fopenmp'``
(see :mod:`config` to know how)
Note: the -lguide flag works, however the fix should probably be considered temporary.
Intel has deprecated libguide.so in favor of the newer library libiomp5.so. However,
......
......@@ -145,7 +145,7 @@ Then it executes something like
.. code-block:: bash
THEANO_UNITTEST_SEED=<SEED> THEANO_DEFAULT_MODE=DEBUG_MODE /usr/bin/nosetests --with-coverage --cover-package=theano --cover-package=pylearn
THEANO_FLAGS='unittests.rseed=<SEED>,mode=DEBUG_MODE' /usr/bin/nosetests --with-coverage --cover-package=theano --cover-package=pylearn
in the updated ``theano`` directory.
The output is emailed automatically to one of the developers.
......
......@@ -130,7 +130,7 @@ Getting started
A PDF version of the online documentation may be found `here
<http://deeplearning.net/theanodoc/theano.pdf>`_.
<http://deeplearning.net/software/theano/theano.pdf>`_.
Contact us
......
......@@ -35,7 +35,7 @@ DebugMode can be used as follows:
f(0)
f(7)
It can also be used by setting an environment variable ``THEANO_DEFAULT_MODE=DEBUG_MODE``.
It can also be used by setting the configuration variable :attr:`config.mode`.
It can also be used by passing a DebugMode instance as the mode, as in
>>> f = theano.function([x], 10*x, mode=DebugMode(check_c_code=False))
......@@ -78,47 +78,47 @@ Reference
Each of these exceptions inherits from the more generic `DebugModeError`.
If there are no internal errors, this mode behaves like FAST_RUN or FAST_COMPILE, but takes
a little longer and uses more memory.
a little longer and uses more memory.
If there are internal errors, this mode will raise an `DebugModeError` exception.
.. attribute:: stability_patience = config.THEANO_DEBUGMODE_PATIENCE
.. attribute:: stability_patience = config.DebugMode.patience
When checking for the stability of optimization, recompile the graph this many times.
Default 10.
.. attribute:: check_c_code = config.THEANO_DEBUGMODE_CHECK_C
.. attribute:: check_c_code = config.DebugMode.check_c
Should we evaluate (and check) the `c_code` implementations?
``True`` -> yes, ``False`` -> no.
Default yes.
.. attribute:: check_py_code = config.THEANO_DEBUGMODE_CHECK_PY
.. attribute:: check_py_code = config.DebugMode.check_py
Should we evaluate (and check) the `perform` implementations?
``True`` -> yes, ``False`` -> no.
Default yes.
.. attribute:: check_isfinite = config.THEANO_DEBUGMODE_CHECK_FINITE
.. attribute:: check_isfinite = config.DebugMode.check_finite
Should we check for (and complain about) ``NaN``/``Inf`` ndarray elements?
``True`` -> yes, ``False`` -> no.
Default yes.
.. attribute:: require_matching_strides = config.THEANO_DEBUGMODE_CHECK_STRIDES
.. attribute:: require_matching_strides = config.DebugMode.check_strides
Check for (and complain about) Ops whose python and C
outputs are ndarrays with different strides. (This can catch bugs, but
is generally overly strict.)
is generally overly strict.)
0 -> no check, 1 -> warn, 2 -> err.
Default warn.
.. method:: __init__(self, optimizer='fast_run', stability_patience=None, check_c_code=None, check_py_code=None, check_isfinite=None, require_matching_strides=None, linker=None)
......@@ -128,7 +128,7 @@ Reference
If any of these arguments (except optimizer) is not None, it overrides the class default.
The linker arguments is not used. It is set their to allow Mode.requiring() and some other fct to work with DebugMode too.
The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``)
is quite strict, and can raise several different Exception types.
......
......@@ -134,7 +134,7 @@ Reference
about how output variables should be returned.
The default is typically 'FAST_RUN' but this can be changed in
:doc:`theano.config <../config>` or via :envvar:`THEANO_DEFAULT_MODE`. The mode
:doc:`theano.config <../config>`. The mode
argument controls the sort of optimizations that will be applied to the
graph, and the way the optimized graph will be evaluated.
......
......@@ -10,21 +10,21 @@
Guide
=====
The ``mode`` parameter to :func:`theano.function`` controls how the
The ``mode`` parameter to :func:`theano.function` controls how the
inputs-to-outputs graph is transformed into a callable object.
Theano defines the following modes by name:
- ``FAST_COMPILE``: Apply just a few optimizations, but use C op implementations where possible.
- ``FAST_RUN``: Apply all optimizations, and use C op implementations where possible.
- ``DEBUG_MODE``: Verify the correctness of all optimizations, and compare C and python
- ``'FAST_COMPILE'``: Apply just a few graph optimizations, but use C implementations where possible.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python
implementations. This mode can take much longer than the other modes,
but can identify many kinds of problems.
The default mode is typically 'FAST_RUN', but it can be controlled via the
environment variable 'THEANO_DEFAULT_MODE', which can in turn be overridden by
The default mode is typically ``FAST_RUN``, but it can be controlled via the
configuration variable :attr:`config.mode`, which can in turn be overridden by
setting ``theano.compile.mode.default_mode`` directly, which can in turn be
overridden by passing the keyword argument to ``theano.function``.
overridden by passing the keyword argument to :func:`theano.function`.
.. TODO::
......
.. _libdoc_floatX:
=======================================================================
:mod:`floatX` -- Switching Between 'float32' and 'float64'
=======================================================================
.. module:: floatX
:platform: Unix, Windows
:synopsis: easy switching between float32 and float64
.. moduleauthor:: LISA
Guide
=====
On the CPU, 'float32' computations are often twice as fast as 'float64'
and are half the size.
On GPUs the speed difference between 'float32'`` and 'float64' is much greater.
Often we develop our code using double-precision expressions, and then wonder if
we might get the same answer much more quickly with single-precision arithmetic.
If we have used ``tensor.dmatrix`` and ``tensor.dvector`` and so on throughout
our code, it could be tedious to switch to single-precision Variables. To make
switching precisions easier, Theano provides the ``floatX`` module.
>>> from theano.floatX import xmatrix, xvector, xtensor4
>>> import numpy
>>> a = xvector('a')
>>> b = xmatrix()
>>> c = xtensor4()
These calls are identical to ``dvector``, ``dmatrix``, and ``dtensor4`` by default, but a
single environment variable can switch them to ``fvector``, ``fmatrix`` and ``ftensor4``.
You can set the floatX precision via ``floatX`` in the :envvar:`THEANO_FLAGS`.
It defaults to ``'float64'``. To set it to ``'float32'`` in *bash* for example, type ``export THEANO_FLAGS=floatX=float64``.
To set it from within your program call :func:`set_floatX`
The current floatX precision is stored in ``theano.config.floatX`` as a string.
Its value is either 'float32' or 'float64'.
So it is easy to allocate a numpy vector of the floatX dtype.
>>> import theano.config as config
>>> print config.floatX # either 'float32' or 'float64'
>>> x = numpy.asarray([1,2,3], dtype=config.floatX)
Reference
==========
.. function:: xscalar(name=None)
Alias for either :func:`dscalar` or :func:`fscalar`
.. function:: xvector(name=None)
Alias for either :func:`dvector` or :func:`fvector`
.. function:: xmatrix(name=None)
Alias for either :func:`dmatrix` or :func:`fmatrix`
.. function:: xrow(name=None)
Alias for either :func:`drow` or :func:`frow`
.. function:: xcol(name=None)
Alias for either :func:`dcol` or :func:`fcol`
.. function:: xtensor3(name=None)
Alias for either :func:`dtensor3` or :func:`ftensor3`
.. function:: xtensor4(name=None)
Alias for either :func:`dtensor4` or :func:`ftensor4`
.. function:: set_floatX(dtype=config.floatX)
Reset the :func:`xscalar`, ... :func:`xtensor4` aliases to return Variables with given dtype.
This is called at import-time when setting floatX in :envvar:`THEANO_FLAGS`.
......@@ -263,7 +263,7 @@ them perfectly, but a dscalar otherwise.
.. note::
When config.floatX==float32 (see :module:`config`), then Python floats
When config.floatX==float32 (see :mod:`config`), then Python floats
are stored instead as single-precision floats.
For fine control of this rounding policy, see
......
......@@ -22,7 +22,7 @@ Theano defines the following modes by name:
but can identify many kinds of problems.
The default mode is typically ``FAST_RUN``, but it can be controlled via
the environment variable ``THEANO_DEFAULT_MODE``, which can in turn be
the configuration variable :attr:`config.mode`, which can in turn be
overridden by setting `theano.compile.mode.default_mode` directly,
which can in turn be overridden by passing the keyword argument to
:func:`theano.function <function.function>`.
......@@ -30,7 +30,6 @@ which can in turn be overridden by passing the keyword argument to
================= =============================================================== ===============================================================================
short name Full constructor What does it do?
================= =============================================================== ===============================================================================
(default) ``compile.mode.Mode(linker='py', optimizer=None)`` Python implementations with zero graph modifications.
FAST_COMPILE ``compile.mode.Mode(linker='c|py', optimizer='fast_compile')`` C implementations where available, quick and cheap graph transformations
FAST_RUN ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations.
DEBUG_MODE ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
......
......@@ -16,7 +16,7 @@ Setting up CUDA
The first thing you'll need for Theano to use your GPU is Nvidia's
GPU-programming toolchain. You should install at least the CUDA driver and the CUDA Toolkit, as
:ref:`described here <http://www.nvidia.com/object/cuda_get.html>`. The CUDA
`described here <http://www.nvidia.com/object/cuda_get.html>`_. The CUDA
Toolkit installs a folder on your computer with subfolders *bin*, *lib*,
*include*, and some more too. (Sanity check: The *bin* subfolder should contain an *nvcc*
program which is the compiler for GPU code.) This folder is called the *cuda
......
......@@ -221,12 +221,8 @@ predefined_modes = {'FAST_COMPILE': FAST_COMPILE,
'SANITY_CHECK': SANITY_CHECK}
##
# The default mode used by functions and modules is read from the environment
# variable THEANO_DEFAULT_MODE. Unit tests will run using this value. If the env. var.
# is not set, it will default to 'FAST_RUN'
# The default mode used by functions and modules is read from the configuration.
# keep default_mode.optimizer==default_optimizer and default_mode.linker==default_linker!
##
default_mode = config.mode
def get_mode(string):
......
......@@ -319,10 +319,10 @@ class ProfileMode(Mode):
register_mode('PROFILE_MODE',ProfileMode())
def atexit_print_default_profile_mode():
"""Print the summary of the predefied mode PROFILE_MODE if used.
"""Print the summary of the predefined mode PROFILE_MODE if used.
This all to have the summary printed at exit when we do
THEANO_DEFAULT_MODE=PROFILE_MODE
This all to have the summary printed at exit when
config.mode=PROFILE_MODE
"""
prof_mode=predefined_modes["PROFILE_MODE"]
if prof_mode.local_time[0]>0:
......
......@@ -18,16 +18,19 @@ def debug(*msg):
# Compile cuda_ndarray.cu
# This need that nvcc (part of cuda) is installed. If it is not, a warning is
# printed and this module will not be working properly (we set `enable_cuda`
# printed and this module will not be working properly (we set `cuda_available`
# to False).
# This variable is True by default, and set to False if something goes wrong
# when trying to initialize cuda.
enable_cuda = True
cuda_available = True
# Global variable to avoid displaying the same warning multiple times.
cuda_warning_is_displayed = False
#This variable is set to True when we enable the cuda.(i.e. when use() is called)
cuda_enabled = False
# Code factorized within a function so that it may be called from multiple
# places (which is not currently the case, but may be useful in the future).
def set_cuda_disabled():
......@@ -38,8 +41,8 @@ def set_cuda_disabled():
Note that there is no point calling this function from outside of
`cuda.__init__`, since it has no effect once the module is loaded.
"""
global enable_cuda, cuda_warning_is_displayed
enable_cuda = False
global cuda_available, cuda_warning_is_displayed
cuda_available = False
if not cuda_warning_is_displayed:
cuda_warning_is_displayed = True
warning('Cuda is disabled, cuda-based code will thus not be '
......@@ -70,7 +73,7 @@ try:
if not nvcc_compiler.is_nvcc_available():
set_cuda_disabled()
if enable_cuda:
if cuda_available:
code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read()
if not os.path.exists(cuda_ndarray_loc):
......@@ -84,7 +87,7 @@ except Exception, e:
error( "Failed to compile cuda_ndarray.cu: %s" % str(e))
set_cuda_disabled()
if enable_cuda:
if cuda_available:
#check if their is an old cuda_ndarray that was loading instead of the one we compiled!
import cuda_ndarray.cuda_ndarray
if os.path.join(config.compiledir,'cuda_ndarray','cuda_ndarray.so')!=cuda_ndarray.cuda_ndarray.__file__:
......@@ -104,7 +107,8 @@ if enable_cuda:
import cuda_ndarray
def use(device=config.device):
def use(device):
global cuda_enabled, enabled_cuda
if device.startswith('gpu'):
device = int(device[3:])
elif device == 'cpu':
......@@ -122,8 +126,10 @@ def use(device=config.device):
gpu_init(device)
handle_shared_float32(True)
use.device_number = device
cuda_enabled = True
except RuntimeError, e:
_logger.warning("ERROR: Not using GPU. Initialisation of device %i failed. %s" %(device, e))
enabled_cuda = False
elif use.device_number != device:
logging.getLogger('theano.sandbox.cuda').warning("WARNING: ignoring call to use(%s), GPU number %i is already in use." %(str(device), use.device_number))
optdb.add_tags('gpu',
......@@ -144,5 +150,6 @@ def handle_shared_float32(tf):
else:
raise NotImplementedError('removing our handler')
if enable_cuda and config.device.startswith('gpu'):
use()
if cuda_available and config.device.startswith('gpu'):
use(config.device)
......@@ -381,11 +381,11 @@ def local_gpu_conv(node):
gpu_conv = GpuConvOp_from_ConvOp(node.op)
return [host_from_gpu(gpu_conv(gpu_from_host(img), gpu_from_host(kern)))]
import theano.sandbox.downsample
import theano.tensor.signal.downsample as downsample
@register_opt()
@local_optimizer([])
def local_gpu_downsample_factor_max(node):
if isinstance(node.op, theano.sandbox.downsample.DownsampleFactorMax):
if isinstance(node.op, downsample.DownsampleFactorMax):
x, = node.inputs
if (x.owner and x.owner.op == host_from_gpu):
gpu_ds = GpuDownsampleFactorMax(node.op.ds, node.op.ignore_border)
......@@ -394,7 +394,7 @@ def local_gpu_downsample_factor_max(node):
@register_opt()
@local_optimizer([])
def local_gpu_downsample_factor_max_grad(node):
if isinstance(node.op, theano.sandbox.downsample.DownsampleFactorMaxGrad):
if isinstance(node.op, downsample.DownsampleFactorMaxGrad):
x,z,gz = node.inputs
if (x.owner and x.owner.op == host_from_gpu):
gpu_ds_grad = GpuDownsampleFactorMaxGrad(node.op.ds, node.op.ignore_border)
......
......@@ -11,7 +11,7 @@ import theano.tensor as T
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.enable_cuda == False:
if cuda_ndarray.cuda_enabled == False:
raise SkipTest('Optional package cuda disabled')
import theano.sandbox.cuda as tcn
......
......@@ -270,7 +270,7 @@ def test_bench_elemwise(n_iter=1000, **kwargs):
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.enable_cuda == False:
if cuda_ndarray.cuda_enabled == False:
raise SkipTest('Optional package cuda disabled')
import theano.sandbox.cuda
theano.sandbox.cuda.use()
......
......@@ -8,12 +8,12 @@ import numpy
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.enable_cuda == False:
if cuda_ndarray.cuda_enabled == False:
raise SkipTest('Optional package cuda disabled')
import theano.sandbox.cuda as tcn
from theano.sandbox.downsample import DownsampleFactorMax
from theano.tensor.signal.downsample import DownsampleFactorMax
import theano.compile.mode
......
......@@ -5,7 +5,7 @@ import theano
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.enable_cuda == False:
if cuda_ndarray.cuda_enabled == False:
raise SkipTest('Optional package cuda disabled')
cuda_tensor4 = cuda_ndarray.CudaNdarrayType([False]*4)
......
......@@ -3,7 +3,7 @@ import theano
import theano.sandbox.cuda as cuda_ndarray
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
if cuda_ndarray.enable_cuda == False:
if cuda_ndarray.cuda_enabled == False:
raise SkipTest('Optional package cuda disabled')
import numpy
......
......@@ -7,14 +7,14 @@ from theano import tensor
import theano.tensor.nnet
import theano.sandbox.conv
import theano.sandbox.downsample
import theano.tensor.signal.downsample as downsample
import numpy
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.enable_cuda == False:
if cuda_ndarray.cuda_enabled == False:
raise SkipTest('Optional package cuda disabled')
import theano.sandbox.cuda as tcn
......@@ -307,7 +307,7 @@ def run_conv_nnet2_classif(use_gpu, isize, ksize, n_batch, n_iter,
conv_op.set_flops()
conv_op1.set_flops()
ds_op = theano.sandbox.downsample.DownsampleFactorMax((2,2), ignore_border=False)
ds_op = downsample.DownsampleFactorMax((2,2), ignore_border=False)
if downsample_ops:
hid = tensor.tanh(ds_op(conv_op(x, w0)+b0.dimshuffle((0,'x','x'))))
else:
......
......@@ -8,7 +8,7 @@ import numpy
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.enable_cuda == False:
if cuda_ndarray.cuda_available == False:
raise SkipTest('Optional package cuda disabled')
import theano.compile.mode
......
""" Ops for downsampling images.
import sys
print >> sys.stderr, "DEPRECATION: theano.sandbox.downsample is deprecated. Use theano.tensor.signal.downsample instead."
Planned:
DownsampleFactorMax, DownsampleAvg, DownsampleSoftmax.
"""
#This file should move along with conv.py
from theano import gof, Op, tensor, Variable, Apply
from theano.printing import Print
import numpy, theano
import __builtin__
class DownsampleFactorMaxGrad(Op):
def __init__(self, ds, ignore_border):
self.ds = tuple(ds)
self.ignore_border = ignore_border
def __eq__(self, other):
return type(self) == type(other) and self.ds == other.ds and self.ignore_border == other.ignore_border
def __hash__(self):
return hash(type(self)) ^ hash(self.ds) ^ hash(self.ignore_border)
def __str__(self):
return '%s{%s,%s}' % (self.__class__.__name__, self.ds, self.ignore_border)
def make_node(self, x, maxout, gz):
# make_node should only be called by the grad function of DownsampleFactorMax,
# so these asserts should not fail.
assert isinstance(x, Variable) and x.ndim==4
assert isinstance(maxout, Variable) and maxout.ndim==4
assert isinstance(gz, Variable) and gz.ndim==4
return Apply(self, [x, maxout, gz], [x.type()])
def perform(self, node, (x, maxout, gz), (gx_stg,)):
gx = numpy.zeros_like(x)
ds0, ds1 = self.ds
shape2 = (x.shape[2] / ds0 * ds0)
if not self.ignore_border: shape2 = x.shape[2]
shape3 = (x.shape[3] / ds1 * ds1)
if not self.ignore_border: shape3 = x.shape[3]
for n in xrange(x.shape[0]):
for k in xrange(x.shape[1]):
for i in xrange(shape2):
zi = i / ds0
for j in xrange(shape3):
zj = j / ds1
if (maxout[n,k,zi,zj] == x[n,k,i,j]):
gx[n,k,i,j] = gz[n,k,zi,zj]
else: gx[n,k,i,j] = 0
gx_stg[0] = gx
def c_code(self, node, name, (x, z, gz), (gx,), sub):
fail = sub['fail']
ignore_border = int(self.ignore_border)
ds0, ds1 = self.ds
return """
int x_typenum = PyArray_ObjectType((PyObject*)%(x)s, 0);
int z_typenum = PyArray_ObjectType((PyObject*)%(z)s, 0);
int gz_typenum = PyArray_ObjectType((PyObject*)%(gz)s, 0);
int x_shp0_usable;
int x_shp1_usable;
int z_shp0, z_shp1;
if ((x_typenum != z_typenum) || (x_typenum != gz_typenum))
{
PyErr_SetString(PyExc_ValueError, "input types must all match");
%(fail)s;
}
if(%(x)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "x must be a 4d ndarray");
%(fail)s;
}
if(%(z)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "z must be a 4d ndarray");
%(fail)s;
}
if(%(gz)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "gz must be a 4d ndarray");
%(fail)s;
}
z_shp0 = %(z)s->dimensions[2];
z_shp1 = %(z)s->dimensions[3];
if (%(ignore_border)s)
{
x_shp0_usable = z_shp0 * %(ds0)s;
x_shp1_usable = z_shp1 * %(ds1)s;
}
else
{
x_shp0_usable = %(x)s->dimensions[2];
x_shp1_usable = %(x)s->dimensions[3];
}
if ((!%(gx)s)
|| *PyArray_DIMS(%(gx)s)!=4
||(%(gx)s->dimensions[0] != %(x)s->dimensions[0])
||(%(gx)s->dimensions[1] != %(x)s->dimensions[1])
||(%(gx)s->dimensions[2] != %(x)s->dimensions[2])
||(%(gx)s->dimensions[3] != %(x)s->dimensions[3])
)
{
Py_XDECREF(%(gx)s);
%(gx)s = (PyArrayObject*) PyArray_ZEROS(4, %(x)s->dimensions, x_typenum,0);
}
for(int b=0;b<%(x)s->dimensions[0];b++){
for(int k=0;k<%(x)s->dimensions[1];k++){
int mini_i = 0;
int zi = 0;
for(int i=0;i< x_shp0_usable; i++){
int mini_j = 0;
int zj = 0;
for(int j=0; j< x_shp1_usable; j++){
dtype_%(x)s * __restrict__ xp = ((dtype_%(x)s*)(PyArray_GETPTR4(%(x)s,b,k,i,j)));
dtype_%(gx)s * __restrict__ gxp = ((dtype_%(gx)s*)(PyArray_GETPTR4(%(gx)s,b,k,i,j)));
dtype_%(z)s * __restrict__ zp = ((dtype_%(z)s*)(PyArray_GETPTR4(%(z)s,b,k,zi,zj)));
dtype_%(gz)s * __restrict__ gzp = ((dtype_%(gz)s*)(PyArray_GETPTR4(%(gz)s,b,k,zi,zj)));
gxp[0] = (zp[0] == xp[0]) ? gzp[0] : 0;
mini_j = (mini_j + 1 == %(ds1)s) ? 0 : mini_j+1;
zj += (mini_j == 0);
}//for j
mini_i = (mini_i + 1 == %(ds0)s) ? 0 : mini_i+1;
zi += (mini_i == 0);
for (int j = x_shp1_usable; j < %(x)s->dimensions[3]; ++j) {
dtype_%(gx)s * gxp = ((dtype_%(gx)s*)(PyArray_GETPTR4(%(gx)s,b,k,i,j)));
gxp[0] = 0;
}
}//for i
for(int i = x_shp0_usable; i < %(x)s->dimensions[2]; i++){
for (int j = 0; j < %(x)s->dimensions[3]; ++j) {
dtype_%(gx)s * gxp = ((dtype_%(gx)s*)(PyArray_GETPTR4(%(gx)s,b,k,i,j)));
gxp[0] = 0;
}
}
}//for k
}//for b
""" %locals()
def c_code_cache_version(self):
return ()
def max_pool2D(input, ds, ignore_border=False):
"""
Takes as input a N-D tensor, where N >= 2. It downscales the input image by
the specified factor, by keeping only the maximum value of non-overlapping
patches of size (ds[0],ds[1])
:type input: N-D theano tensor of input images.
:param input: input images. Max pooling will be done over the 2 last dimensions.
:type ds: tuple of length 2
:param ds: factor by which to downscale. (2,2) will halve the image in each
dimension.
:param ignore_border: boolean value. When True, (5,5) input with ds=(2,2)
will generate a (2,2) output. (3,3) otherwise.
"""
if input.ndim < 2:
raise NotImplementedError('max_pool2D requires a dimension >= 2')
# extract image dimensions
img_shape = input.shape[-2:]
# count the number of "leading" dimensions, store as dmatrix
batch_size = tensor.prod(input.shape[:-2])
batch_size = tensor.shape_padright(batch_size,1)
# store as 4D tensor with shape: (batch_size,1,height,width)
new_shape = tensor.cast(tensor.join(0, batch_size,
tensor.as_tensor([1,]), img_shape), 'int64')
input_4D = tensor.reshape(input, new_shape, ndim=4)
# downsample mini-batch of images
op = DownsampleFactorMax(ds, ignore_border)
output = op(input_4D)
# restore to original shape
outshp = tensor.join(0, input.shape[:-2], output.shape[-2:])
return tensor.reshape(output, outshp, ndim=input.ndim)
class DownsampleFactorMax(Op):
"""
For N-dimensional tensors, consider that the last two dimensions span images.
This Op downsamples these images by a factor ds, by taking the max over non-
overlapping rectangular regions.
"""
@staticmethod
def out_shape(imgshape, ds, ignore_border=False):
"""Return the shape of the output from this op, for input of given shape and flags.
:param imgshape: the shape of a tensor of images. The last two elements are interpreted
as the number of rows, and the number of cols.
:type imgshape: tuple, list, or similar.
:param ds: downsample factor over rows and columns
:type ds: list or tuple of two ints
:param ignore_border: if ds doesn't divide imgshape, do we include an extra row/col of
partial downsampling (False) or ignore it (True).
:type ignore_border: bool
:rtype: list
:returns: the shape of the output from this op, for input of given shape. This will
have the same length as imgshape, but with last two elements reduced as per the
downsampling & ignore_border flags.
"""
if len(imgshape) < 2:
raise TypeError('imgshape must have at least two elements (rows, cols)')
r, c = imgshape[-2:]
rval = list(imgshape[:-2])+[ r/ds[0], c/ds[1]]
if not ignore_border:
if r % ds[0]:
rval[-2] += 1
if c % ds[1]:
rval[-1] += 1
return rval
def __init__(self, ds, ignore_border=False):
"""
:param ds: downsample factor over rows and columns
:type ds: list or tuple of two ints
:param ignore_border: if ds doesn't divide imgshape, do we include an extra row/col of
partial downsampling (False) or ignore it (True).
:type ignore_border: bool
TODO: why is poolsize an op parameter here?
"""
self.ds = tuple(ds)
self.ignore_border = ignore_border
def __eq__(self, other):
return type(self) == type(other) and self.ds == other.ds and self.ignore_border == other.ignore_border
def __hash__(self):
return hash(type(self)) ^ hash(self.ds) ^ hash(self.ignore_border)
def __str__(self):
return '%s{%s,%s}' % (self.__class__.__name__, self.ds, self.ignore_border)
def make_node(self, x):
if x.type.ndim != 4:
raise TypeError()
# TODO: consider restrucing the dtype?
return gof.Apply(self, [x], [x.type()])
def perform(self, node, (x,), (z,)):
"""
"""
if len(x.shape)!=4:
raise NotImplementedError('DownsampleFactorMax requires 4D input for now')
if z[0] is None:
z[0] = numpy.zeros(self.out_shape(x.shape, self.ds, self.ignore_border)) -float('inf')
z[0] = theano._asarray(z[0], dtype=x.dtype)
zz=z[0]
ds0, ds1 = self.ds
if self.ignore_border:
x_usable2 = (x.shape[2] / ds0 * ds0)
else: x_usable2 = x.shape[2]
if self.ignore_border:
x_usable3 = (x.shape[3] / ds1 * ds1)
else: x_usable3 = x.shape[3]
for n in xrange(x.shape[0]):
for k in xrange(x.shape[1]):
for i in xrange(x_usable2):
zi = i / ds0
for j in xrange(x_usable3):
zj = j / ds1
zz[n,k,zi,zj] = __builtin__.max(zz[n,k,zi,zj], x[n,k,i,j])
def grad(self,(x,), (gz,)):
maxout = self(x)
return [DownsampleFactorMaxGrad(self.ds, ignore_border=self.ignore_border)(x, maxout, gz)]
def c_code(self, node, name, (x,), (z, ), sub):
fail=sub['fail']
ignore_border = int(self.ignore_border)
ds0, ds1 = self.ds
return """
int typenum = PyArray_ObjectType((PyObject*)%(x)s, 0);
int x_shp0_usable;
int x_shp1_usable;
int z_shp0, z_shp1;
if(%(x)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "x must be a 4d ndarray");
%(fail)s;
}
z_shp0 = %(x)s->dimensions[2] / %(ds0)s;
z_shp1 = %(x)s->dimensions[3] / %(ds1)s;
if (%(ignore_border)s)
{
x_shp0_usable = z_shp0 * %(ds0)s;
x_shp1_usable = z_shp1 * %(ds1)s;
}
else
{
z_shp0 += (%(x)s->dimensions[2] %% %(ds0)s) ? 1 : 0;
z_shp1 += (%(x)s->dimensions[3] %% %(ds1)s) ? 1 : 0;
x_shp0_usable = %(x)s->dimensions[2];
x_shp1_usable = %(x)s->dimensions[3];
}
if ((!%(z)s)
|| *PyArray_DIMS(%(z)s)!=4
||(%(z)s->dimensions[0] != %(x)s->dimensions[0])
||(%(z)s->dimensions[1] != %(x)s->dimensions[1])
||(%(z)s->dimensions[2] != z_shp0)
||(%(z)s->dimensions[3] != z_shp1)
)
{
if (%(z)s) Py_XDECREF(%(z)s);
npy_intp dims[4] = {0,0,0,0};
dims[0]=%(x)s->dimensions[0];
dims[1]=%(x)s->dimensions[1];
dims[2]=z_shp0;
dims[3]=z_shp1;
%(z)s = (PyArrayObject*) PyArray_ZEROS(4, dims, typenum,0); //TODO: zeros not necessary
}
if (z_shp0 && z_shp1)
{
for(int b=0;b<%(x)s->dimensions[0];b++){
for(int k=0;k<%(x)s->dimensions[1];k++){
int mini_i = 0;
int zi = 0;
for(int i=0;i< x_shp0_usable; i++){
int mini_j = 0;
int zj = 0;
for(int j=0; j<x_shp1_usable; j++){
dtype_%(x)s a = ((dtype_%(x)s*)(PyArray_GETPTR4(%(x)s,b,k,i,j)))[0];
dtype_%(z)s * __restrict__ z = ((dtype_%(z)s*)(PyArray_GETPTR4(%(z)s,b,k,zi,zj)));
z[0] = (((mini_j|mini_i) == 0) || z[0] < a) ? a : z[0];
mini_j = ((mini_j + 1) == %(ds1)s) ? 0 : mini_j+1;
zj += (mini_j == 0);
}
mini_i = ((mini_i + 1) == %(ds0)s) ? 0 : mini_i+1;
zi += (mini_i == 0);
}
}
}
}
""" % locals()
def c_code_cache_version(self):
return ()
from theano.tensor.signal.downsample import *
import unittest, sys, time
import numpy as N
import theano.tensor as T
from theano.tests import unittest_tools as utt
from theano.sandbox.downsample import DownsampleFactorMax
from theano import function, Mode
def max_pool(images=None, imshp=None, maxpoolshp=None, ignore_border=True):
"""Implements a max pooling layer
Uses the same API as sp.max_pool but uses the Downsample op instead.
Takes as input a 2D tensor of shape batch_size x img_size and performs max pooling.
Max pooling downsamples by taking the max value in a given area, here defined by
maxpoolshp. Outputs a 2D tensor of shape batch_size x output_size.
Parameters are keyword arguments in order to use func_to_mod.
@param images: 2D tensor containing images on which to apply convolution.
Assumed to be of shape batch_size x img_size
@param imgshp: tuple containing image dimensions
@param maxpoolshp: tuple containing shape of area to max pool over
@output out1: symbolic result (2D tensor)
@output out2: logical shape of the output
"""
if len(imshp) == 2:
imshp = (1,) + imshp
elif len(imshp)!=3:
raise NotImplementedError("!")
# all these reshapes should happen in place
imrshp = T.stack(images.shape[0],
*[T.as_tensor(x) for x in imshp])
imtensor = T.reshape(images, imrshp)
maxpop = DownsampleFactorMax(maxpoolshp, ignore_border)
rval = maxpop(imtensor)
return T.flatten(rval,2), maxpop.out_shape(imshp, maxpoolshp, ignore_border)
class TestDownsampleFactorMax(unittest.TestCase):
def test_maxpool(self):
# generate flatted images
maxpoolshps = ((1,1),(2,2),(3,3),(2,3))
imval = N.random.rand(4,10,64,64)
images = T.dmatrix()
dmatrix4=T.TensorType('float64', (False, False, False, False))
images4=dmatrix4()
tctot, tpytot, ntot = [],[],[]
for maxpoolshp in maxpoolshps:
for border in [True,False]:
print 'maxpoolshp', maxpoolshp,'border', border
# numeric verification
xi=0
yi=0
if not border:
if imval.shape[-2] % maxpoolshp[0]:
xi += 1
if imval.shape[-1] % maxpoolshp[1]:
yi += 1
my_output_val = N.zeros((imval.shape[0], imval.shape[1],
imval.shape[2]/maxpoolshp[0]+xi,
imval.shape[3]/maxpoolshp[1]+yi))
time1=time.time()
for n in range(imval.shape[0]):
for k in range(imval.shape[1]):
for i in range(my_output_val.shape[2]):
ii = i*maxpoolshp[0]
for j in range(my_output_val.shape[3]):
jj = j*maxpoolshp[1]
patch = imval[n,k,ii:ii+maxpoolshp[0],jj:jj+maxpoolshp[1]]
my_output_val[n,k,i,j] = N.max(patch)
my_output_val = my_output_val.reshape(imval.shape[0],-1)
ntot+=[time.time()-time1]
# symbolic stuff
#### wrapper to DownsampleFactorMax op ####
output, outshp = max_pool(images, imval.shape[1:], maxpoolshp, border)
assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
f = function([images,],[output,])
imval2=imval.reshape(imval.shape[0],-1)
output_val = f(imval2)
assert N.all(output_val == my_output_val)
#DownsampleFactorMax op
maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=border)(images4)
f = function([images4],maxpool_op,mode=Mode(linker="py"))
f2 = function([images4],maxpool_op,mode=Mode(linker="c"))
f3 = function([images4],maxpool_op)#for when we want to use the debug mode
time1=time.time()
output_val = f(imval)
tctot+=[time.time()-time1]
assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
time1=time.time()
output_val = f2(imval)
tpytot+=[time.time()-time1]
assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
output_val = f3(imval)
print 'Numpy processing time: %.3fs'%sum(ntot),ntot
print 'c Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tctot),tctot
print 'py Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tpytot),tpytot
d=N.asarray(ntot)/tctot
print 'speed up c theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
d=N.asarray(ntot)/tpytot
print 'speed up py theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
def test_DownsampleFactorMax_grad(self):
# generate flatted images
maxpoolshps = ((1,1),(3,2),(2,3))
imval = N.random.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate
do_theano=True
for maxpoolshp in maxpoolshps:
for border in [True,False]:
print 'maxpoolshp', maxpoolshp, 'border', border
def mp(input):
return DownsampleFactorMax(maxpoolshp, ignore_border=border)(input)
utt.verify_grad(mp, [imval])
if __name__ == '__main__':
t = TestDownsampleFactorMax("test_maxpool").run()
#t.test_maxpool()
from theano.tests import main
# main("test_sp")
......@@ -28,5 +28,3 @@ import nnet # used for softmax, sigmoid, etc.
......@@ -261,11 +261,14 @@ def _wrap_tensor_into_member(x):
compile.module.register_wrapper(_obj_is_wrappable_as_tensor, _wrap_tensor_into_member)
if int(config.tensor.cmp_sloppy)>1:
# This environment variable is a quick-and-dirty way to get low-precision comparisons.
# For a more precise setting of these tolerances set them explicitly in your user code by
# assigning, for example, "theano.tensor.basic.float32_atol = ..."
#when THEANO_CMP_SLOPPY>1 we are even more sloppy. This is usefull to test the gpu as they don't use extended precision and this cause some difference bigger then the normal sloppy.
# This config variable is a quick-and-dirty way to get low-precision
# comparisons. For a more precise setting of these tolerances set
# them explicitly in your user code by assigning, for example,
# "theano.tensor.basic.float32_atol = ..."
# When config.tensor.cmp_sloppy>1 we are even more sloppy. This is
# useful to test the GPU as they don't use extended precision and
# this cause some difference bigger then the normal sloppy.
float32_atol = 5e-4
float32_rtol = 1e-3
float64_rtol = 1e-4
......@@ -3597,8 +3600,10 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps=None, tol=None, mode=None, cast
o_fn = function(tensor_pt, o_output)
o_fn_out = o_fn(*[p.copy() for p in pt])
random_projection = rng.rand(*o_fn_out.shape)
# random_projection should not have elements too small,
# otherwise too much precision is lost in numerical gradient
random_projection = rng.rand(*o_fn_out.shape) + 0.5
if cast_to_output_type:
random_projection = numpy.array(random_projection,
dtype=o_output.dtype)
......
......@@ -44,7 +44,7 @@ def ldflags(libs=True, flags=False):
"""Return a list of libraries against which an Op's object file should be
linked to benefit from a BLAS implementation.
Default: ['blas'], but environment variable THEANO_BLAS_LDFLAGS overrides this.
Default: ['blas'], but configuration variable config.blas.ldflags overrides this.
"""
rval = []
for t in config.blas.ldflags.split():
......@@ -52,9 +52,9 @@ def ldflags(libs=True, flags=False):
t0, t1, t2 = t[0:3]
assert t0 == '-'
except:
raise ValueError('invalid token in THEANO_BLAS_LDFLAGS', t)
raise ValueError('invalid token in config.blas.ldflags', t)
if t1 == 'L':
raise ValueError('library dir not allowed in THEANO_BLAS_LDFLAGS', t)
raise ValueError('library dir not allowed in config.blas.ldflags', t)
elif libs and t1=='l': # example -lmkl
rval.append(t[2:])
elif flags and t1!='l': # example -openmp
......
......@@ -822,7 +822,14 @@ class CAReduce(Op):
to_reduce = reversed(sorted(axis))
if to_reduce:
for dimension in to_reduce:
variable = self.ufunc.reduce(variable, dimension)
# If it's a zero-size array, use scalar_op.identity if available
if variable.shape[dimension] == 0:
if hasattr(self.scalar_op, 'identity'):
variable = self.scalar_op.identity
else:
raise ValueError("Input (%s) has zero-size on axis %s, but self.scalar_op (%s) has no attribute 'identity'" % (variable, dimension, self.scalar_op))
else:
variable = self.ufunc.reduce(variable, dimension)
output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype)
else:
output[0] = numpy.copy(variable)
......
"""
Contains an op for convolving input images with a set of filters. This was
developed especially for Convolutional Neural Networks.
"""
__docformat__ = "restructuredtext en"
import numpy
import theano
import theano.tensor as tensor
from theano import gof, Op, tensor, config
import logging
_logger=logging.getLogger("theano.signal.conv")
def _debug(*msg):
_logger.debug(' '.join(msg))
def _warn(*msg):
_logger.warn(' '.join(msg))
def conv2d(input, filters, image_shape=None, filter_shape=None,
border_mode='valid', subsample=(1,1), **kargs):
"""
This function will build the symbolic graph for convolving a stack of input
images with a set of filters. The implementation is modelled after
Convolutional Neural Networks (CNN). It is simply a wrapper to the ConvOp but
provides a much cleaner interface.
:type input: symbolic 4D tensor
:param input: mini-batch of feature map stacks, of shape image_shape.
:type filters: symbolic 4D tensor
:param filters: set of filters used in CNN layer of shape filter_shape
:param border_mode:
'valid'-- only apply filter to complete patches of the image. Generates
output of shape: image_shape - filter_shape + 1
'full' -- zero-pads image to multiple of filter shape to generate output of
shape: image_shape + filter_shape - 1
:type subsample: tuple of len 2
:param subsample: factor by which to subsample the output
:type image_shape: tuple of len 4
:param image_shape: (batch size, stack size, nb row, nb col)
:type filter_shape: tuple of len 4
:param filter_shape: (nb filters, stack size, nb row, nb col)
:param kwargs: kwargs are passed onto ConvOp. Can be used to set the following:
unroll_batch, unroll_kern, unroll_patch (see ConvOp doc)
"""
if image_shape and filter_shape:
assert image_shape[1]==filter_shape[1]
if filter_shape is not None:
nkern = filter_shape[0]
kshp = filter_shape[2:]
else:
nkern, kshp = None, None
if image_shape is not None:
bsize = image_shape[0]
imshp = image_shape[1:]
else:
bsize, imshp = None, None
op = ConvOp(output_mode=border_mode, dx=subsample[0], dy=subsample[1],
imshp=imshp, kshp=kshp, nkern=nkern, bsize=bsize,**kargs)
return op(input, filters)
class ConvOp(Op):
"""
This Op serves a dual purpose: it can implement a vanilla 2D convolution
(as taught in any signal processing class) or implement the
convolutional layers found in Convolutional Neural Networks.
In this setting, a set of 3D images is convolved with a set of 3D kernels,
with the particularity that their leading dimensions are of equal length.
Vanilla 2D convolution is treated as a special case of this.
The input parameter represents a mini-batch of multiple images. Its shape is:
batch size x num. input feature maps x image height x image width
The kernel parameter represents a set of 3D kernels. Its shape is:
number of filters x num. input images x filter height x filter width
The output of ConvOp is a 4D tensor, generated as follows:
output[b,k,:,:] = \sum_i input[b,i,:,:] * filter[k,i,:,:] \forall b,k
where b is the mini-batch index, k the filter index and * is the convolution
operator.
"""
__attrnames = ['imshp', 'kshp', 'nkern', 'bsize', 'dx', 'dy', 'out_mode',
'unroll_batch', 'unroll_kern', 'unroll_patch',
'imshp_logical', 'kshp_logical', 'kshp_logical_top_aligned']
"""These attributes uniquely identify the behaviour of this op for given inputs"""
@staticmethod
def getOutputShape(inshp, kshp, (dx,dy)=(1,1), mode='valid'):
"""
Computes the output dimensions of convolving an image of shape "inshp"
with kernels of shape "kshp".
:param inshp: (rows,cols) of input image
:param kshp: (rows,cols) of filters
:param mode: 'valid' or 'full' (see 'border_mode' in conv2d's doc)
:return: (rows,cols) of output image
"""
if mode=='valid': s = -1
else: s = 1
inshp, kshp = numpy.array(inshp), numpy.array(kshp)
return numpy.int64(numpy.ceil((inshp[1:] + s*kshp - s*1)/\
numpy.array([dx,dy], dtype='float')))
def __init__(self, imshp=None, kshp=None, nkern=None, bsize=None,
dx=None, dy=None,
output_mode='valid', unroll_batch=0,
unroll_kern=0,
unroll_patch=True,
imshp_logical=None,
kshp_logical=None,
kshp_logical_top_aligned=True,
verbose=0,
version=-1):
"""
Initializes a ConvOp with given output_mode (full/valid). All other
parameters are optional and are only used to generate more optimized c
code.
NOTES ON OPTIMIZATION:
If ALL (imshp, kshp, nkern and bsize) parameters are provided, we can
generate faster c-code. This make a significant difference for the
'full' output_mode with unroll_patch=True. The current fastest
implementation on x86-64 uses {unroll_batch=4, unroll_kern=4,
unroll_patch=False} with all other shape parameters being provided.
For optimizing other architectures, see:
Kazushige Goto and Robert A. Van De Geijn, Anatomy of High-Performance
Matrix Multiplication, (mr x nr). ACM Transactions on Mathematical
Software, May 2008.
Figure 12: (mr x nr). For x86 use 2x4, itanium 8x8, etc.
:type output_mode: string
:param output_mode: 'valid' -- gives an output smaller then the image
'full' -- gives an output bigger then the image
Optional parameters: (will generate more optimal c code)
:type imshp: tuple of len 2 or 3: 2 for 2d image, 3 for a stack of 2d images.
:param imshp: (stacksize, nb image row, nb image col)
:type kshp: tuple of len 2
:param kshp: (nb kernel row, nb kernel col)
:type nkern: int
:param nkern: the number of kernel
:type bsize: int
:param bsize: the size of the minibatch
:type dx: int
:param dx: patch stride rows
:type dy: int
:param dx: patch stride cols
Params which select the version of code used:
:type unroll_patch: bool
:param unroll_patch: use a version of c_code that unroll the patch loop that don't
request all shape information to work, but if all shape information are present, will
use it to hardcode the value in the code for faster code.
:type unroll_batch:int
:param unroll_batch: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
respectively.
:type unroll_kern:int
:param unroll_kern: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
respectively.
:type verbose: int
:param verbose: passed to GpuConv
:type version: int
:param version: passed to GpuConv
:param imshp_logical: used internally when we generate the gradient when dx!=1 or dy!=1
:param kshp_logical: idem
:param kshp_logical_top_aligned: idem
"""
all_shape = imshp is not None and kshp is not None and \
nkern is not None and bsize is not None
if (unroll_batch>0 or unroll_kern>0) and not all_shape:
raise Exception("In ConvOp, when using unroll_batch and unroll_nkern, all shape are needed")
if not all_shape:
unroll_patch = True
if imshp is not None:
imshp = tuple(imshp)
if len(imshp)==2:
imshp = (1,)+imshp
elif len(imshp)==3:
imshp = imshp
else:
raise Exception("bad len for imshp")
self.imshp = imshp
if kshp is not None:
kshp = tuple(kshp)
self.kshp = kshp
self.nkern = nkern
self.bsize=bsize
self.dx=dx
self.dy=dy
self.verbose=verbose
self.version=version
# a triple
self.imshp_logical = self.imshp
if imshp_logical is not None: self.imshp_logical = tuple(imshp_logical)
assert (self.imshp is None and self.imshp_logical is None) or \
(len(self.imshp) == len(self.imshp_logical))
# a pair
self.kshp_logical = self.kshp
if kshp_logical is not None: self.kshp_logical = tuple(kshp_logical)
self.kshp_logical_top_aligned = kshp_logical_top_aligned
self.unroll_batch=unroll_batch
self.unroll_kern=unroll_kern
self.unroll_patch=unroll_patch
if self.unroll_batch>0 and self.bsize % self.unroll_batch!=0:
if self.bsize<=self.unroll_batch:
self.unroll_batch = self.bsize
else:
#find the maximum value under unroll_batch that would work
new=self.unroll_batch
assert(new>=1)
while self.bsize % new!=0:
new-=1
warnstr = "OPTIMISATION WARNING: in ConvOp.__init__() unroll_batch(%i)"\
"must be 0 or a divisor of bsize(%i). We revert it to %i. This"\
"won't change the result, but may make it slower."
_warn(warnstr % (self.unroll_batch, self.bsize, new))
self.unroll_batch=new
if self.unroll_kern>0 and self.nkern % unroll_kern!=0:
if self.nkern<=self.unroll_kern:
self.unroll_kern = self.nkern
else:
#find the maximum value under unroll_kern that would work
new=self.unroll_kern
assert(new>=1)
while self.nkern % new!=0:
new-=1
warnstr = "OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%i)"\
"should be 0 or a divisor of nkern(%i). We revert it to %i."\
"This won't change the result, but may make it slower."
_warn(warnstr % (self.unroll_kern, self.nkern, new))
self.unroll_kern=new
if all_shape:
self.outshp = ConvOp.getOutputShape(self.imshp_logical, self.kshp_logical, (dx,dy), output_mode)
self.fulloutshp = ConvOp.getOutputShape(self.imshp_logical, self.kshp_logical, (1,1), output_mode)
else:
self.outshp = None
self.fulloutshp = None
self.out_mode = output_mode
if not self.out_mode in ["valid", "full"]:
raise Exception("Mode %s not implemented"%self.out_mode)
if all_shape and not (self.outshp > 0).all():
raise Exception(("Bad size for the output shape. Verify that [post-"\
"supersampling] input shape (%s) and kern shape(%s) are ok. "\
"(Hint: kerns must fit inside image in valid mode)")%
(self.imshp_logical,self.kshp_logical))
self._rehash()
if config.op.set_flops:
self.set_flops()
def __eq__(self, other):
if type(self) != type(other):
return False
for a in self.__attrnames:
if getattr(self, a) != getattr(other, a):
return False
return True
def __setstate__(self, d):
self.__dict__.update(d)
self._rehash()
def _rehash(self):
hashval = hash(type(self))
for a in self.__attrnames:
hashval = hashval ^ hash(getattr(self, a))
self.__hashval = hashval
def __hash__(self):
return self.__hashval
def __str__(self):
return "ConvOp{" +",".join(str((a, getattr(self, a))) for a in self.__attrnames) + "}"
def set_flops(self):
""" Usefull with the hack in profilemode to print the MFlops"""
if self.out_mode=="valid":
self.flops=self.kshp[0]*self.kshp[1]*2#nb mul and add by output pixed
self.flops*=self.outshp[0]*self.outshp[1]#nb flops by output image
self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0]
else: #full mode not implemented
self.flops=0
for out_row in range(self.outshp[0]):#loop over output row
for out_col in range(self.outshp[0]):#loop over output col
for row in range(self.kshp[0]):#loop over kern row
if (row+out_row-self.kshp[0]+1<0 or
row+out_row-self.kshp[0]+1>=self.imshp[1]):
continue
col=0
max_col=self.kshp[1]
img_col=out_col-self.kshp[1]+1
max_col=min(max_col,self.imshp[2]-img_col)
if img_col<0:
col=-img_col
img_col+=col
while col < max_col: #loop over kern col
self.flops+=2
col+=1
self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0]
assert self.flops == self.bsize * self.nkern * self.imshp[0] * \
self.kshp[0] * self.kshp[1] * self.imshp[1] * self.imshp[2] * 2
def make_node(self, inputs, kerns):
# TODO: find a way to make ConvOp work for N-D (after NIPS09)
"""
inputs - 4 dim: batches x stacksize x rows x cols
kerns - 4 dim: nkern x stackidx x rows x cols
"""
outdim = kerns.ndim
_inputs = tensor.as_tensor_variable(inputs)
_kerns = tensor.as_tensor_variable(kerns)
# TODO: lift this restriction by upcasting either inputs or kerns
if _inputs.ndim != 4:
raise TypeError('make_node requires 4D tensor of inputs')
if _kerns.ndim != 4:
raise TypeError('make_node requires 4D tensor of kernels')
if _inputs.type.dtype != _kerns.type.dtype:
raise NotImplementedError("The image and the kernel must have the same type."
"inputs(%s), kerns(%s)"%(_inputs.dtype, _kerns.dtype))
output = tensor.tensor(dtype=_inputs.type.dtype,
broadcastable=[_inputs.broadcastable[0],
_kerns.broadcastable[0], False, False]);
return gof.Apply(self, [_inputs, _kerns], [output])
def perform(self,node, (img2d, filtersflipped), (z,)):
"""
By default if len(img2d.shape)==3, we
"""
# TODO: move these back out to global scope when they no longer cause an atexit error
from scipy.signal.signaltools import _valfrommode, _bvalfromboundary
from scipy.signal.sigtools import _convolve2d
imshp = self.imshp
if imshp is None:
imshp = tuple(img2d.shape[1:])
kshp = self.kshp
if kshp is None:
kshp = tuple(filtersflipped.shape[2:])
bsize = self.bsize
if bsize is None:
bsize = img2d.shape[0]
nkern = self.nkern
if nkern is None:
nkern = filtersflipped.shape[0]
imshp_logical = self.imshp_logical
if imshp_logical is None:
imshp_logical = imshp
kshp_logical = self.kshp_logical
if kshp_logical is None:
kshp_logical = kshp
if self.fulloutshp is not None:
fulloutshp = tuple(self.fulloutshp)
else:
fulloutshp = tuple(ConvOp.getOutputShape(imshp_logical, kshp_logical, (1,1), self.out_mode))
if z[0] is None:
z[0] = numpy.zeros((bsize,)+(nkern,)+fulloutshp,
dtype=img2d.dtype)
zz=z[0]
val = _valfrommode(self.out_mode)
bval = _bvalfromboundary('fill')
stacklen = imshp[0]
img2d = img2d.reshape((bsize,)+ imshp)
filtersflipped = filtersflipped.reshape((nkern,stacklen)+kshp)
if self.imshp != self.imshp_logical:
# assuming that to get from imshp to imshp logical we insert zeros in missing spots
rstride = int(numpy.ceil(imshp_logical[1] / float(imshp[1])))
cstride = int(numpy.ceil(imshp_logical[2] / float(imshp[2])))
buf = numpy.zeros((bsize,)+ imshp_logical, dtype=img2d.dtype)
buf[:,:,::rstride, ::cstride] = img2d
img2d = buf
del buf, rstride, cstride
if kshp != kshp_logical:
rstride = int(numpy.ceil(kshp_logical[0] / float(kshp[0])))
cstride = int(numpy.ceil(kshp_logical[1] / float(kshp[1])))
buf = numpy.zeros((nkern,stacklen)+ self.kshp_logical, dtype=filtersflipped.dtype)
if self.kshp_logical_top_aligned:
roffset=coffset=0
else:
roffset=(kshp_logical[0] - (kshp[0]*rstride) - 1+rstride) % rstride
coffset=(kshp_logical[1] - (kshp[1]*cstride) - 1+cstride) % cstride
assert roffset >= 0
assert coffset >= 0
buf[:,:,roffset::rstride, coffset::cstride] = filtersflipped
filtersflipped = buf
del buf, rstride, cstride
for b in range(bsize):
for n in range(nkern):
zz[b,n,...].fill(0)
for im0 in range(stacklen):
zz[b,n,...] += _convolve2d(\
img2d[b,im0,...], filtersflipped[n,im0,...],1,val, bval, 0)
#We copy it to remove the Stride mismatch warning from DEBUG_MODE.
#The copy make that we return an object with the same stride as the c version.
#The copy don't affect the performence during our experience as in that case we
#execute the c version which is much faster.
if self.dx>1 or self.dy>1:
zz = zz[:,:,0::self.dx,0::self.dy].copy()
z[0]=zz
def grad(self, (inputs, kerns), (gz,)):
if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical:
raise NotImplementedError('todo')
#if self.dx!=1 or self.dy!=1:
#raise Exception("ERROR: We disable ConvOp.grad now when dx!=1 or "\
#"dy!=1 as we think their is a high probability of bug in it."\
#"We need to raise the error on the gradient to .1!")
all_shape = self.imshp is not None and self.kshp is not None and \
self.nkern is not None and self.bsize is not None
if not all_shape and (self.dx!=1 or self.dy!=1):
raise Exception("ConvOp.grad when dx!=1 or dy!=1 we must have all "\
"the optional shape information")
####### Determine gradient on kernels ########
assert inputs.ndim==4 and kerns.ndim==4
newin = inputs.dimshuffle((1,0,2,3))
newgz = gz.dimshuffle((1,0,2,3))
(bsize, nkern) = None, None
imshp = None
kshp = None
un_p = self.unroll_patch
imshp_logical = None
if self.out_mode == 'valid':
(img, filters) = (newin, newgz)
kshp_logical = self.fulloutshp
kshp_logical_top_aligned=False
if all_shape:
(bsize, nkern) = (self.imshp[0], self.nkern)
imshp = (self.bsize, self.imshp[1], self.imshp[2])
kshp = self.outshp
un_b = self.unroll_batch
un_k = self.unroll_kern
elif self.out_mode == 'full':
(img, filters) = (newgz, newin)
kshp_logical = None
kshp_logical_top_aligned=True
if all_shape:
imshp_logical = (self.bsize, self.fulloutshp[0], self.fulloutshp[1])
(bsize, nkern) = (self.nkern, self.imshp[0])
imshp = (self.bsize, self.outshp[0], self.outshp[1])
kshp = self.imshp[1:]
un_b = self.unroll_kern
un_k = self.unroll_batch
else:
raise NotImplementedError('Only [full,valid] modes are currently supported.')
filters = filters[:,:,::-1,::-1] #flip them
#find good value for the unroll
if all_shape and un_b!=0 and bsize%un_b!=0:
if bsize<un_b:
un_b = bsize
else:
un_b = 1
_warn("OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the batch. Maybe you can optimize this!")
if un_k!=0 and nkern%un_k!=0:
if nkern<un_k:
un_k = nkern
else:
un_k = 1
_warn("OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the kernel. Maybe you can optimize this!")
dw = ConvOp(imshp, kshp, nkern, bsize, 1,1, output_mode='valid',
unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p,
imshp_logical=imshp_logical,
kshp_logical=kshp_logical,
kshp_logical_top_aligned=kshp_logical_top_aligned,
version=self.version,
verbose=self.verbose)
if hasattr(self,'flops'):
dw.set_flops()
dw = dw(img,filters)
if all_shape:
assert (dw.owner.op.outshp==self.kshp).all()
if self.out_mode == 'valid':
# before DimShuffle, dw is of shape visdim x nkern x kshp[0] x kshp[1]
dw = dw.dimshuffle((1,0,2,3))
dw = dw[:,:,::-1,::-1]
####### Determine gradient on inputs ########
mode = 'valid'
if not self.out_mode == 'full':
mode = 'full'
filters = kerns.dimshuffle((1,0,2,3))
filters = filters[:,:,::-1,::-1]
nkern = None
imshp = None
imshp_logical = None
kshp = None
if all_shape:
nkern = self.imshp[0]
imshp = (self.nkern, self.outshp[0], self.outshp[1])
imshp_logical=(self.nkern, self.fulloutshp[0], self.fulloutshp[1])
din = ConvOp(imshp, self.kshp, nkern, self.bsize,
1,1, output_mode=mode,
unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p,
imshp_logical=imshp_logical,
kshp_logical=None,
version=-1,#we we change the mode, we don't forward the version.
verbose=self.verbose)
if hasattr(self,'flops'):
din.set_flops()
din = din(gz,filters)
print 'self.imshp = ', self.imshp
print 'din.owner.op.outshp = ', din.owner.op.outshp
assert (din.owner.op.outshp is None and self.imshp is None) or \
(din.owner.op.outshp is None) or \
(din.owner.op.outshp==self.imshp[1:]).all()
return [din, dw]
def c_headers(self):
return ['<numpy/noprefix.h>', '<iostream>', '<sstream>' ]
def c_code_cache_version(self):
return (1)
def c_support_code(self):
return """
#define STRIDES(arr) ((arr)->strides)
#define FULL 2
#define SAME 1
#define VALID 0
#define MOD %
using namespace std;
""" + tensor.blas.blas_header_text()
def c_libraries(self):
return tensor.blas.ldflags()
def c_code(self, node, name, (img2d, filtersflipped), (z, ), sub):
if node.inputs[0].type.dtype != node.inputs[1].type.dtype:
raise NotImplementedError()
assert node.inputs[0].type.dtype == node.inputs[1].type.dtype
d=locals()
d.update(sub)
all_shape = self.imshp is not None and self.kshp is not None and \
self.nkern is not None and self.bsize is not None
d["self_out_mode"]=self.out_mode
d["self_dx"]=self.dx
d["self_dy"]=self.dy
d["mode"]=self.out_mode.upper()
d["mode"]=self.out_mode.upper()
d["affectation"]="="
if all_shape:
d["self_bsize"]=self.bsize
d["self_nkern"]=self.nkern
d["self_outshp0"]=self.outshp[0]
d["self_outshp1"]=self.outshp[1]
d["self_imshp0"]=self.imshp[0]
d["self_imshp1"]=self.imshp[1]
d["self_imshp2"]=self.imshp[2]
d["self_kshp0"]=self.kshp[0]
d["self_kshp1"]=self.kshp[1]
d["self_kshp_logical_r"] = self.kshp_logical[0]
d["self_kshp_logical_c"] = self.kshp_logical[1]
d["self_kshp_logical_stride_r"] = int(numpy.ceil(self.kshp_logical[0] / float(self.kshp[0])))
d["self_kshp_logical_stride_c"] = int(numpy.ceil(self.kshp_logical[1] / float(self.kshp[1])))
d["self_imshp_logical_r"] = self.imshp_logical[1] #numpy.B. 1 not 0
d["self_imshp_logical_c"] = self.imshp_logical[2]#numpy.B. 2 not 1
d["self_imshp_logical_stride_r"] = int(numpy.ceil(self.imshp_logical[1] / float(self.imshp[1])))
d["self_imshp_logical_stride_c"] = int(numpy.ceil(self.imshp_logical[2] / float(self.imshp[2])))
if not self.imshp[0]==1: d["affectation"]="+="
d["all_shape"]=1
d["dim_zz_const"]="const"
else:
d["self_bsize"]="%(img2d)s->dimensions[0]"%d
d["self_nkern"]="%(filtersflipped)s->dimensions[0]"%d
d["self_outshp0"]="-1"
d["self_outshp1"]="-1"
d["self_imshp0"]="%(img2d)s->dimensions[1]"%d
d["self_imshp1"]="%(img2d)s->dimensions[2]"%d
d["self_imshp2"]="%(img2d)s->dimensions[3]"%d
d["self_kshp0"]="%(filtersflipped)s->dimensions[2]"%d
d["self_kshp1"]="%(filtersflipped)s->dimensions[3]"%d
d["affectation"]="+="
d["all_shape"]=0
d["dim_zz_const"]=""
if self.kshp_logical_top_aligned:
d["self_kshp_logical_offset_r"] = 0
d["self_kshp_logical_offset_c"] = 0
elif self.imshp != self.imshp_logical or self.kshp != self.kshp_logical:
rstride = d["self_kshp_logical_stride_r"]
cstride = d["self_kshp_logical_stride_c"]
d["self_kshp_logical_offset_r"] = (self.kshp_logical[0] - (self.kshp[0]*rstride) - 1+rstride) % rstride
d["self_kshp_logical_offset_c"] = (self.kshp_logical[1] - (self.kshp[1]*cstride) - 1+cstride) % cstride
del rstride, cstride
if node.inputs[0].type.dtype=="float32": d["type"]="float"
elif node.inputs[0].type.dtype=="float64": d["type"]="double"
else: raise Exception("Type %s not implemented"%node.inputs[0].type.dtype)
d["gemm"]='dgemm_'
if not d["type"]=="double":d["gemm"]='sgemm_'
if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical:
if self.verbose:
_debug("return imshp!=imshp_logical or self.kshp != self.kshp_logical shape version")
return _conv_op_code_a % d
if self.unroll_patch:
if self.verbose:
_debug("return unroll patch version. all_shape=", all_shape)
return _conv_op_code_unroll_patch%d
if self.unroll_batch>0 or self.unroll_kern>0:
if self.unroll_batch<=0: self.unroll_batch=1
if self.unroll_kern<=0: self.unroll_kern=1
if self.verbose:
_debug("return unrolled batch (%s) and kern code (%s)",
str(self.unroll_batch), str(self.unroll_kern))
return gen_conv_code_unroll_batch_kern(d, self.unroll_batch,
self.unroll_kern)
#TODO: should we choose the unroll size automatically with the bigger divisor under 5?
if self.out_mode == 'valid' and self.dx==0 and self.dy==0:
if self.verbose:
_debug("return gemm version")
return _conv_op_code_valid_gemm % d
else:
if self.verbose:
_debug("return no gemm version")
return _conv_op_code_a % d
_conv_op_code_a = """
const int mode=%(mode)s;
int typenum=0, typenum_f=0;
PyArrayObject *ain1=NULL, *ain2=NULL, *filtersflipped_arr=NULL, *img2d_arr=NULL;
const %(type)s fill_value = 0;
int type_im=PyArray_TYPE(%(img2d)s);
int type_ker=PyArray_TYPE(%(filtersflipped)s);
npy_intp dim_zz[2]={%(self_outshp0)s,%(self_outshp1)s};
npy_intp dim_im_phys[2]={%(self_imshp1)s,%(self_imshp2)s};
npy_intp dim_im_log[2]={%(self_imshp_logical_r)s,%(self_imshp_logical_c)s};
npy_intp dim_ker_phys[2]={%(self_kshp0)s,%(self_kshp1)s};
npy_intp dim_ker_log[2]={%(self_kshp_logical_r)s,%(self_kshp_logical_c)s};
PyArray_Dims img2d_shape;
npy_intp img2d_dim[4]={1,1,0,0};
img2d_shape.ptr=img2d_dim;
img2d_shape.len=4;
PyArray_Dims kerns_shape;
npy_intp kerns_dim[4]={1,1,0,0};
kerns_shape.ptr=kerns_dim;
kerns_shape.len=4;
PyObject *img2d=NULL, *contig, *filtersflipped=NULL;
if(%(img2d)s->nd==2){
img2d_dim[3]=%(img2d)s->dimensions[1];
img2d_dim[2]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==3){
img2d_dim[3]=%(img2d)s->dimensions[2];
img2d_dim[2]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==4){
img2d_dim[3]=%(img2d)s->dimensions[3];
img2d_dim[2]=%(img2d)s->dimensions[2];
img2d_dim[1]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else {
PyErr_SetString(PyExc_ValueError, "img don't have a good shape");
%(fail)s;
}
if(%(filtersflipped)s->nd==3){
kerns_dim[3]=%(filtersflipped)s->dimensions[2];
kerns_dim[2]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else if(%(filtersflipped)s->nd==4){
kerns_dim[3]=%(filtersflipped)s->dimensions[3];
kerns_dim[2]=%(filtersflipped)s->dimensions[2];
kerns_dim[1]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else{
std:stringstream temp;
temp << "nddim="<<%(filtersflipped)s->nd;
std::string param = temp.str();
PyErr_SetString(PyExc_ValueError,
("kernel don't have a good shape. " + param).c_str());
%(fail)s;
}
img2d = PyArray_Newshape(%(img2d)s,&img2d_shape, PyArray_CORDER);
img2d_arr = (PyArrayObject*)img2d;
if ((img2d_arr->strides[3] != (npy_intp)sizeof(%(type)s))
|| (img2d_arr->strides[2] != img2d_arr->dimensions[3]*(npy_intp)sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)img2d));
Py_DECREF(img2d);
img2d = contig;
if (!PyArray_ISCONTIGUOUS(img2d)){
PyErr_SetString(PyExc_ValueError, "img2d isn't contiguous");
%(fail)s;
}
}
img2d_arr = (PyArrayObject*)img2d;
filtersflipped = PyArray_Newshape(%(filtersflipped)s,&kerns_shape, PyArray_CORDER);
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if ((filtersflipped_arr->strides[3] != (npy_intp)sizeof(%(type)s))
|| (filtersflipped_arr->strides[2] != filtersflipped_arr->dimensions[3]*(npy_intp)sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)filtersflipped));
Py_DECREF(filtersflipped);
filtersflipped = contig;
if (!PyArray_ISCONTIGUOUS(filtersflipped)){
PyErr_SetString(PyExc_ValueError, "filtersflipped isn't contiguous");
%(fail)s;
}
}
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if(mode != VALID && mode != FULL){
PyErr_SetString(PyExc_ValueError, "invalid mode, only full and valid are supported"); %(fail)s;
}
typenum = PyArray_ObjectType((PyObject*)%(img2d)s, 0);
typenum_f = PyArray_ObjectType((PyObject*)%(filtersflipped)s, 0);
if (typenum < 0) {PyErr_SetString(PyExc_ValueError, "Invalid type"); %(fail)s;}
if (typenum != typenum_f) {PyErr_SetString(PyExc_ValueError, "Input types must match"); %(fail)s;}
if (!img2d) %(fail)s;
if (!filtersflipped) %(fail)s;
if ((!%(z)s)
|| *PyArray_DIMS(%(z)s)!=4
||(%(z)s->dimensions[0] != %(self_bsize)s)
||(%(z)s->dimensions[1] != %(self_nkern)s)
||(%(z)s->dimensions[2] != dim_zz[0])
|| (%(z)s->dimensions[3] != dim_zz[1])
)
{
{Py_XDECREF(%(z)s);}
npy_intp dims[4] = {0,0,0,0};
dims[0]=%(self_bsize)s;
dims[1]=%(self_nkern)s;
dims[2]=dim_zz[0];
dims[3]=dim_zz[1];
%(z)s = (PyArrayObject*) PyArray_ZEROS(4, dims, typenum,0);
}else{
//PyArray_FILLWBYTE((PyObject*)%(z)s,0);
}
int Os[2];
Os[0]=%(self_outshp0)s;
Os[1]=%(self_outshp1)s;
for(int b=0;b< %(self_bsize)s;b++){
for(int n_kern=0;n_kern<%(self_nkern)s;n_kern++){
//assertions
if (%(z)s->strides[0] != %(z)s->dimensions[1] *%(z)s->dimensions[2] *%(z)s->dimensions[3] * (npy_intp)sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[1] != %(z)s->dimensions[2] * %(z)s->dimensions[3] * (npy_intp)sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[2] != %(z)s->dimensions[3] * (npy_intp)sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[3] != (npy_intp)sizeof(%(type)s)) %(fail)s;
%(type)s * __restrict__ out=(%(type)s *)(PyArray_GETPTR2(%(z)s,b,n_kern));
for (int i = 0; i < dim_zz[0]*dim_zz[1]; ++i) out[i] = 0;
for(int stack_size=0;stack_size<%(self_imshp0)s;stack_size++){
const %(type)s * __restrict__ in=(%(type)s *)(PyArray_GETPTR2(img2d,b,stack_size));
const %(type)s * __restrict__ hvals=(%(type)s *)(PyArray_GETPTR2(filtersflipped,n_kern,stack_size));
for (int iter_m=0; iter_m < Os[0]; iter_m++) {
/// Reposition index into input image based on requested output size
int pos_m = iter_m*%(self_dx)s; //row position in logical output image
int new_m; //row anchor in logical input image (we will loop upward from here)
if (mode == FULL) new_m = pos_m ;
else new_m = (pos_m+dim_ker_log[0]-1);
for (int iter_n=0; iter_n < Os[1]; iter_n++) { // loop over columns
int pos_n=iter_n*%(self_dy)s; // current col position in logical output image
%(type)s sum=0;
// Sum over kernel, if index into image is out of bounds
// fill with the value
for (int j_log=0; j_log < %(self_kshp_logical_r)s; j_log++) { // loop over logical rows in kernel
int ind0_log = (new_m-j_log); // ind0_log: row position in logical input image
if ((j_log < %(self_kshp_logical_offset_r)s) || (j_log - %(self_kshp_logical_offset_r)s) MOD %(self_kshp_logical_stride_r)s)
continue;
if (ind0_log MOD %(self_imshp_logical_stride_r)s)
continue;
int j_phys = ((j_log- %(self_kshp_logical_offset_r)s) / %(self_kshp_logical_stride_r)s);
int ind0_phys = (ind0_log / %(self_imshp_logical_stride_r)s);
//std::cerr <<"j_log" << j_log << " j_phys " << j_phys << " " << ind0_phys << "\\n";
if(mode==FULL){
const %(type)s * idx_hvals=&hvals[j_phys*dim_ker_phys[1]]; //This is a pointer to the current row of the kernel
if(ind0_log < 0 || ind0_log >= dim_im_log[0]){
// the current row of the kernel is off the image
}else{
int k = max((int)(pos_n-dim_im_log[1])+1,0);
int max_k=min(pos_n+1,(int)dim_ker_log[1]);
const %(type)s * idx_in=&in[ind0_phys*dim_im_phys[1]];
for (int ind1_log=pos_n-k; k<max_k; k++,ind1_log--) {
if (1)
{
if ((k < %(self_kshp_logical_offset_c)s) || (k - %(self_kshp_logical_offset_c)s) MOD %(self_kshp_logical_stride_c)s)
continue;
if (ind1_log MOD %(self_imshp_logical_stride_c)s)
continue;
}
sum+= idx_hvals[(k-%(self_kshp_logical_offset_c)s) / %(self_kshp_logical_stride_c)s] * idx_in[ind1_log / %(self_imshp_logical_stride_c)s];
}
}
}else{
const %(type)s* idx_in=&in[ind0_phys*dim_im_phys[1]]; //JB: should be dim_im[1] right? (was dim_im[0])
const %(type)s* idx_hvals=&hvals[j_phys*dim_ker_phys[1]];
int new_n = (pos_n+dim_ker_log[1]-1);
if (%(self_imshp_logical_stride_c)s != 1) // a general loop
{
for (int k=0,last=new_n; k < dim_ker_log[1]; k++,last--) {
if ((k < %(self_kshp_logical_offset_c)s) || (k - %(self_kshp_logical_offset_c)s) MOD %(self_kshp_logical_stride_c)s)
continue;
else if (last MOD %(self_imshp_logical_stride_c)s)
continue;
else
{
sum+=idx_hvals[(k-%(self_kshp_logical_offset_c)s) / %(self_kshp_logical_stride_c)s]*idx_in[last/%(self_imshp_logical_stride_c)s];
}
}
}
else // self_imshp_stride_c == 1
{
int offset = %(self_kshp_logical_offset_c)s;
int k_phys=0;
for (int k_log=offset,last=new_n-offset; k_log < dim_ker_log[1]; ) {
sum += idx_hvals[k_phys]*idx_in[last];
++k_phys;
last -= %(self_kshp_logical_stride_c)s;
k_log += %(self_kshp_logical_stride_c)s;
}
}
}
}//for j
out[iter_m*dim_zz[1]+iter_n] %(affectation)s sum;
}//for n
}//for m
}//for stack_size
if (0 && (mode==FULL)){
for (int i = 0; i < dim_zz[0]*dim_zz[1]; ++i)
std::cout << " " << out[i];
std::cout << "\\n";
}
}//for n_kern
}//for b
Py_XDECREF(img2d);
Py_XDECREF(filtersflipped);
"""
#########
######### ConvOp c_code for valid mode (uses gemm)
#########
_conv_op_code_valid_gemm = """
int typenum=0, typenum_f=0;
PyArrayObject *ain1=NULL, *ain2=NULL, *img2d_arr=NULL;
const int NKERN = %(self_nkern)s;
int type_im=PyArray_TYPE(%(img2d)s);
int type_ker=PyArray_TYPE(%(filtersflipped)s);
npy_intp dim_zz[2]={%(self_outshp0)s,%(self_outshp1)s};
npy_intp dim_im[2]={%(self_imshp1)s,%(self_imshp2)s};
npy_intp dim_ker[2]={%(self_kshp0)s,%(self_kshp1)s};
PyArray_Dims img2d_shape;
npy_intp img2d_dim[4]={1,1,0,0};
img2d_shape.ptr=img2d_dim;
img2d_shape.len=4;
PyArray_Dims kerns_shape;
npy_intp kerns_dim[4]={1,1,0,0};
kerns_shape.ptr=kerns_dim;
kerns_shape.len=4;
PyObject *img2d=NULL, *contig;
if(%(img2d)s->nd==2){
img2d_dim[3]=%(img2d)s->dimensions[1];
img2d_dim[2]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==3){
img2d_dim[3]=%(img2d)s->dimensions[2];
img2d_dim[2]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==4){
img2d_dim[3]=%(img2d)s->dimensions[3];
img2d_dim[2]=%(img2d)s->dimensions[2];
img2d_dim[1]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else {
PyErr_SetString(PyExc_ValueError, "img don't have a good shape");
%(fail)s;
}
if(%(filtersflipped)s->nd==3){
kerns_dim[3]=%(filtersflipped)s->dimensions[2];
kerns_dim[2]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else if(%(filtersflipped)s->nd==4){
kerns_dim[3]=%(filtersflipped)s->dimensions[3];
kerns_dim[2]=%(filtersflipped)s->dimensions[2];
kerns_dim[1]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else{
std:stringstream temp;
temp << "nddim="<<%(filtersflipped)s->nd;
std::string param = temp.str();
PyErr_SetString(PyExc_ValueError,
("kernel don't have a good shape. " + param).c_str());
%(fail)s;
}
if (NKERN != kerns_dim[0])
{
PyErr_SetString(PyExc_NotImplementedError, "nonsense nkern");
%(fail)s;
}
img2d = PyArray_Newshape(%(img2d)s,&img2d_shape, PyArray_CORDER);
img2d_arr = (PyArrayObject*)img2d;
if ((img2d_arr->strides[3] != (npy_intp)sizeof(%(type)s))
|| (img2d_arr->strides[2] != img2d_arr->dimensions[3]*(npy_intp)sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)img2d));
Py_DECREF(img2d);
img2d = contig;
if (!PyArray_ISCONTIGUOUS(img2d)){
PyErr_SetString(PyExc_ValueError, "img2d isn't contiguous");
%(fail)s;
}
}
img2d_arr = (PyArrayObject*)img2d;
typenum = PyArray_ObjectType((PyObject*)%(img2d)s, 0);
typenum_f = PyArray_ObjectType((PyObject*)%(filtersflipped)s, 0);
if (typenum < 0) {PyErr_SetString(PyExc_ValueError, "Invalid type"); %(fail)s;}
if (typenum != typenum_f) {PyErr_SetString(PyExc_ValueError, "Input types must match"); %(fail)s;}
if (!img2d) {
PyErr_SetString(PyExc_ValueError, "Null argument img2d");
%(fail)s;
}
if ((!%(z)s)
|| *PyArray_DIMS(%(z)s)!=4
||(%(z)s->dimensions[0] != %(self_bsize)s)
||(%(z)s->dimensions[1] != %(self_nkern)s)
||(%(z)s->dimensions[2] != dim_zz[0])
|| (%(z)s->dimensions[3] != dim_zz[1])
)
{
{Py_XDECREF(%(z)s);}
npy_intp dims[4] = {0,0,0,0};
dims[0]=%(self_bsize)s;
dims[1]=%(self_nkern)s;
dims[2]=dim_zz[0];
dims[3]=dim_zz[1];
%(z)s = (PyArrayObject*) PyArray_ZEROS(4, dims, typenum,0);
}else{
PyArray_FILLWBYTE((PyObject*)%(z)s,0);
}
int Os[2];
Os[0] = dim_im[0]-dim_ker[0]+1;
Os[1] = dim_im[1]-dim_ker[1]+1;
// allocate a temporary buffer for storing the inner product of each nth kernel row
// with each row of an image
{
%(type)s * kbuf = (%(type)s *)malloc((Os[0] * NKERN + PyArray_Size((PyObject*)%(filtersflipped)s))* (npy_intp)sizeof(%(type)s));
int kbufstride = NKERN;
%(type)s * myfilters = kbuf + Os[0] * NKERN;
//copy out filtersflipped into filters un-flipped format
//std::cerr << "__filling myfilters__\\n";
for(int i=0;i < kerns_dim[0];++i){
for(int j=0;j < kerns_dim[1];++j){
for(int k=0;k < kerns_dim[2];++k){
for(int l=0;l < kerns_dim[3];++l){
%(type)s * ff = ((%(filtersflipped)s)->nd == 3)
? (%(type)s *)PyArray_GETPTR3(%(filtersflipped)s, i, kerns_dim[2]-1-k, kerns_dim[3]-1-l)
: (%(type)s *)PyArray_GETPTR4(%(filtersflipped)s, i, j, kerns_dim[2]-1-k, kerns_dim[3]-1-l);
myfilters[i * (kerns_dim[1]*kerns_dim[2]*kerns_dim[3])
+ j * (kerns_dim[2]*kerns_dim[3])
+ k * (kerns_dim[3])
+ l] = ff[0];
//std::cerr << " " << ff[0];
}
//std::cerr << "\\n";
}
//std::cerr << "(end of stack/batch " <<j << "/" << i << " ) \\n";
}
}
//std::cerr << "-----new loop ----\\n";
for(int b=0;b< %(self_bsize)s;b++){
for (int img_col = 0; img_col < Os[1]; ++img_col){
for (int filter_row = 0; filter_row < kerns_dim[2]; ++filter_row){
for (int stackidx = 0; stackidx < %(self_imshp0)s; ++stackidx){
%(type)s * img_colview =
(%(type)s *)(PyArray_GETPTR4(img2d, b, stackidx, filter_row, img_col));
%(type)s * filter_rows = myfilters + stackidx * (kerns_dim[2]*kerns_dim[3]) +
filter_row * kerns_dim[3];
//std::cerr << "filterview offset: " << filter_rows - myfilters << "\\n";
char N = 'N'; char T = 'T';
int Nz0 = Os[0];
int Nz1 = NKERN;
int K = kerns_dim[3];
%(type)s alpha = 1.0;
%(type)s beta = stackidx ? 1.0 : 0.0;
int imgview_stride = dim_im[1];
int filter_rows_stride =kerns_dim[1]*kerns_dim[2]*kerns_dim[3];
//remember, Fortran wants a column-major interpretation
assert(img2d->strides[3] == (npy_intp)sizeof(%(type)s));
if (0){
std::cerr << "b " << b << " img_col " << img_col << " filterrow " << filter_row << " stackidx " <<stackidx << "\\n";
std::cerr << "colview (physical layout) stride: " << imgview_stride << "\\n";
for (int ii = 0; ii < Nz0; ++ii){
for (int jj = 0; jj < K; ++jj){
std::cerr << " " << img_colview[ii * imgview_stride + jj];
}
std::cerr << "\\n";
}
std::cerr << "filterview ("<<filter_row<<"'th rows) stride: " << filter_rows_stride << "\\n";
for (int ii = 0; ii < Nz1; ++ii){
for (int jj = 0; jj < K; ++jj){
std::cerr << " " << filter_rows[ii * filter_rows_stride + jj];
}
std::cerr << "\\n";
}
std::cerr << Nz1 << " " << Nz0 << " " << K << "\\n" ;
}
%(gemm)s(&T, &N,
&Nz1, &Nz0, &K,
&alpha,
filter_rows, &filter_rows_stride,
img_colview, &imgview_stride,
&beta, kbuf, &kbufstride);
if (0){
std::cerr << "z (logical layout) beta" << beta << "\\n";
for (int ii = 0; ii < Nz0; ++ii){
for (int jj = 0; jj < Nz1; ++jj){
std::cerr << " " << kbuf[ii * kbufstride + jj];
}
std::cerr << "\\n";
}
}
}
// now kbuf the sum over the stack, put it into the outbuf
for (int img_row = 0; img_row < Os[0]; ++img_row) {
for (int kernel_idx = 0; kernel_idx < NKERN; ++kernel_idx) {
%(type)s * z_p = (%(type)s *)PyArray_GETPTR4(%(z)s, b, kernel_idx, img_row, img_col);
if (0)
{
if (b >= %(z)s->dimensions[0]) %(fail)s;
if (kernel_idx >= %(z)s->dimensions[1]) %(fail)s;
if (img_row >= %(z)s->dimensions[2]) %(fail)s;
if (img_col >= %(z)s->dimensions[3]) %(fail)s;
}
z_p[0] += kbuf[img_row * kbufstride + kernel_idx];
}
}
}
}
}
free(kbuf);
}
Py_XDECREF(img2d);
"""
def gen_conv_code_unroll_batch_kern(d,unroll_bsize=1, unroll_ksize=1):
""" c_code for ConvOp that unroll the batch size loop
"""
assert unroll_bsize>0 and unroll_ksize>0
if d.has_key("unroll_bsize") or d.has_key("unroll_ksize") or d.has_key("unroll_iter") or d.has_key("unroll_biter") or d.has_key("unroll_kiter"):
raise Exception("We can't use this dictionnary as we will overwrite some of its containt")
d=d.copy()
d["unroll_bsize"]=unroll_bsize
d["unroll_ksize"]=unroll_ksize
def my_dup(st,size):
s=""
for i in range(size):
d["unroll_iter"]=i
s+=st%d
return s+"\n"
def my_dup2(st):
s=""
iter=0
for i in range(unroll_bsize):
d["unroll_biter"]=i
for j in range(unroll_ksize):
d["unroll_kiter"]=j
d["unroll_iter"]=iter
iter+=1
s+=st%d
return s+"\n"
ret = """
const int mode=%(mode)s;
int typenum=0, typenum_f=0;
PyArrayObject *ain1=NULL, *ain2=NULL, *filtersflipped_arr=NULL, *img2d_arr=NULL;
const %(type)s fill_value = 0;
int type_im=PyArray_TYPE(%(img2d)s);
int type_ker=PyArray_TYPE(%(filtersflipped)s);
npy_intp dim_zz[2]={%(self_outshp0)s,%(self_outshp1)s};
npy_intp dim_im[2]={%(self_imshp1)s,%(self_imshp2)s};
npy_intp dim_ker[2]={%(self_kshp0)s,%(self_kshp1)s};
PyArray_Dims img2d_shape;
npy_intp img2d_dim[4]={1,1,0,0};
img2d_shape.ptr=img2d_dim;
img2d_shape.len=4;
PyArray_Dims kerns_shape;
npy_intp kerns_dim[4]={1,1,0,0};
kerns_shape.ptr=kerns_dim;
kerns_shape.len=4;
PyObject *img2d=NULL, *contig, *filtersflipped=NULL;
if(%(img2d)s->nd==2){
img2d_dim[3]=%(img2d)s->dimensions[1];
img2d_dim[2]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==3){
img2d_dim[3]=%(img2d)s->dimensions[2];
img2d_dim[2]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==4){
img2d_dim[3]=%(img2d)s->dimensions[3];
img2d_dim[2]=%(img2d)s->dimensions[2];
img2d_dim[1]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else {
std:stringstream temp;
temp << "nddim="<<%(img2d)s->nd;
std::string param = temp.str();
PyErr_SetString(PyExc_ValueError,
("img don't have a good shape. " + param).c_str());
%(fail)s;
}
if(%(filtersflipped)s->nd==3){
kerns_dim[3]=%(filtersflipped)s->dimensions[2];
kerns_dim[2]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else if(%(filtersflipped)s->nd==4){
kerns_dim[3]=%(filtersflipped)s->dimensions[3];
kerns_dim[2]=%(filtersflipped)s->dimensions[2];
kerns_dim[1]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else{
PyErr_SetString(PyExc_ValueError, "kernel don't have a good shape");
%(fail)s;
}
img2d = PyArray_Newshape(%(img2d)s,&img2d_shape, PyArray_CORDER);
img2d_arr = (PyArrayObject*)img2d;
if ((img2d_arr->strides[3] != (npy_intp)sizeof(%(type)s))
|| (img2d_arr->strides[2] != img2d_arr->dimensions[3]*(npy_intp)sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)img2d));
Py_DECREF(img2d);
img2d = contig;
if (!PyArray_ISCONTIGUOUS(img2d)){
PyErr_SetString(PyExc_ValueError, "img2d isn't contiguous");
%(fail)s;
}
}
img2d_arr = (PyArrayObject*)img2d;
filtersflipped = PyArray_Newshape(%(filtersflipped)s,&kerns_shape, PyArray_CORDER);
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if ((filtersflipped_arr->strides[3] != (npy_intp)sizeof(%(type)s))
|| (filtersflipped_arr->strides[2] != filtersflipped_arr->dimensions[3]*(npy_intp)sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)filtersflipped));
Py_DECREF(filtersflipped);
filtersflipped = contig;
if (!PyArray_ISCONTIGUOUS(filtersflipped)){
PyErr_SetString(PyExc_ValueError, "filtersflipped isn't contiguous");
%(fail)s;
}
}
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if(mode != VALID && mode != FULL){
PyErr_SetString(PyExc_ValueError, "invalid mode, only full and valid are supported"); %(fail)s;
}
typenum = PyArray_ObjectType((PyObject*)%(img2d)s, 0);
typenum_f = PyArray_ObjectType((PyObject*)%(filtersflipped)s, 0);
if (typenum < 0) {PyErr_SetString(PyExc_ValueError, "Invalid type"); %(fail)s;}
if (typenum != typenum_f) {PyErr_SetString(PyExc_ValueError, "Input types must match"); %(fail)s;}
if (!img2d) %(fail)s;
if (!filtersflipped) %(fail)s;
if ((!%(z)s)
|| *PyArray_DIMS(%(z)s)!=4
||(%(z)s->dimensions[0] != %(self_bsize)s)
||(%(z)s->dimensions[1] != %(self_nkern)s)
||(%(z)s->dimensions[2] != dim_zz[0])
|| (%(z)s->dimensions[3] != dim_zz[1])
)
{
{Py_XDECREF(%(z)s);}
npy_intp dims[4] = {0,0,0,0};
dims[0]=%(self_bsize)s;
dims[1]=%(self_nkern)s;
dims[2]=dim_zz[0];
dims[3]=dim_zz[1];
%(z)s = (PyArrayObject*) PyArray_ZEROS(4, dims, typenum,0);
}else{
//PyArray_FILLWBYTE((PyObject*)%(z)s,0);
}
int Os[2];
Os[0]=%(self_outshp0)s;
Os[1]=%(self_outshp1)s;
for(int b=0;b< %(self_bsize)s ;b+=%(unroll_bsize)s){
for(int n_kern=0;n_kern<%(self_nkern)s;n_kern+=%(unroll_ksize)s){
//assertions
if (%(z)s->strides[0] != %(z)s->dimensions[1] *%(z)s->dimensions[2] *%(z)s->dimensions[3] * (npy_intp)sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[1] != %(z)s->dimensions[2] * %(z)s->dimensions[3] * (npy_intp)sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[2] != %(z)s->dimensions[3] * (npy_intp)sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[3] != (npy_intp)sizeof(%(type)s)) %(fail)s;
"""%d
ret+=my_dup2("%(type)s * __restrict__ out%(unroll_iter)s=(%(type)s *)(PyArray_GETPTR2(%(z)s,b+%(unroll_biter)s,n_kern+%(unroll_kiter)s));")
ret+=my_dup("for (int i = 0; i < dim_zz[0]*dim_zz[1]; ++i) out%(unroll_iter)s[i] = 0;",unroll_bsize*unroll_ksize)
ret+="""
for(int stack_size=0;stack_size<%(self_imshp0)s;stack_size++){
"""%d
ret+=my_dup("const %(type)s * __restrict__ in%(unroll_iter)d=(%(type)s *)(PyArray_GETPTR2(img2d,b+%(unroll_iter)s,stack_size));", unroll_bsize)
ret+=my_dup("const %(type)s * __restrict__ hvals%(unroll_iter)s=(%(type)s *)(PyArray_GETPTR2(filtersflipped,n_kern+%(unroll_iter)s,stack_size));",unroll_ksize)
ret+="""
int new_m;
for (int iter_m=0; iter_m < Os[0]; iter_m++) {
// Reposition index into input image based on requested output size
int pos_m = iter_m*%(self_dx)s;//The position of the patch in the image
if (mode == FULL) new_m = pos_m ;
else new_m = (pos_m+dim_ker[0]-1);
for (int iter_n=0; iter_n < Os[1]; iter_n++) { // loop over columns
int pos_n=iter_n*%(self_dy)s;
"""%d
ret+=my_dup("%(type)s sum%(unroll_iter)s=0;", unroll_bsize*unroll_ksize)
ret+="""
// Sum over kernel, if index into image is out of bounds
// fill with the value
for (int j=0; j < dim_ker[0]; j++) {
int ind0 = (new_m-j);
if(mode==FULL){
"""%d
ret+=my_dup("const %(type)s * idx_hvals%(unroll_iter)s=&hvals%(unroll_iter)s[j*dim_ker[1]];",unroll_ksize)
ret+="""
if(ind0 < 0 || ind0 >= dim_im[0]){
if(fill_value!=0)
for (int k=0; k < dim_ker[1]; k++) {
"""%d
ret+=my_dup2("sum%(unroll_iter)s += idx_hvals%(unroll_kiter)s[k] * fill_value;")
ret+="""
}
}else{
//do the part where kernel is to the right of the img
int k=0,max_k=max((int)(pos_n-dim_im[1])+1,0);
if(fill_value!=0){
for(k=0;k<max_k;k++){
"""%d
ret+=my_dup2("sum%(unroll_iter)s += idx_hvals%(unroll_kiter)s[k] * fill_value;")
ret+="""
}
}else {k=max_k;}
//do the part where the kernel is on the img
max_k=min(pos_n+1,(int)dim_ker[1]);
"""%d
ret+=my_dup("const %(type)s * idx_in%(unroll_iter)s=&in%(unroll_iter)s[ind0*dim_im[1]];", unroll_bsize)
ret+="""
for (int ind1=pos_n-k; k<max_k; k++,ind1--) {
"""%d
ret+=my_dup2("sum%(unroll_iter)s+= idx_hvals%(unroll_kiter)s[k] * idx_in%(unroll_biter)s[ind1];")
ret+="""
}
//do the part to the left of the img
if(fill_value!=0)
for(;k<dim_ker[1];k++){
"""%d
ret+=my_dup2("sum%(unroll_iter)s += idx_hvals%(unroll_kiter)s[k] * fill_value;")
ret+="""
}
}
}else{//valid mode
"""%d
ret+=my_dup("const %(type)s* idx_in%(unroll_iter)s=&in%(unroll_iter)s[ind0*dim_im[1]];", unroll_bsize)
ret+=my_dup("const %(type)s* idx_hvals%(unroll_iter)s=&hvals%(unroll_iter)s[j*dim_ker[1]];",unroll_ksize)
ret+="""
int new_n = (pos_n+dim_ker[1]-1);
for (int k=0,last=new_n; k < dim_ker[1]; k++,last--) {
"""%d
ret+=my_dup2("sum%(unroll_iter)s+=idx_hvals%(unroll_kiter)s[k]*idx_in%(unroll_biter)s[last];")
ret+="""
}
}
}//for j
"""%d
ret+=my_dup("out%(unroll_iter)s[iter_m*dim_zz[1]+iter_n] %(affectation)s sum%(unroll_iter)s;", unroll_bsize*unroll_ksize)
ret+="""
}//for n
}//for m
}//for stack_size
}//for n_kern
}//for b
Py_XDECREF(img2d);
Py_XDECREF(filtersflipped);
"""
return ret
_conv_op_code_unroll_patch = """
const int mode=%(mode)s;
int typenum=0, typenum_f=0;
PyArrayObject *ain1=NULL, *ain2=NULL, *filtersflipped_arr=NULL, *img2d_arr=NULL;
const %(type)s fill_value = 0;//only value of 0 are currently tested and correctly implemented
int type_im=PyArray_TYPE(%(img2d)s);
int type_ker=PyArray_TYPE(%(filtersflipped)s);
const npy_intp dim_im[2]={%(self_imshp1)s,%(self_imshp2)s};
const npy_intp dim_ker[2]={%(self_kshp0)s,%(self_kshp1)s};
%(dim_zz_const)s npy_intp dim_zz[2]={%(self_outshp0)s,%(self_outshp1)s};
#if !%(all_shape)s
if (mode == FULL) {
dim_zz[0] = (int)ceil((dim_im[0]+dim_ker[0]-1)/float(%(self_dx)s));
dim_zz[1] = (int)ceil((dim_im[1]+dim_ker[1]-1)/float(%(self_dy)s));
} else {
dim_zz[0] = (int)ceil((dim_im[0]-dim_ker[0]+1)/float(%(self_dx)s));
dim_zz[1] = (int)ceil((dim_im[1]-dim_ker[1]+1)/float(%(self_dy)s));
}
#endif
PyArray_Dims img2d_shape;
npy_intp img2d_dim[4]={1,1,0,0};
img2d_shape.ptr=img2d_dim;
img2d_shape.len=4;
PyArray_Dims kerns_shape;
npy_intp kerns_dim[4]={1,1,0,0};
kerns_shape.ptr=kerns_dim;
kerns_shape.len=4;
PyObject *img2d=NULL, *contig, *filtersflipped=NULL;
if(%(img2d)s->nd==2){
img2d_dim[3]=%(img2d)s->dimensions[1];
img2d_dim[2]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==3){
img2d_dim[3]=%(img2d)s->dimensions[2];
img2d_dim[2]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else if(%(img2d)s->nd==4){
img2d_dim[3]=%(img2d)s->dimensions[3];
img2d_dim[2]=%(img2d)s->dimensions[2];
img2d_dim[1]=%(img2d)s->dimensions[1];
img2d_dim[0]=%(img2d)s->dimensions[0];
}else {
PyErr_Format(PyExc_ValueError,
"image don't have a good number of dimensions %%d. ", %(filtersflipped)s->nd);
%(fail)s;
}
if(%(filtersflipped)s->nd==3){
kerns_dim[3]=%(filtersflipped)s->dimensions[2];
kerns_dim[2]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else if(%(filtersflipped)s->nd==4){
kerns_dim[3]=%(filtersflipped)s->dimensions[3];
kerns_dim[2]=%(filtersflipped)s->dimensions[2];
kerns_dim[1]=%(filtersflipped)s->dimensions[1];
kerns_dim[0]=%(filtersflipped)s->dimensions[0];
}else{
PyErr_Format(PyExc_ValueError,
"kernel don't have a good number of dimensions %%d. ", %(filtersflipped)s->nd);
%(fail)s;
}
img2d = PyArray_Newshape(%(img2d)s,&img2d_shape, PyArray_CORDER);
img2d_arr = (PyArrayObject*)img2d;
if ((img2d_arr->strides[3] != sizeof(%(type)s))
|| (img2d_arr->strides[2] != img2d_arr->dimensions[3]*sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)img2d));
Py_DECREF(img2d);
img2d = contig;
if (!PyArray_ISCONTIGUOUS(img2d)){
PyErr_SetString(PyExc_ValueError, "img2d isn't contiguous");
%(fail)s;
}
}
img2d_arr = (PyArrayObject*)img2d;
filtersflipped = PyArray_Newshape(%(filtersflipped)s,&kerns_shape, PyArray_CORDER);
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if ((filtersflipped_arr->strides[3] != sizeof(%(type)s))
|| (filtersflipped_arr->strides[2] != filtersflipped_arr->dimensions[3]*sizeof(%(type)s))){
contig = (PyObject*)(PyArray_GETCONTIGUOUS((PyArrayObject*)filtersflipped));
Py_DECREF(filtersflipped);
filtersflipped = contig;
if (!PyArray_ISCONTIGUOUS(filtersflipped)){
PyErr_SetString(PyExc_ValueError, "filtersflipped isn't contiguous");
%(fail)s;
}
}
filtersflipped_arr = (PyArrayObject*)filtersflipped;
if(mode != VALID && mode != FULL){
PyErr_SetString(PyExc_ValueError, "invalid mode, only full and valid are supported"); %(fail)s;
}
if(dim_zz[0]<=0 || dim_zz[1]<=0){
PyErr_Format(PyExc_ValueError,
"Output dimensions are not valid %%dx%%d",dim_zz[0],dim_zz[1]);
%(fail)s;
}
typenum = PyArray_ObjectType((PyObject*)%(img2d)s, 0);
typenum_f = PyArray_ObjectType((PyObject*)%(filtersflipped)s, 0);
if (typenum < 0) {PyErr_SetString(PyExc_ValueError, "Invalid type"); %(fail)s;}
if (typenum != typenum_f) {PyErr_SetString(PyExc_ValueError, "Input types must match"); %(fail)s;}
if (!img2d) %(fail)s;
if (!filtersflipped) %(fail)s;
if ((!%(z)s)
|| *PyArray_DIMS(%(z)s)!=4
||(%(z)s->dimensions[0] != %(self_bsize)s)
||(%(z)s->dimensions[1] != %(self_nkern)s)
||(%(z)s->dimensions[2] != dim_zz[0])
|| (%(z)s->dimensions[3] != dim_zz[1])
)
{
if (%(z)s) Py_DECREF(%(z)s);
npy_intp dims[4] = {0,0,0,0};
if(!dims) %(fail)s;
dims[0]=%(self_bsize)s;
dims[1]=%(self_nkern)s;
dims[2]=dim_zz[0];
dims[3]=dim_zz[1];
%(z)s = (PyArrayObject*) PyArray_ZEROS(4, dims, typenum,0);
}else{
//PyArray_FILLWBYTE((PyObject*)%(z)s,0);
}
for(int b=0;b< %(self_bsize)s;b++){
for(int n_kern=0;n_kern<%(self_nkern)s;n_kern++){
//assertions
if (%(z)s->strides[0] != %(z)s->dimensions[1] *%(z)s->dimensions[2] *%(z)s->dimensions[3] * sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[1] != %(z)s->dimensions[2] * %(z)s->dimensions[3] * sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[2] != %(z)s->dimensions[3] * sizeof(%(type)s)) %(fail)s;
if (%(z)s->strides[3] != sizeof(%(type)s)) %(fail)s;
%(type)s * __restrict__ out=(%(type)s *)(PyArray_GETPTR2(%(z)s,b,n_kern));
for (int i = 0; i < dim_zz[0]*dim_zz[1]; ++i) out[i] = 0;
for(int stack_size=0;stack_size<%(self_imshp0)s;stack_size++){
const %(type)s * __restrict__ in=(%(type)s *)(PyArray_GETPTR2(img2d,b,stack_size));
const %(type)s * __restrict__ hvals=(%(type)s *)(PyArray_GETPTR2(filtersflipped,n_kern,stack_size));
int new_m;
for (int iter_m=0; iter_m < dim_zz[0]; iter_m++) {
// Reposition index into input image based on requested output size
int pos_m = iter_m*%(self_dx)s;//The position of the patch in the image
if (mode == FULL) new_m = pos_m ;
else new_m = (pos_m+dim_ker[0]-1);
for (int iter_n=0; iter_n < dim_zz[1]; iter_n++) { // loop over columns
int pos_n=iter_n*%(self_dy)s;
%(type)s sum=0;
%(type)s sum2=0;
%(type)s sum3=0;
%(type)s sum4=0;
int nb_sum=0;
// Sum over kernel, if index into image is out of bounds
// fill with the value
for (int j=0; j < dim_ker[0]; j++) {
int ind0 = (new_m-j);
if(mode==FULL){
const %(type)s * idx_hvals=&hvals[j*dim_ker[1]];
if(ind0 < 0 || ind0 >= dim_im[0]){
if(fill_value!=0)
for (int k=0; k < dim_ker[1]; k++) {
sum+= idx_hvals[k] * fill_value;
}
}else{
//do the part where kernel is to the right of the img
int k=0,max_k=max((int)(pos_n-dim_im[1])+1,0);
if(fill_value!=0){
for(k=0;k<max_k;k++){
sum+= idx_hvals[k]*fill_value;
}
}else {k=max_k;}
//do the part where the kernel is on the img
max_k=min(pos_n+1,(int)dim_ker[1]);
const %(type)s * idx_in=&in[ind0*dim_im[1]];
if(iter_n + 4*%(self_dy)s < dim_zz[1]
&& iter_n>dim_ker[1]-1+3
&& iter_n<dim_im[1]-dim_ker[1]+1-3){
nb_sum=4;
for (int ind1=pos_n-k; k<max_k; k++,ind1--) {
sum+=idx_hvals[k]*idx_in[ind1];
sum2+=idx_hvals[k]*idx_in[ind1+%(self_dy)s];
sum3+=idx_hvals[k]*idx_in[ind1+2*%(self_dy)s];
sum4+=idx_hvals[k]*idx_in[ind1+3*%(self_dy)s];
}
}else if(iter_n + 2*%(self_dy)s < dim_zz[1]
&& iter_n>dim_ker[1]-1
&& iter_n<dim_im[1]-dim_ker[1]+1){
nb_sum=2;
for (int ind1=pos_n-k; k<max_k; k++,ind1--) {
sum+=idx_hvals[k]*idx_in[ind1];
sum2+=idx_hvals[k]*idx_in[ind1+%(self_dy)s];
}
}else{
nb_sum=1;
/*
%(type)s sum_=0;
if((k-max_k) & 0x1 != 0){
sum+= idx_hvals[k] * idx_in[pos_n-k];
}
for (int ind1=pos_n-k; k<max_k; k+=2,ind1-=2) {
sum+= idx_hvals[k] * idx_in[ind1];
sum_+= idx_hvals[k+1] * idx_in[ind1-1];
}
sum+=sum_;
*/
for (int ind1=pos_n-k; k<max_k; k++,ind1--) {
sum+=idx_hvals[k]*idx_in[ind1];
}
}
//do the part to the left of the img
if(fill_value!=0)
for(;k<dim_ker[1];k++) sum+= idx_hvals[k]*fill_value;
}
}else{//valid mode
const %(type)s* idx_in=&in[ind0*dim_im[1]];
const %(type)s* idx_hvals=&hvals[j*dim_ker[1]];
if(iter_n + 4*%(self_dy)s < dim_zz[1]){
nb_sum=4;
for (int k=dim_ker[1]-1,im_idx=pos_n; k >=0; k--,im_idx++) {
sum+=idx_hvals[k]*idx_in[im_idx];
sum2+=idx_hvals[k]*idx_in[im_idx+%(self_dy)s];
sum3+=idx_hvals[k]*idx_in[im_idx+2*%(self_dy)s];
sum4+=idx_hvals[k]*idx_in[im_idx+3*%(self_dy)s];
}
}else if(iter_n + 2*%(self_dy)s < dim_zz[1]){
nb_sum=2;
for (int k=dim_ker[1]-1,im_idx=pos_n; k >=0; k--,im_idx++) {
sum+=idx_hvals[k]*idx_in[im_idx];
sum2+=idx_hvals[k]*idx_in[im_idx+%(self_dy)s];
}
}else{
nb_sum=1;
for (int k=dim_ker[1]-1,im_idx=pos_n; k >=0; k--,im_idx++) {
sum+=idx_hvals[k]*idx_in[im_idx];
}
}
}//else valid mode
}//for j
switch(nb_sum){
case 4: out[iter_m*dim_zz[1]+iter_n+3] %(affectation)s sum4;
case 3: out[iter_m*dim_zz[1]+iter_n+2] %(affectation)s sum3;
case 2: out[iter_m*dim_zz[1]+iter_n+1] %(affectation)s sum2;
case 1: out[iter_m*dim_zz[1]+iter_n] %(affectation)s sum;
}
iter_n+=nb_sum-1;
}//for iter_n
}//for iter_m
}//for stack_size
}//for n_kern
}//for b
Py_XDECREF(img2d);
Py_XDECREF(filtersflipped);
"""
""" Ops for downsampling images.
Planned:
DownsampleFactorMax, DownsampleAvg, DownsampleSoftmax.
"""
#This file should move along with conv.py
from theano import gof, Op, tensor, Variable, Apply
import numpy, theano
import __builtin__
def max_pool2D(input, ds, ignore_border=False):
"""
Takes as input a N-D tensor, where N >= 2. It downscales the input image by
the specified factor, by keeping only the maximum value of non-overlapping
patches of size (ds[0],ds[1])
:type input: N-D theano tensor of input images.
:param input: input images. Max pooling will be done over the 2 last dimensions.
:type ds: tuple of length 2
:param ds: factor by which to downscale. (2,2) will halve the image in each
dimension.
:param ignore_border: boolean value. When True, (5,5) input with ds=(2,2)
will generate a (2,2) output. (3,3) otherwise.
"""
if input.ndim < 2:
raise NotImplementedError('max_pool2D requires a dimension >= 2')
# extract image dimensions
img_shape = input.shape[-2:]
# count the number of "leading" dimensions, store as dmatrix
batch_size = tensor.prod(input.shape[:-2])
batch_size = tensor.shape_padright(batch_size,1)
# store as 4D tensor with shape: (batch_size,1,height,width)
new_shape = tensor.cast(tensor.join(0, batch_size,
tensor.as_tensor([1,]), img_shape), 'int64')
input_4D = tensor.reshape(input, new_shape, ndim=4)
# downsample mini-batch of images
op = DownsampleFactorMax(ds, ignore_border)
output = op(input_4D)
# restore to original shape
outshp = tensor.join(0, input.shape[:-2], output.shape[-2:])
return tensor.reshape(output, outshp, ndim=input.ndim)
class DownsampleFactorMax(Op):
"""
For N-dimensional tensors, consider that the last two dimensions span images.
This Op downsamples these images by a factor ds, by taking the max over non-
overlapping rectangular regions.
"""
@staticmethod
def out_shape(imgshape, ds, ignore_border=False):
"""Return the shape of the output from this op, for input of given shape and flags.
:param imgshape: the shape of a tensor of images. The last two elements are interpreted
as the number of rows, and the number of cols.
:type imgshape: tuple, list, or similar.
:param ds: downsample factor over rows and columns
:type ds: list or tuple of two ints
:param ignore_border: if ds doesn't divide imgshape, do we include an extra row/col of
partial downsampling (False) or ignore it (True).
:type ignore_border: bool
:rtype: list
:returns: the shape of the output from this op, for input of given shape. This will
have the same length as imgshape, but with last two elements reduced as per the
downsampling & ignore_border flags.
"""
if len(imgshape) < 2:
raise TypeError('imgshape must have at least two elements (rows, cols)')
r, c = imgshape[-2:]
rval = list(imgshape[:-2])+[ r/ds[0], c/ds[1]]
if not ignore_border:
if r % ds[0]:
rval[-2] += 1
if c % ds[1]:
rval[-1] += 1
return rval
def __init__(self, ds, ignore_border=False):
"""
:param ds: downsample factor over rows and columns
:type ds: list or tuple of two ints
:param ignore_border: if ds doesn't divide imgshape, do we include an extra row/col of
partial downsampling (False) or ignore it (True).
:type ignore_border: bool
TODO: why is poolsize an op parameter here?
"""
self.ds = tuple(ds)
self.ignore_border = ignore_border
def __eq__(self, other):
return type(self) == type(other) and self.ds == other.ds and self.ignore_border == other.ignore_border
def __hash__(self):
return hash(type(self)) ^ hash(self.ds) ^ hash(self.ignore_border)
def __str__(self):
return '%s{%s,%s}' % (self.__class__.__name__, self.ds, self.ignore_border)
def make_node(self, x):
if x.type.ndim != 4:
raise TypeError()
# TODO: consider restrucing the dtype?
return gof.Apply(self, [x], [x.type()])
def perform(self, node, (x,), (z,)):
"""
"""
if len(x.shape)!=4:
raise NotImplementedError('DownsampleFactorMax requires 4D input for now')
if z[0] is None:
z[0] = numpy.zeros(self.out_shape(x.shape, self.ds, self.ignore_border)) -float('inf')
z[0] = theano._asarray(z[0], dtype=x.dtype)
zz=z[0]
ds0, ds1 = self.ds
if self.ignore_border:
x_usable2 = (x.shape[2] / ds0 * ds0)
else: x_usable2 = x.shape[2]
if self.ignore_border:
x_usable3 = (x.shape[3] / ds1 * ds1)
else: x_usable3 = x.shape[3]
for n in xrange(x.shape[0]):
for k in xrange(x.shape[1]):
for i in xrange(x_usable2):
zi = i / ds0
for j in xrange(x_usable3):
zj = j / ds1
zz[n,k,zi,zj] = __builtin__.max(zz[n,k,zi,zj], x[n,k,i,j])
def grad(self,(x,), (gz,)):
maxout = self(x)
return [DownsampleFactorMaxGrad(self.ds, ignore_border=self.ignore_border)(x, maxout, gz)]
def c_code(self, node, name, (x,), (z, ), sub):
fail=sub['fail']
ignore_border = int(self.ignore_border)
ds0, ds1 = self.ds
return """
int typenum = PyArray_ObjectType((PyObject*)%(x)s, 0);
int x_shp0_usable;
int x_shp1_usable;
int z_shp0, z_shp1;
if(%(x)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "x must be a 4d ndarray");
%(fail)s;
}
z_shp0 = %(x)s->dimensions[2] / %(ds0)s;
z_shp1 = %(x)s->dimensions[3] / %(ds1)s;
if (%(ignore_border)s)
{
x_shp0_usable = z_shp0 * %(ds0)s;
x_shp1_usable = z_shp1 * %(ds1)s;
}
else
{
z_shp0 += (%(x)s->dimensions[2] %% %(ds0)s) ? 1 : 0;
z_shp1 += (%(x)s->dimensions[3] %% %(ds1)s) ? 1 : 0;
x_shp0_usable = %(x)s->dimensions[2];
x_shp1_usable = %(x)s->dimensions[3];
}
if ((!%(z)s)
|| *PyArray_DIMS(%(z)s)!=4
||(%(z)s->dimensions[0] != %(x)s->dimensions[0])
||(%(z)s->dimensions[1] != %(x)s->dimensions[1])
||(%(z)s->dimensions[2] != z_shp0)
||(%(z)s->dimensions[3] != z_shp1)
)
{
if (%(z)s) Py_XDECREF(%(z)s);
npy_intp dims[4] = {0,0,0,0};
dims[0]=%(x)s->dimensions[0];
dims[1]=%(x)s->dimensions[1];
dims[2]=z_shp0;
dims[3]=z_shp1;
%(z)s = (PyArrayObject*) PyArray_ZEROS(4, dims, typenum,0); //TODO: zeros not necessary
}
if (z_shp0 && z_shp1)
{
for(int b=0;b<%(x)s->dimensions[0];b++){
for(int k=0;k<%(x)s->dimensions[1];k++){
int mini_i = 0;
int zi = 0;
for(int i=0;i< x_shp0_usable; i++){
int mini_j = 0;
int zj = 0;
for(int j=0; j<x_shp1_usable; j++){
dtype_%(x)s a = ((dtype_%(x)s*)(PyArray_GETPTR4(%(x)s,b,k,i,j)))[0];
dtype_%(z)s * __restrict__ z = ((dtype_%(z)s*)(PyArray_GETPTR4(%(z)s,b,k,zi,zj)));
z[0] = (((mini_j|mini_i) == 0) || z[0] < a) ? a : z[0];
mini_j = ((mini_j + 1) == %(ds1)s) ? 0 : mini_j+1;
zj += (mini_j == 0);
}
mini_i = ((mini_i + 1) == %(ds0)s) ? 0 : mini_i+1;
zi += (mini_i == 0);
}
}
}
}
""" % locals()
def c_code_cache_version(self):
return ()
class DownsampleFactorMaxGrad(Op):
def __init__(self, ds, ignore_border):
self.ds = tuple(ds)
self.ignore_border = ignore_border
def __eq__(self, other):
return type(self) == type(other) and self.ds == other.ds and self.ignore_border == other.ignore_border
def __hash__(self):
return hash(type(self)) ^ hash(self.ds) ^ hash(self.ignore_border)
def __str__(self):
return '%s{%s,%s}' % (self.__class__.__name__, self.ds, self.ignore_border)
def make_node(self, x, maxout, gz):
# make_node should only be called by the grad function of DownsampleFactorMax,
# so these asserts should not fail.
assert isinstance(x, Variable) and x.ndim==4
assert isinstance(maxout, Variable) and maxout.ndim==4
assert isinstance(gz, Variable) and gz.ndim==4
return Apply(self, [x, maxout, gz], [x.type()])
def perform(self, node, (x, maxout, gz), (gx_stg,)):
gx = numpy.zeros_like(x)
ds0, ds1 = self.ds
shape2 = (x.shape[2] / ds0 * ds0)
if not self.ignore_border: shape2 = x.shape[2]
shape3 = (x.shape[3] / ds1 * ds1)
if not self.ignore_border: shape3 = x.shape[3]
for n in xrange(x.shape[0]):
for k in xrange(x.shape[1]):
for i in xrange(shape2):
zi = i / ds0
for j in xrange(shape3):
zj = j / ds1
if (maxout[n,k,zi,zj] == x[n,k,i,j]):
gx[n,k,i,j] = gz[n,k,zi,zj]
else: gx[n,k,i,j] = 0
gx_stg[0] = gx
def c_code(self, node, name, (x, z, gz), (gx,), sub):
fail = sub['fail']
ignore_border = int(self.ignore_border)
ds0, ds1 = self.ds
return """
int x_typenum = PyArray_ObjectType((PyObject*)%(x)s, 0);
int z_typenum = PyArray_ObjectType((PyObject*)%(z)s, 0);
int gz_typenum = PyArray_ObjectType((PyObject*)%(gz)s, 0);
int x_shp0_usable;
int x_shp1_usable;
int z_shp0, z_shp1;
if ((x_typenum != z_typenum) || (x_typenum != gz_typenum))
{
PyErr_SetString(PyExc_ValueError, "input types must all match");
%(fail)s;
}
if(%(x)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "x must be a 4d ndarray");
%(fail)s;
}
if(%(z)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "z must be a 4d ndarray");
%(fail)s;
}
if(%(gz)s->nd!=4)
{
PyErr_SetString(PyExc_ValueError, "gz must be a 4d ndarray");
%(fail)s;
}
z_shp0 = %(z)s->dimensions[2];
z_shp1 = %(z)s->dimensions[3];
if (%(ignore_border)s)
{
x_shp0_usable = z_shp0 * %(ds0)s;
x_shp1_usable = z_shp1 * %(ds1)s;
}
else
{
x_shp0_usable = %(x)s->dimensions[2];
x_shp1_usable = %(x)s->dimensions[3];
}
if ((!%(gx)s)
|| *PyArray_DIMS(%(gx)s)!=4
||(%(gx)s->dimensions[0] != %(x)s->dimensions[0])
||(%(gx)s->dimensions[1] != %(x)s->dimensions[1])
||(%(gx)s->dimensions[2] != %(x)s->dimensions[2])
||(%(gx)s->dimensions[3] != %(x)s->dimensions[3])
)
{
Py_XDECREF(%(gx)s);
%(gx)s = (PyArrayObject*) PyArray_ZEROS(4, %(x)s->dimensions, x_typenum,0);
}
for(int b=0;b<%(x)s->dimensions[0];b++){
for(int k=0;k<%(x)s->dimensions[1];k++){
int mini_i = 0;
int zi = 0;
for(int i=0;i< x_shp0_usable; i++){
int mini_j = 0;
int zj = 0;
for(int j=0; j< x_shp1_usable; j++){
dtype_%(x)s * __restrict__ xp = ((dtype_%(x)s*)(PyArray_GETPTR4(%(x)s,b,k,i,j)));
dtype_%(gx)s * __restrict__ gxp = ((dtype_%(gx)s*)(PyArray_GETPTR4(%(gx)s,b,k,i,j)));
dtype_%(z)s * __restrict__ zp = ((dtype_%(z)s*)(PyArray_GETPTR4(%(z)s,b,k,zi,zj)));
dtype_%(gz)s * __restrict__ gzp = ((dtype_%(gz)s*)(PyArray_GETPTR4(%(gz)s,b,k,zi,zj)));
gxp[0] = (zp[0] == xp[0]) ? gzp[0] : 0;
mini_j = (mini_j + 1 == %(ds1)s) ? 0 : mini_j+1;
zj += (mini_j == 0);
}//for j
mini_i = (mini_i + 1 == %(ds0)s) ? 0 : mini_i+1;
zi += (mini_i == 0);
for (int j = x_shp1_usable; j < %(x)s->dimensions[3]; ++j) {
dtype_%(gx)s * gxp = ((dtype_%(gx)s*)(PyArray_GETPTR4(%(gx)s,b,k,i,j)));
gxp[0] = 0;
}
}//for i
for(int i = x_shp0_usable; i < %(x)s->dimensions[2]; i++){
for (int j = 0; j < %(x)s->dimensions[3]; ++j) {
dtype_%(gx)s * gxp = ((dtype_%(gx)s*)(PyArray_GETPTR4(%(gx)s,b,k,i,j)));
gxp[0] = 0;
}
}
}//for k
}//for b
""" %locals()
def c_code_cache_version(self):
return ()
import sys, time, unittest
import numpy
import numpy as N
from theano.tests import unittest_tools as utt
from theano import function, Mode
import theano.tensor as T
from theano.tensor.signal.conv import ConvOp
def flip(kern, kshp):
"flip the kernel as scipy.convolv2d do it flipped."
flip = N.zeros(kern.shape)
if len(kern.shape)==2:
kern=kern.reshape(-1)
it = reversed(kern)
for i in range(kshp[0]):
for j in range(kshp[1]):
flip[i,j] = it.next()
elif len(kern.shape)==3:
kern=kern.reshape(kern.shape[0],-1)
for k in range(kern.shape[0]):
it = reversed(kern[k,:])
for i in range(kshp[0]):
for j in range(kshp[1]):
flip[k,i,j] = it.next()
elif len(kern.shape)==4:
kern=kern.reshape(kern.shape[0],kern.shape[1],-1)
for k in range(kern.shape[0]):
for m in range(kern.shape[1]):
it = reversed(kern[k,m,:])
for i in range(kshp[0]):
for j in range(kshp[1]):
flip[k,m,i,j] = it.next()
else:
raise NotImplementedError()
return flip
global_rng = N.random.RandomState(3423489)
dmatrix4=T.TensorType('float64', (False, False, False, False))
def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns,
unroll_batch=0, unroll_kern=0, img=T.dmatrix(), validate=True,
conv_op_py=False, do_print=True, repeat=1,
unroll_patch=False, unroll_patch_size=False, verbose=0):
# build actual input images
imgval = global_rng.rand(bsize, imshp[0], imshp[1], imshp[2])
a=T.dmatrix()
kerns = [a for i in nkerns]
inputs4=dmatrix4()
kerns4=dmatrix4()
# for each layer
ntot=0
tctot=0
tpytot=0
for kshp, kern, nkern, n_layer in zip(kshps, kerns, nkerns, range(len(nkerns))):
if do_print:
print '************* layer %i ***************' % n_layer
print conv_mode, ss, n_layer, kshp, nkern
# actual values
w = global_rng.random_sample(N.r_[nkern,imshp[0],kshp])
w_flip = flip(w,kshp).reshape(w.shape)
## manual implementation
# check first stage
padimg = imgval
if conv_mode == 'full':
padimg_shp = N.array(imshp[1:]) + 2*(N.array(kshp) - N.array([1,1]))
padimg = N.zeros(N.r_[bsize,imshp[0],padimg_shp])
padimg[:, :, kshp[0]-1:-kshp[0]+1,
kshp[1]-1:-kshp[1]+1] = imgval
outshp = N.hstack((nkern, ConvOp.getOutputShape(imshp, kshp, ss, conv_mode)))
time1 = time.time()
outval = N.zeros(N.r_[bsize,outshp])
if validate:
# causes an atexit problem
from scipy.signal.sigtools import _convolve2d
from scipy.signal.signaltools import _valfrommode, _bvalfromboundary
val = _valfrommode(conv_mode)
bval = _bvalfromboundary('fill')
for b in range(bsize): # loop over batches
for n in range(nkern): # loop over filters
for i in range(imshp[0]): # loop over input feature maps
outval[b,n,...] += _convolve2d(\
imgval[b,i,...], w_flip[n,i,...],1,val, bval, 0)[0::ss[0],0::ss[1]]
ntot += time.time() - time1
# ConvOp
if unroll_patch and not unroll_patch_size:
conv_op = ConvOp(dx=ss[0],dy=ss[1], output_mode=conv_mode,
unroll_patch=unroll_patch, verbose=verbose)(inputs4, kerns4)
else:
conv_op = ConvOp(imshp, kshp, nkern, bsize, ss[0],ss[1], conv_mode,
unroll_batch=unroll_batch, unroll_kern=unroll_kern, unroll_patch=unroll_patch, verbose=verbose)(inputs4, kerns4)
l1shp=N.hstack((nkern,
ConvOp.getOutputShape(imshp, kshp, ss, conv_mode)))
propup2 = function([inputs4, kerns4], conv_op)
propup3 = function([inputs4, kerns4], conv_op, mode=Mode(linker="py"))
time1 = time.time()
for i in range(repeat):
hidval2_ = propup2(imgval,w_flip)
hidval2 = hidval2_#[:,:,0::ss[0],0::ss[1]]
tctot += time.time() - time1
if conv_op_py:
time1 = time.time()
for i in range(repeat):
hidval3_ = propup3(imgval,w_flip)
hidval3 = hidval3_#[:,:,0::ss[0],0::ss[1]]
tpytot += time.time() - time1
assert (N.abs(hidval2-hidval3)<1e-5).all()
else:
tpytot += 0
if validate:
temp = N.abs(outval - hidval2)
assert (temp < 1e-5).all()
if validate and conv_op_py:
temp = N.abs(outval - hidval3)
assert (temp < 1e-5).all()
imshp = tuple(outshp)
imgval = outval.reshape(bsize,outshp[0],outshp[1],outshp[2])
return tctot, tpytot, ntot
def speed_multilayer_conv():
# calculate the speed up of different combination of unroll
# put the paramter to the same you will try.
validate=False# we don't validate the result to have it much faster!
verbose=1
unroll_batch = [1,2,4,5,10,20]
unroll_kern = [1,2,4,5,10,20]
unroll_batch = [1,4,5]
unroll_kern = [1,4,5]
unroll_patch = [True, False]
bsize = 20 # batch size
imshp_start = (1,48,48)#un square shape to test more corner case.
kshps = ([11,12],[12,11])#un square shape to test more corner case.
nkerns = [20,20] # per output pixel
ssizes = [(1,1),]#(1,1)]#(2,2) bugged
convmodes = ['valid','full']
do_convolve2=False
a=T.dmatrix()
kerns = [a for i in nkerns]
assert len(kshps)==len(nkerns)==len(kerns)
timing = N.zeros((len(unroll_batch),len(unroll_kern),3,len(convmodes)*len(ssizes)))
t_b_k=[]
#calculate the timing with unrolling
print 'time unroll batch kern'
t_=[[ 7.60572791, 3.95069814, 3.74271464], [ 4.05631089, 2.90384555, 2.93613672], [ 3.90551591, 2.92595196, 3.00102282]]
best=[0.52690219879150391, 2.4266397953033447]
worst=[0.92042708396911621, 6.8822150230407715]
best=[]
worst=[]
t_=[]
for unroll_b, n_b in zip(unroll_batch,range(len(unroll_batch))):
for unroll_k, n_k in zip(unroll_kern,range(len(unroll_kern))):
t_b_k.append(str(unroll_b)+"/"+str(unroll_k))
if not t_:
tctot, tpytot, ntot=[],[],[]
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=unroll_b, unroll_kern=unroll_k, validate=validate, verbose=verbose,do_print=False)
tctot+=[tctot_]
tpytot+=[tpytot_]
ntot+=[ntot_]
if unroll_b==4 and unroll_k==4:
#print "unroll 4/4",tctot
best=tctot
if unroll_b==1 and unroll_k==1:
#print "unroll 1/1",tctot
worst=tctot
timing[n_b,n_k]=[tctot, tpytot, ntot]#[sum(tctot), sum(tpytot), sum(ntot)]
if not t_:
t=timing[:,:,0,:]#We select only the c timing.
else:
t=t_
t=N.asarray(t)
#calculate the old timing
print 'time old version'
tctot_=[0.52555489540100098, 6.6634182929992676]
tctot,tpytot,ntot=[],[],[]
tctot_=[]
if not tctot_:
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate, verbose=verbose,do_print=False)
tctot+=[tctot_]
tpytot+=[tpytot_]
ntot+=[ntot_]
else: tctot=N.asarray(tctot_)
print "old code timing %.3fs"%sum(tctot),tctot
best=N.asarray(best)
worst=N.asarray(worst)
print "timing for unrolled version"
print t_b_k
print t
t_detail=t
t = t.sum(axis=2)
print "max %.3fs"%t.max(), "max param(batch unloop size/kernel unloop size)", t_b_k[t.argmax()]
print "min %.3fs"%t.min(), "min param(batch unloop size/kernel unloop size)", t_b_k[t.argmin()]
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t.min(),sum(tctot)/t.min())
print worst/best,tctot/best
#calculate the timing of unroll_patch
print 'time unroll_patch'
tctot_patch = []
tctot_patch_size = []
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=True,verbose=verbose,do_print=False)
tctot_patch += [tctot_]
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=True,verbose=verbose,do_print=False,unroll_patch_size=True)
tctot_patch_size += [tctot_]
t_patch=sum(tctot_patch)
print "unroll_patch without shape time", tctot_patch
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch,sum(tctot)/t_patch)
print best/tctot_patch, worst/tctot_patch
t_patch_size=sum(tctot_patch_size)
print "unroll_patch with shape time", tctot_patch_size
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch_size,sum(tctot)/t_patch_size)
print best/tctot_patch_size, worst/tctot_patch_size
return
if __name__ == '__main__':
speed_multilayer_conv()
import sys, time, unittest
import numpy
from scipy import signal
import theano
import theano.tensor as T
from theano import function, Mode
from theano.tests import unittest_tools as utt
from theano.tensor.signal import conv
from theano.tensor.basic import _allclose
class TestConv2D(unittest.TestCase):
def setUp(self):
utt.seed_rng()
self.input = T.dtensor4('input')
self.filters = T.dtensor4('filters')
def validate(self, image_shape, filter_shape,
border_mode='valid', subsample=(1,1),
N_image_shape=None, N_filter_shape=None,
input=None, filters=None,
unroll_batch=0, unroll_kern=0, unroll_patch=True,
verify_grad=True):
if N_image_shape is None:
N_image_shape = image_shape
if N_filter_shape is None:
N_filter_shape = filter_shape
if not input:
input = self.input
if not filters:
filters = self.filters
############# THEANO IMPLEMENTATION ############
# we create a symbolic function so that verify_grad can work
def sym_conv2d(input, filters):
# define theano graph and function
return conv.conv2d(input, filters, image_shape, filter_shape,
border_mode, subsample, unroll_batch=unroll_batch,
unroll_kern=unroll_kern, unroll_patch=unroll_patch)
output = sym_conv2d(input, filters)
theano_conv = theano.function([input, filters], output)
# initialize input and compute result
image_data = numpy.random.random(N_image_shape)
filter_data = numpy.random.random(N_filter_shape)
theano_output = theano_conv(image_data, filter_data)
############# REFERENCE IMPLEMENTATION ############
s = 1. if border_mode is 'full' else -1.
out_shape2d = numpy.array(N_image_shape[-2:]) +\
s*numpy.array(N_filter_shape[-2:]) - s
out_shape2d = numpy.ceil(out_shape2d / numpy.array(subsample))
out_shape = (N_image_shape[0],N_filter_shape[0]) + tuple(out_shape2d)
ref_output = numpy.zeros(out_shape)
# loop over output feature maps
for k in range(N_filter_shape[0]):
# loop over input feature maps
for l in range(N_filter_shape[1]):
filter2d = filter_data[k,l,:,:]
# loop over mini-batches
for b in range(N_image_shape[0]):
image2d = image_data[b,l,:,:]
output2d = signal.convolve2d(image2d, filter2d, border_mode)
ref_output[b,k,:,:] +=\
output2d[::subsample[0],::subsample[1]]
self.failUnless(_allclose(theano_output, ref_output))
############# TEST GRADIENT ############
if verify_grad:
utt.verify_grad(sym_conv2d, [image_data, filter_data])
def test_basic(self):
"""
Tests that basic convolutions work for odd and even dimensions of image and filter
shapes, as well as rectangular images and filters.
"""
self.validate((3,2,8,8), (4,2,5,5), 'valid')
self.validate((3,2,7,5), (5,2,2,3), 'valid')
self.validate((3,2,7,5), (5,2,3,2), 'valid')
self.validate((3,2,8,8), (4,2,5,5), 'full')
self.validate((3,2,7,5), (5,2,2,3), 'full')
# test filter same size as input
self.validate((3,2,3,3), (4,2,3,3), 'valid')
def test_unroll_patch_false(self):
"""
unroll_patch is True by default. Test basic convs with False.
"""
self.validate((3,2,7,5), (5,2,2,3), 'valid', unroll_patch=False)
self.validate((3,2,7,5), (5,2,2,3), 'full', unroll_patch=False)
self.validate((3,2,3,3), (4,2,3,3), 'valid', unroll_patch=False)
def test_unroll_special(self):
"""
(unroll_kern, unroll_batch) in (0,1),(1,0) is special case.
"""
self.validate((6,2,3,3), (3,2,2,2), 'valid', unroll_batch=1)
def test_unroll_batch(self):
"""
Test mini-batch unrolling for various legal values.
"""
# mini-batch of size 6 is multiple of 2 and 3. Should work.
self.validate((6,2,3,3), (3,2,2,2), 'valid', unroll_batch=2, verify_grad=False)
self.validate((6,2,3,3), (3,2,2,2), 'valid', unroll_batch=3, verify_grad=False)
def test_unroll_kern(self):
"""
Test kernel unrolling for various legal values.
"""
# 6 filters is a multiple of 2 and 3. Should work.
self.validate((2,3,3,3), (6,3,2,2), 'valid', unroll_kern=2, verify_grad=False)
self.validate((2,3,3,3), (6,3,2,2), 'valid', unroll_kern=3, verify_grad=False)
def test_subsample(self):
"""
Tests convolution where subsampling != (1,1)
"""
self.validate((3,2,7,5), (5,2,2,3), 'valid', subsample=(2,2))
self.validate((3,2,7,5), (5,2,2,3), 'full', subsample=(2,2))
self.validate((3,2,7,5), (5,2,2,3), 'valid', subsample=(2,1))
def test_invalid_filter_shape(self):
"""
Tests scenario where filter_shape[1] != input_shape[1]
"""
def f():
self.validate((3,2,8,8), (4,3,5,5), 'valid')
self.failUnlessRaises(AssertionError, f)
def test_missing_info(self):
"""
Test convolutions for various pieces of missing info.
"""
self.validate(None, None,
N_image_shape=(3,2,8,8),
N_filter_shape=(4,2,5,5))
self.validate((3,2,None,None), None,
N_image_shape=(3,2,8,8),
N_filter_shape=(4,2,5,5))
self.validate((None,2,None,None), (None,2,5,5),
N_image_shape=(3,2,8,8),
N_filter_shape=(4,2,5,5))
def test_full_mode(self):
"""
Tests basic convolution in full mode and case where filter
is larger than the input image.
"""
self.validate((3,2,5,5), (4,2,8,8), 'full')
def f():
self.validate((3,2,5,5), (4,2,8,8), 'valid')
self.failUnlessRaises(Exception, f)
def test_wrong_input(self):
"""
Make sure errors are raised when image and kernel are not 4D tensors
"""
try:
self.validate((3,2,8,8), (4,2,5,5), 'valid', input = T.dmatrix())
self.validate((3,2,8,8), (4,2,5,5), 'valid', filters = T.dvector())
self.validate((3,2,8,8), (4,2,5,5), 'valid', input = T.dtensor3())
# should never reach here
self.fail()
except:
pass
import unittest, sys, time
import numpy
import theano.tensor as tensor
from theano.tests import unittest_tools as utt
from theano.tensor.signal.downsample import DownsampleFactorMax, max_pool2D
from theano import function, Mode
class TestDownsampleFactorMax(unittest.TestCase):
def setUp(self):
utt.seed_rng()
@staticmethod
def numpy_max_pool2D(input, ds, ignore_border=False):
'''Helper function, implementing max_pool2D in pure numpy'''
if len(input.shape) < 2:
raise NotImplementedError('input should have at least 2 dim, shape is %s'\
% str(input.shape))
xi=0
yi=0
if not ignore_border:
if input.shape[-2] % ds[0]:
xi += 1
if input.shape[-1] % ds[1]:
yi += 1
out_shp = list(input.shape[:-2])
out_shp.append(input.shape[-2]/ds[0]+xi)
out_shp.append(input.shape[-1]/ds[1]+yi)
output_val = numpy.zeros(out_shp)
for k in numpy.ndindex(input.shape[:-2]):
for i in range(output_val.shape[-2]):
ii = i*ds[0]
for j in range(output_val.shape[-1]):
jj = j*ds[1]
patch = input[k][ii:ii+ds[0],jj:jj+ds[1]]
output_val[k][i,j] = numpy.max(patch)
return output_val
def test_DownsampleFactorMax(self):
rng = numpy.random.RandomState(utt.fetch_seed())
# generate random images
maxpoolshps = ((1,1),(2,2),(3,3),(2,3))
imval = rng.rand(4,10,64,64)
images = tensor.dtensor4()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
## Pure Numpy computation
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
f = function([images,],[output,])
output_val = f(imval)
assert numpy.all(output_val == numpy_output_val)
#DownsampleFactorMax op
maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(images)
f = function([images], maxpool_op)
output_val = f(imval)
assert (numpy.abs(output_val - numpy_output_val) < 1e-5).all()
def test_DownsampleFactorMax_grad(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = ((1,1),(3,2),(2,3))
imval = rng.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
def mp(input):
return DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(input)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_2D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = ((1,1),(3,2))
imval = rng.rand(4,7)
images = tensor.dmatrix()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
def mp(input):
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_3D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = [(1,2)]
imval = rng.rand(2,3,4)
images = tensor.dtensor3()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
c = tensor.sum(output)
c_val = function([images], c)(imval)
g = tensor.grad(c, images)
g_val = function([images], [g.shape, tensor.min(tensor.min(tensor.min(g))), tensor.max(tensor.max(tensor.max(g)))])(imval)
def mp(input):
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_6D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = [(3,2)]
imval = rng.rand(2,1,1,1,3,4)
images = tensor.TensorType('float64', [False]*6)()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
def mp(input):
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
if __name__ == '__main__':
unittest.main()
......@@ -133,6 +133,8 @@ class test_CAReduce(unittest.TestCase):
((5, 6), (1, )),
((5, 6), ()),
((2, 3, 4, 5), (0, 1, 3)),
((5, 0), (0, )),
((5, 0), (1, )),
((), ())]:
x = TensorType('float64', [(entry == 1) for entry in xsh])('x')
e = CAReduce(add, axis = tosum)(x)
......@@ -149,7 +151,7 @@ class test_CAReduce(unittest.TestCase):
def test_c(self):
self.with_linker(gof.CLinker())
if __name__ == '__main__':
unittest.main()
import unittest
import numpy
import theano.tensor as T
from theano.configparser import config, AddConfigVar, IntParam
from theano.configparser import config, AddConfigVar, StrParam
import os, sys
AddConfigVar('unittests.rseed',
"Seed to use for randomized unit tests",
IntParam(666))
"Seed to use for randomized unit tests. Special value 'random' means using a seed of None.",
StrParam(666))
def fetch_seed(pseed=None):
"""
Returns the seed to use for running the unit tests.
If an explicit seed is given, it will be used for seending numpy's rng.
If not, it will try to get a seed from the THEANO_UNITTEST_SEED variable.
If THEANO_UNITTEST_SEED is set to "random", it will seed the rng. with None,
If an explicit seed is given, it will be used for seeding numpy's rng.
If not, it will use config.unittest.rseed (its default value is 666).
If config.unittest.rseed is set to "random", it will seed the rng with None,
which is equivalent to seeding with a random seed.
If THEANO_UNITTEST_SEED is not defined, it will use a default seed of 666.
Useful for seeding RandomState objects.
>>> rng = numpy.random.RandomState(unittest_tools.fetch_seed())
......@@ -35,7 +34,7 @@ def fetch_seed(pseed=None):
#backport
#seed = int(seed) if seed else None
except ValueError:
print >> sys.stderr, 'Error: THEANO_UNITTEST_SEED contains '\
print >> sys.stderr, 'Error: config.unittests.rseed contains '\
'invalid seed, using None instead'
seed = None
......@@ -49,7 +48,7 @@ def seed_rng(pseed=None):
seed = fetch_seed(pseed)
if pseed and pseed!=seed:
print >> sys.stderr, 'Warning: using seed given by THEANO_UNITTEST_SEED=%i'\
print >> sys.stderr, 'Warning: using seed given by config.unittests.rseed=%i'\
'instead of seed %i given as parameter' % (seed, pseed)
numpy.random.seed(seed)
return seed
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论