提交 f4cecd01 authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merged

......@@ -137,17 +137,23 @@ following methods:
the gradient of the Op's output but rather the gradient of some
other criterion C with respect to the Op's input.
If the outputs of your op are [ f_1, ... f_n], then
``output_derivatives`` gives [ grad_{f_1} C, grad_{f_2} C, ... , grad_{f_n} C ]
If the inputs of your op are [x_1, ..., x_n], then your Op.grad should
return [ grad_{x_1} C, grad_{x_2} C, ..., grad_{x_n} C ]
where (grad_{y} z)_i = partial z / partial y_i (and i can have any
number of dimensions)
(note: in the case where i is 2 dimensional, this definition of grad
If the outputs of your op are :math:`[ f_1, ... f_n]`, then
``output_derivatives`` gives
:math:`[ grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C) ]`.
If the inputs of your op are :math:`[x_1, ..., x_m]`, then your Op.grad
should return :math:`[ grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C) ]`,
where :math:`(grad_{y} z)_i = \frac{\partial z}{\partial y_i}`
(and :math:`i` can have any number of dimensions).
(Note: in the case where i is 2 dimensional, this definition of grad
is different from the standard mathematical definition of the gradient
of a scalar with respect to a matrix, where you transpose the indices)
of a scalar with respect to a matrix, where you transpose the indices.)
In other words, :func:`grad` does not return
:math:`\frac{\partial f_i}{\partial x_j}`, but
:math:`\frac{\partial C}{\partial x_j} =
\frac{\partial C}{\partial f_i} \cdot \frac{\partial f_i}{\partial x_j}`.
Both the partial derivation and that multiplication have to be done by
:func:`grad`.
At a bare minimum, a new Op must define ``make_node`` and ``perform``, which have no defaults.
......
......@@ -8,7 +8,7 @@ arrays efficiently. Theano features:
* **tight integration with numpy** -- Use `numpy.ndarray` in Theano-compiled functions.
* **transparent use of a GPU** -- Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
* **symbolic differentiation** -- Let Theano do your derivatives.
* **efficient symbolic differentiation** -- Theano does your derivatives for function with one or many inputs.
* **speed and stability optimizations** -- Get the right answer for ``log(1+x)`` even when ``x`` is really tiny.
* **dynamic C code generation** -- Evaluate expressions faster.
* **extensive unit-testing and self-verification** -- Detect and diagnose many types of mistake.
......
......@@ -272,61 +272,84 @@ that fail on your platform (use the ``theano-users@googlegroups.com`` mailing li
but note that you must first register to it, by going to `theano-users`_).
Windows V1(bigger install, but simpler instruction + try instruction for gpu)
-----------------------------------------------------------------------------
- If you don't have Python yet, I would recommend the Python(x,y)
distribution. It is only one installation and contains the most
important packages (NumPy, SciPy, IPython, Matplotlib, Mingw, Nose,
etc.).
- Next you should install Mercurial and download Theano.
Command line version: http://mercurial.selenic.com/
One gui version(Tortoise hg): http://mercurial.selenic.com/downloads/
the command is
hg clone http://hg.assembla.com/theano Theano
- Theano needs 1 environment variable:
a) system variable PYTHONPATH with value C:\...\Theano
(installation folder of theano)
In the USERPROFILE directory
you should create a configuration file .theanorc.
.theanorc.txt is also accepted on Windows if the environment
variable THEANORC is not set. The file should have the following
two lines:
Windows V1 (bigger install, but simpler instructions + tentative GPU instructions)
----------------------------------------------------------------------------------
- Install `Python(x,y) <http://www.pythonxy.com>`_. It is a single installation
file that contains additional packages like Numpy, Scipy, IPython, Matplotlib,
MinGW, Nose, etc. Note that this implies you do not already have a Python
installation (if you do have one, then you will need to either remove it first,
or install those additional packages manually as described in the V2 instructions).
- Install Mercurial. You can use either the
`command-line version <http://groups.google.com/group/theano-users>`_ or the
`GUI version <http://groups.google.com/group/theano-users>`_ (for the purpose of
simply downloading Theano, the command line version is enough).
- Start a shell (hit the Start button and run the ``cmd`` command) and navigate to
the directory where you want to install Theano (it is ok to just stay in the
default directory, which should be your user profile directory). Then download
Theano with:
.. code-block:: bash
hg clone http://hg.assembla.com/theano Theano
- Add (or edit) the PYTHONPATH environment variable (available through Control
Panel / System / Advanced / Environment Variables), so that it contains
the full installation directory of Theano. Restart a shell (``cmd``) to verify
that it works:
.. code-block:: bash
C:\Users\login>echo %PYTHONPATH%
C:\Users\login\Theano
- Create a new ``.theanorc`` text file (or ``.theanorc.txt``, which is easier
to create under Windows) in your user profile directory, with the following
two lines:
.. code-block:: bash
[blas]
ldflags =
This is enough to run Theano! It will use NumPy for dot products
which, however, is pretty fast (see below).
To test that theano read correctly the .theanorc or .theanorc.txt file,
in python run:
- You are now ready to run Theano.
It will use NumPy for dot products, which is still pretty fast (see below for
optional instructions on how to compile your own BLAS library).
To test that theano read correctly your configuration file, run Python (easiest
way is to just type ``python`` in a shell) and run the following:
.. code-block:: bash
.. code-block:: python
import theano
print theano.config.blas.ldflags
That should print the same content as what is in your config file.
- (Optional) If you want a faster and/or multithreaded BLAS library, you can
compile GotoBLAS2. I did not try to compile ATLAS because I read that
it is slower than Goto and very difficult to compile (especially for
This should print the same content as in your config file, i.e. nothing
(if your config file was not read properly, it would print ``-lblas``).
Windows V1.5 (optional follow-up to V1 instructions)
----------------------------------------------------
- If you want a faster and/or multithreaded BLAS library, you can
compile GotoBLAS2. We did not try to compile ATLAS because we read that
it is slower than Goto and more difficult to compile (especially on
Windows).
GotoBLAS can be downloaded after a simple registration (the most
recent version is 1.13 right now). To compile it, you need to install
two more programs: MSYS and Perl (for example ActivePerl). Actually,
the GotoBLAS makefiles expect a full UNIX environment (like Cygwin)
but the BLAS compilation seems to work with only MSYS and Perl. The
LAPACK compilation fails, but we don't need it anyway.
GotoBLAS2 can be downloaded
`here <http://www.tacc.utexas.edu/tacc-projects/gotoblas2/downloads>`_
after registering on the website (we tested v1.13).
To compile it, you also need to install MSYS and Perl (for instance
ActivePerl).
The GotoBLAS makefiles actually expect a full UNIX environment (like
Cygwin) but the BLAS compilation seems to work with only MSYS and Perl.
The LAPACK compilation fails, but is not needed anyways.
(WORK-IN-PROGRESS, TO BE CONTINUED)
Compilation steps:
a) Unpack GotoBLAS2 (using 7-zip or the MSYS tar command)
a) Unpack GotoBLAS2 (using `7-zip <http://www.7-zip.org/>`_ or the
MSYS tar command).
b) open MSYS, change directory to GotoBLAS2 (cd command)
......@@ -354,7 +377,7 @@ Windows V1(bigger install, but simpler instruction + try instruction for gpu)
b) The Windows binaries of NumPy were compiled with ATLAS and are surprisingly fast.
c) GotoBLAS is even faster, in particular if you have several kernels.
- (Optional) Gpu on Windows. Not sur it work! Can you report success/error on the `theano-user <http://groups.google.ca/group/theano-users?pli=1>`_ mailing list?
- (Optional) Gpu on Windows. Not sur it work! Can you report success/error on the `theano-users <http://groups.google.com/group/theano-users>`_ mailing list?
Those are indication for 32 bits version of python, the one that come with pythonxy is 32 bits.
......
......@@ -174,9 +174,9 @@ Config Attributes
A list of optimizer tags that we don't want included in the default Mode.
If multiple tags, separate them by ':'.
Ex: to remove the elemwise inplace optimizer(slow for big graph)
use the flags: optimizer_excluding:inplace_opt
inplace_opt is the name of that optimization.
Ex: to remove the elemwise inplace optimizer(slow for big graph),
use the flags: optimizer_excluding:inplace_opt, where
inplace_opt is the name of that optimization.
.. attribute:: optimizer_including
......
......@@ -18,11 +18,15 @@ awkward to use when :func:`tensor.grad` can do the job.
.. function:: grad_sources_inputs(sources, graph_inputs, warn_type=True)
A gradient source is a pair (``r``, ``g_r``), in which ``r`` is a `Variable`, and ``g_r`` is a
`Variable` that is a gradient wrt ``r``.
A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
``v``. More specifically, ``g_v`` is the gradient of an external
scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
This function traverses the graph backward from the ``r`` sources,
calling ``op.grad(...)`` for all ops with some non-None gradient on an output.
calling ``op.grad(...)`` for all ops with some non-None gradient
on an output, to compute gradients of ``cost`` wrt intermediate
variables and ``graph_inputs``.
The ``op.grad(...)`` functions are called like this:
......@@ -30,14 +34,20 @@ awkward to use when :func:`tensor.grad` can do the job.
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to ``op.grad`` should return a list or tuple: one symbolic gradient per input.
If ``op`` has a single input, then ``op.grad`` should return a list or tuple of length 1.
This call to ``op.grad`` should return a list or tuple: one symbolic
gradient per input. These gradients represent the gradients of
the same implicit ``cost`` mentionned above, wrt ``op.inputs``. Note
that this is **not** the same as the gradient of ``op.outputs`` wrt
``op.inputs``.
For each input wrt to which ``op`` is not differentiable, it should return ``None`` instead
of a `Variable` instance.
If ``op`` has a single input, then ``op.grad`` should return a list
or tuple of length 1.
For each input wrt to which ``op`` is not differentiable, it should
return ``None`` instead of a `Variable` instance.
If a source ``r`` receives a gradient from another source ``r2``,
then the effective gradient on ``r`` is the sum of both gradients.
If a source ``r`` receives a gradient from another source ``r2``, then the effective
gradient on ``r`` is the sum of both gradients.
:type sources: list of pairs of Variable: (v, gradient-on-v) to
initialize the total_gradient dictionary
......
......@@ -463,15 +463,6 @@ TensorVariable
(0, 'x', 1) -> AxB to Ax1xB
(1, 'x', 0) -> AxB to Bx1xA
See :func:`dimshuffle`.
(The above link just points back to this paragraph. Maybe whoever
wrote that meant to refer to theano.tensor.DimShuffle)
.. method:: flatten(ndim=1)
Returns a view of this tensor with `ndim` dimensions, whose shape for the first
......@@ -500,11 +491,11 @@ Shaping and Shuffling
=====================
To re-order the dimensions of a variable, to insert or remove broadcastable
dimensions, see :meth:`_tensor_py_operators.dimshuffle`
dimensions, see :meth:`_tensor_py_operators.dimshuffle`.
.. function:: shape(x)
Returns lvector representing shape of `x`
Returns an lvector representing the shape of `x`.
.. function:: reshape(x, newshape, ndim=None)
......@@ -562,7 +553,8 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`
Make `x` broadcastable in the specified axes `axes`. For
example, `unbroadcast(x,0)` will make the first dimension of `x`
broadcastable.
broadcastable. When performing the function, if the length of `x`
along that dimension is not 1, a ``ValueError`` will be raised.
.. function:: flatten(x, outdim=1)
......@@ -1105,10 +1097,14 @@ Gradient / Differentiation
Return symbolic gradients for one or more variables with respect to some
cost.
For more information about how automatic differentiation works in Theano,
see :mod:`gradient`. For information on how to implement the gradient of
a certain Op, see :func:`grad`.
:type cost: 0-d tensor variable
:type wrt: tensor variable or list of tensor variables
:type g_cost: same as `cost`
:type g_cost: same as type of `cost`
:type consider_constant: list of variables
:type warn_type: bool
......@@ -1121,7 +1117,7 @@ Gradient / Differentiation
expression
:rtype: variable or list of variables (matching `wrt`)
:returns: gradients with respect to cost for each of the `wrt` terms
:returns: gradients of the cost with respect to each of the `wrt` terms
......
.. _basictutaliasing:
===============
Memory Aliasing
===============
The aggressive reuse of memory is one of the ways Theano makes code fast, and
it's important for the correctness and speed of your program that you understand
which buffers Theano might alias to which others.
This file describes the principles for how Theano treats memory, and explains
when you might want to change the default behaviour of some functions and
methods for faster performance.
The memory model: 2 spaces
==========================
There are some simple principles that guide Theano's treatment of memory. The
main idea is that there is a pool of memory managed by Theano, and Theano tracks
changes to values in that pool.
1. Theano manages its own memory space, which typically does not overlap with
the memory of normal python variables that non-theano code creates.
1. Theano functions only modify buffers that are in its memory space.
1. Theano's memory space includes the buffers allocated to store shared
variables and the temporaries used to evaluate Functions.
1. Physically, Theano's memory space may be spread across the host, a GPU
device(s), and in the future may even include objects on a remote machine.
1. The memory allocated for a shared variable buffer is unique: it is never
aliased to another shared variable.
1. Theano's managed memory is constant while Theano Functions are not running
and theano library code is not running.
1. The default behaviour of Function is to return user-space values for
outputs, and to expect user-space values for inputs.
The distinction between Theano-managed memory and user-managed memory can be
broken down by some theano functions (e.g. In, Out,shared, get_value)) by using
a ``borrow=True`` flag. This can make those methods faster (by avoiding copy
operations) at the expense of risking subtle bugs in the overall program (by
aliasing memory).
The rest of this section is aimed at helping you to understand when it is safe
to use the ``borrow=True`` argument and reap the benefit of faster code.
Borrowing when creating shared variables
========================================
A ``borrow`` argument can be provided to the shared-variable constructor.
.. code-block:: python
import numpy, theano
np_array = numpy.ones(2, dtype='float32')
s_default = shared(np_array)
s_false = shared(np_array, borrow=False)
s_true = shared(np_array, borrow=True)
By default (``s_default``) and when explicitly setting ``borrow=False``, the
shared variable we construct gets a [deep] copy of ``np_array``. So changes we
subsequently make to ``np_array`` have no effect on our shared variable.
.. code-block:: python
np_array += 1 # now it is an array of 2.0 s
s_default.value # -> array([1.0, 1.0])
s_false.value # -> array([2.0, 2.0])
If we are running this with the CPU as the device,
then changes we make to np_array *right away* will show up in ``s_false.value``
because numpy arrays are mutable, and ``s_false`` is using the ``np_array``
object as it's internal buffer.
However, this aliasing of ``np_array`` and ``s_false`` is *inconsistent and fragile*!
It is inconsistent because if Theano is using a GPU device, then the borrow flag
has no effect.
It is fragile because
if we call a theano function that updates the value of ``s_false`` the aliasing
relationship *may* or *may not* be broken (it depends on what the Theano
function does).
*Take home message:*
It is safe practice (and a good idea) to use ``borrow=True`` in a shared
variable constructor when the shared variable stands for a large object (in
terms of memory footprint) and you do not want to create copies of it in memory
.
It is not a reliable technique to use ``borrow=True`` to modify shared variables
by side-effect, because with some devices (e.g. GPU devices) this technique will
not work.
Borrowing when accessing value of shared variables
==================================================
Retrieving
----------
A ``borrow`` argument can also be used to control how a shared variable's value is retrieved.
.. code-block:: python
s = shared(np_array)
v_false = s.get_value(borrow=False) # N.B. borrow default is False
v_true = s.get_value(borrow=True)
When ``borrow=False`` is passed to ``get_value``, it means that the return value
may not be aliased to any part of Theano's internal memory.
When ``borrow=True`` is passed to ``get_value``, it means that the return value
*might* be aliased to some of Theano's internal memory.
But both of these calls might create copies of the internal memory.
The reason that ``borrow=True`` might still make a copy is that the internal
representation of a shared variable might not be what you expect. When you
create a variable by passing a numpy array for example, then ``get_value()``
must return a numpy array too. That's how Theano can make the GPU use
transparent. But when you are using a GPU (or in future perhaps a remote machine), then the numpy.ndarray
is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to
``get_value``. It will never copy the internal object (always return in
constant time), but might return various datatypes depending on contextual
factors (e.g. the compute device, the dtype of the numpy array).
.. code-block:: python
v_internal = s.get_internal_value(borrow=True, return_internal_type=True)
It is possible to use ``borrow=False`` in conjunction with
``return_internal_type=True``, which will return a deep copy of the internal object.
This is primarily for internal debugging, not for typical use.
*Take home message:*
It is safe (and sometimes much faster) to use ``get_value(borrow=True)`` when
your code does not modify the return value. *Do not use this to modify a shared
variable by side-effect* because it will make your code device-dependent.
Modification of GPU variables by this sort of side-effect is impossible.
Assigning
---------
Shared variables also have a ``set_value`` method that can accept an optional ``borrow=True`` argument.
The semantics are similar to those of creating a new shared variable -
``borrow=False`` is the default and ``borrow=True`` means that Theano *may*
reuse the buffer you provide as the internal storage for the variable.
A standard pattern for manually updating the value of a shared variable is as
follows.
.. code-block:: python
s.set_value(
some_inplace_fn(s.get_value(borrow=True)),
borrow=True)
This pattern works regardless of the compute device, and when the compute device
makes it possible to expose Theano's internal variables without a copy, then it
goes as fast as an in-place update.
Borrowing when constructing Function objects
============================================
A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects
that control how ``theano.function`` handles its arguments and return value[s].
.. code-block:: python
import theano, theano.tensor
x = theano.tensor.matrix()
y = 2*x
f = theano.function([theano.In(x, borrow=True)], theano.Out(y, borrow=True))
Borrowing an input means that Theano will treat the argument you provide as if
it were part of Theano's pool of temporaries. Consequently, your input
may be reused as a buffer (and overwritten!) during the computation of other variables in the
course of evaluating that function (e.g. ``f``).
Borrowing an output means that Theano will not insist on allocating a fresh
output buffer every time you call the function. It will possibly reuse the same one as
a previous call, and overwrite the old contents. Consequently, it may overwrite
old return values by side effect.
It is also possible to pass an ``return_internal_type=True`` flag to the ``Out``
variable which has the same interpretation as the ``return_internal_type`` flag
to the shared variable's ``get_value`` function.
*Take home message:*
When an input ``x`` to a function is not needed after the function returns and you
would like to make it available to Theano as additional workspace, then consider
marking it with ``In(x, borrow=True)``. It may make the function faster and
reduce its memory requirement.
When a return value ``y`` is large (in terms of memory footprint), and you only need to read from it once, right
away when it's returned, then consider marking it with an ``Out(y,
borrow=True)``.
......@@ -144,6 +144,8 @@ array([[ 0.25 , 0.19661193],
The resulting function computes the gradient of its first argument
with respect to the second. In this way, Theano can be used for
`automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_.
As opposed to what this page tell, Theano do efficient symbolic differentiation
even for function with many inputs.
.. note::
......
......@@ -510,7 +510,7 @@ class Function(object):
# Set positional arguments
i = 0
for arg in args:
for arg_index, arg in enumerate(args):
#TODO: provide a Param option for skipping the filter if we
# really want speed.
s = self.input_storage[i]
......@@ -520,7 +520,7 @@ class Function(object):
try:
s.storage[0] = s.type.filter(arg, strict=s.strict)
except Exception, e:
e.args = tuple(list(e.args)+["Bad input argument at index %d"%(list(args).index(arg))])
e.args = tuple(list(e.args)+["Bad input argument at index %d" % arg_index])
raise
s.provided += 1
i+=1
......
import os
import subprocess
import logging
from theano.configparser import TheanoConfigParser, AddConfigVar, EnumStr, StrParam, IntParam, FloatParam, BoolParam
_logger = logging.getLogger('theano.configdefaults')
def warning(*msg):
_logger.warning('WARNING theano.configdefaults: '+' '.join(msg))
config = TheanoConfigParser()
AddConfigVar('floatX',
......@@ -24,10 +31,22 @@ AddConfigVar('mode',
"Default compilation mode",
EnumStr('Mode', 'ProfileMode', 'DebugMode', 'FAST_RUN', 'FAST_COMPILE', 'PROFILE_MODE', 'DEBUG_MODE'))
#Keep the default linker the same as the one for the mode FAST_RUN
AddConfigVar('linker',
"Default linker. If not None, will use this linker with the Mode object(not ProfileMode or DebugMode)",
EnumStr('c|py', 'py', 'c', 'c|py_nogc', 'c&py'))
# Test whether or not gcc is present: disable C code if it is not
try:
subprocess.Popen('gcc', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Keep the default linker the same as the one for the mode FAST_RUN
AddConfigVar('linker',
"Default linker. If not None, will use this linker with the Mode "+
"object(not ProfileMode or DebugMode)",
EnumStr('c|py', 'py', 'c', 'c|py_nogc', 'c&py'))
except OSError:
# gcc is not present, linker should default to python only
AddConfigVar('linker',
"Default linker. If not None, will use this linker with the Mode object(not ProfileMode or DebugMode)",
EnumStr('py', 'c|py', 'c', 'c|py_nogc', 'c&py'))
warning('GCC not detected ! Theano will be unable to execute optimized '+
'C-implementations (for both CPU and GPU) and will default to '+
'Python implementations. Performance will be severely degraded.')
#Keep the default optimizer the same as the one for the mode FAST_RUN
AddConfigVar('optimizer',
......@@ -88,19 +107,23 @@ AddConfigVar('experimental.mrg',
###
### To disable some warning about old bug that are fixed now.
###
AddConfigVar('warn.old_bug_default',
"If False, will disable by default the warning about old Theano bug. If you never used Theano, you set it to False.",
BoolParam(True))
default_warn = config.warn.old_bug_default
AddConfigVar('warn.argmax_pushdown_bug',
"Warn if in past version of Theano we generated a bug with the optimisation theano.tensor.nnet.nnet.local_argmax_pushdown optimization. Was fixed 27 may 2010",
BoolParam(True))
BoolParam(default_warn))
AddConfigVar('warn.gpusum_01_011_0111_bug',
"Warn if we are in a case where old version of Theano had a silent bug with GpuSum pattern 01,011 and 0111 when the first dimensions was bigger then 4096. Was fixed 31 may 2010",
BoolParam(True))
BoolParam(default_warn))
AddConfigVar('warn.sum_sum_bug',
"Warn if we are in a case where Theano version between version 9923a40c7b7a and the 2 august 2010(fixed date), generated an error in that case. This happen when their is 2 consecutive sum in the graph, bad code was generated. Was fixed 2 August 2010",
BoolParam(True))
BoolParam(default_warn))
AddConfigVar('warn.sum_div_dimshuffle_bug',
"Warn if previous versions of Theano (between rev. 3bd9b789f5e8, 2010-06-16, and cfc6322e5ad4, 2010-08-03) would have given incorrect result. This bug was triggered by sum of division of dimshuffled tensors.",
BoolParam(True))
BoolParam(default_warn))
......@@ -50,7 +50,7 @@ class CLinkerObject(object):
- `MethodNotDefined`: Subclass does not implement this method
"""
raise utils.MethodNotDefined("c_lib_dirs", type(self), self.__class__.__name__)
raise utils.MethodNotDefined("c_header_dirs", type(self), self.__class__.__name__)
def c_libraries(self):
"""Optional: Return a list of libraries required by code returned by
......
......@@ -23,7 +23,7 @@ def debugprint(obj, depth=-1, print_type=False, file=None):
:type file: None, 'str', or file-like object
:param file: print to this file ('str' means to return a string)
:returns: str if `file`=='str', else file arg
:returns: string if `file` == 'str', else file arg
Each line printed represents a Variable in the graph.
The indentation of each line corresponds to its depth in the symbolic graph.
......
......@@ -18,6 +18,7 @@ from theano.sandbox.cuda.nnet import (
GpuCrossentropySoftmax1HotWithBiasDx,
GpuSoftmax, GpuSoftmaxWithBias)
from theano.compile import optdb
from theano.tensor.blas import _is_real_vector, _is_real_matrix
#optdb.print_summary() # this shows what is currently registered (in a so-far crude way...)
gpu_optimizer = EquilibriumDB()
......@@ -57,12 +58,12 @@ class InputToGpuOptimizer(Optimizer):
if new_input.type==input.type:
env.replace_validate(input, new_input, "To allow further optimisation to move Ops to gpu")
except Exception, e:
#as we currently only support float32, this can fail.
#Using try except make that we won't need
#as we currently only support float32, this can fail.
#Using try except make that we won't need
pass
#we register it before all other gpu optimizer to be sure that the input are on the gpu.
gpu_seqopt.register('InputToGpuOptimizer', InputToGpuOptimizer(),
gpu_seqopt.register('InputToGpuOptimizer', InputToGpuOptimizer(),
0, 'fast_run', 'fast_compile', 'merge')#TODO: how to make it mandatory for gpu_seqopt?
@local_optimizer([])
......@@ -72,9 +73,9 @@ def local_cut_gpu_host_gpu(node):
if tensor.opt.opt.check_chain(node, host_from_gpu, gpu_from_host):
return [node.inputs[0].owner.inputs[0]]
return False
gpu_cut_copies.register('cut_gpu_host_transfers', local_cut_gpu_host_gpu,
gpu_cut_copies.register('cut_gpu_host_transfers', local_cut_gpu_host_gpu,
'fast_run', 'inplace', 'gpu')
gpu_cut_copies.register('cut_gpu_constant_transfers', tensor.opt.constant_folding,
gpu_cut_copies.register('cut_gpu_constant_transfers', tensor.opt.constant_folding,
'fast_run', 'gpu')
#register it into canonicalize to allow other optimization to work without
#botering with this useless pattern.
......@@ -83,7 +84,7 @@ compile.optdb['canonicalize'].register('local_cut_gpu_host_gpu', local_cut_gpu_h
@register_opt()
@local_optimizer([])
def local_gpu_elemwise_0(node):
"""elemwise(..., host_from_gpu, ...)
"""elemwise(..., host_from_gpu, ...)
-> host_from_gpu(elemwise(gpu_from_host, ..., gpu_from_host)
"""
if isinstance(node.op, tensor.Elemwise):
......@@ -92,25 +93,29 @@ def local_gpu_elemwise_0(node):
#don't set any inplace pattern. gpu_insert_inplace_optimizer will do it later
new_op = GpuElemwise(node.op.scalar_op)
# first establish that float32 can store all inputs
upcastable = set(['float32', 'int8', 'int16', 'uint8', 'uint16'])
# case 1 - all inputs are already float32
if numpy.all([i.type.dtype == 'float32' for i in node.inputs]):
#TODO: change this when fusion makes Elemwise with multiple outputs
return [host_from_gpu(new_op(*(gpu_from_host(i) for i in node.inputs)))]
# THIS IS PROBABLY TRUE....
# case 2 - it would still be ok if some inputs were upcast to float32
# first establish that float32 can store all inputs
upcastable = set(['float32', 'int8', 'int16', 'uint8', 'uint16'])
if numpy.all([i.type.dtype in upcastable for i in node.inputs]):
gpu_elemwise = new_op(*(gpu_from_host(i) for i in node.inputs))
# case 2 - it is still ok if some inputs were upcast to float32
elif numpy.all([i.type.dtype in upcastable for i in node.inputs]):
# second - establish that a new node with upcasted inputs has the same outputs
# types as the original node
casted = node.op.make_node(*[tensor.cast(i, 'float32') for i in node.inputs])
if [o.type for o in casted.outputs] == [o.type for o in node.outputs]:
new_inputs = [gpu_from_host(tensor.cast(i, 'float32')) for i in node.inputs]
gpu_elemwise = new_op(*new_inputs)
else:
return False
else:
return False
return [host_from_gpu(new_op(*new_inputs))]
gpu_elemwise = split_huge_add_or_mul(gpu_elemwise.owner).outputs[0]
return [host_from_gpu(gpu_elemwise)]
@register_opt()
@local_optimizer([])
def local_gpu_elemwise_1(node):
......@@ -124,7 +129,9 @@ def local_gpu_elemwise_1(node):
#don't set any inplace pattern. gpu_insert_inplace_optimizer will do it later
new_op = GpuElemwise(elemwise_node.op.scalar_op)
if all([i.dtype=='float32' for i in elemwise_node.inputs]):
return [new_op(*[gpu_from_host(i) for i in elemwise_node.inputs])]
gpu_elemwise = new_op(*[gpu_from_host(i) for i in elemwise_node.inputs])
gpu_elemwise = split_huge_add_or_mul(gpu_elemwise.owner).outputs[0]
return [gpu_elemwise]
return False
@register_opt()
......@@ -138,18 +145,68 @@ def local_gpu_dimshuffle_0(node):
input, = node.inputs
if input.owner and isinstance(input.owner.op, HostFromGpu):
# move the add to a GpuAdd
new_op = GpuDimShuffle(node.op.input_broadcastable,
new_op = GpuDimShuffle(node.op.input_broadcastable,
node.op.new_order)
return [host_from_gpu(new_op(gpu_from_host(input)))]
if node.op == gpu_from_host:
host_input = node.inputs[0]
if host_input.owner and isinstance(host_input.owner.op, tensor.DimShuffle):
dimshuffle_node = host_input.owner
new_op = GpuDimShuffle(dimshuffle_node.op.input_broadcastable,
new_op = GpuDimShuffle(dimshuffle_node.op.input_broadcastable,
dimshuffle_node.op.new_order)
return [new_op(gpu_from_host(dimshuffle_node.inputs[0]))]
return False
@register_opt()
@local_optimizer([])
def local_gpu_dot_to_dot22(node):
"""
gpu_from_host(dot) -> gpudot(gpu_from_host)
dot(host_from_gpu) -> host_from_gpu(gpudot)
This optimization solves the vector-matrix multiplication issue by
transforming the vector into a matrix, apply gpudot22 and reshaping
the output.
A more suitable solution would be to use the right cublas call
"""
if node.op == gpu_from_host:
host_input = node.inputs[0]
if host_input.owner and host_input.owner.op == tensor.basic.dot:
x, y = host_input.owner.inputs
# case one vector X matrix
if _is_real_vector(x) and _is_real_matrix(y):
new_op = GpuDimShuffle((False,), ['x',0])
shape_out = y.shape[0],dimshuffle(['x'])
gpu_x = new_op(gpu_from_host(x))
gpu_y = gpu_from_host(y)
# case two matrix X vector
elif _is_real_matrix(x) and _is_real_vector(y):
new_op = GpuDimShuffle((False,), [0,'x'])
shape_out = x.shape[1].dimshuffle(['x'])
gpu_x = gpu_from_host(x)
gpu_y = new_op(gpu_from_host(y))
return [GpuReshape(1)(gpu_dot22(gpu_x, gpu_y), shape_out)]
if node.op == tensor.basic.dot:
if numpy.any([(i.owner and i.owner.op == host_from_gpu) for i in node.inputs]):
x, y = node.inputs
if _is_real_vector(x) and _is_real_matrix(y):
new_op = GpuDimShuffle((False,), ['x',0])
shape_out = y.shape[0].dimshuffle(['x'])
gpu_x = new_op(gpu_from_host(x))
gpu_y = gpu_from_host(y)
elif _is_real_matrix(x) and _is_real_vector(y):
new_op = GpuDimShuffle((False,), [0,'x'])
shape_out = x.shape[1].dimshuffle(['x'])
gpu_x = gpu_from_host(x)
gpu_y = new_op(gpu_from_host(y))
return [host_from_gpu(GpuReshape(1)(gpu_dot22(gpu_x, gpu_y),
shape_out))]
return False
@register_opt()
@local_optimizer([])
def local_gpu_dot22(node):
......@@ -188,6 +245,50 @@ def local_gpu_dot22scalar(node):
return [host_from_gpu(gpu_dot22scalar(gpu_from_host(x), gpu_from_host(y),tensor.blas._as_scalar(scalar)))]
return False
@register_opt()
@local_optimizer([])
def local_gpu_gemv_as_gemm(node):
"""
gpu_from_host(gemv) -> gpu_gemv(gpu_from_host)
gemm(host_from_gpu) -> host_from_gpu(gpu_gemv)
This optimization solves the vector-matrix multiplication issue by
transforming the vector into a matrix, apply gpudot22 and reshaping
the output.
A more suitable solution would be to use the right cublas call
"""
gemvs = {tensor.blas.gemv_inplace: gpu_gemm_inplace,
tensor.blas.gemv_no_inplace: gpu_gemm_no_inplace}
if node.op == gpu_from_host:
host_input = node.inputs[0]
if host_input.owner and host_input.owner.op in gemvs:
op = host_input.owner.op
z, a, x, y, b = host_input.owner.inputs
return [
GpuDimShuffle((False,True),[0])(gemvs[op](
GpuDimShuffle((False,),[0,'x'])(gpu_from_host(z))
, a
, gpu_from_host(x)
, GpuDimShuffle((False,),[0,'x'])(gpu_from_host(y))
, b))]
if node.op in gemvs:
z, a, x, y, b = node.inputs
x_on_gpu = (x.owner and x.owner.op == host_from_gpu)
y_on_gpu = (y.owner and y.owner.op == host_from_gpu)
z_on_gpu = (z.owner and z.owner.op == host_from_gpu)
if x_on_gpu or y_on_gpu or z_on_gpu:
return [host_from_gpu(GpuDimShuffle((False,True),[0])(
gemvs[node.op](
GpuDimShuffle((False,),[0,'x'])(gpu_from_host(z))
, a
, gpu_from_host(x)
, GpuDimShuffle((False,),[0,'x'])(gpu_from_host(y))
, b)))]
return False
@register_opt()
@local_optimizer([])
def local_gpu_gemm(node):
......@@ -421,7 +522,7 @@ def local_gpu_crossentorpy_softmax_argmax_1hot_with_bias(node):
x,b,y = node.inputs
if x.owner and x.owner.op == host_from_gpu:
gpu_x, = x.owner.inputs
# if y is a cast to integers, we can go to the underlying thing if we want,
# if y is a cast to integers, we can go to the underlying thing if we want,
# since this gpu op will cast to integers internally anyway
int_cast_ops = (
tensor.basic._convert_to_int32,
......@@ -436,8 +537,8 @@ def local_gpu_crossentorpy_softmax_argmax_1hot_with_bias(node):
gpu_from_host(b),
gpu_from_host(cast(y, 'float32')))
am_dtype = node.outputs[2].type.dtype
return [host_from_gpu(gpu_nll),
host_from_gpu(gpu_sm),
return [host_from_gpu(gpu_nll),
host_from_gpu(gpu_sm),
cast(host_from_gpu(gpu_am), am_dtype)]
return False
......@@ -633,7 +734,7 @@ else:
#GpuElemwise inplace
gpu_insert_inplace_optimizer = tensor.opt.insert_inplace_optimizer_op(GpuElemwise)
compile.optdb.register('gpu_inplace_opt', gpu_insert_inplace_optimizer, 75, 'fast_run', 'inplace','gpu_inplace')
compile.optdb.register('gpu_inplace_opt', gpu_insert_inplace_optimizer, 75, 'fast_run', 'inplace','gpu_inplace')
@register_opt()
@local_optimizer([tensor.Alloc])
......@@ -654,7 +755,7 @@ def local_gpualloc(node):
new_out = host_from_gpu(gpu_alloc(val2, *shp))
# Sigh. it's an annoying thing about theano
# that you can't add information to the graph.
# If for some reason it has come to light that
# If for some reason it has come to light that
# one of the dimensions is broadcastable, we have to hide that
# or the optimization won't go through.
if new_out.type != old_out.type:
......@@ -668,24 +769,42 @@ def local_gpualloc(node):
#if old_out.type != new_out.type:
#import pdb; pdb.set_trace()
return [new_out]
@register_opt()
@local_optimizer([])
def local_gpu_huge_add_or_mul(node):
"""
The gpu code generator for elemwise fusion knows when there are too many inputs, but add
doesn't. So there's this workaround.
The CUDA c compiler limits the number of arguments to 256 bytes' worth or something.
def max_inputs_to_GpuElemwise(node):
"""
return the maximum number of input this Apply node to an GpuElemwise can accept.
This is needed as currently their is a limit of 256 bytes of paramter for the gpu function.
This mesure the number of paramter we put in our gpu function and compute the maximum number of inputs that respect the 256 bytes limits.
"""
if isinstance(node.op, GpuElemwise) and node.op.scalar_op in (scal.add, scal.mul):
if len(node.inputs)>10:
# TODO: look up how arguments are passed to the GpuElemwise function
# and figure out how many arguments can fit in 256 bytes.
# this will depend on the number of dimensions in each argument.
# The current heuristic to chop at 10 prevents crashing in the
# pylearn/algorithms/tests/test_mcRBM feature extractor.
return [node.op(
node.op(*node.inputs[:10]),
node.op(*node.inputs[10:]))]
#TODO: detect the size of gpu pointeur and c int.
int_size = 8
ptr_size = 8
argument_limit = 256 # if was 240, with this note: 16 bytes are used for block and thread coords etc.
size_param_mandatory = int_size #for numels
size_param_mandatory += int_size * node.inputs[0].type.ndim # for the shape#node.outputs[0].ndim+1+node.inputs[0].ndim+1
size_param_mandatory += sum((ptr_size + int_size * i.type.ndim) for i in node.outputs)
nb_bytes_avail = argument_limit-size_param_mandatory
nb_bytes_per_inputs = (node.inputs[0].ndim*int_size)+ptr_size
max_nb_inputs = nb_bytes_avail//nb_bytes_per_inputs
return max_nb_inputs
def split_huge_add_or_mul(node):
"""
For add and mul, it can happen that we have too much input
That will make nvcc fail compilation of our current code.
We don't want node in the graph that can't execute
as this break DebugMode.
This should not happen for other GpuElemwise as their is only the fusion
that can generate op with too much input and it check for that.
"""
if node.op.scalar_op in (scal.add, scal.mul):
max_nb_inputs = max_inputs_to_GpuElemwise(node)
while len(node.inputs)>max_nb_inputs:
inner_op = []
for i in range(0,len(node.inputs),max_nb_inputs):
inner_op.append(node.op(*node.inputs[i:i+max_nb_inputs]))
node = node.op(*inner_op).owner
return node
......@@ -759,28 +759,47 @@ def test_many_arg_elemwise():
rng = numpy.random.RandomState( [1,2,3])
for num_args in [25]:
rows = rng.randint(1,5)
cols = rng.randint(1,5)
for op_to_test in [ theano.tensor.add, theano.tensor.mul ]:
args = [ numpy.cast['float32'](rng.randn(rows,cols)) for arg in xrange(0,num_args) ]
symb_args = [ theano.tensor.fmatrix() for arg in xrange(0,num_args) ]
for nb_dim in [2,3,4,5]:
shapes = [rng.randint(1,5) for i in range(nb_dim)]
args = [ numpy.cast['float32'](rng.randn(*shapes)) for arg in xrange(0,num_args) ]
symb_args = [ theano.tensor.TensorType('float32', (False,)*nb_dim)() for arg in xrange(0,num_args) ]
outputs = []
for mode in [ mode_with_gpu, mode_without_gpu ]:
f = theano.function( symb_args, op_to_test(*symb_args), mode = mode )
#theano.printing.debugprint(f)
outputs.append( f( * args) )
#assert that the test was done on the gpu.
if mode is mode_with_gpu:
assert any([isinstance(node.op, cuda.GpuElemwise) for node in f.maker.env.nodes])
outputs = []
for mode in [ mode_with_gpu, mode_without_gpu ]:
#test the optijmization local_gpu_elemwise_0
f = theano.function( symb_args, op_to_test(*symb_args), mode = mode.excluding("local_gpu_elemwise_1") )
outputs.append( f( * args) )
#assert that the test was done on the gpu.
if mode is mode_with_gpu:
assert any([isinstance(node.op, cuda.GpuElemwise) for node in f.maker.env.nodes])
#test the optijmization local_gpu_elemwise_1
f = theano.function( symb_args,
cuda.gpu_from_host(op_to_test(*symb_args)),
mode = mode.excluding("local_gpu_elemwise_0") )
out = f( * args)
#assert that the test was done on the gpu.
if mode is mode_with_gpu:
assert any([isinstance(node.op, cuda.GpuElemwise) for node in f.maker.env.nodes])
assert numpy.allclose(out, outputs[-1])
results_gpu, results_cpu = outputs
results_gpu, results_cpu = outputs
assert numpy.allclose(results_gpu, results_cpu)
def test_duplicate_arg_elemwise():
A = theano.tensor.fmatrix()
B = A + A
f = theano.function([A],B, mode = mode_with_gpu)
assert numpy.allclose(results_gpu, results_cpu)
Aval = numpy.random.RandomState([1,2,3]).randn(5,5)
Bval = Aval + Aval
assert numpy.allclose(Bval,f(Aval))
......
import sys, time
from theano.compile.sharedvalue import shared
import numpy
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
from theano.compile.pfunc import pfunc
from theano import tensor
import theano
import numpy
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda
if cuda.cuda_available == False:
raise SkipTest('Optional package cuda disabled')
import theano.compile.mode
from theano.sandbox.cuda.type import CudaNdarrayType
if theano.config.mode=='FAST_COMPILE':
......@@ -49,6 +49,9 @@ def test_int_pow():
#theano.printing.debugprint(f)
def test_softmax():
x = tensor.fmatrix()
......@@ -78,7 +81,7 @@ def test_opt_gpujoin_onlyajoin():
b = cuda.shared_constructor(_b)
c = tensor.join(1,a,b)
f = theano.function([], c, mode=mode_with_gpu)
#theano.printing.debugprint(f)
......@@ -105,7 +108,7 @@ def test_opt_gpujoin_joinvectors_elemwise_then_minusone():
b_prime = tensor.sin(b)
c = tensor.join(0,a_prime,b_prime)
d = c[:-1]
f = theano.function([], d, mode=mode_with_gpu)
......
import sys, time
from theano import shared
from theano.compile.pfunc import pfunc
from theano import tensor
import numpy
import theano
import theano.tensor as TT
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.cuda_available == False:
raise SkipTest('Optional package cuda disabled')
import theano.sandbox.cuda as tcn
import theano.sandbox.cuda as cuda
import theano.sandbox.cuda.basic_ops as B
import theano.sandbox.cuda.blas as blasop
import theano.compile.mode
from theano.tests import unittest_tools as utt
### Tolerance factor used in this tests !!!
atol = 1e-6
##########################
if theano.config.mode=='FAST_COMPILE':
mode_with_gpu = theano.compile.mode.get_mode('FAST_RUN').including('gpu')
mode_without_gpu = theano.compile.mode.get_mode('FAST_RUN').excluding('gpu')
else:
mode_with_gpu = theano.compile.mode.get_default_mode().including('gpu')
mode_without_gpu = theano.compile.mode.get_default_mode().excluding('gpu')
def test_dot_vm():
''' Test vector dot matrix '''
v = theano.shared( numpy.array(numpy.random.rand(2), dtype='float32'))
m = theano.shared( numpy.array(numpy.random.rand(2,2),
dtype='float32'))
no_gpu_f = theano.function([], theano.dot(v,m), mode = mode_without_gpu)
gpu_f = theano.function([], theano.dot(v,m), mode = mode_with_gpu)
# Assert they produce the same output
assert numpy.allclose(no_gpu_f(), gpu_f(), atol = atol)
# Assert that the gpu version actually uses gpu
assert sum([isinstance(node.op, blasop.GpuDot22) for node in
gpu_f.maker.env.toposort() ]) == 1
def test_dot_mv():
''' Test matrix dot vector '''
v = theano.shared( numpy.array(numpy.random.rand(2), dtype='float32'))
m = theano.shared( numpy.array(numpy.random.rand(2,2),
dtype='float32'))
no_gpu_f = theano.function([], theano.dot(m,v), mode = mode_without_gpu)
gpu_f = theano.function([], theano.dot(m,v), mode = mode_with_gpu)
# Assert they produce the same output
assert numpy.allclose(no_gpu_f(), gpu_f(), atol = atol)
# Assert that the gpu version actually uses gpu
assert sum([isinstance(node.op, blasop.GpuDot22) for node in
gpu_f.maker.env.toposort() ]) == 1
def test_gemv1():
''' Is this the same test as test_gemv2 ? '''
v1 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32'))
v2 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32'))
m = theano.shared( numpy.array(numpy.random.rand(2,2), dtype='float32'))
no_gpu_f = theano.function([], v2+theano.dot(m,v1), mode = mode_without_gpu)
gpu_f = theano.function([], v2+theano.dot(m,v1), mode = mode_with_gpu)
# Assert they produce the same output
assert numpy.allclose(no_gpu_f(), gpu_f(), atol = atol)
# Assert that the gpu version actually uses gpu
assert sum([isinstance(node.op, blasop.GpuGemm) for node in
gpu_f.maker.env.toposort() ]) == 1
def test_gemv2():
''' Is this the same test as test_gemv1 ? '''
v1 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32'))
v2 = theano.shared( numpy.array(numpy.random.rand(2) , dtype='float32'))
m = theano.shared( numpy.array(numpy.random.rand(2,2), dtype='float32'))
no_gpu_f = theano.function([], v2+theano.dot(v1,m), mode = mode_without_gpu)
gpu_f = theano.function([], v2+theano.dot(v1,m), mode = mode_with_gpu)
# Assert they produce the same output
assert numpy.allclose(no_gpu_f(), gpu_f(), atol = atol)
# Assert that the gpu version actually uses gpu
assert sum([isinstance(node.op, blasop.GpuGemm) for node in
gpu_f.maker.env.toposort() ]) == 1
if __name__=='__main__':
test_dot_vm()
test_dot_mv()
test_gemv1()
test_gemv2()
......@@ -15,7 +15,7 @@ class DebugLinker(gof.WrapLinker):
copy_originals = False,
check_types = True,
compare_variables = True,
compare_fn = lambda x, y: x == y):
compare_fn = (lambda x, y: x == y)):
gof.WrapLinker.__init__(self,
linkers = linkers,
wrapper = self.wrapper)
......
......@@ -46,9 +46,6 @@ class Images2Neibs(Op):
return Apply(self, [ten4, neib_shape,neib_step], [T.matrix(dtype=ten4.type.dtype)])
def grad(self, (pvals, unis), (gz,)):
return [None, None]
def c_code_cache_version(self):
return (3,)
......@@ -224,6 +221,11 @@ def neibs2images(neibs, neib_shape, original_shape):
Return a 4d tensor of shape `original_shape`.
"""
# TODO: handle the case where patches either overlap
# TODO: handle the case where patches are not directly adjacent
# TODO: at least separate these cases so that the following code does not incorrectly
# handle them by accident.
raise NotImplementedError('check for overlapping patches or non-adjacent patches.')
neibs = T.as_tensor_variable(neibs)
neib_shape = T.as_tensor_variable(neib_shape)
original_shape = T.as_tensor_variable(original_shape)
......
......@@ -3,22 +3,27 @@ import unittest
from nose.plugins.skip import SkipTest
import numpy
import scipy.sparse as sp
import scipy.sparse
try:
import scipy.sparse as sp
import scipy.sparse
except ImportError:
pass#the variable enable_sparse will be used to disable the test file.
import theano
from theano import compile
from theano.sparse import enable_sparse
if enable_sparse == False:
raise SkipTest('Optional package sparse disabled')
from theano.sparse.basic import _is_dense, _is_sparse, _is_dense_variable, _is_sparse_variable
from theano.sparse.basic import _mtypes
from theano.sparse import as_sparse_variable, enable_sparse, CSC, CSR, CSM, CSMProperties, SparseType, StructuredDotCSC
from theano.sparse import as_sparse_variable, CSC, CSR, CSM, CSMProperties, SparseType, StructuredDotCSC
from theano.sparse import add, structured_dot, transpose
from theano.sparse import csc_from_dense, csr_from_dense, dense_from_sparse
from theano.tests import unittest_tools as utt
from theano import tensor
if enable_sparse == False:
raise SkipTest('Optional package sparse disabled')
def eval_outputs(outputs):
return compile.function([], outputs)()[0]
......
......@@ -57,7 +57,7 @@ __oplist_constructor_list = []
"""List of functions to be listed as op constructors in the oplist (`gen_oplist`, doc/oplist.txt)."""
def constructor(f):
"""Add `f` to :doc:`oplist`.
Make `f` appear as a constructor in the oplist (`gen_oplist`, doc/oplist.txt).
"""
__oplist_constructor_list.append(f)
......@@ -80,7 +80,7 @@ if 0:
if hasattr(x, '_as_CudaNdarrayVariable'):
return x._as_CudaNdarrayVariable() #TODO: pass name and ndim arguments
return as_tensor_variable(x, name, ndim)
def as_tensor_variable(x, name = None, ndim=None):
"""Return `x`, transformed into a `TensorType`
......@@ -158,7 +158,7 @@ class NumpyAutocaster(object):
When config.floatX is float32 (at the time of calling), then this function downcasts float
and numpy.float arguments to numpy.float32, if float32 is in the self.dtypes list.
Python ints are always 64bit and floats are always double precision.
This class uses the algorithm in __call__ to use a narrower dtype when no precision would
be lost, and to even lose precision when this is demanded by the list of dtypes (e.g. to
......@@ -182,7 +182,7 @@ class NumpyAutocaster(object):
# recall: float is numpy.float
if isinstance(x, float) and config.floatX in self.dtypes and config.floatX == 'float32':
return theano._asarray(x, dtype='float32')
for dtype in self.dtypes:
x_ = theano._asarray(x, dtype=dtype)
if numpy.all(x == x_):
......@@ -200,7 +200,7 @@ autocast_float = NumpyAutocaster(('float32', 'float64'))
# this autocasting, and in future, our ops might be smarter about factoring out upcasts. The
# advantage of this mechanism is to combine it with floatX so that 1.0 + xmatrix() will always
# have the same type as the xmatrix().
#
#
class autocast_float_as(object):
"""This class makes it possible to temporarily and locally adjust autocasting behaviour.
......@@ -222,7 +222,7 @@ class autocast_float_as(object):
def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
"""Return a symbolic `Constant` with value `x`
:Exceptions:
- `TypeError`: `x` could not be converted to a numpy.ndarray
- `ValueError`: `x` could not be expanded to have ndim dimensions
......@@ -295,19 +295,19 @@ if int(config.tensor.cmp_sloppy)>1:
# useful to test the GPU as they don't use extended precision and
# this cause some difference bigger then the normal sloppy.
float32_atol = 5e-4
float32_rtol = 1e-3
float32_rtol = 1e-3
float64_rtol = 1e-4
float64_atol = 1e-3
elif int(config.tensor.cmp_sloppy):
float32_atol = 1e-4
float32_rtol = 1e-3
float32_rtol = 1e-3
float64_rtol = 1e-4
float64_atol = 1e-3
else:
#If you change those value in test don't forget to put them back when the test end.
#Don't forget the case when the test fail.
float32_atol = 1e-5
float32_rtol = 1e-3
float32_rtol = 1e-3
# defaults in numpy.allclose
float64_rtol = 1.0000000000000001e-05
......@@ -395,7 +395,7 @@ class TensorType(Type):
if self.dtype=='floatX':
self.dtype=config.floatX
### broadcastable is immutable, and all elements are either True or False
self.broadcastable = tuple(bool(b) for b in broadcastable)
self.broadcastable = tuple(bool(b) for b in broadcastable)
self.dtype_specs() # error checking is done there
self.name = name
self.numpy_dtype = numpy.dtype(self.dtype)
......@@ -438,12 +438,12 @@ class TensorType(Type):
except Exception, e:
return str(e)
return "value is valid"
def dtype_specs(self):
"""Return a tuple (python type, c type, numpy typenum) that corresponds to
self.dtype.
This function is used internally as part of C code generation.
"""
#TODO: add more type correspondances for e.g. int32, int64, float32,
......@@ -483,7 +483,7 @@ class TensorType(Type):
a_eq_b = (a==b)
r = numpy.all(a_eq_b)
if r: return True
# maybe the trouble is that there are NaNs
# maybe the trouble is that there are NaNs
a_missing = numpy.isnan(a)
if a_missing.any():
b_missing = numpy.isnan(b)
......@@ -546,7 +546,7 @@ class TensorType(Type):
#set it to False
cmp_elemwise = numpy.where(both_inf&cmp_elemwise,
a==b,cmp_elemwise)
#check the sign of the inf
both_inf = numpy.where(both_inf,a==b,both_inf)
......@@ -554,7 +554,7 @@ class TensorType(Type):
both_inf += a_inf
if allow_remove_nan:
both_missing += a_missing
# Combine all information.
return (cmp_elemwise + both_missing + both_inf).all()
......@@ -634,8 +634,6 @@ class TensorType(Type):
def c_extract(self, name, sub):
"""Override `CLinkerOp.c_extract` """
# TODO: make the error message print out the dtype of the
# input received.
return """
%(name)s = NULL;
if (py_%(name)s == Py_None) {
......@@ -649,11 +647,13 @@ class TensorType(Type):
PyErr_SetString(PyExc_ValueError, "expected an ndarray");
%(fail)s
}
type_num_%(name)s = ((PyArrayObject*)py_%(name)s)->descr->type_num; //we expect %(type_num)s
if (!PyArray_ISALIGNED(py_%(name)s)) {
PyErr_SetString(PyExc_NotImplementedError, "expected an aligned array");
PyErr_Format(PyExc_NotImplementedError,
"expected an aligned array of type %%d (%(type_num)s), got non-aligned array of type %%d",
%(type_num)s, type_num_%(name)s);
%(fail)s
}
type_num_%(name)s = ((PyArrayObject*)py_%(name)s)->descr->type_num; //we expect %(type_num)s
if (type_num_%(name)s != %(type_num)s) {
PyErr_Format(PyExc_ValueError, "expected type_num %%d (%(type_num)s) got %%d", %(type_num)s, type_num_%(name)s);
%(fail)s
......@@ -885,7 +885,7 @@ class _tensor_py_operators:
def __abs__(self): return abs_(self)
def __neg__(self): return neg(self)
#CASTS
#CASTS
#### REMOVED THESE BECAUSE PYTHON appears to require __int__ to return an int. -JB 20081112
#def __int__(self): return convert_to_int32(self)
#def __float__(self): return convert_to_float64(self)
......@@ -898,7 +898,7 @@ class _tensor_py_operators:
def __ge__(self,other): return ge(self, other)
#BITWISE
def __invert__(self): return invert(self)
def __invert__(self): return invert(self)
def __and__(self,other): return and_(self, other)
def __or__(self,other): return or_(self, other)
def __xor__(self,other): return xor(self, other)
......@@ -910,27 +910,27 @@ class _tensor_py_operators:
# def __ixor__(self, other): return _xor_inplace(self, other)
#ARITHMETIC - NORMAL
def __add__(self,other):
def __add__(self,other):
try:
return add(self,other)
except Exception, e:
return NotImplemented
def __sub__(self,other):
def __sub__(self,other):
try:
return sub(self,other)
except Exception, e:
return NotImplemented
def __mul__(self,other):
try:
def __mul__(self,other):
try:
return mul(self,other)
except Exception, e:
return NotImplemented
def __div__(self,other):
try:
def __div__(self,other):
try:
return div_proxy(self,other)
except Exception, e:
return NotImplemented
def __pow__(self,other):
def __pow__(self,other):
try:
return pow(self,other)
except Exception, e:
......@@ -1031,12 +1031,12 @@ class _tensor_py_operators:
def __getslice__(self, *args):
args = slice(*args),
return self.__getitem__(args)
#COPYING
def copy(self):
return tensor_copy(self)
def __iter__(self):
def __iter__(self):
try:
for i in xrange(get_vector_length(self)):
yield self[i]
......@@ -1044,7 +1044,7 @@ class _tensor_py_operators:
# This prevents accidental iteration via builtin.sum(self)
raise TypeError('TensorType does not support iteration. '
'Maybe you are using builtin.sum instead of theano.tensor.sum? (Maybe .max?)')
# CONVENIENT ACCESS TO TYPE PROPERTIES
ndim = property(lambda self: self.type.ndim)
......@@ -1053,7 +1053,7 @@ class _tensor_py_operators:
"""The broadcastable signature of this tensor.
See :doc:`broadcasting` for details.
"""
dtype = property(lambda self: self.type.dtype)
""" The dtype of this tensor. """
......@@ -1095,7 +1095,7 @@ class _tensor_py_operators:
def get_constant_value(self):
return get_constant_value(self)
class TensorVariable(Variable, _tensor_py_operators):
"""Subclass to add the tensor operators to the basic `Variable` class."""
TensorType.Variable = TensorVariable
......@@ -1115,7 +1115,7 @@ class TensorConstantSignature(tuple):
#N.B. compare shape to ensure no broadcasting in ==
#N.B. compare elementwise last because it is the most expensive check
return (t0 == t1) and (d0.shape == d1.shape) \
and (self.sum == other.sum) and (numpy.all(d0 == d1))
and (self.sum == other.sum) and (numpy.all(d0 == d1))
def __hash__(self):
t, d = self
return hashtype(self) ^ hash(t) ^ hash(d.shape) ^ hash(self.sum)
......@@ -1130,7 +1130,7 @@ class TensorConstantSignature(tuple):
class TensorConstant(Constant, _tensor_py_operators):
"""Subclass to add the tensor operators to the basic `Constant` class.
To create a TensorConstant, use the `constant` function in this module.
"""
def signature(self):
......@@ -1139,7 +1139,7 @@ TensorType.Constant = TensorConstant
class TensorValue(Value, _tensor_py_operators):
"""Subclass to add the tensor operators to the basic `Value` class.
To create a TensorValue, use the `value` function in this module.
:note: Value is deprecated by SharedVariable
......@@ -1167,8 +1167,8 @@ def _elemwise(scalar_op, name, doc_prefix=''):
inplace = elemwise.Elemwise(inplace_scalar_op, {0: 0}, name = name+"_inplace")
# don't add the inplace versions, they aren't supposed to be part of the user interface
_constructor_list.append(straight)
_constructor_list.append(straight)
# This is here so that gen_oplist can detect which module declared these variables.
straight.__module__ = 'tensor'
......@@ -1181,7 +1181,7 @@ def _elemwise(scalar_op, name, doc_prefix=''):
def _redefine(real_symbol_value, module='tensor'):
"""Replace the value associated with a function symbol.
This is useful to trick epydoc into doing what we want. It's a hack.
"""
real_symbol_value.__module__ = 'tensor.basic'
......@@ -1275,7 +1275,7 @@ def _conversion(real_value, name):
_convert_to_int8 = _conversion(elemwise.Elemwise(scal.convert_to_int8), 'int8')
"""Cast to 8-bit integer"""
_convert_to_int16 = _conversion(elemwise.Elemwise(scal.convert_to_int16), 'int16')
"""Cast to 16-bit integer"""
......@@ -1287,7 +1287,7 @@ _convert_to_int64 = _conversion(elemwise.Elemwise(scal.convert_to_int64), 'int64
_convert_to_uint8 = _conversion(elemwise.Elemwise(scal.convert_to_uint8), 'uint8')
"""Cast to unsigned 8-bit integer"""
_convert_to_uint16 = _conversion(elemwise.Elemwise(scal.convert_to_uint16), 'uint16')
"""Cast to unsigned 16-bit integer"""
......@@ -1324,9 +1324,9 @@ _cast_mapping = {
'complex128': _convert_to_complex128}
@constructor
def cast(x, dtype):
"""Symbolically cast `x` to a Tensor of type `dtype`."""
"""Symbolically cast `x` to a Tensor of type `dtype`."""
if dtype=='floatX': dtype = config.floatX
_x = as_tensor_variable(x)
if _x.type.dtype == dtype:
return _x
......@@ -1382,7 +1382,7 @@ pprint.assign(_shape, printing.MemberPrinter('shape'))
class MaxAndArgmax(Op):
"""Calculate the max and argmax over a given axis.
.. note::
If axis is None it means to calculate the max over the last dimension which is
......@@ -1393,7 +1393,7 @@ class MaxAndArgmax(Op):
nin=2 # tensor, axis
nout=2 # max val, max idx
E_axis = 'invalid axis'
def __eq__(self,other):
return type(self)==type(other)
def __hash__(self):
......@@ -1422,7 +1422,7 @@ class MaxAndArgmax(Op):
inputs = [x, axis]
#TODO: figure things out if axis is a constant
broadcastable = [False] * (x.type.ndim - 1)
outputs = [tensor(x.type.dtype, broadcastable,name='max'),
outputs = [tensor(x.type.dtype, broadcastable,name='max'),
tensor('int32', broadcastable,name='argmax')]
return Apply(self, inputs, outputs)
def perform(self, node, (x, axis), (max, max_idx)):
......@@ -1445,7 +1445,7 @@ class MaxAndArgmax(Op):
# gMax * dMax/dx + gArgMax * dArgMax/dx, gMax * dMax/daxis + gArgMax * dArgMax/daxis
# g_max has one less dimension than x, so you need to complete g_max to x's shape
# when axis=0 the broadcasting mechanism does it automatically
if not ( axis.data == 0 or axis.data == x.ndim-1):
raise NotImplementedError('MaxAndArgmax gradient with axis corresponding to internal dimension')
if axis.data==0:
......@@ -1874,7 +1874,7 @@ if 0:
class Alloc(gof.Op):
"""Create a Tensor from an initial value and a desired shape
alloc(value, shape0, shape1, ..., shapeN)
alloc(value, shape0, shape1, ..., shapeN)
Returns an N-dimensional tensor initialized by `value` using something equivalent to
>>> z = numpy.zeros(shape, value.dtype)
......@@ -1883,7 +1883,7 @@ class Alloc(gof.Op):
The result has N dimensions, has the dtype of `value` and is obtained by broadcasting value
over the output ndarray.
This Op is used to replace fill() during optimizations because after shapes are lifted,
This Op is used to replace fill() during optimizations because after shapes are lifted,
the first argument to fill can often be pruned from the graph.
"""
def __init__(self):
......@@ -1943,7 +1943,7 @@ class Alloc(gof.Op):
pass
return ret
alloc = Alloc()
pprint.assign(alloc, printing.FunctionPrinter('alloc'))
......@@ -2006,8 +2006,8 @@ def mean(input, axis = None, op = False):
:param axis: compute the mean along this axis of the tensor.
None means all axes (like numpy).
:type axis: None or int or (list of int) (see `Sum`)
:note: for gpu, if you manually cast the input to float32 before calling
:note: for gpu, if you manually cast the input to float32 before calling
mean, everything will be done on the gpu.
"""
if op:
......@@ -2117,7 +2117,7 @@ class Default(gof.Op):
if x is None:
# why copy? Theano can't yet understand out[0] being a view of either x or y,
# so we can be a view of x, but only a copy of y.
out[0] = default.copy()
out[0] = default.copy()
else:
out[0] = x
default = Default()
......@@ -2221,7 +2221,7 @@ class Subtensor(Op):
integers are indexes into the inputs array, and the start/stop/step members
of each slice are also integer indexes into the inputs array (or None). The
inputs array is the tensor x, followed by scalar integer variables.
@todo: add support for advanced tensor indexing (in Subtensor_dx too).
The idx_list is a tuple similar in structure to the sort of key you might expect in numpy's
......@@ -2230,7 +2230,11 @@ class Subtensor(Op):
can additionally be a Scalar instance, and slice components can also be Scalar instances
too.
"""
e_invalid = 'The index list is longer than the number of dimensions of the tensor.'
e_invalid = ( 'The index list is longer (size %d) than the number of '
'dimensions of the tensor(namely %d). You are asking for '
'a dimension of the tensor that does not exist! You might '
'need to use dimshuffle to add extra dimension to your '
'tensor.')
e_subslice = 'nested slicing is not supported'
e_indextype = "Invalid index type or slice for Subtensor"
debug = 0
......@@ -2246,7 +2250,7 @@ class Subtensor(Op):
elif isinstance(entry, slice):
helper(entry.start)
helper(entry.stop)
helper(entry.step)
helper( entry.step)
for idx in idxs:
helper(idx)
return ret
......@@ -2312,11 +2316,13 @@ class Subtensor(Op):
def make_node(self, x, *inputs):
x = as_tensor_variable(x)
inputs = tuple(self.my_as_scalar(a) for a in inputs)
idx_list = list(self.idx_list)
if len(idx_list) > x.type.ndim:
raise ValueError(Subtensor.e_invalid,
(len(idx_list), x.type.ndim))
exception = ValueError(Subtensor.e_invalid%(len(idx_list),
x.type.ndim))
exception.subtensor_invalid = True
raise exception
#infer the broadcasting pattern
padded = idx_list + [slice(0,sys.maxint,1)] * (x.type.ndim - len(idx_list))
......@@ -2412,7 +2418,7 @@ class Subtensor(Op):
msg += [(entry.start, entry.stop, entry.step)]
else:
msg += [entry]
idx_list = tuple(msg)
#backport
#idx_list = tuple((entry.start, entry.stop, entry.step)
......@@ -2472,7 +2478,7 @@ class SubtensorPrinter:
msg3 = ""
else:
msg3 = ":%s" % entry.step
sidxs.append("%s:%s%s" % (msg1, msg2, msg3))
#backport
#sidxs.append("%s:%s%s" % ("" if entry.start is None or entry.start == 0 else entry.start,
......@@ -2531,10 +2537,10 @@ def inc_subtensor(x, y, inplace=False, set_instead_of_inc=False):
class IncSubtensor(Op):
"""Increment a subtensor.
This is like numpy's
This is like numpy's
x[i,j,k] += y
It is used internally to implement the gradient on SubTensor.
:param set_instead_of_inc: if True set the subtensor to the value instead
......@@ -2592,11 +2598,13 @@ class IncSubtensor(Op):
def make_node(self, x, y, *inputs):
x, y = map(as_tensor_variable, [x, y])
inputs = tuple(map(Subtensor.my_as_scalar, inputs))
idx_list = list(self.idx_list)
if len(idx_list) > x.type.ndim:
raise ValueError(Subtensor.e_invalid,
(len(idx_list), x.type.ndim))
exception = ValueError(Subtensor.e_invalid%(len(idx_list),
x.type.ndim))
exception.subtensor_invalid = True
raise exception
#infer the broadcasting pattern
padded = idx_list + [slice(0,sys.maxint,1)] * (x.type.ndim - len(idx_list))
......@@ -2671,11 +2679,11 @@ class Split(Op):
"""Partition a `TensorVariable` along some axis.
.. python::
x = vector()
splits = lvector()
# you have to declare right away how many split_points there will be.
ra, rb, rc = split(x, splits, n_splits = 3, axis = 0)
ra, rb, rc = split(x, splits, n_splits = 3, axis = 0)
f = function([x, splits], [ra, rb, rc])
......@@ -2709,16 +2717,16 @@ class Split(Op):
node = self.make_node(*inputs, **kwargs)
node.tag.trace = traceback.extract_stack()[:-1]
return node.outputs
def make_node(self, x, axis, splits):
"""WRITEME"""
x = as_tensor_variable(x)
axis = as_tensor_variable(axis)
splits = as_tensor_variable(splits)
if splits.type not in int_vector_types:
if splits.type not in int_vector_types:
raise TypeError('splits must have type tensor.lvector', splits.type)
if axis.type not in int_types:
if axis.type not in int_types:
raise TypeError('axis must have type lscalar', axis.type)
# # The following lines are necessary if we allow splits of zero
......@@ -2738,21 +2746,21 @@ class Split(Op):
#in python 2.4, x.shape[numpy.asarray(1)] don't work.
if sys.version_info[0:2]==(2, 4) and axis.size==1:
axis=int(axis)
try:
len_along_axis = x.shape[axis]
except :
raise ValueError('Split.perform() with axis=(%s) is invalid for x.shape==(%s)'
%(axis, x.shape))
if len(splits) != self.len_splits:
raise ValueError('In Split.perform(), len(splits) != len_splits.',
raise ValueError('In Split.perform(), len(splits) != len_splits.',
(len(splits), self.len_splits))
if numpy.sum(splits) != len_along_axis:
raise ValueError('The splits sum to %s, expected %s' % (numpy.sum(splits), len_along_axis))
if not all(splits):
raise ValueError('Cannot have a split of zero.')
# Checking is done, let's roll the splitting algorithm!
# Basically we step along the given axis of x, extracting subtensors of size splits[i]
# as we go along.
......@@ -2826,7 +2834,7 @@ def addbroadcast(x, *axes):
def unbroadcast(x, *axes):
"""
Make the input impossible to broadcast in the specified axes.
We apply the opt here to don't pollute the graph especially during the gpu optimization
"""
rval = Rebroadcast(*[(axis, False) for axis in axes])(x)
......@@ -2835,7 +2843,7 @@ def unbroadcast(x, *axes):
def patternbroadcast(x, broadcastable):
"""
Make the input impossible to broadcast in the specified axes.
We apply the opt here to don't pollute the graph especially during the gpu optimization
"""
rval = Rebroadcast(*[(i,broadcastable[i]) for i in range(len(broadcastable))])(x)
......@@ -2853,7 +2861,7 @@ class Join(Op):
For joins involving scalar values, see @stack.
.. python::
x, y, z = tensor.matrix(), tensor.matrix(), tensor.matrix()
u = tensor.vector()
......@@ -2952,7 +2960,7 @@ class Join(Op):
return [None] + split_gz
else:
# assume that this isn't differentiable
return [None] * (1 + len(tensors))
return [None] * (1 + len(tensors))
def _native_grad(self, axis_and_tensors, (gz,)):
"""WRITEME"""
......@@ -3006,7 +3014,7 @@ pprint.assign(lambda pstate, r: r.owner and isinstance(r.owner.op, Join),
@constructor
def shape_padleft(t, n_ones=1):
"""Reshape `t` by left-padding the shape with `n_ones` 1s
See also: `shape_padright` and `Dimshuffle`
"""
_t = as_tensor_variable(t)
......@@ -3017,7 +3025,7 @@ def shape_padleft(t, n_ones=1):
@constructor
def shape_padright(t, n_ones=1):
"""Reshape `t` by right-padding the shape with `n_ones` 1s
See also: `shape_padleft` and `Dimshuffle`
"""
_t = as_tensor_variable(t)
......@@ -3045,10 +3053,10 @@ def stack(*tensors):
@constructor
def concatenate(tensor_list, axis=0):
"""Alias for `join`(axis, *tensor_list).
This function is similar to `join`, but uses the signature of numpy's concatenate function.
This function
This function
:Exceptions:
- `TypeError` : the tensor_list must be a tuple or list
......@@ -3072,7 +3080,7 @@ def get_vector_length(v):
:Exceptions:
- `TypeError` : `v` hasn't the proper type.
- `ValueError` : No special case applies, the length is not known.
In general this is not possible, but for a number of special cases the length can be
determined at compile / graph-construction time. This function implements these special
cases.
......@@ -3165,7 +3173,7 @@ else:
class Reshape(Op):
"""Perform a reshape operation of the input x to the new shape shp.
The number of dimensions to which to reshape to (ndim) must be known at graph
The number of dimensions to which to reshape to (ndim) must be known at graph
build time."""
view_map = {0: [0]} #output 0 is potentially aliased to inputs [0]
def __init__(self, ndim, name = None):
......@@ -3248,7 +3256,7 @@ class Flatten(Op):
def grad(self, (x,), (g_out,)):
return [reshape(g_out, shape(x), x.ndim)]
def flatten(x, outdim=1):
def flatten(x, outdim=1):
return Flatten(outdim)(x)
class TileGrad(Op):
......@@ -3634,7 +3642,7 @@ class AdvancedSubtensor(Op):
# TODO: in general, we need to re-pack the inputs into a valid index, just like
# subtensor
out[0] = inputs[0].__getitem__(inputs[1:])
#return
#return
#raise NotImplementedError()
def grad(self, inputs, (gz,)):
......@@ -3703,7 +3711,7 @@ class Dot(Op):
return hash(type(self))
# the rationale for Dot22 is related to getting GEMM Ops into the graph. See Dot22 in tensor.blas for details.
def make_node(self, *inputs):
inputs = map(as_tensor_variable, inputs)
......@@ -3764,7 +3772,7 @@ class Dot(Op):
elif x.type.ndim == 1 and y.type.ndim > 1:
rval = dot(gz, y.T), outer(x.T, gz)
elif x.type.ndim > 1 and y.type.ndim == 1:
rval = outer(gz, y.T), dot(x.T, gz)
rval = outer(gz, y.T), dot(x.T, gz)
else:
rval = dot(gz, y.T), dot(x.T, gz)
return cast(rval[0], x.dtype), cast(rval[1], y.dtype)
......@@ -3865,7 +3873,7 @@ class TensorDot(Op):
if len(axes[0])!=len(axes[1]):
raise ValueError("We need that the axes 2 sub list of axes are of the same size")
assert len(axes[0])==len(axes[1])
self.axes = axes
def __eq__(self, other):
......@@ -3887,7 +3895,7 @@ class TensorDot(Op):
if axesdim > x.type.ndim or axesdim > y.type.ndim:
raise TypeError('Cannot sum over more dimensions than input. %i > %i,%i' %
axesdim, x.type.ndim, y.type.ndim)
outdim = x.type.ndim + y.type.ndim - 2*axesdim
output = tensor(dtype=scal.upcast(x.dtype, y.dtype),
broadcastable=[False]*outdim);
......@@ -3904,7 +3912,7 @@ class TensorDot(Op):
def grad(self, (x, y), (gz,)):
gx, gy = tensordot_grad(self.axes)(x, y, gz)
return [gx, gy]
def __str__(self):
return "tensordot"
tensordot = TensorDot
......@@ -3923,7 +3931,7 @@ class Outer(Op):
if nx != 1: raise TypeError('non-vector arg0 to outer()', x)
if ny != 1: raise TypeError('not-vector arg1 to outer()', y)
bz = [x.type.broadcastable[0], y.type.broadcastable[0]]
i_dtypes = [input.type.dtype for input in inputs]
......@@ -3997,8 +4005,8 @@ class numeric_grad:
#
# There is a relationship between the step size and the function value and the measurement
# error that is incurred due to rounding. The finite difference we measure is
# delta = f(x0) - f(x0+eps)
#
# delta = f(x0) - f(x0+eps)
#
# For maximum precision, f should be close to zero.
# For every power of 2 that f departs from zero, we lose a bit of precision in delta.
#
......@@ -4009,7 +4017,7 @@ class numeric_grad:
# bias into our measurement in general for non-linear functions.
#
# It would be interesting to have a version of numeric grad that used an adaptive stepsize.
#
#
# For now, we use a heuristic that catches very bad gradients, but is not perfectly
# accurate.
type_eps = {'float64': 1e-7,
......@@ -4161,7 +4169,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None, rel_tol=No
mode=None, cast_to_output_type=False):
""" Test a gradient by Finite Difference Method. Raise error on failure.
Example:
Example:
>>> verify_grad(theano.tensor.tanh,
(numpy.asarray([[2,3,4], [-1, 3.3, 9.9]]),),
rng=numpy.random)
......@@ -4187,8 +4195,8 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None, rel_tol=No
debug mode, which can be very slow if it has to verify a lot
of intermediate computations.
:note: This op does not support multiple outputs. In tests/test_scan.py there is
an experimental verify_grad that covers that case as well by using random
:note: This op does not support multiple outputs. In tests/test_scan.py there is
an experimental verify_grad that covers that case as well by using random
projections.
"""
assert isinstance(pt, (list,tuple))
......@@ -4244,7 +4252,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None, rel_tol=No
t_r = shared(random_projection())
#random projection of o onto t_r
cost = sum(t_r * o_output) #This sum() is defined above, it's not the builtin sum.
cost = sum(t_r * o_output) #This sum() is defined above, it's not the builtin sum.
cost_fn = function(tensor_pt, cost)
#todo-- determine if this is actually needed
......
......@@ -101,6 +101,11 @@ class DimShuffle(Op):
self.new_order = new_order
self.inplace = inplace
for i in xrange(len(new_order)-1):
j = new_order[i]
if j != 'x' and j in new_order[i+1:]:
raise ValueError("The same input dimension may not appear twice in the list of output dimensions", (new_order))
# list of dimensions of the input to drop
self.drop = []
i2j = {} # this maps i before dropping dimensions to j after dropping dimensions so self.shuffle can be set properly later on
......
......@@ -848,9 +848,39 @@ class ConvOp(Op):
using namespace std;
""" + tensor.blas.blas_header_text()
def use_blas(self):
""" Return True if we will generate code that use gemm.
"""
#the gemm version only support that case
if self.out_mode == 'valid' and self.dx==0 and self.dy==0:
#We use a faster version in those case.
if (self.imshp != self.imshp_logical or self.kshp != self.kshp_logical
or self.unroll_patch or self.unroll_batch>0 or self.unroll_kern>0):
return False
return True
return False
def c_libraries(self):
return tensor.blas.ldflags()
if self.use_blas():
return tensor.blas.ldflags()
return []
def c_compile_args(self):
if self.use_blas():
return tensor.blas.ldflags(libs=False, flags=True)
return []
def c_lib_dirs(self):
if self.use_blas():
return tensor.blas.ldflags(libs=False, libs_dir=True)
return []
def c_header_dirs(self):
if self.use_blas():
return tensor.blas.ldflags(libs=False, include_dir=True)
return []
def c_code(self, node, name, (img2d, filtersflipped), (z, ), sub):
if node.inputs[0].type.dtype != node.inputs[1].type.dtype:
raise NotImplementedError()
......
......@@ -119,11 +119,11 @@ def insert_inplace_optimizer_op(OP):
"""
#we should not validate too often as this take too much time to execute!
#It is the _dfs_toposort() fct in theano/gof/destroyhandler.py
#that take so much time.
#Should we try to use another lib that do toposort?
#that take so much time.
#Should we try to use another lib that do toposort?
# igraph: http://igraph.sourceforge.net/
# networkx: https://networkx.lanl.gov/
#Should we try to use cython?
#Should we try to use cython?
# compiling only that fct is not enought, should we try to add the deque class too?
# and init the deque and other list to an upper bound number of element?
#Should Theano do online toposort as in http://code.google.com/p/acyclic/?
......@@ -213,7 +213,7 @@ def insert_inplace_optimizer_op(OP):
insert_inplace_optimizer = insert_inplace_optimizer_op(T.Elemwise)
compile.optdb.register('inplace_opt', insert_inplace_optimizer, 75, 'fast_run', 'inplace')
compile.optdb.register('inplace_opt', insert_inplace_optimizer, 75, 'fast_run', 'inplace')
def register_canonicalize(lopt, *tags, **kwargs):
name = (kwargs and kwargs.pop('name')) or lopt.__name__
......@@ -304,7 +304,7 @@ class MakeVector(T.Op):
"""Concatenate a number of scalars together into a vector
This is a simple version of stack() that introduces far less cruft into the graph.
Should work with 0 inputs. The constant_folding optimization will remove it.
"""
def __init__(self, dtype='int64'):
......@@ -398,7 +398,7 @@ class Shape_i(T.Op):
%(out)s=(PyArrayObject*)PyArray_ZEROS(0, NULL, PyArray_INT64, 0);
((npy_int64*)PyArray_DATA(%(out)s))[0]=%(x)s->dimensions[%(i)s];
"""%locals()
elif node.inputs[0].type.__class__.__name__=="CudaNdarrayType":
#Don't want to import cuda stuff here.
return """
......@@ -413,12 +413,12 @@ class Shape_i(T.Op):
class ShapeFeature(object):
"""Graph optimizer for removing all calls to shape()
This optimizer replaces all Shapes and Subtensors of Shapes with Shape_i and MakeVector
Ops.
This optimizer has several goals:
1. to 'lift' Shapes to as close to the inputs as possible.
1. to 'lift' Shapes to as close to the inputs as possible.
2. to infer the shape of every node in the graph in terms of the input shapes.
3. remove all fills (T.second, T.fill) from the graph
......@@ -430,7 +430,7 @@ class ShapeFeature(object):
Many optimizations refuse to work on nodes with multiple clients.
Lifting is done by using an `<Op>.infer_shape` function if one is present, or else using a
conservative default. An Op that supports shape-lifting should define a
conservative default. An Op that supports shape-lifting should define a
infer_shape(self, node, input_shapes) function. The argument input_shapes is a tuple
of tuples... there is an interior tuple for each input to the node. The tuple has as many
elements as dimensions. The element in position i of tuple j represents the i'th shape
......@@ -439,9 +439,9 @@ class ShapeFeature(object):
the output[j].shape[i] of the function. If an output is not a TensorType, then None should
be returned instead of a tuple for that output.
For example the infer_shape for a matrix-matrix product would accept
For example the infer_shape for a matrix-matrix product would accept
input_shapes=((x0,x1), (y0,y1)) and return ((x0, y1),).
Inferring the shape of internal nodes in the graph is important for doing size-driven
optimizations. If we know how big various intermediate results will be, we can estimate
......@@ -495,7 +495,7 @@ class ShapeFeature(object):
return T.constant(s_i, dtype='int64')
if type(s_i) in (tuple,list):
# this dimension is the same as many of the inputs
# which tells us that if one of the inputs is known,
# which tells us that if one of the inputs is known,
# the others all become known.
# TODO: should be implemented in Elemwise, and Dot
#
......@@ -506,7 +506,7 @@ class ShapeFeature(object):
raise TypeError('Shape element must be scalar', s_i)
return s_i
else:
raise TypeError('Unsupported shape element',
raise TypeError('Unsupported shape element',
s_i, type(s_i), getattr(s_i, 'type', None))
def set_shape(self, r, s):
......@@ -534,7 +534,7 @@ class ShapeFeature(object):
assert not hasattr(env, 'shape_feature')
env.shape_feature = self
self.shape_of = {} # Variable -> tuple(scalars) or None (All tensor vars map to tuple)
self.scheduled = {} # Variable ->
self.scheduled = {} # Variable ->
self.lscalar_one = T.constant(1, dtype='int64')
assert self.lscalar_one.type == T.lscalar
for node in env.toposort():
......@@ -622,7 +622,7 @@ def local_fill_to_alloc(node):
This is an important optimization because with the shape_to_shape_i optimization, the
dependency on 's' is often removed.
"""
if node.op == T.fill:
r, v = node.inputs
......@@ -637,7 +637,7 @@ def local_fill_to_alloc(node):
shape_of = node.env.shape_feature.shape_of
# TODO: cut out un-necessary dimshuffles of v
rval = [T.alloc(T.cast(v, node.outputs[0].dtype), *shape_of[node.outputs[0]])]
#if rval[0].type != node.outputs[0].type:
#print >> sys.stderr, theano.printing.debugprint(node.outputs[0], file='str')
......@@ -700,7 +700,7 @@ def local_subtensor_make_vector(node):
raise
if isinstance(idx, (scalar.Scalar, T.TensorType)):
# The idx is a Scalar, ie a Type. This means the actual index
# The idx is a Scalar, ie a Type. This means the actual index
# is contained in node.inputs[1]
old_idx, idx = idx, node.inputs[1]
assert idx.type == old_idx
......@@ -773,7 +773,7 @@ class Assert(T.Op):
cond = [T.as_tensor_variable(c) for c in conds]
assert numpy.all([c.type.ndim == 0 for c in cond])
return gof.Apply(self, [value]+cond, [value.type()])
def __str__(self):
return self.__class__.__name__
def perform(self, node, inputs, (out,)):
......@@ -807,7 +807,7 @@ class Assert(T.Op):
def infer_shape(self, node, input_shapes):
return [input_shapes[0]]
assert_ = Assert()
@register_specialize
......@@ -818,13 +818,13 @@ def local_remove_useless_assert(node):
for c in node.inputs[1:]:
try:
const = get_constant_value(c)
if 0!=const.ndim or const==0:
#Should we raise an error here? How to be sure it is not catched?
cond.append(c)
except TypeError:
cond.append(c)
if len(cond)==0:
return [node.inputs[0]]
if len(cond)!=len(node.inputs)-1:
......@@ -873,12 +873,12 @@ def local_alloc_elemwise(node):
isinstance(i.owner.inputs[0].owner.op,T.Alloc)):
no_broad_idx = idx
break
assert no_broad_idx>=0
assert_op = node.inputs[no_broad_idx]
cmp_op = assert_op
new = []
for i in node.inputs:
if i.owner and isinstance(i.owner.op,T.Alloc) and i.owner.inputs[0].type != i.owner.outputs[0].type:
#when i.owner.inputs[0].type == i.owner.outputs[0].type we will remove that alloc later
......@@ -1017,8 +1017,8 @@ def local_IncSubtensor_serialize(node):
IncSubtensor(Elemwise{second}(a, 0), g(f(a[2])), [2])
This is much worse because this time we have to produce 3 matrices the size of 'a', just so
we can add them together.
we can add them together.
This Op rearranges IncSubtensor's that all work on the same initial argument (here,
Elemwise{second}(a,0)) into a chain. The advantage of the chain structure is that each one
can be optimized later in the pipeline to operate inplace.
......@@ -1028,7 +1028,7 @@ def local_IncSubtensor_serialize(node):
#
# add(x, incsubtensor(b, c), incsubtensor(b, d))
# -> incsubtensor(incsubtensor(add(x,b,b), c), d)
"""
def movable(i):
# Return True iff this is a incsubtensor that we can move
......@@ -1138,7 +1138,7 @@ def local_rebroadcast_lift(node):
def apply_rebroadcast_opt(rval):
"""
Apply as many times as required the optimization local_useless_rebroadcast
Apply as many times as required the optimization local_useless_rebroadcast
and local_rebroadcast_lift.
:param rval: a Variable
......@@ -1149,7 +1149,7 @@ def apply_rebroadcast_opt(rval):
while changed and rval.owner:
changed = False
rval2 = theano.tensor.opt.local_useless_rebroadcast.transform(rval.owner)
if rval2:
if rval2:
assert len(rval2)==1
rval = rval2[0]
changed = True
......@@ -1216,7 +1216,7 @@ def local_mul_switch_sink(node):
fct[0].values_eq_approx = fct[0].type.values_eq_approx_remove_nan
return fct
except TypeError:
pass
pass
try:
if get_constant_value(switch.inputs[2]) == 0.:
listmul = node.inputs[:idx] + node.inputs[idx+1:]
......@@ -1274,7 +1274,7 @@ def local_reshape_chain(node):
"""
if not opt.check_chain(node, T.Reshape, T.Reshape):
return False
# TODO: this can permit a failing program to run by eliminating the the lower
# reshape
return [node.op(node.inputs[0].owner.inputs[0], node.inputs[1])]
......@@ -1304,7 +1304,7 @@ if 0:
y_shape = node.env.shape_feature.shape_of[y]
def tmp(thing):
try:
try:
return T.get_constant_value(thing)
except (TypeError, ValueError), e:
print e, thing.owner.inputs[0]
......@@ -1322,15 +1322,15 @@ def local_fill_cut(node):
If c.type == a.type.
"""
# this optimization is essentially for getting broadcasting to replace fill.
# This is always possible when using a Compound Elemwise operation,
# this optimization is essentially for getting broadcasting to replace fill.
# This is always possible when using a Compound Elemwise operation,
# but it is not always possible without one (consider filling a large matrix with a scalar,
# and then adding another scalar. The only numbers that count are the two scalars, but we
# can't ignore the large matrix because it gives the shape of the result.
if not opt.check_chain(node, T.Elemwise):
return False
output = node.outputs[0]
try:
#reference is some input with the same type as the input but that is not produced by a fill
......@@ -1397,7 +1397,7 @@ class Canonizer(gof.LocalOptimizer):
Simplification tool.
Usage: Canonizer(main, inverse, reciprocal, calculate)
* main: a suitable Op class that is commutative, associative and
takes one to an arbitrary number of inputs, e.g. add or
mul
......@@ -1421,7 +1421,7 @@ class Canonizer(gof.LocalOptimizer):
T = theano.tensor
add_canonizer = Canonizer(T.add, T.sub, T.neg, lambda n, d: sum(n) - sum(d))
mul_canonizer = Canonizer(T.mul, T.true_div, T.inv, lambda n, d: prod(n) / prod(d))
Examples of optimizations mul_canonizer can perform:
x / x -> 1
(x * y) / x -> y
......@@ -1659,7 +1659,7 @@ class Canonizer(gof.LocalOptimizer):
# Lists representing the *constant* elements of num and denum
numct, denumct = [], []
for v in orig_num:
ct = self.get_constant(v)
if ct is not None:
......@@ -1788,7 +1788,7 @@ register_canonicalize(local_mul_canonizer, name = 'local_mul_canonizer')
@gof.local_optimizer([T.neg])
def local_neg_to_mul(node):
if node.op == T.neg:
return [T.mul(numpy.array(-1, dtype = node.inputs[0].dtype),
return [T.mul(numpy.array(-1, dtype = node.inputs[0].dtype),
node.inputs[0])]
register_canonicalize(local_neg_to_mul)
......@@ -1797,7 +1797,7 @@ register_canonicalize(local_neg_to_mul)
def local_sum_mul_by_scalar(node):
"""sum(scalar * smth) -> scalar * sum(smth)
"""
# TODO: if the the thing inside the Sum is a division,
# TODO: if the the thing inside the Sum is a division,
# we should get at the numerator....
if isinstance(node.op, T.Sum):
thing_summed, = node.inputs
......@@ -1935,7 +1935,7 @@ def local_sum_sum(node):
# special case of local_cut_useless_reduce
return [T.Sum(None)(summed.owner.inputs[0])]
if node.op.axis is None:
# we're summing up everything anyway so lets
# we're summing up everything anyway so lets
# do it all at once
return [T.Sum(None)(summed.owner.inputs[0])]
......@@ -1983,7 +1983,6 @@ def local_sum_alloc(node):
if summed.owner and isinstance(summed.owner.op, T.Alloc):
input = summed.owner.inputs[0]
shapes = summed.owner.inputs[1:]
#import pdb;pdb.set_trace()
if node.op.axis is None or node.op.axis == tuple(range(input.ndim)):
try:
val = get_constant_value(input)
......@@ -2019,7 +2018,7 @@ register_specialize(local_mul_to_neg)
@register_specialize
@gof.local_optimizer([T.neg])
def local_neg_neg(node):
# other specializations shouldn't put this in,
# other specializations shouldn't put this in,
# but sometimes they do
if node.op == T.neg:
if node.inputs[0].owner and node.inputs[0].owner.op == T.neg:
......@@ -2177,11 +2176,11 @@ def local_pow_specialize_device(node):
rval1 = None
rval1_scal = None
while y_to_do>0:
log_to_do = int(numpy.log2(y_to_do))
log_to_do = int(numpy.log2(y_to_do))
if rval1:
rval1 *= pow2[log_to_do]
rval1_scal *= pow2_scal[log_to_do]
else:
else:
rval1 = pow2[log_to_do]
rval1_scal = pow2_scal[log_to_do]
y_to_do -= 2**log_to_do
......@@ -2197,7 +2196,7 @@ def local_pow_specialize_device(node):
rval[0] = T.cast(rval[0], odtype)
assert rval[0].type == node.outputs[0].type, (rval, node.outputs)
return rval
@gof.local_optimizer([T.mul])
def local_mul_specialize(node):
"""Remove special-case constants from mul arguments
......@@ -2210,7 +2209,7 @@ def local_mul_specialize(node):
neg = False
new_inputs = []
for input in node.inputs:
# remove any neg arguments
# remove any neg arguments
while input.owner and input.owner.op == T.neg:
neg ^= True
input = input.owner.inputs[0]
......@@ -2303,8 +2302,8 @@ def check_for_x_over_absX(numerators, denominators):
if den.owner and den.owner.op == T.abs_ and den.owner.inputs[0] in numerators:
if den.owner.inputs[0].type.dtype.startswith('complex'):
#TODO: Make an Op that projects a complex number to have unit length
# but projects 0 to 0. That would be a weird Op, but consistent with the
# special case below. I heard there's some convention in Matlab that is
# but projects 0 to 0. That would be a weird Op, but consistent with the
# special case below. I heard there's some convention in Matlab that is
# similar to this... but not sure.
pass
else:
......@@ -2319,7 +2318,7 @@ local_mul_canonizer.add_simplifier(check_for_x_over_absX, 'X_over_absX')
def local_abs_lift(node):
"""
move the abs toward the input. This is needed for check_for_x_over_absX to apply in more case.
"""
if node.op == T.abs_ and node.inputs[0].owner:
assert node.nin == 1
......@@ -2328,13 +2327,13 @@ def local_abs_lift(node):
if node.inputs[0].owner.op == T.true_div:
i = node.inputs[0].owner.inputs
return [T.true_div(T.abs_(i[0]),T.abs_(i[1]))]
@register_specialize
@gof.local_optimizer([])
def local_abs_merge(node):
"""
merge abs generated by local_abs_lift when the canonizer don't need it anymore
"""
if node.op == T.mul and sum([i.owner.op == T.abs_ for i in node.inputs if i.owner])>1:
inputs = []
......@@ -2570,7 +2569,7 @@ def constant_folding(node):
return msg
register_canonicalize(constant_folding, 'fast_compile')
register_stabilize(constant_folding) # because
register_stabilize(constant_folding) # because
register_specialize(constant_folding)
......@@ -2598,7 +2597,7 @@ def _is_minus1(expr):
return False
#1+erf(x)=>erfc(-x)
local_one_plus_erf = gof.PatternSub((T.add,
local_one_plus_erf = gof.PatternSub((T.add,
dict(pattern='y', constraint = _is_1),
(T.erf, 'x')),
(T.erfc, (T.neg, 'x')),
......@@ -2608,7 +2607,7 @@ register_stabilize(local_one_plus_erf, name='local_one_plus_erf')
register_specialize(local_one_plus_erf, name='local_one_plus_erf')
#1-erf(x)=>erfc(x)
local_one_minus_erf = gof.PatternSub((T.sub,
local_one_minus_erf = gof.PatternSub((T.sub,
dict(pattern='y', constraint = _is_1),
(T.erf, 'x')),
(T.erfc, 'x'),
......@@ -2629,7 +2628,7 @@ register_specialize(local_one_minus_erf2)
#1+(-erf(x))=>erfc(x)
#This is a different graph then the previous as the canonicalize don't work completly
local_one_plus_neg_erf = gof.PatternSub((T.add,
local_one_plus_neg_erf = gof.PatternSub((T.add,
dict(pattern='y', constraint = _is_1),
(T.neg,(T.erf, 'x'))),
(T.erfc, 'x'),
......@@ -2640,7 +2639,7 @@ register_specialize(local_one_plus_neg_erf, name='local_one_plus_neg_erf')
#(-1)+erf(x) => -erfc(x)
#don't need erf(x)+(-1) as the canonicalize will put the -1 as the first argument.
local_erf_minus_one = gof.PatternSub((T.add,
local_erf_minus_one = gof.PatternSub((T.add,
dict(pattern='y', constraint = _is_minus1),
(T.erf, 'x')),
(T.neg,(T.erfc, 'x')),
......@@ -2650,7 +2649,7 @@ register_stabilize(local_erf_minus_one, name='local_erf_minus_one')
register_specialize(local_erf_minus_one, name='local_erf_minus_one')
#1-erfc(x) => erf(x)
local_one_minus_erfc = gof.PatternSub((T.sub,
local_one_minus_erfc = gof.PatternSub((T.sub,
dict(pattern='y', constraint = _is_1),
(T.erfc, 'x')),
(T.erf, 'x'),
......@@ -2665,7 +2664,7 @@ local_one_minus_erfc2 = gof.PatternSub((T.add,
(T.erf, 'x'),
allow_multiple_clients = True,
name='local_one_minus_erfc2')
register_canonicalize(local_one_minus_erfc2)
register_canonicalize(local_one_minus_erfc2)
register_stabilize(local_one_minus_erfc2)
register_specialize(local_one_minus_erfc2)
......@@ -2675,13 +2674,13 @@ local_one_minus_erfc3 = gof.PatternSub((T.add,
(T.erf, 'x'),
allow_multiple_clients = True,
name='local_one_minus_erfc3')
register_canonicalize(local_one_minus_erfc3)
register_canonicalize(local_one_minus_erfc3)
register_stabilize(local_one_minus_erfc3)
register_specialize(local_one_minus_erfc3)
#1+(-erfc(x)) => erf(x)
#This is a different graph then the previous as the canonicalize don't work completly
local_one_add_neg_erfc = gof.PatternSub((T.add,
local_one_add_neg_erfc = gof.PatternSub((T.add,
dict(pattern='y', constraint = _is_1),
(T.neg,(T.erfc, 'x'))),
(T.erf, 'x'),
......@@ -2691,7 +2690,7 @@ register_stabilize(local_one_add_neg_erfc, name='local_one_add_neg_erfc')
register_specialize(local_one_add_neg_erfc, name='local_one_add_neg_erfc')
#(-1)+erfc(-x)=>erf(x)
local_erf_neg_minus_one = gof.PatternSub((T.add,
local_erf_neg_minus_one = gof.PatternSub((T.add,
dict(pattern='y', constraint = _is_minus1),
(T.erfc, (T.neg,'x'))),
(T.erf, 'x'),
......@@ -2701,7 +2700,7 @@ register_stabilize(local_erf_neg_minus_one, name='local_erf_neg_minus_one')
register_specialize(local_erf_neg_minus_one, name='local_erf_neg_minus_one')
#(-1)+erfc(-1*x)=>erf(x)
local_erf_neg_minus_one2 = gof.PatternSub((T.add,
local_erf_neg_minus_one2 = gof.PatternSub((T.add,
dict(pattern='y', constraint = _is_minus1),
(T.erfc, (T.mul,-1,'x'))),
(T.erf, 'x'),
......@@ -2732,7 +2731,7 @@ def local_log_erfc(node):
x = node.inputs[0].owner.inputs[0]
stab_value = -x**2-T.log(x)-.5*T.log(numpy.pi)+T.log(1-1/(2*x**2)+3/(4*x**4)-15/(8*x**6))
if node.outputs[0].dtype=='float32':
threshold = 10.0541949
elif node.outputs[0].dtype=='float64':
......@@ -2749,7 +2748,7 @@ def local_log_erfc(node):
#for float64: threshold=26.63 see at the end of the fct for the explaination
#for float32: threshold=9.3 see at the end of the fct for the explaination
#TODO: remove the contraint that their is only 2 inputs to mul and the exp(x**2) is the second.
#TODO: at the test point 10 in float32, their is instability in the original value.
#TODO: at the test point 10 in float32, their is instability in the original value.
# the original give -30.0, the stab -20.1 and in float64 -18.1.
# Make the test don't generate error in that case!
@register_stabilize
......@@ -2809,7 +2808,7 @@ def local_grad_log_erfc_neg(node):
new_inputs.append(i)
return new_inputs
mul_inputs = check_input(mul_neg.owner.inputs)
#put the constant first
for i in range(len(mul_inputs)):
if isinstance(i, Constant):
......@@ -2821,7 +2820,7 @@ def local_grad_log_erfc_neg(node):
mul_inputs[i]=tmp
break
mul_neg = T.mul(*mul_inputs)
try:
cst2 = get_constant_value(mul_neg.owner.inputs[0])
except TypeError:
......@@ -2840,25 +2839,25 @@ def local_grad_log_erfc_neg(node):
return False
if cst2!=-1:
if (not erfc_x.owner or erfc_x.owner.op != T.mul
if (not erfc_x.owner or erfc_x.owner.op != T.mul
or len(erfc_x.owner.inputs)!=2):
#todo implement that case
return False
if erfc_x.owner.inputs[1] is not mul_neg.owner.inputs[1]:
return False
x = erfc_x
try:
try:
cst = get_constant_value(erfc_x.owner.inputs[0])
except TypeError:
return False
if cst2 != -cst*2:
return False
#The constant is valid. Must check that the
elif erfc_x is not x:
elif erfc_x is not x:
return False
else:
return False
......@@ -3014,7 +3013,7 @@ def local_elemwise_fusion_op(OP):
try:
s_new_out.owner.op.c_code(s_new_out.owner, "test_presence_of_c_code",
["x" for x in s_g],
"z",{})
"z",{})
except MethodNotDefined:
_logger.info("%s does not implement the c_code function. As well as being potentially slow, this disables loop fusion of this op." % str(s_new_out.owner.op))
return False
......@@ -3046,19 +3045,18 @@ def local_elemwise_fusion_op(OP):
return False
# print "local_elemwise_fusion: FUSED",nb_elemwise+1,"elemwise!"
#we fuse as many that we can at the same time to make debug mode faster
#debug mode will be faster as it won't test all intermediate step.
while True:
ret = local_fuse(n)
if ret is not False and ret is not None:
#print n,ret
#import pdb;pdb.set_trace()
assert len(ret)==len(n.outputs)
assert len(ret)==1
n = ret[0].owner
else: break
return n.outputs
return local_fuse
......
......@@ -647,7 +647,7 @@ TanhInplaceTester = makeBroadcastTester(op = inplace.tanh_inplace,
grad = _grad_broadcast_unary_normal,
inplace = True)
#inplace ops when the input is integer and the output is float*
#inplace ops when the input is integer and the output is float*
# don't have a well defined behavior. We don't test that case.
_good_broadcast_unary_normal_no_int = _good_broadcast_unary_normal.copy()
del _good_broadcast_unary_normal_no_int['integers']
......@@ -903,7 +903,7 @@ class T_max_and_argmax(unittest.TestCase):
def test_grad(self):
data = numpy.random.rand(2,3)
n = as_tensor_variable(data)
def check_grad_max(data, max_grad_data, axis=None):
#This work only for axis in [0,None]
assert axis in [0,None]
......@@ -915,7 +915,7 @@ class T_max_and_argmax(unittest.TestCase):
else:
for id,v in enumerate(argmax):
z[v*numpy.prod(data.shape[data.ndim-1:axis:-1])+id]+=1
z = z.reshape(data.shape)
assert numpy.all(max_grad_data == z)
......@@ -1053,7 +1053,7 @@ class T_argmin_argmax(unittest.TestCase):
def test_grad_argmin(self):
data = numpy.random.rand(2,3)
n = as_tensor_variable(data)
#test grad of argmin
utt.verify_grad(lambda v: argmin(v), [data])
......@@ -1072,7 +1072,7 @@ class T_argmin_argmax(unittest.TestCase):
def test_grad_argmax(self):
data = numpy.random.rand(2,3)
n = as_tensor_variable(data)
#test grad of argmax
utt.verify_grad(lambda v: argmax(v), [data])
......@@ -1172,7 +1172,7 @@ class T_min_max(unittest.TestCase):
v = eval_outputs(fct(n,-2))
self.failUnless(v.shape == (3,))
self.failUnless(numpy.all(v == nfct(n.value,-2)))
v = eval_outputs(fct(n,-1).shape)
assert v==(2)
v = eval_outputs(fct(n,-2).shape)
......@@ -1220,7 +1220,7 @@ class T_min_max(unittest.TestCase):
def test_grad_max(self):
data = numpy.random.rand(2,3)
n = as_tensor_variable(data)
def check_grad_max(data, max_grad_data, axis=None):
#This work only for axis in [0,None]
assert axis in [0,None]
......@@ -1232,7 +1232,7 @@ class T_min_max(unittest.TestCase):
else:
for id,v in enumerate(argmax):
z[v*numpy.prod(data.shape[data.ndim-1:axis:-1])+id]+=1
z = z.reshape(data.shape)
assert numpy.all(max_grad_data == z)
......@@ -1252,7 +1252,7 @@ class T_min_max(unittest.TestCase):
def test_grad_min(self):
data = numpy.random.rand(2,3)
n = as_tensor_variable(data)
def check_grad_min(data, min_grad_data, axis=None):
#This work only for axis in [0,None]
assert axis in [0,None]
......@@ -1264,7 +1264,7 @@ class T_min_max(unittest.TestCase):
else:
for id,v in enumerate(argmin):
z[v*numpy.prod(data.shape[data.ndim-1:axis:-1])+id]+=1
z = z.reshape(data.shape)
assert numpy.all(min_grad_data == z)
......@@ -1304,7 +1304,7 @@ class T_subtensor(unittest.TestCase):
try:
t = n[0]
except ValueError, e:
self.failUnless(e[0] is Subtensor.e_invalid)
self.failUnless(hasattr(e,'subtensor_invalid'))
return
self.fail()
......@@ -1356,7 +1356,7 @@ class T_subtensor(unittest.TestCase):
try:
t = n[0,0]
except ValueError, e:
self.failUnless(e[0] is Subtensor.e_invalid)
self.failUnless(hasattr(e,'subtensor_invalid'))
return
self.fail()
def test1_ok_elem(self):
......@@ -2561,7 +2561,7 @@ def test_flatten_outdim_invalid():
# TODO: write test case for Tile Op
def test_tile():
print >> sys.stderr, "WARNING: No testcase for Tile"
pass
pass
class TestARange(unittest.TestCase):
......@@ -2724,7 +2724,7 @@ class TestARange(unittest.TestCase):
f = function([stop], out.shape, mode=mode)
assert len(f.maker.env.toposort())==2
#[Elemwise{Cast{int64}}(stop), MakeVector(Elemwise{Cast{int64}}.0)]
assert out.dtype == start.type.dtype
assert numpy.all(f(5) == len(numpy.arange(0,5)))
assert numpy.all(f(11) == len(numpy.arange(0,11)))
......@@ -2961,7 +2961,7 @@ class test_tensordot(unittest.TestCase):
self.failUnless(numpy.allclose(numpy.tensordot(aval,bval,axes),
f5(aval,bval)))
utt.verify_grad(TensorDot(axes), [aval,bval])
axes = (axes[1],axes[0])
c = tensordot(axes)(btens, atens)
f6 = inplace_func([btens,atens],c)
......@@ -3051,7 +3051,7 @@ class test_tensordot(unittest.TestCase):
def test_tensordot_grad(self):
#We test it manually as we recreate the op in the make_node
amat = matrix()
bmat = matrix()
gzmat = matrix()
......@@ -3245,17 +3245,17 @@ class test_broadcast(unittest.TestCase):
test that the unbroadcast fct don't insert not needed broadcast
and fuse consecutive Rebroadcast op
"""
x=matrix()
assert unbroadcast(x,0) is x
assert unbroadcast(x,1) is x
assert unbroadcast(x,1,0) is x
assert unbroadcast(x,0,1) is x
assert addbroadcast(x,0) is not x
assert addbroadcast(x,1) is not x
assert addbroadcast(x,1,0).owner.inputs[0] is x
assert unbroadcast(addbroadcast(x,0),0) is x
assert addbroadcast(unbroadcast(x,0),0) is not x
x=row()
......@@ -3263,15 +3263,15 @@ class test_broadcast(unittest.TestCase):
assert unbroadcast(x,1) is x
assert unbroadcast(x,1,0) is not x
assert unbroadcast(x,0,1) is not x
assert addbroadcast(x,0) is x
assert addbroadcast(x,1).owner.inputs[0] is x
assert addbroadcast(x,1,0).owner.inputs[0] is x
assert addbroadcast(x,0,1).owner.inputs[0] is x
assert unbroadcast(addbroadcast(x,1),1) is x
assert addbroadcast(unbroadcast(x,1),1) is not x
#the first broadcast is remove the broadcast, so the second
#should not make one
assert unbroadcast(unbroadcast(x,0),0).owner.inputs[0] is x
......@@ -3281,10 +3281,10 @@ class test_broadcast(unittest.TestCase):
assert unbroadcast(unbroadcast(x,1),0).owner.inputs[0] is x
assert addbroadcast(unbroadcast(x,1),0).owner.inputs[0] is x
assert addbroadcast(unbroadcast(x,0),0) is x
def test_mod():
"""
We add this test as not all language and C implementation give the same
We add this test as not all language and C implementation give the same
signe to the result. This check that the c_code of `Mod` is implemented
as Python. That is what we want.
"""
......@@ -3298,7 +3298,7 @@ def test_mod():
def test_mod_compile():
"""
This test generate an Elemwise of Composite as:
This test generate an Elemwise of Composite as:
Elemwise{Composite{Composite{Composite{Composite{mod,EQ},Switch},mul},add}}
The c_code generated is not compiling as of 30 June 2010. I fix the compilation in the same commit.
......@@ -3342,6 +3342,20 @@ def test_unalign():
if not should_raise:
raise Exception("Theano raised an exception when none was expected")
def test_dimshuffle_duplicate():
x = theano.tensor.vector()
success = False
try:
y = theano.tensor.DimShuffle((False, ), (0, 0))(x)
except ValueError, e:
assert str(e).find("may not appear twice") != -1
success = True
assert success
if __name__ == '__main__':
if 1:
unittest.main()
......
......@@ -2087,6 +2087,13 @@ if __name__ == '__main__':
# unittest.main()
test_fusion().tes_memory_leak()
def test_local_mul_to_neg():
"""
Test that a multiplication by -1 or -1.0 yields the appropriate data type
"""
a = T.imatrix()
f1 = theano.function([a], -1*a)
f2 = theano.function([a], -1.0*a)
aval = numpy.random.randint(0,10,(2,2))
assert f1(aval).dtype == a.dtype
assert f2(aval).dtype == 'float64'
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论