提交 b6ea8d67 authored 作者: serdyuk's avatar serdyuk

Merge github.com:Theano/Theano into doc_fix

...@@ -242,3 +242,41 @@ Numba Ops ...@@ -242,3 +242,41 @@ Numba Ops
Want C speed without writing C code for your new Op? You can use Numba Want C speed without writing C code for your new Op? You can use Numba
to generate the C code for you! Here is an `example to generate the C code for you! Here is an `example
Op <https://gist.github.com/nouiz/5492778#file-theano_op-py>`_ doing that. Op <https://gist.github.com/nouiz/5492778#file-theano_op-py>`_ doing that.
.. _alternate_theano_types:
Alternate Theano Types
======================
Most ops in Theano are used to manipulate tensors. However, Theano also
supports many other variable types. The supported types are listed below,
along with pointers to the relevant documentation.
* :class:`TensorType <tensor.TensorType>` : Theano type that represents
a multidimensional array containing elements that all have the same
type. Variables of this Theano type are represented in C as objects of
class
`PyArrayObject <http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#PyArrayObject>`_.
* :ref:`TypedList <libdoc_typed_list>` : Theano type that represents a
typed list (a list where every element in the list has the same Theano
type). Variables of this Theano type are represented in C as objects
of class `PyListObject <https://docs.python.org/2/c-api/list.html>`_.
* :ref:`Scalar <libdoc_scalar>` : Theano type that represents a C
primitive type. The C type associated with this Theano type is the
represented C primitive itself.
* :ref:`SparseType <sparse_ops>` : Theano type used to represent sparse
tensors. There is no equivalent C type for this Theano Type but you
can split a sparse variable into its parts as TensorVariables. Those
can then be used as inputs to an op with C code.
* :class:`Generic <theano.gof.type.Generic>` : Theano type that
represents a simple Python Object. Variables of this Theano type are
represented in C as objects of class `PyObject
<https://docs.python.org/2/c-api/structures.html#c.PyObject>`_.
* :class:`CDataType <theano.gof.type.CDataType>` : Theano type that
represents a C data type. The C type associated with this Theano type
depends on the data being represented.
...@@ -4,7 +4,6 @@ ...@@ -4,7 +4,6 @@
========================== ==========================
Frequently Asked Questions Frequently Asked Questions
========================== ==========================
TypeError: object of type 'TensorVariable' has no len() TypeError: object of type 'TensorVariable' has no len()
------------------------------------------------------- -------------------------------------------------------
...@@ -63,6 +62,13 @@ compilation but it will also use more memory because ...@@ -63,6 +62,13 @@ compilation but it will also use more memory because
``optimizer_excluding=inplace`` excludes inplace optimizations resulting ``optimizer_excluding=inplace`` excludes inplace optimizations resulting
in a trade off between speed of compilation and memory usage. in a trade off between speed of compilation and memory usage.
Theano flag `reoptimize_unpickled_function` controls if an unpickled theano function
should reoptimize its graph or not. Theano users can use the standard python pickle
tools to save a compiled theano function. When pickling, both graph before and
after the optimization are saved, including shared variables. When set to True,
the graph is reoptimized when being unpickled. Otherwise, skip the graph optimization
and use directly the optimized graph from the pickled file.
Faster Theano function Faster Theano function
---------------------- ----------------------
......
...@@ -47,21 +47,11 @@ ...@@ -47,21 +47,11 @@
:type: class:`Container` :type: class:`Container`
.. function:: shared(value, name=None, strict=False, **kwargs) .. autofunction:: theano.compile.sharedvalue.shared
Return a :class:`SharedVariable` Variable, initialized with a copy or reference of `value`. .. function:: shared_constructor(ctor)
This function iterates over constructor functions (see `shared_constructor`) to find a
suitable SharedVariable subclass. The suitable one is the first constructor
that doesn't raise an exception.
This function is meant as a convenient default. If you want to use a
specific shared variable constructor, consider calling it directly.
.. note::
By passing `kwargs`, you effectively limit the set of potential constructors to those that Append `ctor` to the list of shared constructors (see :func:`shared`).
can accept those kwargs.
Each registered constructor ``ctor`` will be called like this: Each registered constructor ``ctor`` will be called like this:
...@@ -69,12 +59,4 @@ ...@@ -69,12 +59,4 @@
ctor(value, name=name, strict=strict, **kwargs) ctor(value, name=name, strict=strict, **kwargs)
.. attribute:: constructors If it do not support given value, it must raise a TypeError.
A list of shared variable constructors that will be tried in reverse
order.
.. function:: shared_constructor(ctor)
Append `ctor` to the list of shared constructors (see :func:`shared`).
...@@ -335,7 +335,7 @@ import theano and print the config variable, as in: ...@@ -335,7 +335,7 @@ import theano and print the config variable, as in:
Default False Default False
Do the memory profile print the min peak memory usage? Does the memory profile print the min peak memory usage?
It only works when profile=True, profile_memory=True It only works when profile=True, profile_memory=True
.. attribute:: profiling.destination .. attribute:: profiling.destination
...@@ -462,6 +462,20 @@ import theano and print the config variable, as in: ...@@ -462,6 +462,20 @@ import theano and print the config variable, as in:
Link arguments to link against a (Fortran) level-3 blas implementation. Link arguments to link against a (Fortran) level-3 blas implementation.
.. attribute:: config.experimental.local_alloc_elemwise_assert
Bool value: either True or False
Default: True
When the local_alloc_optimization is applied, add an assert to highlight
shape errors.
Without such asserts this optimization could hide errors in the user code.
We add the assert only if we can't infer that the shapes are equivalent.
As such this optimization does not always introduce an assert in the graph.
Removing the assert could speed up execution.
.. attribute:: config.cuda.root .. attribute:: config.cuda.root
Default: $CUDA_ROOT or failing that, "/usr/local/cuda" Default: $CUDA_ROOT or failing that, "/usr/local/cuda"
...@@ -683,6 +697,16 @@ import theano and print the config variable, as in: ...@@ -683,6 +697,16 @@ import theano and print the config variable, as in:
optimization phase. Theano user's do not need to use this. This is optimization phase. Theano user's do not need to use this. This is
to help debug shape error in Theano optimization. to help debug shape error in Theano optimization.
.. attribute:: config.reoptimize_unpickled_function
Bool value, default: True
Theano users can use the standard python pickle tools to save a compiled
theano function. When pickling, both graph before and after the optimization
are saved, including shared variables. When set to True, the graph is
reoptimized when being unpickled. Otherwise, skip the graph optimization and
use directly the optimized graph.
.. attribute:: config.exception_verbosity .. attribute:: config.exception_verbosity
String Value: ``'low'``, ``'high'``. String Value: ``'low'``, ``'high'``.
......
...@@ -9,11 +9,11 @@ ...@@ -9,11 +9,11 @@
:synopsis: low-level automatic differentiation :synopsis: low-level automatic differentiation
.. moduleauthor:: LISA .. moduleauthor:: LISA
Symbolic gradient is usually computed from :func:`tensor.grad`, which offers a Symbolic gradient is usually computed from :func:`gradient.grad`, which offers a
more convenient syntax for the common case of wanting the gradient in some more convenient syntax for the common case of wanting the gradient in some
expressions with respect to a scalar cost. The :func:`grad_sources_inputs` expressions with respect to a scalar cost. The :func:`grad_sources_inputs`
function does the underlying work, and is more flexible, but is also more function does the underlying work, and is more flexible, but is also more
awkward to use when :func:`tensor.grad` can do the job. awkward to use when :func:`gradient.grad` can do the job.
.. automodule:: theano.gradient .. automodule:: theano.gradient
......
...@@ -754,6 +754,8 @@ Creating Tensor ...@@ -754,6 +754,8 @@ Creating Tensor
>>> f(x, x, x, x).shape >>> f(x, x, x, x).shape
(2, 2, 4, 4) (2, 2, 4, 4)
.. autofunction:: theano.tensor.basic.choose
Reductions Reductions
========== ==========
...@@ -1630,125 +1632,11 @@ Linear Algebra ...@@ -1630,125 +1632,11 @@ Linear Algebra
Gradient / Differentiation Gradient / Differentiation
========================== ==========================
.. function:: grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False) .. automodule:: theano.gradient
:members: grad
Return symbolic gradients for one or more variables with respect to some
cost. See the :ref:`gradient <libdoc_gradient>` page for complete documentation
of the gradient module.
For more information about how automatic differentiation works in Theano,
see :mod:`gradient`. For information on how to implement the gradient of
a certain Op, see :func:`grad`.
:type cost: 0-d tensor variable
:type wrt: tensor variable or list of tensor variables
:type g_cost: same as type of `cost`
:type consider_constant: list of variables
:type warn_type: bool
:param cost: a scalar with respect to which we are differentiating
:param wrt: term[s] for which we want gradients
:param g_cost: the gradient on the cost
:param consider_constant: variables whose gradients will be held at 0.
:param warn_type: True will trigger warnings via the logging module when
the gradient on an expression has a different type than the original
expression
:rtype: variable or list of variables (matching `wrt`)
:returns: gradients of the cost with respect to each of the `wrt` terms
.. function:: subgraph_grad(wrt, end, start=None, cost=None, details=False)
With respect to `wrt`, computes gradients of cost and/or from existing
`start` gradients, up to the `end` variables of a symbolic digraph.
In other words, computes gradients for a subgraph of the
symbolic theano function. Ignores all disconnected inputs.
This can be useful when one needs to perform the gradient descent
iteratively (e.g. one layer at a time in an MLP), or when a particular
operation is not differentiable in theano (e.g. stochastic sampling
from a multinomial). In the latter case, the gradient of the
non-differentiable process could be approximated by user-defined
formula, which could be calculated using the gradients of a cost
with respect to samples (0s and 1s). These gradients are obtained
by performing a subgraph_grad from the `cost` or previously known gradients
(`start`) up to the outputs of the stochastic process (`end`).
A dictionary mapping gradients obtained from the user-defined
differentiation of the process, to variables, could then be fed into
another subgraph_grad as `start` with any other `cost` (e.g. weight decay).
In an MLP, we could use subgraph_grad to iteratively backpropagate:
.. testcode:: subgraph_grad
import theano
import numpy as np
x, t = theano.tensor.fvector('x'), theano.tensor.fvector('t')
w1 = theano.shared(np.random.randn(3,4))
w2 = theano.shared(np.random.randn(4,2))
a1 = theano.tensor.tanh(theano.tensor.dot(x,w1))
a2 = theano.tensor.tanh(theano.tensor.dot(a1,w2))
cost2 = theano.tensor.sqr(a2 - t).sum()
cost2 += theano.tensor.sqr(w2.sum())
cost1 = theano.tensor.sqr(w1.sum())
params = [[w2],[w1]]
costs = [cost2,cost1]
grad_ends = [[a1], [x]]
next_grad = None
param_grads = []
for i in xrange(2):
param_grad, next_grad = theano.subgraph_grad(
wrt=params[i], end=grad_ends[i],
start=next_grad, cost=costs[i]
)
next_grad = dict(zip(grad_ends[i], next_grad))
param_grads.extend(param_grad)
:type wrt: list of variables
:param wrt:
Gradients are computed with respect to `wrt`.
:type end: list of variables
:param end:
Theano variables at which to end gradient descent (they are
considered constant in theano.grad). For convenience, the
gradients with respect to these variables are also returned.
:type start: dictionary of variables
:param start:
If not None, a dictionary mapping variables to their
gradients. This is useful when the gradient on some variables
are known. These are used to compute the gradients backwards up
to the variables in `end` (they are used as known_grad in
theano.grad).
:type cost: scalar (0-dimensional) variable
:param cost:
Additional costs for which to compute the gradients. For
example, these could be weight decay, an l1 constraint, MSE,
NLL, etc. May optionally be None if start is provided.
.. warning::
If the gradients of `cost` with respect to any of the `start`
variables is already part of the `start` dictionary, then it
may be counted twice with respect to `wrt` and `end`.
:type details: bool
:param details:
When True, additionally returns the list of gradients from
`start` and of `cost`, respectively, with respect to `wrt` (not
`end`).
:rtype: Tuple of 2 or 4 Lists of Variables
:return: Returns lists of gradients with respect to `wrt` and `end`,
respectively.
.. versionadded:: 0.6.1
.. _R_op_list: .. _R_op_list:
......
...@@ -22,6 +22,33 @@ ...@@ -22,6 +22,33 @@
.. moduleauthor:: LISA .. moduleauthor:: LISA
.. note::
As of October 21st, 2014, the default GPU image convolution
changed. Here is the algo:
- If we can use `cuDNN <https://developer.nvidia.com/cuDNN>`_, use it.
- If not, use gemm version (slower then cuDNN, uses more memory).
If the users do not want the extra memory usage of the gemm
version, they can enable the legacy code that is even slower, but
does not use extra memory. For this, use the Theano flag
``optimizer_excluding=conv_gemm``.
There is no reason to use the legacy code or the gemm version if
cuDNN is available.
2 other options:
- There is also the fft version that is the fastest in some cases,
but uses even more memory. It does not support striding to remove
computation and has some shapes restriction.
- There is also the cuda_convnet convolution in Pylearn2. It uses a
different memory layout, has shapes restrictions, but does not use
extra memory and is faster then the legacy convolution.
TODO: Give examples on how to use these things! They are pretty complicated. TODO: Give examples on how to use these things! They are pretty complicated.
- Convolution operators implemented: - Convolution operators implemented:
......
==================
Advanced Indexing
==================
Continue the Advanced Indexing project that is on either github or bitbucket.
...@@ -17,6 +17,76 @@ Isolating the Problem/Testing Theano Compiler ...@@ -17,6 +17,76 @@ Isolating the Problem/Testing Theano Compiler
You can run your Theano function in a :ref:`DebugMode<using_debugmode>`. You can run your Theano function in a :ref:`DebugMode<using_debugmode>`.
This tests the Theano optimizations and helps to find where NaN, inf and other problems come from. This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.
Interpreting Error Messages
---------------------------
Even in its default configuration, Theano tries to display useful error
messages. Consider the following faulty code.
.. code-block:: python
import numpy as np
import theano
import theano.tensor as T
x = T.vector()
y = T.vector()
z = x + x
z = z + y
f = theano.function([x, y], z)
f(np.ones((2,)), np.ones((3,)))
Running the code above we see:
.. code-block:: bash
Traceback (most recent call last):
File "test0.py", line 10, in <module>
f(np.ones((2,)), np.ones((3,)))
File "/PATH_TO_THEANO/theano/compile/function_module.py", line 605, in __call__
self.fn.thunks[self.fn.position_of_error])
File "/PATH_TO_THEANO/theano/compile/function_module.py", line 595, in __call__
outputs = self.fn()
ValueError: Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 2)
Apply node that caused the error: Elemwise{add,no_inplace}(<TensorType(float64, vector)>, <TensorType(float64, vector)>, <TensorType(float64, vector)>)
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(float64, vector)]
Inputs shapes: [(3,), (2,), (2,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs scalar values: ['not scalar', 'not scalar', 'not scalar']
HINT: Re-running with most Theano optimization disabled could give you a back-traces when this node was created. This can be done with by setting the Theano flags 'optimizer=fast_compile'. If that does not work, Theano optimization can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.
Arguably the most useful information is approximately half-way through
the error message, where the kind of error is displayed along with its
cause (`ValueError: Input dimension mis-match. (input[0].shape[0] = 3,
input[1].shape[0] = 2`).
Below it, some other information is given, such as the apply node that
caused the error, as well as the input types, shapes, strides and
scalar values.
The two hints can also be helpful when debugging. Using the theano flag
``optimizer=fast_compile`` or ``optimizer=None`` can often tell you
the faulty line, while ``exception_verbosity=high`` will display a
debugprint of the apply node. Using these hints, the end of the error
message becomes :
.. code-block:: bash
Backtrace when the node is created:
File "test0.py", line 8, in <module>
z = z + y
Debugprint of the apply node:
Elemwise{add,no_inplace} [@A] <TensorType(float64, vector)> ''
|Elemwise{add,no_inplace} [@B] <TensorType(float64, vector)> ''
| |<TensorType(float64, vector)> [@C] <TensorType(float64, vector)>
| |<TensorType(float64, vector)> [@C] <TensorType(float64, vector)>
|<TensorType(float64, vector)> [@D] <TensorType(float64, vector)>
We can here see that the error can be traced back to the line ``z = z + y``.
For this example, using ``optimizer=fast_compile`` worked. If it did not,
you could set ``optimizer=None`` or use test values.
Using Test Values Using Test Values
----------------- -----------------
...@@ -26,13 +96,19 @@ on-the-fly, before a ``theano.function`` is ever compiled. Since optimizations ...@@ -26,13 +96,19 @@ on-the-fly, before a ``theano.function`` is ever compiled. Since optimizations
haven't been applied at this stage, it is easier for the user to locate the haven't been applied at this stage, it is easier for the user to locate the
source of some bug. This functionality is enabled through the config flag source of some bug. This functionality is enabled through the config flag
``theano.config.compute_test_value``. Its use is best shown through the ``theano.config.compute_test_value``. Its use is best shown through the
following example. following example. Here, we use ``exception_verbosity=high`` and
``optimizer=fast_compile``, which would not tell you the line at fault.
``optimizer=None`` would and it could therefore be used instead of test values.
.. code-block:: python .. code-block:: python
import numpy
import theano
import theano.tensor as T
# compute_test_value is 'off' by default, meaning this feature is inactive # compute_test_value is 'off' by default, meaning this feature is inactive
theano.config.compute_test_value = 'off' theano.config.compute_test_value = 'off' # Use 'warn' to activate this feature
# configure shared variables # configure shared variables
W1val = numpy.random.rand(2, 10, 10).astype(theano.config.floatX) W1val = numpy.random.rand(2, 10, 10).astype(theano.config.floatX)
...@@ -42,6 +118,8 @@ following example. ...@@ -42,6 +118,8 @@ following example.
# input which will be of shape (5,10) # input which will be of shape (5,10)
x = T.matrix('x') x = T.matrix('x')
# provide Theano with a default test-value
#x.tag.test_value = numpy.random.rand(5, 10)
# transform the shared variable in some way. Theano does not # transform the shared variable in some way. Theano does not
# know off hand that the matrix func_of_W1 has shape (20, 10) # know off hand that the matrix func_of_W1 has shape (20, 10)
...@@ -61,35 +139,32 @@ Running the above code generates the following error message: ...@@ -61,35 +139,32 @@ Running the above code generates the following error message:
.. code-block:: bash .. code-block:: bash
Definition in:
File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 1102, in apply
lopt_change = self.process_node(fgraph, node, lopt)
File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 882, in process_node
replacements = lopt.transform(node)
File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/blas.py", line 1030, in local_dot_to_dot22
return [_dot22(*node.inputs)]
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 324, in __call__
self.add_tag_trace(node)
For the full definition stack trace set the Theano flags traceback.limit to -1
Traceback (most recent call last): Traceback (most recent call last):
File "test.py", line 29, in <module> File "test1.py", line 31, in <module>
f(numpy.random.rand(5,10)) f(numpy.random.rand(5, 10))
File "/u/desjagui/workspace/PYTHON/theano/compile/function_module.py", line 596, in __call__ File "PATH_TO_THEANO/theano/compile/function_module.py", line 605, in __call__
self.fn() self.fn.thunks[self.fn.position_of_error])
File "/u/desjagui/workspace/PYTHON/theano/gof/link.py", line 288, in streamline_default_f File "PATH_TO_THEANO/theano/compile/function_module.py", line 595, in __call__
raise_with_op(node) outputs = self.fn()
File "/u/desjagui/workspace/PYTHON/theano/gof/link.py", line 284, in streamline_default_f ValueError: Shape mismatch: x has 10 cols (and 5 rows) but y has 20 rows (and 10 cols)
thunk() Apply node that caused the error: Dot22(x, DimShuffle{1,0}.0)
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/cc.py", line 1111, in execute Inputs types: [TensorType(float64, matrix), TensorType(float64, matrix)]
raise exc_type, exc_value, exc_trace Inputs shapes: [(5, 10), (20, 10)]
ValueError: ('Shape mismatch: x has 10 cols but y has 20 rows', Inputs strides: [(80, 8), (8, 160)]
_dot22(x, <TensorType(float64, matrix)>), [_dot22.0], Inputs scalar values: ['not scalar', 'not scalar']
_dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4')
Debugprint of the apply node:
Needless to say, the above is not very informative and does not provide much in Dot22 [@A] <TensorType(float64, matrix)> ''
the way of guidance. However, by instrumenting the code ever so slightly, we |x [@B] <TensorType(float64, matrix)>
can get Theano to reveal the exact source of the error. |DimShuffle{1,0} [@C] <TensorType(float64, matrix)> ''
|Flatten{2} [@D] <TensorType(float64, matrix)> ''
|DimShuffle{2,0,1} [@E] <TensorType(float64, 3D)> ''
|W1 [@F] <TensorType(float64, 3D)>
HINT: Re-running with most Theano optimization disabled could give you a back-traces when this node was created. This can be done with by setting the Theano flags 'optimizer=fast_compile'. If that does not work, Theano optimization can be disabled with 'optimizer=None'.
If the above is not informative enough, by instrumenting the code ever
so slightly, we can get Theano to reveal the exact source of the error.
.. code-block:: python .. code-block:: python
...@@ -108,18 +183,22 @@ value. This allows Theano to evaluate symbolic expressions on-the-fly (by ...@@ -108,18 +183,22 @@ value. This allows Theano to evaluate symbolic expressions on-the-fly (by
calling the ``perform`` method of each op), as they are being defined. Sources calling the ``perform`` method of each op), as they are being defined. Sources
of error can thus be identified with much more precision and much earlier in of error can thus be identified with much more precision and much earlier in
the compilation pipeline. For example, running the above code yields the the compilation pipeline. For example, running the above code yields the
following error message, which properly identifies *line 23* as the culprit. following error message, which properly identifies *line 24* as the culprit.
.. code-block:: bash .. code-block:: bash
Traceback (most recent call last): Traceback (most recent call last):
File "test2.py", line 23, in <module> File "test2.py", line 24, in <module>
h1 = T.dot(x,func_of_W1) h1 = T.dot(x, func_of_W1)
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 360, in __call__ File "PATH_TO_THEANO/theano/tensor/basic.py", line 4734, in dot
node.op.perform(node, input_vals, output_storage) return _dot(a, b)
File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/basic.py", line 4458, in perform File "PATH_TO_THEANO/theano/gof/op.py", line 545, in __call__
required = thunk()
File "PATH_TO_THEANO/theano/gof/op.py", line 752, in rval
r = p(n, [x[0] for x in i], o)
File "PATH_TO_THEANO/theano/tensor/basic.py", line 4554, in perform
z[0] = numpy.asarray(numpy.dot(x, y)) z[0] = numpy.asarray(numpy.dot(x, y))
ValueError: ('matrices are not aligned', (5, 10), (20, 10)) ValueError: matrices are not aligned
The ``compute_test_value`` mechanism works as follows: The ``compute_test_value`` mechanism works as follows:
......
.. _tut_multi_cores:
============================= =============================
Multi cores support in Theano Multi cores support in Theano
============================= =============================
......
...@@ -494,7 +494,8 @@ def char_from_number(number): ...@@ -494,7 +494,8 @@ def char_from_number(number):
def debugprint(r, prefix='', depth=-1, done=None, print_type=False, def debugprint(r, prefix='', depth=-1, done=None, print_type=False,
file=sys.stdout, print_destroy_map=False, file=sys.stdout, print_destroy_map=False,
print_view_map=False, order=None, ids='CHAR', print_view_map=False, order=None, ids='CHAR',
stop_on_name=False, prefix_child=None): stop_on_name=False, prefix_child=None,
scan_ops=None):
"""Print the graph leading to `r` to given depth. """Print the graph leading to `r` to given depth.
:param r: Variable instance :param r: Variable instance
...@@ -502,10 +503,10 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False, ...@@ -502,10 +503,10 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False,
:param depth: maximum recursion depth (Default -1 for unlimited). :param depth: maximum recursion depth (Default -1 for unlimited).
:param done: dict of Apply instances that have already been printed :param done: dict of Apply instances that have already been printed
and their associated printed ids and their associated printed ids
:param print_type: wether to print the Variable type after the other infos :param print_type: whether to print the Variable type after the other infos
:param file: file-like object to which to print :param file: file-like object to which to print
:param print_destroy_map: wether to print the op destroy_map after ofther info :param print_destroy_map: whether to print the op destroy_map after other info
:param print_view_map: wether to print the op view_map after ofther info :param print_view_map: whether to print the op view_map after other info
:param order: If not empty will print the index in the toposort. :param order: If not empty will print the index in the toposort.
:param ids: How do we print the identifier of the variable :param ids: How do we print the identifier of the variable
id - print the python id value id - print the python id value
...@@ -514,6 +515,8 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False, ...@@ -514,6 +515,8 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False,
"" - don't print an identifier "" - don't print an identifier
:param stop_on_name: When True, if a node in the graph has a name, :param stop_on_name: When True, if a node in the graph has a name,
we don't print anything below it. we don't print anything below it.
:param scan_ops: Scan ops in the graph will be added inside this list
for later printing purposes.
""" """
if depth == 0: if depth == 0:
...@@ -525,6 +528,9 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False, ...@@ -525,6 +528,9 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False,
if done is None: if done is None:
done = dict() done = dict()
if scan_ops is None:
scan_ops = []
if print_type: if print_type:
type_str = ' <%s>' % r.type type_str = ' <%s>' % r.type
else: else:
...@@ -575,37 +581,45 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False, ...@@ -575,37 +581,45 @@ def debugprint(r, prefix='', depth=-1, done=None, print_type=False,
o = '' o = ''
if order: if order:
o = str(order.index(r.owner)) o = str(order.index(r.owner))
already_printed = a in done # get_id_str put it in the dict already_printed = a in done # get_id_str put it in the dict
id_str = get_id_str(a) id_str = get_id_str(a)
if len(a.outputs) == 1: if len(a.outputs) == 1:
print >> file, '%s%s %s%s \'%s\' %s %s %s' % (prefix, a.op, print >> file, '%s%s %s%s \'%s\' %s %s %s' % (prefix, a.op,
id_str, id_str,
type_str, r_name, type_str,
r_name,
destroy_map_str,
view_map_str,
o)
else:
print >> file, '%s%s.%i %s%s \'%s\' %s %s %s' % (prefix, a.op,
a.outputs.index(r),
id_str, type_str,
r_name,
destroy_map_str, destroy_map_str,
view_map_str, view_map_str,
o) o)
else:
print >> file, '%s%s.%i %s%s \'%s\' %s %s %s' % (prefix, a.op,
a.outputs.index(r),
id_str, type_str,
r_name,
destroy_map_str,
view_map_str,
o)
if not already_printed: if not already_printed:
if (not stop_on_name or if (not stop_on_name or
not (hasattr(r, 'name') and r.name is not None)): not (hasattr(r, 'name') and r.name is not None)):
new_prefix = prefix_child + ' |' new_prefix = prefix_child + ' |'
new_prefix_child = prefix_child + ' |' new_prefix_child = prefix_child + ' |'
for idx, i in enumerate(a.inputs): for idx, i in enumerate(a.inputs):
if idx == len(a.inputs) - 1: if idx == len(a.inputs) - 1:
new_prefix_child = prefix_child + ' ' new_prefix_child = prefix_child + ' '
if hasattr(i, 'owner') and hasattr(i.owner, 'op'):
if isinstance(i.owner.op, theano.scan_module.scan_op.Scan):
scan_ops.append(i)
debugprint(i, new_prefix, depth=depth - 1, done=done, debugprint(i, new_prefix, depth=depth - 1, done=done,
print_type=print_type, file=file, order=order, print_type=print_type, file=file, order=order,
ids=ids, stop_on_name=stop_on_name, ids=ids, stop_on_name=stop_on_name,
prefix_child=new_prefix_child) prefix_child=new_prefix_child, scan_ops=scan_ops)
else: else:
#this is an input variable #this is an input variable
id_str = get_id_str(r) id_str = get_id_str(r)
...@@ -624,7 +638,6 @@ def _optcheck_fgraph(input_specs, output_specs, accept_inplace=False): ...@@ -624,7 +638,6 @@ def _optcheck_fgraph(input_specs, output_specs, accept_inplace=False):
:type accept_inplace: Bool :type accept_inplace: Bool
:rtype: `FunctionGraph` :rtype: `FunctionGraph`
:returns: a new FunctionGraph with a cloned graph, with debugging `Feature` instances already installed. :returns: a new FunctionGraph with a cloned graph, with debugging `Feature` instances already installed.
""" """
orig_inputs = [spec.variable for spec in input_specs] orig_inputs = [spec.variable for spec in input_specs]
updates = [spec.update for spec in input_specs if spec.update] updates = [spec.update for spec in input_specs if spec.update]
...@@ -2152,7 +2165,7 @@ class _Maker(FunctionMaker): # inheritance buys a few helper functions ...@@ -2152,7 +2165,7 @@ class _Maker(FunctionMaker): # inheritance buys a few helper functions
# Check if some input variables are unused # Check if some input variables are unused
self._check_unused_inputs(inputs, outputs, on_unused_input) self._check_unused_inputs(inputs, outputs, on_unused_input)
# Make a list of (SymbolicInput|SymblicInputKits, indices, [SymbolicInput,...]), one # Make a list of (SymbolicInput|SymblicInputKits, indices, [SymbolicInput,...]), one
# tuple for each input. (See Function.indices for more details) # tuple for each input. (See Function.indices for more details)
indices = [[input] + self.expand_in(input, _inputs) for input in inputs] indices = [[input] + self.expand_in(input, _inputs) for input in inputs]
......
...@@ -169,15 +169,33 @@ def shared(value, name=None, strict=False, allow_downcast=None, **kwargs): ...@@ -169,15 +169,33 @@ def shared(value, name=None, strict=False, allow_downcast=None, **kwargs):
"""Return a SharedVariable Variable, initialized with a copy or """Return a SharedVariable Variable, initialized with a copy or
reference of `value`. reference of `value`.
This function iterates over constructor functions (see This function iterates over
`shared_constructor`) to find a suitable SharedVariable subclass. :ref:`constructor functions <shared_constructor>`
to find a suitable SharedVariable subclass.
The suitable one is the first constructor that accept the given value.
This function is meant as a convenient default. If you want to use a
specific shared variable constructor, consider calling it directly.
``theano.shared`` is a shortcut to this function.
:note: By passing kwargs, you effectively limit the set of :note: By passing kwargs, you effectively limit the set of
potential constructors to those that can accept those kwargs. potential constructors to those that can accept those kwargs.
:note: Some shared variable have 'borrow' as extra kwargs. :note: Some shared variable have ``borrow`` as extra kwargs.
`See <http://deeplearning.net/software/theano/tutorial/aliasing.html#borrowing-when-creating-shared-variables>`_ for detail. `See <http://deeplearning.net/software/theano/tutorial/aliasing.html#borrowing-when-creating-shared-variables>`_ for detail.
:note: Some shared variable have ``broadcastable`` as extra kwargs.
As shared variable shapes can change, all dimensions default
to not being broadcastable, even if ``value`` has a shape of 1
along some dimension. This parameter allows you to create
for example a `row` or `column` 2d tensor.
.. attribute:: constructors
A list of shared variable constructors that will be tried in reverse
order.
""" """
try: try:
......
...@@ -118,6 +118,7 @@ AddConfigVar('print_active_device', ...@@ -118,6 +118,7 @@ AddConfigVar('print_active_device',
BoolParam(True, allow_override=False), BoolParam(True, allow_override=False),
in_c_key=False) in_c_key=False)
# Do not add FAST_RUN_NOGC to this list (nor any other ALL CAPS shortcut). # Do not add FAST_RUN_NOGC to this list (nor any other ALL CAPS shortcut).
# The way to get FAST_RUN_NOGC is with the flag 'linker=c|py_nogc'. # The way to get FAST_RUN_NOGC is with the flag 'linker=c|py_nogc'.
# The old all capital letter way of working is deprecated as it is not # The old all capital letter way of working is deprecated as it is not
...@@ -465,6 +466,12 @@ AddConfigVar('unpickle_function', ...@@ -465,6 +466,12 @@ AddConfigVar('unpickle_function',
BoolParam(True), BoolParam(True),
in_c_key=False) in_c_key=False)
AddConfigVar('reoptimize_unpickled_function',
"Re-optimize the graph when a theano function is unpickled from the disk.",
BoolParam(True, allow_override=True),
in_c_key=False)
"""Note to developers: """Note to developers:
Generally your exceptions should use an apply node's __str__ Generally your exceptions should use an apply node's __str__
method when exception_verbosity == 'low'. When exception_verbosity method when exception_verbosity == 'low'. When exception_verbosity
...@@ -538,3 +545,11 @@ AddConfigVar('check_input', ...@@ -538,3 +545,11 @@ AddConfigVar('check_input',
"(particularly for scalars) and reduce the number of generated C " "(particularly for scalars) and reduce the number of generated C "
"files.", "files.",
BoolParam(True)) BoolParam(True))
AddConfigVar('cache_optimizations',
"WARNING: work in progress, does not work yet."
"Specify if the optimization cache should be used. This cache will"
"any optimized graph and its optimization. Actually slow downs a lot"
"the first optimization, and could possibly still contains some bugs."
"Use at your own risks.",
BoolParam(False))
...@@ -55,7 +55,7 @@ from theano.gof.link import \ ...@@ -55,7 +55,7 @@ from theano.gof.link import \
Container, Linker, LocalLinker, PerformLinker, WrapLinker, WrapLinkerMany Container, Linker, LocalLinker, PerformLinker, WrapLinker, WrapLinkerMany
from theano.gof.op import \ from theano.gof.op import \
Op, OpenMPOp, PureOp, ops_with_inner_function Op, OpenMPOp, PureOp, COp, ops_with_inner_function
from theano.gof.opt import ( from theano.gof.opt import (
Optimizer, Optimizer,
......
...@@ -662,6 +662,7 @@ class DestroyHandler(toolbox.Bookkeeper): ...@@ -662,6 +662,7 @@ class DestroyHandler(toolbox.Bookkeeper):
The following data structures remain to be converted: The following data structures remain to be converted:
<unknown> <unknown>
""" """
pickle_rm_attr = ["destroyers"]
def __init__(self, do_imports_on_attach=True): def __init__(self, do_imports_on_attach=True):
self.fgraph = None self.fgraph = None
...@@ -720,15 +721,7 @@ class DestroyHandler(toolbox.Bookkeeper): ...@@ -720,15 +721,7 @@ class DestroyHandler(toolbox.Bookkeeper):
" or in conflict with another plugin.") " or in conflict with another plugin.")
####### Annotate the FunctionGraph ############ ####### Annotate the FunctionGraph ############
self.unpickle(fgraph)
def get_destroyers_of(r):
droot, impact, root_destroyer = self.refresh_droot_impact()
try:
return [root_destroyer[droot[r]]]
except Exception:
return []
fgraph.destroyers = get_destroyers_of
fgraph.destroy_handler = self fgraph.destroy_handler = self
self.fgraph = fgraph self.fgraph = fgraph
...@@ -743,6 +736,15 @@ class DestroyHandler(toolbox.Bookkeeper): ...@@ -743,6 +736,15 @@ class DestroyHandler(toolbox.Bookkeeper):
if self.do_imports_on_attach: if self.do_imports_on_attach:
toolbox.Bookkeeper.on_attach(self, fgraph) toolbox.Bookkeeper.on_attach(self, fgraph)
def unpickle(self, fgraph):
def get_destroyers_of(r):
droot, impact, root_destroyer = self.refresh_droot_impact()
try:
return [root_destroyer[droot[r]]]
except Exception:
return []
fgraph.destroyers = get_destroyers_of
def refresh_droot_impact(self): def refresh_droot_impact(self):
""" """
Makes sure self.droot, self.impact, and self.root_destroyer are Makes sure self.droot, self.impact, and self.root_destroyer are
......
...@@ -87,6 +87,11 @@ class FunctionGraph(utils.object2): ...@@ -87,6 +87,11 @@ class FunctionGraph(utils.object2):
#TODO: document what variables are[not] set in the FunctionGraph when a feature #TODO: document what variables are[not] set in the FunctionGraph when a feature
is added via the constructor. How constructed is the FunctionGraph? is added via the constructor. How constructed is the FunctionGraph?
Note: the intermediate nodes between 'inputs' and 'outputs' are not explicitely
passed.
:param inputs: inputs nodes of the graph, usually declared by the user
:param outputs: outputs nodes of the graph.
:param clone: If true, we will clone the graph. This is :param clone: If true, we will clone the graph. This is
useful to remove the constant cache problem. useful to remove the constant cache problem.
...@@ -724,17 +729,42 @@ class FunctionGraph(utils.object2): ...@@ -724,17 +729,42 @@ class FunctionGraph(utils.object2):
return self.__str__() return self.__str__()
### clone ### ### clone ###
def clone(self): def clone(self, check_integrity=True):
"""WRITEME""" """WRITEME"""
return self.clone_get_equiv()[0] return self.clone_get_equiv(check_integrity)[0]
def clone_get_equiv(self): def clone_get_equiv(self, check_integrity=True):
"""WRITEME""" """WRITEME"""
equiv = graph.clone_get_equiv(self.inputs, self.outputs) equiv = graph.clone_get_equiv(self.inputs, self.outputs)
self.check_integrity() if check_integrity:
self.check_integrity()
e = FunctionGraph([equiv[i] for i in self.inputs], e = FunctionGraph([equiv[i] for i in self.inputs],
[equiv[o] for o in self.outputs]) [equiv[o] for o in self.outputs])
e.check_integrity() if check_integrity:
e.check_integrity()
for feature in self._features: for feature in self._features:
e.attach_feature(feature) e.attach_feature(feature)
return e, equiv return e, equiv
def __getstate__(self):
"""This is needed as some feature introduce instancemethod and
this is not pickable.
"""
d = self.__dict__.copy()
for feature in self._features:
for attr in getattr(feature, "pickle_rm_attr", []):
del d[attr]
# The class Updater take fct as parameter and they are lambda function, so unpicklable.
# execute_callbacks_times have reference to optimizer, and they can't
# be pickled as the decorators with parameters aren't pickable.
if "execute_callbacks_times" in d:
del d["execute_callbacks_times"]
return d
def __setstate__(self, dct):
self.__dict__.update(dct)
for feature in self._features:
if hasattr(feature, "unpickle"):
feature.unpickle(self)
...@@ -135,9 +135,14 @@ class Apply(Node): ...@@ -135,9 +135,14 @@ class Apply(Node):
if len(self.outputs) == 1: if len(self.outputs) == 1:
return self.outputs[0] return self.outputs[0]
else: else:
raise AttributeError("%s.default_output should be an output index." % self.op) raise AttributeError(
"%s.default_output should be an output index." % self.op)
elif not isinstance(do, (int, long)):
raise AttributeError("%s.default_output should be an int or long" %
self.op)
elif do < 0 or do >= len(self.outputs): elif do < 0 or do >= len(self.outputs):
raise AttributeError("%s.default_output is out of range." % self.op) raise AttributeError("%s.default_output is out of range." %
self.op)
return self.outputs[do] return self.outputs[do]
def env_getter(self): def env_getter(self):
...@@ -873,6 +878,7 @@ def is_same_graph(var1, var2, givens=None, debug=False): ...@@ -873,6 +878,7 @@ def is_same_graph(var1, var2, givens=None, debug=False):
# Get result from the merge-based function. # Get result from the merge-based function.
rval1 = is_same_graph_with_merge(var1=var1, var2=var2, givens=givens) rval1 = is_same_graph_with_merge(var1=var1, var2=var2, givens=givens)
# Get result from the function `equal_computations` from scan_utils. # Get result from the function `equal_computations` from scan_utils.
use_equal_computations = True use_equal_computations = True
if givens: if givens:
# We need to build the `in_xs` and `in_ys` lists. To do this, we need # We need to build the `in_xs` and `in_ys` lists. To do this, we need
......
...@@ -1024,7 +1024,7 @@ static PyTypeObject lazylinker_ext_CLazyLinkerType = { ...@@ -1024,7 +1024,7 @@ static PyTypeObject lazylinker_ext_CLazyLinkerType = {
static PyObject * get_version(PyObject *dummy, PyObject *args) static PyObject * get_version(PyObject *dummy, PyObject *args)
{ {
PyObject *result = PyFloat_FromDouble(0.20); PyObject *result = PyFloat_FromDouble(0.21);
return result; return result;
} }
......
...@@ -14,7 +14,8 @@ from theano.gof import cmodule ...@@ -14,7 +14,8 @@ from theano.gof import cmodule
_logger = logging.getLogger('theano.gof.lazylinker_c') _logger = logging.getLogger('theano.gof.lazylinker_c')
force_compile = False force_compile = False
version = 0.20 # must match constant returned in function get_version() version = 0.21 # must match constant returned in function get_version()
def try_import(): def try_import():
global lazylinker_ext global lazylinker_ext
...@@ -22,6 +23,7 @@ def try_import(): ...@@ -22,6 +23,7 @@ def try_import():
import lazylinker_ext import lazylinker_ext
del sys.path[0] del sys.path[0]
def try_reload(): def try_reload():
sys.path[0:0] = [config.compiledir] sys.path[0:0] = [config.compiledir]
reload(lazylinker_ext) reload(lazylinker_ext)
......
...@@ -154,9 +154,10 @@ def raise_with_op(node, thunk=None, exc_info=None): ...@@ -154,9 +154,10 @@ def raise_with_op(node, thunk=None, exc_info=None):
else: else:
hints.append( hints.append(
"HINT: Re-running with most Theano optimization disabled could" "HINT: Re-running with most Theano optimization disabled could"
" give you a back-traces when this node was created. This can" " give you a back-trace of when this node was created. This can"
" be done with by setting the Theano flags" " be done with by setting the Theano flag"
" optimizer=fast_compile") " 'optimizer=fast_compile'. If that does not work,"
" Theano optimizations can be disabled with 'optimizer=None'.")
if theano.config.exception_verbosity == 'high': if theano.config.exception_verbosity == 'high':
f = StringIO.StringIO() f = StringIO.StringIO()
...@@ -616,6 +617,7 @@ class PerformLinker(LocalLinker): ...@@ -616,6 +617,7 @@ class PerformLinker(LocalLinker):
f.allow_gc = self.allow_gc #HACK: this is a way of passing an arg to Function.__call__ f.allow_gc = self.allow_gc #HACK: this is a way of passing an arg to Function.__call__
add_clear_storage(f, computed, storage_map) add_clear_storage(f, computed, storage_map)
f.storage_map = storage_map
return f, [Container(input, storage) for input, storage in zip(fgraph.inputs, input_storage)], \ return f, [Container(input, storage) for input, storage in zip(fgraph.inputs, input_storage)], \
[Container(output, storage, True) for output, storage in zip(fgraph.outputs, output_storage)], \ [Container(output, storage, True) for output, storage in zip(fgraph.outputs, output_storage)], \
......
...@@ -13,6 +13,8 @@ __contact__ = "theano-dev <theano-dev@googlegroups.com>" ...@@ -13,6 +13,8 @@ __contact__ = "theano-dev <theano-dev@googlegroups.com>"
__docformat__ = "restructuredtext en" __docformat__ = "restructuredtext en"
import logging import logging
import numpy
import os
import sys import sys
import warnings import warnings
...@@ -974,3 +976,177 @@ int main( int argc, const char* argv[] ) ...@@ -974,3 +976,177 @@ int main( int argc, const char* argv[] )
self.update_self_openmp() self.update_self_openmp()
return super(OpenMPOp, self).make_thunk(node, storage_map, return super(OpenMPOp, self).make_thunk(node, storage_map,
compute_map, no_recycling) compute_map, no_recycling)
class COp(Op):
""" Class to allow an op to have an external C implementation.
An op can use this class by inheriting from it and calling its
__init__() method, providing it with a path to an external file containing
the C implementation and the name of the function, in that file, to call
to perform the computations for the op.
"""
def __init__(self, func_file, func_name):
self.func_file = func_file
self.func_name = func_name
# Define the markers that can be used to delimit sections in the
# external C code
self.support_code_marker = "THEANO_SUPPORT_CODE_SECTION"
self.apply_code_marker = "THEANO_APPLY_CODE_SECTION"
self.c_code_markers = [self.support_code_marker,
self.apply_code_marker]
# Load the external C code
f = open(self.func_file, "r")
self.func_code = f.read()
f.close()
# Separate the contents of the file in sections and validate that at
# lest one of the necessary code sections has been defined
self.code_sections = self.parse_external_c_code(self.func_code)
if sum([marker in self.code_sections.keys()
for marker in self.c_code_markers]) == 0:
raise(RuntimeError, "The provided C implementation does not "
"define a support code section or a support code apply "
"section.")
def parse_external_c_code(self, code):
# Obtain the positions of the C code markers used in the C code
positions = [(code.index(marker), marker)
for marker in self.c_code_markers if marker in code]
# Go over the markers in their order of occurence and extract
# the C code they concern
positions.sort()
code_sections = {}
for i in range(len(positions)):
marker_start, marker = positions[i]
if i < len(positions) - 1:
# This is not the last section in the code : extract the code
# between the beginning of the current marker and the
# beginning of the next one.
next_marker_start = positions[i+1][0]
section = code[marker_start: next_marker_start]
else:
# This is the last section in the code : extract the remaining
# C code
section = code[marker_start:]
cleaned_section = section.replace(marker, "")
code_sections[marker] = cleaned_section
return code_sections
def c_code_cache_version(self):
return hash(self.func_code)
def c_support_code(self):
if self.support_code_marker in self.code_sections:
return self.code_sections[self.support_code_marker]
else:
raise utils.MethodNotDefined("c_support_code",
type(self), self.__class__.__name__)
def c_support_code_apply(self, node, name):
if self.apply_code_marker in self.code_sections:
apply_code = self.code_sections[self.apply_code_marker]
if hasattr(self, 'check_inputs') and self.check_inputs == False:
return apply_code
else:
define_macros, undef_macros = self.get_c_macros(node, name)
return os.linesep.join([define_macros, apply_code,
undef_macros])
else:
raise utils.MethodNotDefined("c_support_code_apply",
type(self), self.__class__.__name__)
def format_c_function_args(self, inp, out):
# Generate an string containing the arguments sent to the external C
# function. The argstring will be of format :
# "input0, input1, input2, &output0, &output1"
return ", ".join(list(inp) + ["&%s" % o for o in out])
def get_c_macros(self, node, name):
define_template = "#define %s %s" + os.linesep
undef_template = "#undef %s" + os.linesep
define_macros = ""
undef_macros = ""
# Extract the various properties of the input and output variables
variables = node.inputs + node.outputs
variable_names = (["INPUT_%i" % i for i in range(len(node.inputs))] +
["OUTPUT_%i" % i for i in range(len(node.inputs))])
variable_dtypes_names = [v.dtype for v in variables]
variable_dtypes = [numpy.dtype(d) for d in variable_dtypes_names]
variable_typenums = [d.num for d in variable_dtypes]
variable_itemsizes = [d.itemsize for d in variable_dtypes]
# Generate dtype macros
for i in range(len(variables)):
macro_name = "DTYPE_" + variable_names[i]
macro_value = "npy_" + variable_dtypes_names[i]
define_macros += define_template % (macro_name, macro_value)
undef_macros += undef_template % macro_name
# Generate typenum macros
for i in range(len(variables)):
macro_name = "TYPENUM_" + variable_names[i]
macro_value = variable_typenums[i]
define_macros += define_template % (macro_name, macro_value)
undef_macros += undef_template % macro_name
# Generate itemsize macros
for i in range(len(variables)):
macro_name = "ITEMSIZE_" + variable_names[i]
macro_value = variable_itemsizes[i]
define_macros += define_template % (macro_name, macro_value)
undef_macros += undef_template % macro_name
# Generate a macro to mark code as being apply-specific
define_macros += define_template % ("APPLY_SPECIFIC(str)",
"str##_%s" % name)
undef_macros += undef_template % "APPLY_SPECIFIC"
return define_macros, undef_macros
def c_code(self, node, name, inp, out, sub):
func_name = self.func_name
func_args = self.format_c_function_args(inp, out)
fail = sub['fail']
# Generate the code to define/undefine the C macros
define_macros, undef_macros = self.get_c_macros(node, name)
# Generate the C code
c_code = """
%(define_macros)s
{
int result = %(func_name)s(%(func_args)s);
if (result != 0)
{
%(fail)s;
}
}
%(undef_macros)s
""" % locals()
return c_code
...@@ -22,8 +22,6 @@ import theano ...@@ -22,8 +22,6 @@ import theano
from theano import config from theano import config
from theano.gof.python25 import any, all, deque from theano.gof.python25 import any, all, deque
#if sys.version_info[:2] >= (2,5):
# from collections import defaultdict
_logger = logging.getLogger('theano.gof.opt') _logger = logging.getLogger('theano.gof.opt')
...@@ -154,7 +152,7 @@ def inplace_optimizer(f): ...@@ -154,7 +152,7 @@ def inplace_optimizer(f):
class SeqOptimizer(Optimizer, list): class SeqOptimizer(Optimizer, list):
#inherit from Optimizer first to get Optimizer.__hash__ # inherit from Optimizer first to get Optimizer.__hash__
"""WRITEME """WRITEME
Takes a list of L{Optimizer} instances and applies them Takes a list of L{Optimizer} instances and applies them
sequentially. sequentially.
...@@ -825,6 +823,68 @@ class LocalOptimizer(object): ...@@ -825,6 +823,68 @@ class LocalOptimizer(object):
(' ' * level), self.__class__.__name__, id(self)) (' ' * level), self.__class__.__name__, id(self))
class LocalSeqOptimizer(LocalOptimizer, list):
"""
This allow to try a group of local optimizer in sequence.
When one do something, we return without trying the following one.
"""
# inherit from Optimizer first to get Optimizer.__hash__
def __init__(self, *opts, **kw):
"""WRITEME"""
if len(opts) == 1 and isinstance(opts[0], (list, tuple)):
opts = opts[0]
self[:] = opts
self.failure_callback = kw.pop('failure_callback', None)
def tracks(self):
t = []
for l in self:
tt = l.tracks()
if tt:
t.extend(tt)
return t
def transform(self, node):
"""Transform a subgraph whose output is `node`.
Subclasses should implement this function so that it returns one of two
kinds of things:
- False to indicate that no optimization can be applied to this `node`;
or
- <list of variables> to use in place of `node`'s outputs in the
greater graph.
- dict(old variables -> new variables). A dictionary that map
from old variables to new variables to replace.
:type node: an Apply instance
"""
for l in self:
ret = l.transform(node)
if ret:
return ret
def add_requirements(self, fgraph):
"""
If this local optimization wants to add some requirements to the
fgraph,
This is the place to do it.
"""
for l in self:
l.add_requirements(fgraph)
def print_summary(self, stream=sys.stdout, level=0, depth=-1):
name = getattr(self, 'name', None)
print >> stream, "%s%s %s id=%i" % (
(' ' * level), self.__class__.__name__, name, id(self))
# This way, -1 will do all depth
if depth != 0:
depth -= 1
for opt in self:
opt.print_summary(stream, level=(level + 2), depth=depth)
class FromFunctionLocalOptimizer(LocalOptimizer): class FromFunctionLocalOptimizer(LocalOptimizer):
"""WRITEME""" """WRITEME"""
def __init__(self, fn, tracks=None, requirements=()): def __init__(self, fn, tracks=None, requirements=()):
...@@ -1241,6 +1301,30 @@ class PatternSub(LocalOptimizer): ...@@ -1241,6 +1301,30 @@ class PatternSub(LocalOptimizer):
# Use the following classes to apply LocalOptimizers # Use the following classes to apply LocalOptimizers
class Updater:
def __init__(self, importer, pruner, chin):
self.importer = importer
self.pruner = pruner
self.chin = chin
def on_import(self, fgraph, node, reason):
if self.importer:
self.importer(node)
def on_prune(self, fgraph, node, reason):
if self.pruner:
self.pruner(node)
def on_change_input(self, fgraph, node, i, r, new_r, reason):
if self.chin:
self.chin(node, i, r, new_r, reason)
def on_detach(self, fgraph):
# To allow pickling this object
self.importer = None
self.pruner = None
self.chin = None
class NavigatorOptimizer(Optimizer): class NavigatorOptimizer(Optimizer):
"""Abstract class """Abstract class
...@@ -1329,18 +1413,7 @@ class NavigatorOptimizer(Optimizer): ...@@ -1329,18 +1413,7 @@ class NavigatorOptimizer(Optimizer):
if importer is None and pruner is None: if importer is None and pruner is None:
return None return None
class Updater: u = Updater(importer, pruner, chin)
if importer is not None:
def on_import(self, fgraph, node, reason):
importer(node)
if pruner is not None:
def on_prune(self, fgraph, node, reason):
pruner(node)
if chin is not None:
def on_change_input(self, fgraph, node, i, r, new_r, reason):
chin(node, i, r, new_r, reason)
u = Updater()
fgraph.attach_feature(u) fgraph.attach_feature(u)
return u return u
......
...@@ -223,6 +223,7 @@ class SequenceDB(DB): ...@@ -223,6 +223,7 @@ class SequenceDB(DB):
other tags) fast_run and fast_compile optimizers are drawn is a SequenceDB. other tags) fast_run and fast_compile optimizers are drawn is a SequenceDB.
""" """
seq_opt = opt.SeqOptimizer
def __init__(self, failure_callback=opt.SeqOptimizer.warn): def __init__(self, failure_callback=opt.SeqOptimizer.warn):
super(SequenceDB, self).__init__() super(SequenceDB, self).__init__()
...@@ -256,13 +257,13 @@ class SequenceDB(DB): ...@@ -256,13 +257,13 @@ class SequenceDB(DB):
# the order we want. # the order we want.
opts.sort(key=lambda obj: obj.name) opts.sort(key=lambda obj: obj.name)
opts.sort(key=lambda obj: self.__position__[obj.name]) opts.sort(key=lambda obj: self.__position__[obj.name])
ret = opt.SeqOptimizer(opts, failure_callback=self.failure_callback) ret = self.seq_opt(opts, failure_callback=self.failure_callback)
if hasattr(tags[0], 'name'): if hasattr(tags[0], 'name'):
ret.name = tags[0].name ret.name = tags[0].name
return ret return ret
def print_summary(self, stream=sys.stdout): def print_summary(self, stream=sys.stdout):
print >> stream, "SequenceDB (id %i)" % id(self) print >> stream, self.__class__.__name__ + " (id %i)" % id(self)
positions = self.__position__.items() positions = self.__position__.items()
def c(a, b): def c(a, b):
...@@ -279,6 +280,13 @@ class SequenceDB(DB): ...@@ -279,6 +280,13 @@ class SequenceDB(DB):
return sio.getvalue() return sio.getvalue()
class LocalSequenceDB(SequenceDB):
"""
This generate a local optimizer instead of a global optimizer.
"""
seq_opt = opt.LocalSeqOptimizer
class ProxyDB(DB): class ProxyDB(DB):
""" """
Wrap an existing proxy. Wrap an existing proxy.
......
import pickle
import unittest import unittest
import theano import theano
from theano.gof import CachedConstantError, FunctionGraph from theano.gof import CachedConstantError, FunctionGraph
from theano import tensor as tt
class TFunctionGraph(unittest.TestCase): class TFunctionGraph(unittest.TestCase):
...@@ -15,3 +17,10 @@ class TFunctionGraph(unittest.TestCase): ...@@ -15,3 +17,10 @@ class TFunctionGraph(unittest.TestCase):
v = theano.tensor.constant(1) v = theano.tensor.constant(1)
assert v.cached assert v.cached
FunctionGraph([], [v + 1]) FunctionGraph([], [v + 1])
def test_pickle(self):
v = tt.vector()
func = theano.gof.FunctionGraph([v], [v + 1])
s = pickle.dumps(func)
func2 = pickle.loads(s)
...@@ -20,6 +20,7 @@ from theano import tensor ...@@ -20,6 +20,7 @@ from theano import tensor
from theano.ifelse import ifelse from theano.ifelse import ifelse
import theano import theano
class TestCallbacks(unittest.TestCase): class TestCallbacks(unittest.TestCase):
""" """
Test the VM_Linker's callback argument, which can be useful for debugging. Test the VM_Linker's callback argument, which can be useful for debugging.
...@@ -34,7 +35,7 @@ class TestCallbacks(unittest.TestCase): ...@@ -34,7 +35,7 @@ class TestCallbacks(unittest.TestCase):
def test_callback(self): def test_callback(self):
a, b, c = tensor.scalars('abc') a, b, c = tensor.scalars('abc')
f = function([a,b,c], (a + b) + c, f = function([a, b, c], (a + b) + c,
mode=Mode( mode=Mode(
optimizer=None, optimizer=None,
linker=vm.VM_Linker(callback=self.callback))) linker=vm.VM_Linker(callback=self.callback)))
...@@ -44,13 +45,12 @@ class TestCallbacks(unittest.TestCase): ...@@ -44,13 +45,12 @@ class TestCallbacks(unittest.TestCase):
f(1, 2, 3) f(1, 2, 3)
assert sum(self.n_callbacks.values()) == len(f.maker.fgraph.toposort()) * 2 assert sum(self.n_callbacks.values()) == len(f.maker.fgraph.toposort()) * 2
def test_callback_with_ifelse(self): def test_callback_with_ifelse(self):
a, b, c = tensor.scalars('abc') a, b, c = tensor.scalars('abc')
f = function([a,b,c], ifelse(a, 2*b, 2*c), f = function([a, b, c], ifelse(a, 2*b, 2*c),
mode=Mode( mode=Mode(
optimizer=None, optimizer=None,
linker=vm.VM_Linker(callback=self.callback))) linker=vm.VM_Linker(callback=self.callback)))
f(1, 2, 3) f(1, 2, 3)
assert self.n_callbacks['IfElse'] == 2 assert self.n_callbacks['IfElse'] == 2
...@@ -71,6 +71,7 @@ def test_speed(): ...@@ -71,6 +71,7 @@ def test_speed():
for d in xrange(depth): for d in xrange(depth):
z = (z+z) z = (z+z)
return z return z
def time_numpy(): def time_numpy():
steps_a = 5 steps_a = 5
steps_b = 100 steps_b = 100
...@@ -78,10 +79,10 @@ def test_speed(): ...@@ -78,10 +79,10 @@ def test_speed():
numpy_version(x, steps_a) numpy_version(x, steps_a)
t0 = time.time() t0 = time.time()
#print numpy_version(x, steps_a) # print numpy_version(x, steps_a)
t1 = time.time() t1 = time.time()
t2 = time.time() t2 = time.time()
#print numpy_version(x, steps_b) # print numpy_version(x, steps_b)
t3 = time.time() t3 = time.time()
t_a = t1 - t0 t_a = t1 - t0
t_b = t3 - t2 t_b = t3 - t2
...@@ -94,18 +95,17 @@ def test_speed(): ...@@ -94,18 +95,17 @@ def test_speed():
steps_a = 5 steps_a = 5
steps_b = 100 steps_b = 100
x = tensor.vector() x = tensor.vector()
a = build_graph(x,steps_a) a = build_graph(x, steps_a)
b = build_graph(x,steps_b) b = build_graph(x, steps_b)
f_a = function([x], a, f_a = function([x], a,
mode=Mode(optimizer=None, linker=linker()), mode=Mode(optimizer=None, linker=linker()),
#profile='f_a speed test %s'%name, #profile='f_a speed test %s'%name,
) )
f_b = function([x], b, f_b = function([x], b,
mode=Mode(optimizer=None, linker=linker()), mode=Mode(optimizer=None, linker=linker()),
#profile='f_b speed test %s'%name, #profile='f_b speed test %s'%name,
) )
f_a([2.0, 3.0]) f_a([2.0, 3.0])
t0 = time.time() t0 = time.time()
...@@ -122,17 +122,18 @@ def test_speed(): ...@@ -122,17 +122,18 @@ def test_speed():
t_b = t3 - t2 t_b = t3 - t2
print "%s takes %f s/Kop" % ( print "%s takes %f s/Kop" % (
name, name,
(1000*(t_b-t_a) / (steps_b - steps_a))) (1000*(t_b-t_a) / (steps_b - steps_a)))
time_linker('c|py', OpWiseCLinker) time_linker('c|py', OpWiseCLinker)
time_linker('vmLinker', vm.VM_Linker) time_linker('vmLinker', vm.VM_Linker)
time_linker('vmLinker_nogc', lambda : vm.VM_Linker(allow_gc=False)) time_linker('vmLinker_nogc', lambda: vm.VM_Linker(allow_gc=False))
if theano.config.cxx: if theano.config.cxx:
time_linker('vmLinker_CLOOP', lambda : vm.VM_Linker(allow_gc=False, time_linker('vmLinker_CLOOP', lambda: vm.VM_Linker(allow_gc=False,
use_cloop=True)) use_cloop=True))
time_numpy() time_numpy()
def test_speed_lazy(): def test_speed_lazy():
def build_graph(x, depth=5): def build_graph(x, depth=5):
...@@ -148,17 +149,16 @@ def test_speed_lazy(): ...@@ -148,17 +149,16 @@ def test_speed_lazy():
a = build_graph(x, steps_a) a = build_graph(x, steps_a)
b = build_graph(x, steps_b) b = build_graph(x, steps_b)
f_a = function([x], a, f_a = function([x], a,
mode=Mode(optimizer=None, mode=Mode(optimizer=None,
linker=linker()), linker=linker()),
#profile='f_a lazy ifelse %s'%name, #profile='f_a lazy ifelse %s'%name,
) )
f_b = function([x], b, f_b = function([x], b,
mode=Mode(optimizer=None, mode=Mode(optimizer=None,
linker=linker()), linker=linker()),
#profile='f_b lazy ifelse %s'%name, #profile='f_b lazy ifelse %s'%name,
) )
f_a([2.0]) f_a([2.0])
t0 = time.time() t0 = time.time()
...@@ -179,15 +179,20 @@ def test_speed_lazy(): ...@@ -179,15 +179,20 @@ def test_speed_lazy():
(1000*(t_b-t_a) / (steps_b - steps_a))) (1000*(t_b-t_a) / (steps_b - steps_a)))
time_linker('vmLinker', vm.VM_Linker) time_linker('vmLinker', vm.VM_Linker)
time_linker('vmLinker_nogc', lambda : vm.VM_Linker(allow_gc=False)) time_linker('vmLinker_nogc', lambda: vm.VM_Linker(allow_gc=False))
if theano.config.cxx: if theano.config.cxx:
time_linker('vmLinker_C', lambda : vm.VM_Linker(allow_gc=False, time_linker('vmLinker_C', lambda: vm.VM_Linker(allow_gc=False,
use_cloop=True)) use_cloop=True))
def test_allow_gc_cvm(): def test_allow_gc_cvm():
mode = theano.config.mode
if mode in ['DEBUG_MODE', 'DebugMode']:
mode = "FAST_RUN"
v = theano.tensor.vector() v = theano.tensor.vector()
f = theano.function([v], v + 1) f = theano.function([v], v + 1, mode=mode)
f([1]) f([1])
n = list(f.maker.fgraph.apply_nodes)[0].outputs[0] n = list(f.maker.fgraph.apply_nodes)[0].outputs[0]
assert f.fn.storage_map[n][0] is None assert f.fn.storage_map[n][0] is None
...@@ -262,8 +267,8 @@ if run_memory_usage_tests: ...@@ -262,8 +267,8 @@ if run_memory_usage_tests:
a = build_graph(x, steps_a) a = build_graph(x, steps_a)
f_a = function([x], a, f_a = function([x], a,
mode=Mode(optimizer=None, mode=Mode(optimizer=None,
linker=linker())) linker=linker()))
for i in xrange(100000): for i in xrange(100000):
f_a([2.0]) f_a([2.0])
...@@ -296,8 +301,8 @@ if run_memory_usage_tests: ...@@ -296,8 +301,8 @@ if run_memory_usage_tests:
a = build_graph(x, steps_a) a = build_graph(x, steps_a)
f_a = function([x], a, f_a = function([x], a,
mode=Mode(optimizer=None, mode=Mode(optimizer=None,
linker=linker())) linker=linker()))
for i in xrange(500000): for i in xrange(500000):
f_a([2.0]) f_a([2.0])
......
...@@ -104,7 +104,32 @@ class Bookkeeper(Feature): ...@@ -104,7 +104,32 @@ class Bookkeeper(Feature):
self.on_prune(fgraph, node, 'Bookkeeper.detach') self.on_prune(fgraph, node, 'Bookkeeper.detach')
class GetCheckpoint:
def __init__(self, history, fgraph):
self.h = history
self.fgraph = fgraph
def __call__(self):
return len(self.h.history[self.fgraph])
class LambdExtract:
def __init__(self, fgraph, node, i, r, reason=None):
self.fgraph = fgraph
self.node = node
self.i = i
self.r = r
self.reason = reason
def __call__(self):
return self.fgraph.change_input(self.node, self.i, self.r,
reason=("Revert", self.reason))
class History(Feature): class History(Feature):
pickle_rm_attr = ["checkpoint", "revert"]
def __init__(self): def __init__(self):
self.history = {} self.history = {}
...@@ -114,7 +139,14 @@ class History(Feature): ...@@ -114,7 +139,14 @@ class History(Feature):
raise AlreadyThere("History feature is already present or in" raise AlreadyThere("History feature is already present or in"
" conflict with another plugin.") " conflict with another plugin.")
self.history[fgraph] = [] self.history[fgraph] = []
fgraph.checkpoint = lambda: len(self.history[fgraph]) # Don't call unpickle here, as ReplaceValidate.on_attach()
# call to History.on_attach() will call the
# ReplaceValidate.unpickle and not History.unpickle
fgraph.checkpoint = GetCheckpoint(self, fgraph)
fgraph.revert = partial(self.revert, fgraph)
def unpickle(self, fgraph):
fgraph.checkpoint = GetCheckpoint(self, fgraph)
fgraph.revert = partial(self.revert, fgraph) fgraph.revert = partial(self.revert, fgraph)
def on_detach(self, fgraph): def on_detach(self, fgraph):
...@@ -126,8 +158,7 @@ class History(Feature): ...@@ -126,8 +158,7 @@ class History(Feature):
if self.history[fgraph] is None: if self.history[fgraph] is None:
return return
h = self.history[fgraph] h = self.history[fgraph]
h.append(lambda: fgraph.change_input(node, i, r, h.append(LambdExtract(fgraph, node, i, r, reason))
reason=("Revert", reason)))
def revert(self, fgraph, checkpoint): def revert(self, fgraph, checkpoint):
""" """
...@@ -144,47 +175,66 @@ class History(Feature): ...@@ -144,47 +175,66 @@ class History(Feature):
class Validator(Feature): class Validator(Feature):
pickle_rm_attr = ["validate", "consistent"]
def on_attach(self, fgraph): def on_attach(self, fgraph):
for attr in ('validate', 'validate_time'): for attr in ('validate', 'validate_time'):
if hasattr(fgraph, attr): if hasattr(fgraph, attr):
raise AlreadyThere("Validator feature is already present or in" raise AlreadyThere("Validator feature is already present or in"
" conflict with another plugin.") " conflict with another plugin.")
# Don't call unpickle here, as ReplaceValidate.on_attach()
# call to History.on_attach() will call the
# ReplaceValidate.unpickle and not History.unpickle
fgraph.validate = partial(self.validate_, fgraph)
fgraph.consistent = partial(self.consistent_, fgraph)
def validate(): def unpickle(self, fgraph):
t0 = time.time() fgraph.validate = partial(self.validate_, fgraph)
ret = fgraph.execute_callbacks('validate') fgraph.consistent = partial(self.consistent_, fgraph)
t1 = time.time()
if fgraph.profile:
fgraph.profile.validate_time += t1 - t0
return ret
fgraph.validate = validate
def consistent():
try:
fgraph.validate()
return True
except Exception:
return False
fgraph.consistent = consistent
def on_detach(self, fgraph): def on_detach(self, fgraph):
del fgraph.validate del fgraph.validate
del fgraph.consistent del fgraph.consistent
def validate_(self, fgraph):
t0 = time.time()
ret = fgraph.execute_callbacks('validate')
t1 = time.time()
if fgraph.profile:
fgraph.profile.validate_time += t1 - t0
return ret
def consistent_(self, fgraph):
try:
fgraph.validate()
return True
except Exception:
return False
class ReplaceValidate(History, Validator): class ReplaceValidate(History, Validator):
pickle_rm_attr = ["replace_validate", "replace_all_validate",
"replace_all_validate_remove"] + \
History.pickle_rm_attr + Validator.pickle_rm_attr
def on_attach(self, fgraph): def on_attach(self, fgraph):
History.on_attach(self, fgraph) for attr in ('replace_validate', 'replace_all_validate',
Validator.on_attach(self, fgraph) 'replace_all_validate_remove'):
for attr in ('replace_validate', 'replace_all_validate'):
if hasattr(fgraph, attr): if hasattr(fgraph, attr):
raise AlreadyThere("ReplaceValidate feature is already present" raise AlreadyThere("ReplaceValidate feature is already present"
" or in conflict with another plugin.") " or in conflict with another plugin.")
History.on_attach(self, fgraph)
Validator.on_attach(self, fgraph)
self.unpickle(fgraph)
def unpickle(self, fgraph):
History.unpickle(self, fgraph)
Validator.unpickle(self, fgraph)
fgraph.replace_validate = partial(self.replace_validate, fgraph) fgraph.replace_validate = partial(self.replace_validate, fgraph)
fgraph.replace_all_validate = partial(self.replace_all_validate, fgraph) fgraph.replace_all_validate = partial(self.replace_all_validate,
fgraph)
fgraph.replace_all_validate_remove = partial( fgraph.replace_all_validate_remove = partial(
self.replace_all_validate_remove, fgraph) self.replace_all_validate_remove, fgraph)
...@@ -247,6 +297,12 @@ class ReplaceValidate(History, Validator): ...@@ -247,6 +297,12 @@ class ReplaceValidate(History, Validator):
print >> out, reason, replacements print >> out, reason, replacements
raise ReplacementDidntRemovedError() raise ReplacementDidntRemovedError()
def __getstate__(self):
d = self.__dict__.copy()
if "history" in d:
del d["history"]
return d
class NodeFinder(Bookkeeper): class NodeFinder(Bookkeeper):
......
...@@ -694,7 +694,7 @@ class VM_Linker(link.LocalLinker): ...@@ -694,7 +694,7 @@ class VM_Linker(link.LocalLinker):
if k.owner and k.clients: if k.owner and k.clients:
ls = [] ls = []
for cl in k.clients: for cl in k.clients:
if cl[0] is not 'output': if cl[0] != 'output':
ls += cl[0].outputs ls += cl[0].outputs
dependencies[k] += ls dependencies[k] += ls
return dependencies return dependencies
...@@ -924,7 +924,7 @@ class VM_Linker(link.LocalLinker): ...@@ -924,7 +924,7 @@ class VM_Linker(link.LocalLinker):
self.updated_vars self.updated_vars
) )
vm.storage_map = storage_map vm.storage_map = storage_map
return (vm, return (vm,
[link.Container(input, storage) [link.Container(input, storage)
......
...@@ -356,9 +356,21 @@ def grad(cost, wrt, consider_constant=None, ...@@ -356,9 +356,21 @@ def grad(cost, wrt, consider_constant=None,
disconnected_inputs='raise', add_names=True, disconnected_inputs='raise', add_names=True,
known_grads=None, return_disconnected='zero'): known_grads=None, return_disconnected='zero'):
""" """
:type cost: Scalar (0-dimensional) Variable. Return symbolic gradients for one or more variables with respect to some
cost.
For more information about how automatic differentiation works in Theano,
see :mod:`gradient`. For information on how to implement the gradient of
a certain Op, see :func:`grad`.
:type cost: Scalar (0-dimensional) tensor variable.
May optionally be None if known_grads is provided. May optionally be None if known_grads is provided.
:type wrt: Variable or list of Variables. :param cost: a scalar with respect to which we are differentiating
:type wrt: Tensor variable or list of variables.
:param wrt: term[s] for which we want gradients
:type consider_constant: list of variables
:param consider_constant: a list of expressions not to backpropagate :param consider_constant: a list of expressions not to backpropagate
through through
...@@ -389,9 +401,10 @@ def grad(cost, wrt, consider_constant=None, ...@@ -389,9 +401,10 @@ def grad(cost, wrt, consider_constant=None,
None None
- 'Disconnected' : returns variables of type DisconnectedType - 'Disconnected' : returns variables of type DisconnectedType
:rtype: Variable or list/tuple of Variables (depending upon `wrt`) :rtype: variable or list/tuple of Variables (matching `wrt`)
:return: symbolic expression of gradient of `cost` with respect to `wrt`. :return: symbolic expression of gradient of `cost` with respect to each
of the `wrt` terms.
If an element of `wrt` is not differentiable with respect If an element of `wrt` is not differentiable with respect
to the output, then a zero variable is returned. to the output, then a zero variable is returned.
It returns an object of same type as `wrt`: a list/tuple It returns an object of same type as `wrt`: a list/tuple
...@@ -567,6 +580,33 @@ def subgraph_grad(wrt, end, start=None, cost=None, details=False): ...@@ -567,6 +580,33 @@ def subgraph_grad(wrt, end, start=None, cost=None, details=False):
subgraph_grad as `start` with any other `cost` (e.g. weight subgraph_grad as `start` with any other `cost` (e.g. weight
decay). decay).
In an MLP, we could use subgraph_grad to iteratively backpropagate:
.. code-block:: python
x, t = theano.tensor.fvector('x'), theano.tensor.fvector('t')
w1 = theano.shared(np.random.randn(3,4))
w2 = theano.shared(np.random.randn(4,2))
a1 = theano.tensor.tanh(theano.tensor.dot(x,w1))
a2 = theano.tensor.tanh(theano.tensor.dot(a1,w2))
cost2 = theano.tensor.sqr(a2 - t).sum()
cost2 += theano.tensor.sqr(w2.sum())
cost1 = theano.tensor.sqr(w1.sum())
params = [[w2],[w1]]
costs = [cost2,cost1]
grad_ends = [[a1], [x]]
next_grad = None
param_grads = []
for i in xrange(2):
param_grad, next_grad = theano.subgraph_grad(
wrt=params[i], end=grad_ends[i],
start=next_grad, cost=costs[i]
)
next_grad = dict(zip(grad_ends[i], next_grad))
param_grads.extend(param_grad)
:type wrt: list of variables :type wrt: list of variables
:param wrt: :param wrt:
Gradients are computed with respect to `wrt`. Gradients are computed with respect to `wrt`.
...@@ -593,7 +633,14 @@ def subgraph_grad(wrt, end, start=None, cost=None, details=False): ...@@ -593,7 +633,14 @@ def subgraph_grad(wrt, end, start=None, cost=None, details=False):
: If the gradients of `cost` with respect to any of the `start` : If the gradients of `cost` with respect to any of the `start`
variables is already part of the `start` dictionary, then it may variables is already part of the `start` dictionary, then it may
be counted twice with respect to `wrt` and `end`. be counted twice with respect to `wrt` and `end`.
.. warning::
If the gradients of `cost` with respect to any of the `start`
variables is already part of the `start` dictionary, then it
may be counted twice with respect to `wrt` and `end`.
:type details: bool :type details: bool
:param details: :param details:
When True, additionally returns the list of gradients from When True, additionally returns the list of gradients from
...@@ -605,6 +652,7 @@ def subgraph_grad(wrt, end, start=None, cost=None, details=False): ...@@ -605,6 +652,7 @@ def subgraph_grad(wrt, end, start=None, cost=None, details=False):
:return: Returns lists of gradients with respect to `wrt` and `end`, :return: Returns lists of gradients with respect to `wrt` and `end`,
respectively. respectively.
.. versionadded:: 0.6.1
''' '''
assert ((cost is not None) or (start is not None)) assert ((cost is not None) or (start is not None))
assert isinstance(end, list) assert isinstance(end, list)
......
...@@ -435,7 +435,7 @@ where, each of the optimization do the following things: ...@@ -435,7 +435,7 @@ where, each of the optimization do the following things:
acceptable_ops = (theano.tensor.basic.Dot, acceptable_ops = (theano.tensor.basic.Dot,
theano.tensor.basic.Reshape, theano.tensor.basic.Reshape,
theano.tensor.basic.Shape, theano.tensor.basic.Shape,
theano.tensor.basic.SpecifyShape, theano.tensor.SpecifyShape,
theano.tensor.basic.MaxAndArgmax, theano.tensor.basic.MaxAndArgmax,
theano.tensor.Subtensor, theano.tensor.Subtensor,
theano.tensor.IncSubtensor, theano.tensor.IncSubtensor,
......
...@@ -201,41 +201,43 @@ if __name__ == "__main__": ...@@ -201,41 +201,43 @@ if __name__ == "__main__":
Test time in float32 Test time in float32
cuda version 6.0 5.5 5.0 4.2 4.1 4.0 3.2 3.0 # note cuda version 6.5 6.0 5.5 5.0 4.2 4.1 4.0 3.2 3.0 # note
gpu gpu
K6000/NOECC 0.06s K6000/NOECC 0.06s
K40 0.07s K40 0.07s
K20m/ECC 0.07s K20m/ECC 0.07s
K20/NOECC 0.07s K20/NOECC 0.07s
M2090 0.19s M2090 0.19s
C2075 0.25s C2075 0.25s
M2075 0.25s M2075 0.25s
M2070 0.25s 0.27s 0.32s M2070 0.25s 0.27s 0.32s
M2070-Q 0.48s 0.27s 0.32s M2070-Q 0.48s 0.27s 0.32s
M2050(Amazon) 0.25s M2050(Amazon) 0.25s
C1060 0.46s C1060 0.46s
K600 1.04s K600 1.04s
GTX Titan Black 0.05s GTX Titan Black 0.05s
GTX Titan(D15U-50) 0.06s 0.06s don't work GTX Titan(D15U-50) 0.06s 0.06s don't work
GTX 780 0.06s GTX 780 0.06s
GTX 680 0.11s 0.12s 0.154s 0.218s GTX 970 0.08s
GTX 580 0.16s 0.16s 0.164s 0.203s GTX 680 0.11s 0.12s 0.154s 0.218s
GTX 480 0.19s 0.19s 0.192s 0.237s 0.27s GTX 580 0.16s 0.16s 0.164s 0.203s
GTX 470 0.23s 0.23s 0.238s 0.297s 0.34s GTX 480 0.19s 0.19s 0.192s 0.237s 0.27s
GTX 660 0.18s 0.20s 0.23s GTX 750 Ti 0.20s
GTX 560 0.30s GTX 470 0.23s 0.23s 0.238s 0.297s 0.34s
GTX 650 Ti 0.27s GTX 660 0.18s 0.20s 0.23s
GTX 765M 0.27s GTX 560 0.30s
GTX 460 0.37s 0.45s GTX 650 Ti 0.27s
GTX 285 0.42s 0.452s 0.452s 0.40s # cuda 3.0 seems faster? driver version? GTX 765M 0.27s
750M 0.49s GTX 460 0.37s 0.45s
GTX 550 Ti 0.57s GTX 285 0.42s 0.452s 0.452s 0.40s # cuda 3.0 seems faster? driver version?
GT 520 2.68s 3.06s 750M 0.49s
520M 2.44s 3.19s # with bumblebee on Ubuntu 12.04 GTX 550 Ti 0.57s
GT 220 3.80s GT 520 2.68s 3.06s
GT 210 6.35s 520M 2.44s 3.19s # with bumblebee on Ubuntu 12.04
8500 GT 10.68s GT 220 3.80s
GT 210 6.35s
8500 GT 10.68s
""" """
t, impl = execute(not options.print_only, not options.quiet, t, impl = execute(not options.print_only, not options.quiet,
......
...@@ -44,6 +44,8 @@ if MutableSet is not None: ...@@ -44,6 +44,8 @@ if MutableSet is not None:
import weakref import weakref
class Link(object): class Link(object):
# This make that we need to use a different pickle protocol
# then the default. Othewise, there is pickling errors
__slots__ = 'prev', 'next', 'key', '__weakref__' __slots__ = 'prev', 'next', 'key', '__weakref__'
def __getstate__(self): def __getstate__(self):
......
...@@ -102,10 +102,38 @@ def debugprint(obj, depth=-1, print_type=False, ...@@ -102,10 +102,38 @@ def debugprint(obj, depth=-1, print_type=False,
else: else:
raise TypeError("debugprint cannot print an object of this type", raise TypeError("debugprint cannot print an object of this type",
obj) obj)
scan_ops = []
for r in results_to_print: for r in results_to_print:
#Add the parent scan op to the list as well
if hasattr(r.owner, 'op') and isinstance(r.owner.op, theano.scan_module.scan_op.Scan):
scan_ops.append(r)
debugmode.debugprint(r, depth=depth, done=done, print_type=print_type, debugmode.debugprint(r, depth=depth, done=done, print_type=print_type,
file=_file, order=order, ids=ids, file=_file, order=order, ids=ids,
stop_on_name=stop_on_name) scan_ops=scan_ops, stop_on_name=stop_on_name)
if len(scan_ops) > 0:
print >> file, ""
new_prefix = ' >'
new_prefix_child = ' >'
print >> file, "Inner graphs of the scan ops:"
for s in scan_ops:
print >> file, ""
debugmode.debugprint(s, depth=depth, done=done, print_type=print_type,
file=_file, ids=ids,
scan_ops=scan_ops, stop_on_name=stop_on_name)
for idx, i in enumerate(s.owner.op.outputs):
if hasattr(i, 'owner') and hasattr(i.owner, 'op'):
if isinstance(i.owner.op, theano.scan_module.scan_op.Scan):
scan_ops.append(i)
debugmode.debugprint(r=i, prefix=new_prefix, depth=depth, done=done,
print_type=print_type, file=file,
ids=ids, stop_on_name=stop_on_name,
prefix_child=new_prefix_child, scan_ops=scan_ops)
if file is _file: if file is _file:
return file return file
elif file == 'str': elif file == 'str':
...@@ -964,7 +992,7 @@ def pydotprint_variables(vars, ...@@ -964,7 +992,7 @@ def pydotprint_variables(vars,
if nd.owner: if nd.owner:
plot_apply(nd.owner, depth) plot_apply(nd.owner, depth)
try: try:
g.write_png(outfile, prog='dot') g.write(outfile, prog='dot', format=format)
except pd.InvocationException, e: except pd.InvocationException, e:
# Some version of pydot are bugged/don't work correctly with # Some version of pydot are bugged/don't work correctly with
# empty label. Provide a better user error message. # empty label. Provide a better user error message.
...@@ -978,6 +1006,7 @@ def pydotprint_variables(vars, ...@@ -978,6 +1006,7 @@ def pydotprint_variables(vars,
" Theano. Using another version of pydot could" " Theano. Using another version of pydot could"
" fix this problem. The pydot error is: " + " fix this problem. The pydot error is: " +
e.message) e.message)
raise
print 'The output file is available at', outfile print 'The output file is available at', outfile
......
...@@ -3025,15 +3025,17 @@ CudaNdarray_ptr_int_size(PyObject* _unused, PyObject* args) ...@@ -3025,15 +3025,17 @@ CudaNdarray_ptr_int_size(PyObject* _unused, PyObject* args)
{ {
int *gpu_data = (int*)device_malloc(sizeof(int)*2); int *gpu_data = (int*)device_malloc(sizeof(int)*2);
if(gpu_data == NULL){ if(gpu_data == NULL){
return PyErr_Format(PyExc_MemoryError, return NULL;
"CudaNdarray_ptr_int_size: Can't allocate memory on the gpu.");
} }
get_gpu_ptr_size<<<1,1>>>(gpu_data); get_gpu_ptr_size<<<1,1>>>(gpu_data);
if (cudaSuccess != cudaGetLastError()){
cudaError_t cudaErr = cudaGetLastError();
if (cudaSuccess != cudaErr){
device_free(gpu_data); device_free(gpu_data);
return PyErr_Format(PyExc_RuntimeError, return PyErr_Format(PyExc_RuntimeError,
"CudaNdarray_ptr_int_size: error when calling the gpu code."); "CudaNdarray_ptr_int_size: error when calling the gpu code. (%s)",
cudaGetErrorString(cudaErr));
} }
// Transfer the result to cpu // Transfer the result to cpu
......
差异被折叠。
...@@ -586,6 +586,31 @@ def test_dnn_valid(): ...@@ -586,6 +586,31 @@ def test_dnn_valid():
yield t yield t
def test_default_conv():
"""Just test that we introduce the right GPU convolution
version.
"""
img = theano.tensor.ftensor4()
fil = theano.tensor.ftensor4()
c = theano.tensor.nnet.conv2d(img, fil)
f = theano.function([img, fil], c, mode=theano_mode)
if cuda.dnn.dnn_available():
assert any([isinstance(a.op, GpuDnnConv)
for a in f.maker.fgraph.apply_nodes])
else:
assert any([isinstance(a.op, cuda.blas.GpuCorrMM)
for a in f.maker.fgraph.apply_nodes])
mode = theano_mode.excluding('local_gpu_conv', 'local_conv_gemm')
f = theano.function([img, fil], c, mode=mode)
assert any([isinstance(a.op, cuda.blas.GpuConv)
for a in f.maker.fgraph.apply_nodes])
def _test_full(cls, mode=None, version=[-1], extra_shapes=[]): def _test_full(cls, mode=None, version=[-1], extra_shapes=[]):
seed_rng() seed_rng()
shapes = get_basic_shapes() shapes = get_basic_shapes()
...@@ -722,6 +747,10 @@ def test_dnn_subsample(): ...@@ -722,6 +747,10 @@ def test_dnn_subsample():
class TestConv2DGPU(unittest.TestCase): class TestConv2DGPU(unittest.TestCase):
conv_ops = (cuda.blas.GpuConv,
cuda.dnn.GpuDnnConvBase,
cuda.blas.BaseGpuCorrMM)
def test_logical_shapes(self): def test_logical_shapes(self):
seed_rng() seed_rng()
for stride in range(1, 4): for stride in range(1, 4):
...@@ -748,7 +777,7 @@ class TestConv2DGPU(unittest.TestCase): ...@@ -748,7 +777,7 @@ class TestConv2DGPU(unittest.TestCase):
func = theano.function([a, A], image_estimate, mode=theano_mode) func = theano.function([a, A], image_estimate, mode=theano_mode)
#theano.printing.debugprint(func,) #theano.printing.debugprint(func,)
assert any([isinstance(node.op, theano.sandbox.cuda.blas.GpuConv) assert any([isinstance(node.op, self.conv_ops)
for node in func.maker.fgraph.toposort()]) for node in func.maker.fgraph.toposort()])
a_in = numpy.random.randn(*featshp).astype("float32") a_in = numpy.random.randn(*featshp).astype("float32")
......
...@@ -83,7 +83,7 @@ class TestConv2dFFT(unittest.TestCase): ...@@ -83,7 +83,7 @@ class TestConv2dFFT(unittest.TestCase):
# make sure we inserted the fft trickery # make sure we inserted the fft trickery
topo = f_fft.maker.fgraph.toposort() topo = f_fft.maker.fgraph.toposort()
assert sum(isinstance(n.op, theano.sandbox.cuda.fftconv.CuFFTOp) assert sum(isinstance(n.op, theano.sandbox.cuda.fftconv.CuFFTOp)
for n in topo) == 2 for n in topo) == 2, topo
res_ref = f_ref() res_ref = f_ref()
...@@ -112,7 +112,7 @@ class TestConv2dFFT(unittest.TestCase): ...@@ -112,7 +112,7 @@ class TestConv2dFFT(unittest.TestCase):
# make sure we inserted the fft trickery # make sure we inserted the fft trickery
topo = f_fft.maker.fgraph.toposort() topo = f_fft.maker.fgraph.toposort()
assert sum(isinstance(n.op, theano.sandbox.cuda.fftconv.CuFFTOp) assert sum(isinstance(n.op, theano.sandbox.cuda.fftconv.CuFFTOp)
for n in topo) == 2 for n in topo) == 2, topo
res_ref = f_ref() res_ref = f_ref()
res_fft = f_fft() res_fft = f_fft()
......
...@@ -396,7 +396,11 @@ def build_conv_nnet2_classif(use_gpu, isize, ksize, n_batch, ...@@ -396,7 +396,11 @@ def build_conv_nnet2_classif(use_gpu, isize, ksize, n_batch,
if use_gpu: if use_gpu:
# Check that GpuConv is used # Check that GpuConv is used
topo = train.maker.fgraph.toposort() topo = train.maker.fgraph.toposort()
assert len([n for n in topo if isinstance(n.op, tcn.blas.GpuConv)]) > 0 conv_ops = (tcn.blas.GpuConv,
tcn.dnn.GpuDnnConvBase,
tcn.blas.BaseGpuCorrMM)
assert len([n for n in topo if isinstance(n.op, conv_ops)]) > 0
shape_target = (n_batch, n_out) shape_target = (n_batch, n_out)
return train, params, shape_img, shape_target, mode return train, params, shape_img, shape_target, mode
......
...@@ -78,13 +78,17 @@ def safe_to_cpu(x): ...@@ -78,13 +78,17 @@ def safe_to_cpu(x):
return x return x
def op_lifter(OP): def op_lifter(OP, cuda_only=False):
""" """
OP(..., host_from_gpu(), ...) -> host_from_gpu(GpuOP(...)) OP(..., host_from_gpu(), ...) -> host_from_gpu(GpuOP(...))
gpu_from_host(OP(inp0, ...)) -> GpuOP(inp0, ...) gpu_from_host(OP(inp0, ...)) -> GpuOP(inp0, ...)
""" """
def f(maker): def f(maker):
def local_opt(node): def local_opt(node):
dev = theano.sandbox.gpuarray.init_dev.device
if cuda_only and not dev.startswith('cuda'):
return
if type(node.op) in OP: if type(node.op) in OP:
# Either one of our inputs is on the gpu or # Either one of our inputs is on the gpu or
...@@ -484,25 +488,25 @@ def local_gpua_eye(node): ...@@ -484,25 +488,25 @@ def local_gpua_eye(node):
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.CrossentropySoftmaxArgmax1HotWithBias]) @op_lifter([tensor.nnet.CrossentropySoftmaxArgmax1HotWithBias], cuda_only=True)
def local_gpua_crossentropysoftmaxargmax1hotwithbias(node): def local_gpua_crossentropysoftmaxargmax1hotwithbias(node):
return GpuCrossentropySoftmaxArgmax1HotWithBias() return GpuCrossentropySoftmaxArgmax1HotWithBias()
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx]) @op_lifter([tensor.nnet.CrossentropySoftmax1HotWithBiasDx], cuda_only=True)
def local_gpua_crossentropysoftmax1hotwithbiasdx(node): def local_gpua_crossentropysoftmax1hotwithbiasdx(node):
return GpuCrossentropySoftmax1HotWithBiasDx() return GpuCrossentropySoftmax1HotWithBiasDx()
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.Softmax]) @op_lifter([tensor.nnet.Softmax], cuda_only=True)
def local_gpua_softmax(node): def local_gpua_softmax(node):
return GpuSoftmax() return GpuSoftmax()
@register_opt('fast_compile') @register_opt('fast_compile')
@op_lifter([tensor.nnet.SoftmaxWithBias]) @op_lifter([tensor.nnet.SoftmaxWithBias], cuda_only=True)
def local_gpua_softmaxwithbias(node): def local_gpua_softmaxwithbias(node):
return GpuSoftmaxWithBias() return GpuSoftmaxWithBias()
......
...@@ -734,6 +734,13 @@ class GPU_mrg_uniform(mrg_uniform_base, GpuOp): ...@@ -734,6 +734,13 @@ class GPU_mrg_uniform(mrg_uniform_base, GpuOp):
unsigned int threads_per_block = std::min((unsigned int)n_streams_used_in_this_call, (unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK); unsigned int threads_per_block = std::min((unsigned int)n_streams_used_in_this_call, (unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK);
unsigned int n_blocks = std::min(ceil_intdiv((unsigned int)n_streams_used_in_this_call, threads_per_block), (unsigned int)NUM_VECTOR_OP_BLOCKS); unsigned int n_blocks = std::min(ceil_intdiv((unsigned int)n_streams_used_in_this_call, threads_per_block), (unsigned int)NUM_VECTOR_OP_BLOCKS);
if (n_streams > (unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK * (unsigned int)NUM_VECTOR_OP_BLOCKS)
{
PyErr_Format(PyExc_ValueError, "On GPU, n_streams should be at most %%u",
(unsigned int)NUM_VECTOR_OP_THREADS_PER_BLOCK * (unsigned int)NUM_VECTOR_OP_BLOCKS);
%(fail)s;
}
if (threads_per_block * n_blocks < n_streams) if (threads_per_block * n_blocks < n_streams)
{ {
if (! %(nodename)s_printed_warning) if (! %(nodename)s_printed_warning)
...@@ -761,7 +768,7 @@ class GPU_mrg_uniform(mrg_uniform_base, GpuOp): ...@@ -761,7 +768,7 @@ class GPU_mrg_uniform(mrg_uniform_base, GpuOp):
""" % locals() """ % locals()
def c_code_cache_version(self): def c_code_cache_version(self):
return (8,) return (9,)
class GPUA_mrg_uniform(GpuKernelBase, mrg_uniform_base): class GPUA_mrg_uniform(GpuKernelBase, mrg_uniform_base):
......
...@@ -17,6 +17,7 @@ import unittest ...@@ -17,6 +17,7 @@ import unittest
from theano.tests import unittest_tools as utt from theano.tests import unittest_tools as utt
from nose.plugins.skip import SkipTest from nose.plugins.skip import SkipTest
from nose.plugins.attrib import attr from nose.plugins.attrib import attr
from nose.tools import assert_raises
#TODO: test gpu #TODO: test gpu
# Done in test_consistency_GPU_{serial,parallel} # Done in test_consistency_GPU_{serial,parallel}
...@@ -306,6 +307,30 @@ def test_consistency_GPU_parallel(): ...@@ -306,6 +307,30 @@ def test_consistency_GPU_parallel():
assert(numpy.allclose(samples, java_samples)) assert(numpy.allclose(samples, java_samples))
def test_GPU_nstreams_limit():
"""Verify that a ValueError is raised when n_streams
is greater than 2**20 on GPU. This is the value of
(NUM_VECTOR_OP_THREADS_PER_BLOCK * NUM_VECTOR_OP_BLOCKS).
"""
if not cuda_available:
raise SkipTest('Optional package cuda not available')
seed = 12345
R = MRG_RandomStreams(seed=seed, use_cuda=True)
def eval_uniform(size, nstreams):
if theano.config.mode == "FAST_COMPILE":
mode = "FAST_RUN"
else:
mode = None
out = R.uniform(size=size, nstreams=nstreams, dtype='float32')
f = theano.function([], out, mode=mode)
return f()
eval_uniform((10,), 2**20)
assert_raises(ValueError, eval_uniform, (10,), 2**20 + 1)
def test_consistency_GPUA_serial(): def test_consistency_GPUA_serial():
'''Verify that the random numbers generated by GPUA_mrg_uniform, serially, '''Verify that the random numbers generated by GPUA_mrg_uniform, serially,
are the same as the reference (Java) implementation by L'Ecuyer et al. are the same as the reference (Java) implementation by L'Ecuyer et al.
......
...@@ -65,3 +65,6 @@ from theano.tensor.sort import sort, argsort ...@@ -65,3 +65,6 @@ from theano.tensor.sort import sort, argsort
from theano.tensor.extra_ops import (DiffOp, bincount, squeeze, from theano.tensor.extra_ops import (DiffOp, bincount, squeeze,
repeat, bartlett, fill_diagonal, fill_diagonal_offset, repeat, bartlett, fill_diagonal, fill_diagonal_offset,
cumsum, cumprod) cumsum, cumprod)
# SpecifyShape is defined in theano.compile, but should be available in tensor
from theano.compile import SpecifyShape, specify_shape
差异被折叠。
...@@ -1494,11 +1494,11 @@ class GemmOptimizer(Optimizer): ...@@ -1494,11 +1494,11 @@ class GemmOptimizer(Optimizer):
callbacks_before = fgraph.execute_callbacks_times.copy() callbacks_before = fgraph.execute_callbacks_times.copy()
callback_before = fgraph.execute_callbacks_time callback_before = fgraph.execute_callbacks_time
class Updater: def on_import(new_node):
def on_import(self, fgraph, new_node, reason): if new_node is not node:
if new_node is not node: nodelist.append(new_node)
nodelist.append(new_node)
u = Updater() u = theano.gof.opt.Updater(on_import, None, None)
fgraph.attach_feature(u) fgraph.attach_feature(u)
while did_something: while did_something:
nb_iter += 1 nb_iter += 1
......
...@@ -182,10 +182,20 @@ class DimShuffle(Op): ...@@ -182,10 +182,20 @@ class DimShuffle(Op):
input = as_tensor_variable(_input) input = as_tensor_variable(_input)
ib = tuple(input.type.broadcastable) ib = tuple(input.type.broadcastable)
if not ib == self.input_broadcastable: if not ib == self.input_broadcastable:
raise TypeError(( if len(ib) != len(self.input_broadcastable):
"The number of dimensions and/or broadcastable pattern of the " raise TypeError((
"input is incorrect for this op. Expected %s, got %s." "The number of dimensions of the "
% (self.input_broadcastable, ib))) "input is incorrect for this op. Expected %s, got %s."
% (self.input_broadcastable, ib)))
for expected, b in zip(self.input_broadcastable, ib):
if expected is True and b is False:
raise TypeError((
"The broadcastable pattern of the "
"input is incorrect for this op. Expected %s, got %s."
% (self.input_broadcastable, ib)))
#else, expected == b or expected is False and b is True
# Both case are good.
ob = [] ob = []
for value in self.new_order: for value in self.new_order:
if value == 'x': if value == 'x':
......
差异被折叠。
...@@ -65,14 +65,20 @@ def make_constant(args): ...@@ -65,14 +65,20 @@ def make_constant(args):
return tuple(map(conv, args)) return tuple(map(conv, args))
def get_idx_list(inputs, idx_list): def get_idx_list(inputs, idx_list, get_count=False):
''' '''
Given a list of inputs to the subtensor and its idx_list reorders Given a list of inputs to the subtensor and its idx_list reorders
the inputs according to the idx list to get the right values the inputs according to the idx list to get the right values.
If get_counts=True, instead returns the number of inputs consumed
during this process.
''' '''
# The number of indices
n = len(inputs) - 1
# The subtensor (or idx_list) does not depend on the inputs. # The subtensor (or idx_list) does not depend on the inputs.
if len(inputs) == 1: if n == 0:
return tuple(idx_list) return tuple(idx_list)
indices = list(reversed(list(inputs[1:]))) indices = list(reversed(list(inputs[1:])))
...@@ -87,7 +93,10 @@ def get_idx_list(inputs, idx_list): ...@@ -87,7 +93,10 @@ def get_idx_list(inputs, idx_list):
else: else:
return entry return entry
cdata = tuple(map(convert, idx_list)) cdata = tuple(map(convert, idx_list))
return cdata if get_count:
return n - len(indices)
else:
return cdata
def get_canonical_form_slice(theslice, length): def get_canonical_form_slice(theslice, length):
......
差异被折叠。
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论