提交 48c63a85 authored 作者: Frédéric Bastien's avatar Frédéric Bastien

Merge pull request #2069 from abergeron/doc

Doc
...@@ -6,28 +6,26 @@ Making arithmetic Ops on double ...@@ -6,28 +6,26 @@ Making arithmetic Ops on double
Now that we have a ``double`` type, we have yet to use it to perform Now that we have a ``double`` type, we have yet to use it to perform
computations. We'll start by defining multiplication. computations. We'll start by defining multiplication.
.. _op_contract: .. _op_contract:
Op's contract Op's contract
============= =============
An Op (:class:`gof.Op`) is any object which defines the An Op is any object which inherits from :class:`gof.Op`. It has to
following methods: define the following methods.
.. function:: make_node(*inputs) .. function:: make_node(*inputs)
This method is responsible for creating output Variables of a This method is responsible for creating output Variables of a
suitable symbolic Type to serve as the outputs of this Op's application. suitable symbolic Type to serve as the outputs of this Op's
The Variables found in ``*inputs`` must be operated on using Theano's application. The Variables found in ``*inputs`` must be operated on
symbolic language to compute the symbolic output Variables. This method using Theano's symbolic language to compute the symbolic output
should put these outputs into an Apply instance, and return the Variables. This method should put these outputs into an Apply
Apply instance. instance, and return the Apply instance.
This method creates an Apply node representing the application of This method creates an Apply node representing the application of
the Op on the inputs provided. If the Op cannot be applied to the Op on the inputs provided. If the Op cannot be applied to these
these inputs, it must raise an appropriate exception. inputs, it must raise an appropriate exception.
The inputs of the Apply instance returned by this call must be The inputs of the Apply instance returned by this call must be
ordered correctly: a subsequent ``self.make_node(*apply.inputs)`` ordered correctly: a subsequent ``self.make_node(*apply.inputs)``
...@@ -35,10 +33,12 @@ following methods: ...@@ -35,10 +33,12 @@ following methods:
.. function:: perform(node, inputs, output_storage) .. function:: perform(node, inputs, output_storage)
This method computes the function associated to this Op. ``node`` is an Apply node created by the Op's ``make_node`` This method computes the function associated to this Op. ``node`` is
method. ``inputs`` is a list of references to data to operate on using non-symbolic statements, an Apply node created by the Op's ``make_node`` method. ``inputs``
(i.e., statements in Python, Numpy and C languages). ``output_storage`` is a list of storage cells where the is a list of references to data to operate on using non-symbolic
variables of the computation must be put. statements, (i.e., statements in Python, Numpy). ``output_storage``
is a list of storage cells where the variables of the computation
must be put.
More specifically: More specifically:
...@@ -52,20 +52,20 @@ following methods: ...@@ -52,20 +52,20 @@ following methods:
- ``output_storage``: This is a list of storage cells where the output is to be stored. - ``output_storage``: This is a list of storage cells where the output is to be stored.
A storage cell is a one-element list. It is forbidden to change A storage cell is a one-element list. It is forbidden to change
the length of the list(s) contained in ``output_storage``. There is the length of the list(s) contained in ``output_storage``.
one storage cell for each output of the Op. There is one storage cell for each output of the Op.
The data put in ``output_storage`` must match the type of the The data put in ``output_storage`` must match the type of the
symbolic output. This is a situation where the ``node`` argument symbolic output. This is a situation where the ``node`` argument
can come in handy. can come in handy.
A function Mode may allow ``output_storage`` elements to persist between A function Mode may allow ``output_storage`` elements to persist
evaluations, or it may reset ``output_storage`` cells to hold a value of between evaluations, or it may reset ``output_storage`` cells to
``None``. It can also pre-allocate some memory for the Op to use. hold a value of ``None``. It can also pre-allocate some memory
This feature can allow ``perform`` to reuse memory between for the Op to use. This feature can allow ``perform`` to reuse
calls, for example. If there is something preallocated in the memory between calls, for example. If there is something
``output_storage``, it will be of the good dtype, but can have preallocated in the ``output_storage``, it will be of the good
the wrong shape and have any stride pattern. dtype, but can have the wrong shape and have any stride pattern.
This method must be determined by the inputs. That is to say, if This method must be determined by the inputs. That is to say, if
it is evaluated once on inputs A and returned B, then if ever it is evaluated once on inputs A and returned B, then if ever
...@@ -77,6 +77,10 @@ following methods: ...@@ -77,6 +77,10 @@ following methods:
operations <views_and_inplace>` before writing a ``perform`` operations <views_and_inplace>` before writing a ``perform``
implementation that does either of these things. implementation that does either of these things.
Instead (or in addition to) ``perform()`` You can also provide a
:ref:`C implementation <cop>` of For more details, refer to the
documentation for :ref:`op`.
.. function:: __eq__(other) .. function:: __eq__(other)
``other`` is also an Op. ``other`` is also an Op.
...@@ -89,6 +93,10 @@ following methods: ...@@ -89,6 +93,10 @@ following methods:
inputs (same view_map). For more details, see inputs (same view_map). For more details, see
:ref:`views_and_inplace`. :ref:`views_and_inplace`.
.. note::
If you set `__props__`, this will be automatically generated.
.. function:: __hash__() .. function:: __hash__()
If two Op instances compare equal, then they **must** return the If two Op instances compare equal, then they **must** return the
...@@ -98,179 +106,286 @@ following methods: ...@@ -98,179 +106,286 @@ following methods:
lifetime of self. Op instances should be immutable in this lifetime of self. Op instances should be immutable in this
sense. sense.
.. function:: connection_pattern( node ): .. note::
Optional method; sometimes needed for gradient.grad to If you set `__props__`, this will be automatically generated.
work correctly.
Returns a list of list of bools. .. op_optional:
Op.connection_pattern[input_idx][output_idx] is true if the Optional methods or attributes
elements of inputs[input_idx] have an effect on the elements of ==============================
outputs[output_idx].
The ``node`` parameter is needed to determine the number of .. attribute:: __props__
inputs. Some ops such as Subtensor take a variable number of
inputs.
If no connection_pattern is specified, gradient.grad will *Default:* Undefined
assume that all inputs have some elements connected to some
elements of all outputs.
This method conveys two pieces of information that are otherwise Must be a tuple. Lists the name of the attributes which influence
not part of the theano graph: the computation performed. This will also enable the automatic
generation of appropriate __eq__, __hash__ and __str__ methods.
Should be set to `()` if you have no attributes that are relevant to
the computation to generate the methods.
1) Which of the op's inputs are truly ancestors of each of the .. versionadded:: 0.7
op's outputs. Suppose an op has two inputs, x and y, and
outputs f(x) and g(y). y is not really an ancestor of f, but
it appears to be so in the theano graph.
2) Whether the actual elements of each input/output are relevant
to a computation.
For example, the shape op does not read its input's elements,
only its shape metadata. d shape(x) / dx should thus raise
a disconnected input exception (if these exceptions are
enabled).
As another example, the elements of the Alloc op's outputs
are not affected by the shape arguments to the Alloc op.
Failing to implement this function for an op that needs it can .. attribute:: default_output
result in two types of incorrect behavior:
1) gradient.grad erroneously raising a TypeError reporting that *Default:* None
a gradient is undefined.
2) gradient.grad failing to raise a ValueError reporting that If this member variable is an integer, then the default
an input is disconnected. implementation of ``__call__`` will return
``node.outputs[self.default_output]``, where ``node`` was returned
by ``make_node``. Otherwise, the entire list of outputs will be
returned, unless it is of length 1, where the single element will be
returned by itself.
.. function:: make_thunk(node, storage_map, compute_map, no_recycling)
This function must return a thunk, that is a zero-arguments
function that encapsulates the computation to be performed by this
op on the arguments of the node.
:param node: Apply instance
The node for which a thunk is requested.
:param storage_map: dict of lists
This maps variables to a one-element lists holding the variable's
current value. The one-element list acts as pointer to the value
and allows sharing that "pointer" with other nodes and instances.
:param compute_map: dict of lists
This maps variables to one-element lists holding booleans. If
the value is 0 then the variable has not been computed and the
value should not be considered valid. If the value is 1 the
variable has been computed and the value is valid. If the value
is 2 the variable has been garbage-collected and is no longer
valid, but shouldn't be required anymore for this call.
:param no_recycling: WRITEME
WRITEME
The returned function must ensure that is sets the computed
variables as computed in the `compute_map`.
Defining this function removes the requirement for :meth:`perform`
or C code, as you will define the thunk for the computation
yourself.
.. function:: __call__(*inputs, **kwargs)
By default this is a convenience function which calls
:meth:`make_node` with the supplied arguments and returns the
result indexed by `default_output`. This can be overridden by
subclasses to do anything else, but must return either a theano
Variable or a list of Variables.
If you feel the need to override `__call__` to change the graph
based on the arguments, you should instead create a function that
will use your Op and build the graphs that you want and call that
instead of the Op instance directly.
.. function:: infer_shape(node, shapes)
This function is needed for shape optimization. ``shapes`` is a
list with one tuple for each input of the Apply node (which corresponds
to the inputs of the op). Each tuple contains as many elements as the
number of dimensions of the corresponding input. The value of each element
is the shape (number of items) along the corresponding dimension of that
specific input.
While this might sound complicated, it is nothing more than the shape
of each input as symbolic variables (one per dimension).
The function should return a list with one tuple for each output.
Each tuple should contain the corresponding output's computed shape.
Implementing this method will allow Theano to compute the output's
shape without computing the output itself, potentially sparing you
a costly recomputation.
.. function:: flops(inputs, outputs)
It is only used to have more information printed by the memory
profiler. It makes it print the mega flops and giga flops per
second for each apply node. It takes as inputs two lists: one for the
inputs and one for the outputs. They contain tuples that are the
shapes of the corresponding inputs/outputs.
.. function:: __str__()
This allows you to specify a more informative string representation of your
Op. If an Op has parameters, it is highly recommended to have the
``__str__`` method include the name of the op and the Op's parameters'
values.
.. note::
If you set `__props__`, this will be automatically generated.
You can still overide it for custom output.
.. function:: do_constant_folding(node)
*Default:* Return True
By default when optimizations are enabled, we remove during
function compilation Apply nodes whose inputs are all constants.
We replace the Apply node with a Theano constant variable.
This way, the Apply node is not executed at each function
call. If you want to force the execution of an op during the
function call, make do_constant_folding return False.
As done in the Alloc op, you can return False only in some cases by
analyzing the graph from the node parameter.
If you want your op to work with gradient.grad() you also need to
implement the functions described below.
Even if connection_pattern is not implemented correctly, Gradient
if gradient.grad returns an expression, that expression will ========
be numerically correct.
These are the function required to work with gradient.grad().
.. function:: grad(inputs, output_gradients) .. function:: grad(inputs, output_gradients)
Optional (but needed to have it work with gradient.grad()). If the Op being defined is differentiable, its gradient may be
specified symbolically in this method. Both ``inputs`` and
If the Op being defined is differentiable, its gradient may be specified ``output_gradients`` are lists of symbolic Theano Variables and
symbolically in this method. Both ``inputs`` and ``output_gradients`` those must be operated on using Theano's symbolic language. The grad
are lists of symbolic Theano Variables and those must be operated on using method must return a list containing one Variable for each
Theano's symbolic language. The grad method must return a list containing input. Each returned Variable represents the gradient with respect
one Variable for each input. Each returned Variable represents to that input computed based on the symbolic gradients with respect
the gradient with respect to that input computed based on the symbolic gradients with to each output.
respect to each output.
If the output is not differentiable with respect to an input then
If the output is not differentiable with respect to an input this method should be defined to return a variable of type NullType
then this method should be defined to return a variable of type for that input. Likewise, if you have not implemented the grad
NullType for that input. Likewise, if you have not implemented the computation for some input, you may return a variable of type
grad computation for some input, you may return a variable of type NullType for that input. theano.gradient contains convenience
NullType for that input. theano.gradient contains convenience methods methods that can construct the variable for you:
that can construct the variable for you: :func:`theano.gradient.grad_undefined` and :func:`theano.gradient.grad_undefined` and
:func:`theano.gradient.grad_not_implemented`, respectively. :func:`theano.gradient.grad_not_implemented`, respectively.
If an element of output_gradient is of type theano.gradient.DisconnectedType, If an element of output_gradient is of type
it means that the cost is not a function of this output. If any of the theano.gradient.DisconnectedType, it means that the cost is not a
op's inputs participate in the computation of only disconnected outputs, function of this output. If any of the op's inputs participate in
then Op.grad should return DisconnectedType variables for those inputs. the computation of only disconnected outputs, then Op.grad should
return DisconnectedType variables for those inputs.
If the grad method is not defined, then Theano assumes it has been If the grad method is not defined, then Theano assumes it has been
forgotten. Symbolic differentiation will fail on a graph that forgotten. Symbolic differentiation will fail on a graph that
includes this Op. includes this Op.
It must be understood that the Op's grad method is not meant to return the It must be understood that the Op's grad method is not meant to
gradient of the Op's output. theano.tensor.grad computes gradients; Op.grad return the gradient of the Op's output. theano.tensor.grad computes
is a helper function that computes terms that appear in gradients. gradients; Op.grad is a helper function that computes terms that
appear in gradients.
If an Op has a single vector-valued output y and a single vector-valued input x,
then the grad method will be passed x and a second vector z. Define J to be If an Op has a single vector-valued output y and a single
the Jacobian of y with respect to x. The Op's grad method should return vector-valued input x, then the grad method will be passed x and a
dot(J.T,z). When theano.tensor.grad calls the grad method, it will set z to second vector z. Define J to be the Jacobian of y with respect to
be the gradient of the cost C with respect to y. If this op is the only op x. The Op's grad method should return dot(J.T,z). When
that acts on x, then dot(J.T,z) is the gradient of C with respect to x. theano.tensor.grad calls the grad method, it will set z to be the
If there are other ops that act on x, theano.tensor.grad will have to add up gradient of the cost C with respect to y. If this op is the only op
the terms of x's gradient contributed by the other op's grad method. that acts on x, then dot(J.T,z) is the gradient of C with respect to
x. If there are other ops that act on x, theano.tensor.grad will
In practice, an op's input and output are rarely implemented as single vectors. have to add up the terms of x's gradient contributed by the other
Even if an op's output consists of a list containing a scalar, a sparse matrix, op's grad method.
and a 4D tensor, you can think of these objects as being formed by rearranging
a vector. Likewise for the input. In this view, the values computed by the grad In practice, an op's input and output are rarely implemented as
method still represent a Jacobian-vector product. single vectors. Even if an op's output consists of a list
containing a scalar, a sparse matrix, and a 4D tensor, you can think
In practice, it is probably not a good idea to explicitly construct the Jacobian, of these objects as being formed by rearranging a vector. Likewise
which might be very large and very sparse. However, the returned value should for the input. In this view, the values computed by the grad method
be equal to the Jacobian-vector product. still represent a Jacobian-vector product.
So long as you implement this product correctly, you need not understand what In practice, it is probably not a good idea to explicitly construct
theano.tensor.grad is doing, but for the curious the mathematical justification the Jacobian, which might be very large and very sparse. However,
is as follows: the returned value should be equal to the Jacobian-vector product.
In essence, the grad method must simply implement through symbolic Variables So long as you implement this product correctly, you need not
and operations the chain rule of differential calculus. The chain rule understand what theano.tensor.grad is doing, but for the curious the
is the mathematical procedure that allows one to calculate the total derivative mathematical justification is as follows:
:math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
primitive symbolic Variable x found in the list ``inputs``. In essence, the grad method must simply implement through symbolic
The grad method does this using ``output_gradients`` which provides the total Variables and operations the chain rule of differential
derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic Variable calculus. The chain rule is the mathematical procedure that allows
that is returned by the Op (this is provided one to calculate the total derivative :math:`\frac{d C}{d x}` of the
in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the final scalar symbolic Variable C with respect to a primitive
latter with respect to the primitive Variable (this has to be computed). symbolic Variable x found in the list ``inputs``. The grad method
does this using ``output_gradients`` which provides the total
In mathematics, the total derivative of a scalar variable (C) with respect to a vector of derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic
scalar variables (x), i.e. the gradient, is customarily represented as the Variable that is returned by the Op (this is provided in
row vector of the partial derivatives, whereas the total derivative of a vector of ``output_gradients``), as well as the knowledge of the total
scalar variables (f) with respect to another (x), is customarily represented by the matrix of derivative :math:`\frac{d f}{d x}` of the latter with respect to the
the partial derivatives, i.e.the jacobian matrix. In this convenient setting, primitive Variable (this has to be computed).
the chain rule instructs that the gradient of the final scalar variable C with respect
to the primitive scalar variables in x through those in f is simply given by the matrix product: In mathematics, the total derivative of a scalar variable (C) with
:math:`\frac{d C}{d x} = \frac{d C}{d f} * \frac{d f}{d x}`. respect to a vector of scalar variables (x), i.e. the gradient, is
customarily represented as the row vector of the partial
Here, the chain rule must be implemented in a similar but slightly more complex derivatives, whereas the total derivative of a vector of scalar
setting: Theano provides in the list ``output_gradients`` one gradient for each variables (f) with respect to another (x), is customarily
of the Variables returned by the Op. Where f is one such particular Variable, represented by the matrix of the partial derivatives, i.e.the
the corresponding gradient found in ``output_gradients`` and representing jacobian matrix. In this convenient setting, the chain rule
:math:`\frac{d C}{d f}` is provided with a shape similar to f and thus not instructs that the gradient of the final scalar variable C with
necessarily as a row vector of scalars. Furthermore, for each Variable x of respect to the primitive scalar variables in x through those in f is
the Op's list of input variables ``inputs``, the returned gradient representing simply given by the matrix product: :math:`\frac{d C}{d x} = \frac{d
:math:`\frac{d C}{d x}` must have a shape similar to that of Variable x. C}{d f} * \frac{d f}{d x}`.
If the output list of the op is :math:`[f_1, ... f_n]`, then the list Here, the chain rule must be implemented in a similar but slightly
``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]`. more complex setting: Theano provides in the list
If ``inputs`` consists of the list :math:`[x_1, ..., x_m]`, then Op.grad ``output_gradients`` one gradient for each of the Variables returned
should return the list :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`, by the Op. Where f is one such particular Variable, the
where :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and :math:`i` can stand for multiple dimensions). corresponding gradient found in ``output_gradients`` and
representing :math:`\frac{d C}{d f}` is provided with a shape
In other words, :func:`grad` does not return similar to f and thus not necessarily as a row vector of scalars.
:math:`\frac{d f_i}{d x_j}`, but instead the appropriate dot product specified by the chain rule: Furthermore, for each Variable x of the Op's list of input variables
:math:`\frac{d C}{d x_j} = ``inputs``, the returned gradient representing :math:`\frac{d C}{d
\frac{d C}{d f_i} \cdot \frac{d f_i}{d x_j}`. x}` must have a shape similar to that of Variable x.
Both the partial differentiation and the multiplication have to be performed by
:func:`grad`. If the output list of the op is :math:`[f_1, ... f_n]`, then the
list ``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C),
... , grad_{f_n}(C)]`. If ``inputs`` consists of the list
Theano currently imposes the following constraints on the values returned by the grad method: :math:`[x_1, ..., x_m]`, then Op.grad should return the list
:math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`, where
:math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and
:math:`i` can stand for multiple dimensions).
In other words, :func:`grad` does not return :math:`\frac{d f_i}{d
x_j}`, but instead the appropriate dot product specified by the
chain rule: :math:`\frac{d C}{d x_j} = \frac{d C}{d f_i} \cdot
\frac{d f_i}{d x_j}`. Both the partial differentiation and the
multiplication have to be performed by :func:`grad`.
Theano currently imposes the following constraints on the values
returned by the grad method:
1) They must be Variable instances. 1) They must be Variable instances.
2) When they are types that have dtypes, they must never have an integer dtype. 2) When they are types that have dtypes, they must never have an integer dtype.
The output gradients passed *to* Op.grad will also obey these constraints. The output gradients passed *to* Op.grad will also obey these constraints.
Integers are a tricky subject. Integers are the main reason for having DisconnectedType, Integers are a tricky subject. Integers are the main reason for
NullType or zero gradient. When you have an integer as an argument to your grad method, having DisconnectedType, NullType or zero gradient. When you have an
recall the definition of a derivative to help you decide what value to return: integer as an argument to your grad method, recall the definition of
a derivative to help you decide what value to return:
:math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`. :math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
Suppose your function f has an integer-valued output. For most functions you're likely Suppose your function f has an integer-valued output. For most
to implement in theano, this means your gradient should be zero, because f(x+epsilon) functions you're likely to implement in theano, this means your
= f(x) for almost all x. (The only other option is that the gradient could be undefined, gradient should be zero, because f(x+epsilon) = f(x) for almost all
if your function is discontinuous everywhere, like the rational indicator function) x. (The only other option is that the gradient could be undefined,
if your function is discontinuous everywhere, like the rational
Suppose your function f has an integer-valued input. This is a little trickier, because indicator function)
you need to think about what you mean mathematically when you make a variable integer-valued
in theano. Most of the time in machine learning we mean "f is a function of a real-valued Suppose your function f has an integer-valued input. This is a
x, but we are only going to pass in integer-values of x". In this case, f(x+epsilon) exists, little trickier, because you need to think about what you mean
so the gradient through f should be the same whether x is an integer or a floating point mathematically when you make a variable integer-valued in
variable. Sometimes what we mean is "f is a function of an integer-valued x, and f is only theano. Most of the time in machine learning we mean "f is a
defined where x is an integer." Since f(x+epsilon) doesn't exist, the gradient is undefined. function of a real-valued x, but we are only going to pass in
Finally, many times in theano, integer valued inputs don't actually affect the elements of integer-values of x". In this case, f(x+epsilon) exists, so the
the output, only its shape. gradient through f should be the same whether x is an integer or a
floating point variable. Sometimes what we mean is "f is a function
of an integer-valued x, and f is only defined where x is an
integer." Since f(x+epsilon) doesn't exist, the gradient is
undefined. Finally, many times in theano, integer valued inputs
don't actually affect the elements of the output, only its shape.
If your function f has both an integer-valued input and an If your function f has both an integer-valued input and an
integer-valued output, then both rules have to be combined: integer-valued output, then both rules have to be combined:
...@@ -290,63 +405,75 @@ following methods: ...@@ -290,63 +405,75 @@ following methods:
Its gradient is zero almost everywhere, so Op.grad should return Its gradient is zero almost everywhere, so Op.grad should return
zeros in the shape of x and y. zeros in the shape of x and y.
2) f(x,y) = dot product between x and y. x is floating point and y is an integer. 2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
In this case the output is floating point. It doesn't matter that y is an integer. In this case the output is floating point. It doesn't matter
We consider f to still be defined at f(x,y+epsilon). The gradient is exactly the that y is an integer. We consider f to still be defined at
same as if y were floating point. f(x,y+epsilon). The gradient is exactly the same as if y were
floating point.
3) f(x,y) = argmax of x along axis y. 3) f(x,y) = argmax of x along axis y.
The gradient with respect to y is undefined, because f(x,y) is not defined for The gradient with respect to y is undefined, because f(x,y) is
floating point y. How could you take an argmax along a fraActional axis? not defined for floating point y. How could you take an argmax
The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost along a fraActional axis? The gradient with respect to x is
everywhere. 0, because f(x+epsilon, y) = f(x) almost everywhere.
4) f(x,y) = a vector with y elements, each of which taking on the value x 4) f(x,y) = a vector with y elements, each of which taking on the value x
The grad method should return DisconnectedType()() for y, because the elements of The grad method should return DisconnectedType()() for y,
f don't depend on y. Only the shape of f depends on y. You probably also want to because the elements of f don't depend on y. Only the shape of
implement a connection_pattern method to encode this. f depends on y. You probably also want to implement a
connection_pattern method to encode this.
5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float. 5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
gradient with respect to y will be 0.5, even if y is an gradient with respect to y will be 0.5, even if y is an
integer. However, the gradient with respect to x will be 0, integer. However, the gradient with respect to x will be 0,
because the output of f is integer-valued. because the output of f is integer-valued.
.. function:: connection_pattern(node):
.. function:: infer_shape(node, shapes) Sometimes needed for proper operation of gradient.grad().
Optional.
This function is needed for shape optimization. ``shapes`` is a Returns a list of list of bools.
list with one tuple for each input of the Apply node (which corresponds
to the inputs of the op). Each tuple contains as many elements as the
number of dimensions of the corresponding input. The value of each element
is the shape (number of items) along the corresponding dimension of that
specific input.
While this might sound complicated, it is nothing more than the shape Op.connection_pattern[input_idx][output_idx] is true if the
of each input as symbolic variables (one per dimension). elements of inputs[input_idx] have an effect on the elements of
outputs[output_idx].
The function should return a list with one tuple for each output. The ``node`` parameter is needed to determine the number of
Each tuple should contain the corresponding output's computed shape. inputs. Some ops such as Subtensor take a variable number of
inputs.
Implementing this method will allow Theano to compute the output's If no connection_pattern is specified, gradient.grad will
shape without computing the output itself, potentially sparing you assume that all inputs have some elements connected to some
a costly recomputation. elements of all outputs.
.. function:: flops(inputs, outputs) This method conveys two pieces of information that are otherwise
not part of the theano graph:
Optional. 1) Which of the op's inputs are truly ancestors of each of the
op's outputs. Suppose an op has two inputs, x and y, and
outputs f(x) and g(y). y is not really an ancestor of f, but
it appears to be so in the theano graph.
2) Whether the actual elements of each input/output are relevant
to a computation.
For example, the shape op does not read its input's elements,
only its shape metadata. d shape(x) / dx should thus raise
a disconnected input exception (if these exceptions are
enabled).
As another example, the elements of the Alloc op's outputs
are not affected by the shape arguments to the Alloc op.
It is only used to have more information printed by the memory Failing to implement this function for an op that needs it can
profiler. It makes it print the mega flops and giga flops per result in two types of incorrect behavior:
second for each apply node. It takes as inputs two lists: one for the
inputs and one for the outputs. They contain tuples that are the
shapes of the corresponding inputs/outputs.
.. function:: make_thunk(node, storage_map, compute_map, no_recycling) 1) gradient.grad erroneously raising a TypeError reporting that
a gradient is undefined.
2) gradient.grad failing to raise a ValueError reporting that
an input is disconnected.
TODO Even if connection_pattern is not implemented correctly, if
gradient.grad returns an expression, that expression will be
numerically correct.
.. function:: R_op(inputs, eval_points) .. function:: R_op(inputs, eval_points)
Optional. Optional, to work with gradient.R_op().
This function implements the application of the R-operator on the This function implements the application of the R-operator on the
function represented by your op. Let assume that function is :math:`f`, function represented by your op. Let assume that function is :math:`f`,
...@@ -373,54 +500,6 @@ following methods: ...@@ -373,54 +500,6 @@ following methods:
the outputs) back to their corresponding shapes and return them as the the outputs) back to their corresponding shapes and return them as the
output of the :func:`R_op` method. output of the :func:`R_op` method.
.. attribute:: default_output
*Default:* None
If this member variable is an integer, then the default
implementation of ``__call__`` will return
``node.outputs[self.default_output]``, where ``node`` was returned
by ``make_node``. Otherwise, the entire list of outputs will be
returned.
.. function:: __call__(*inputs)
Syntactic shortcut to make_node which returns the output
Variables of the Op.
*Default:* this is implemented in the parent class and you do not need to change it.
.. function:: __str__()
*Default:* python default: module_path_to_your_class.CLASSNAME
This allows you to specify a more informative string representation of your
Op. If an Op has parameters, it is highly recommended to have the
``__str__`` method include the name of the op and the Op's parameters'
values.
.. function:: do_constant_folding(node)
*Default:* Return True
By default when optimizations are enabled, we remove during
function compilation Apply nodes whose inputs are all constants.
We replace the Apply node with a Theano constant variable.
This way, the Apply node is not executed at each function
call. If you want to force the execution of an op during the
function call, make do_constant_folding return False.
As done in the Alloc op, you can return False only in some cases by
analyzing the graph from the node parameter.
At a bare minimum, a new Op must define ``make_node`` and ``perform``, which
have no defaults.
You can also provide a :ref:`C implementation <cop>` of
``perform()``. For more details, refer to the documentation for
:ref:`op`.
Defining an Op: ``mul`` Defining an Op: ``mul``
======================= =======================
...@@ -442,12 +521,9 @@ First, we'll instantiate a ``mul`` Op: ...@@ -442,12 +521,9 @@ First, we'll instantiate a ``mul`` Op:
This function must take as many arguments as the operation we are This function must take as many arguments as the operation we are
defining is supposed to take as inputs---in this example that would be defining is supposed to take as inputs---in this example that would be
two. two. This function ensures that both inputs have the ``double`` type.
This function ensures that both inputs have the ``double`` Since multiplying two doubles yields a double, this function makes an
type. Apply node with an output Variable of type ``double``.
Since multiplying two doubles yields a double,
this function makes an Apply node with an output Variable of type
``double``.
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_extending.test_extending_1 .. theano/tests/test_tutorial.py:T_extending.test_extending_1
......
...@@ -67,16 +67,17 @@ installation and configuration, see :ref:`installing Theano <install>`. ...@@ -67,16 +67,17 @@ installation and configuration, see :ref:`installing Theano <install>`.
Status Status
====== ======
.. image:: https://secure.travis-ci.org/Theano/Theano.png?branch=master .. raw:: html
:target: http://travis-ci.org/Theano/Theano/builds
.. image:: https://pypip.in/v/Theano/badge.png <a href="http://travis-ci.org/Theano/Theano/builds"><img src="https://secure.travis-ci.org/Theano/Theano.png?branch=master" /></a>&nbsp;
:target: https://crate.io/packages/Theano/
:alt: Latest PyPI version
.. image:: https://pypip.in/d/Theano/badge.png .. raw:: html
:target: https://crate.io/packages/Theano/
:alt: Number of PyPI downloads <a href="https://crate.io/packages/Theano/"><img src="https://pypip.in/v/Theano/badge.png" alt="Latest PyPI version" /></a>&nbsp;
.. raw:: html
<a href="https://crate.io/packages/Theano/"><img src="https://pypip.in/d/Theano/badge.png" alt="Number of PyPI downloads" /></a>&nbsp;
.. _available on PyPI: http://pypi.python.org/pypi/Theano .. _available on PyPI: http://pypi.python.org/pypi/Theano
.. _Related Projects: https://github.com/Theano/Theano/wiki/Related-projects .. _Related Projects: https://github.com/Theano/Theano/wiki/Related-projects
......
.. ../../../../theano/sandbox/linalg/ops.py .. ../../../../theano/sandbox/linalg/ops.py
.. ../../../../theano/sandbox/linalg .. ../../../../theano/sandbox/linalg
.. _libdoc_linalg: .. _libdoc_sandbox_linalg:
=================================================================== ===================================================================
:mod:`sandbox.linalg` -- Linear Algebra Ops :mod:`sandbox.linalg` -- Linear Algebra Ops
......
...@@ -32,18 +32,20 @@ TODO: Give examples on how to use these things! They are pretty complicated. ...@@ -32,18 +32,20 @@ TODO: Give examples on how to use these things! They are pretty complicated.
Most of the more efficient GPU implementations listed below can be used Most of the more efficient GPU implementations listed below can be used
as an automatic replacement for nnet.conv2d by enabling specific graph as an automatic replacement for nnet.conv2d by enabling specific graph
optimizations. optimizations.
- :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` - :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` This
This is a GPU-only version of nnet.conv2d that uses an FFT transform is a GPU-only version of nnet.conv2d that uses an FFT transform
to perform the work. conv2d_fft should not be called directly as it to perform the work. conv2d_fft should not be used directly as
does not provide a gradient. Instead, use nnet.conv2d and allow it does not provide a gradient. Instead, use nnet.conv2d and
Theano's graph optimizer to replace it by the FFT version by setting allow Theano's graph optimizer to replace it by the FFT version
``THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full`` by setting
'THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full'
in your environement. This is not enabled by default because it in your environement. This is not enabled by default because it
has some restrictions on input and uses a lot more memory. Also note has some restrictions on input and uses a lot more memory. Also
that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and PyCUDA to run. note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and
To deactivate the FFT optimization on a specific nnet.conv2d PyCUDA to run. To deactivate the FFT optimization on a specific
while the optimization flags are active, you can set its ``version`` nnet.conv2d while the optimization flags are active, you can set
parameter to ``'no_fft'``. To enable it for just one Theano function: its ``version`` parameter to ``'no_fft'``. To enable it for just
one Theano function:
.. code-block:: python .. code-block:: python
......
.. ../../../../theano/sandbox/slinalg.py .. ../../../../theano/sandbox/slinalg.py
.. _libdoc_linalg: .. _libdoc_slinalg:
=================================================================== ===================================================================
:mod:`tensor.slinalg` -- Linear Algebra Ops Using Scipy :mod:`tensor.slinalg` -- Linear Algebra Ops Using Scipy
......
...@@ -42,25 +42,22 @@ Inputs and Outputs are lists of Theano variables. ...@@ -42,25 +42,22 @@ Inputs and Outputs are lists of Theano variables.
how to make a quality contribution. how to make a quality contribution.
Op Contract Op Structure
=========== ============
This an overview of the methods you typically have to implement to
make a new op. It does not provide extensive coverage of all the
possibilities you may encounter or need. For that refer to
:ref:`op_contract`.
.. code-block:: python .. code-block:: python
import theano import theano
class MyOp(theano.Op): class MyOp(theano.Op):
def make_node(self, *inputs): __props__ = ()
pass
def __eq__(self, other): def make_node(self, *inputs):
pass
def __hash__(self):
pass
def __str__(self):
pass pass
# Python implementation: # Python implementation:
...@@ -72,11 +69,13 @@ Op Contract ...@@ -72,11 +69,13 @@ Op Contract
# ... # ...
pass pass
# others implementation (pycuda, ...): # Other implementations (pycuda, ...):
def make_thunk(self, node, storage_map, _, _2): def make_thunk(self, node, storage_map, _, _2):
pass pass
# optional: # optional:
check_input = True
def __init__(self, ...): def __init__(self, ...):
pass pass
...@@ -89,43 +88,47 @@ Op Contract ...@@ -89,43 +88,47 @@ Op Contract
def infer_shape(node, (i0_shapes, ...)): def infer_shape(node, (i0_shapes, ...)):
pass pass
def flops(self, inputs, outputs):
pass
check_input = True
.. ../extending/op.txt .. ../extending/op.txt
There are two mandatory methods that one needs to implement. There are two mandatory methods that one needs to implement. The
The first one is :func:`make_node`. The second one first one is :func:`make_node`. The second one would describe the
would describe the computations that are required to be done computations that are required to be done at run time. Currently there
at run time. Currently there are 2 different possibilites: are 2 different possibilites: implement the :func:`perform` and/or
implement the :func:`perform` :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
and/or :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods <cop>`), or the :func:`make_thunk` method. ``perform`` allows to
<cop>`), or the :func:`make_thunk` method. ``perform`` allows easily wrap an existing Python function into Theano. ``c_code`` and
to easily wrap an existing Python function into Theano. ``c_code`` the related methods allow the op to generate C code that will be
and the related methods allow the op to generate C code that will be compiled and linked by Theano. On the other hand, ``make_thunk`` will
compiled and linked by Theano. On the other hand, ``make_thunk`` be called only once during compilation and should generate a
will be called only once during compilation and should generate ``thunk``: a standalone function that when called will do the wanted
a ``thunk``: a standalone function that when called will do the wanted computations. computations. This is useful if you want to generate code and compile
This is useful if you want to generate code and compile it yourself. For it yourself. For example, this allows you to use PyCUDA to compile GPU
example, this allows you to use PyCUDA to compile GPU code. code.
Also there are two methods whose implementations are highly recommended. They are The :attr:`__props__` attribute serves to make Op generate an
needed in order to merge duplicate computations involving your op. So if you appropriate :func:`__eq__` and :func:`__hash__` for your Op. It must
do not want Theano to execute your op multiple times with the same inputs, be a tuple that lists the properties that influence how the
do implement them. Those methods are :func:`__eq__` and computation is performed (Ususally these are those that you set in
:func:`__hash__`. :func:`__init__`). If you don't have any properties, then you should
set this attribute to the emtpy tuple `()`. It will also generate a
suitable :func:`__str__` for your op. This requires development
version after September 1st, 2014 or version 0.7.
:func:`__eq__` and :func:`__hash__` will be used by the optimization
phase to merge nodes that are doing a equivalent compuation (same
inputs, same operation). It is especially important that two Ops that
compare equal (have the same values for all the properties listed in
__props__ and the same type) compute the same thing when presented
with the same inputs.
Also note that this attribute will also generate a suitable
:func:`__str__` method for your Op. You may override this default
with a custom one if you want another format for the output.
The :func:`infer_shape` method allows to infer the shape of some variable, somewhere in the The :func:`infer_shape` method allows to infer the shape of some variable, somewhere in the
middle of the computational graph without actually computing the outputs (when possible). middle of the computational graph without actually computing the outputs (when possible).
This could be helpful if one only needs the shape of the output instead of the actual outputs. This could be helpful if one only needs the shape of the output instead of the actual outputs.
The :func:`flops` method allows to have the number of mega flops and
giga flops per second printed by the memory profiler. It takes as
inputs two lists: one for the inputs and one for the outputs. They
contain tuples that are the shapes of the corresponding inputs/outputs.
The :func:`grad` method is required if you want to differentiate some cost whose expression The :func:`grad` method is required if you want to differentiate some cost whose expression
includes your op. includes your op.
...@@ -135,8 +138,9 @@ string representation of your op. ...@@ -135,8 +138,9 @@ string representation of your op.
The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
work with your op. work with your op.
The optional boolean :func:'check_input' attribute is used to specify if you want the types used in The optional boolean :attr:`check_input` attribute is used to specify
your op to check their inputs in their c_code. It can be used to speed up compilation, reduce overhead if you want the types used in your op to check their inputs in their
c_code. It can be used to speed up compilation, reduce overhead
(particularly for scalars) and reduce the number of generated C files. (particularly for scalars) and reduce the number of generated C files.
Op Example Op Example
...@@ -147,16 +151,11 @@ Op Example ...@@ -147,16 +151,11 @@ Op Example
import theano import theano
class DoubleOp(theano.Op): class DoubleOp(theano.Op):
def __eq__(self, other): __props__ = ()
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, x): def make_node(self, x):
# check that the theano version has support for __props__
assert hasattr(self, '_props')
x = theano.tensor.as_tensor_variable(x) x = theano.tensor.as_tensor_variable(x)
return theano.Apply(self, [x], [x.type()]) return theano.Apply(self, [x], [x.type()])
...@@ -327,24 +326,27 @@ For instance, to verify the Rop method of the DoubleOp, you can use this: ...@@ -327,24 +326,27 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
Testing GPU Ops Testing GPU Ops
--------------- ---------------
Ops to be executed on the GPU should inherit from the ``theano.sandbox.cuda.GpuOp`` Ops to be executed on the GPU should inherit from the
and not ``theano.Op``. This allows Theano to distinguish them. Currently, we ``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
use this to test if the NVIDIA driver works correctly with our sum reduction code on the Theano to distinguish them. Currently, we use this to test if the
GPU. NVIDIA driver works correctly with our sum reduction code on the GPU.
Running Your Tests Running Your Tests
================== ==================
To perform your tests, you may select either one of the three following methods: To perform your tests, you may select either one of the three
following methods:
theano-nose theano-nose
----------- -----------
The method of choice to conduct tests is to run the file ``theano-nose``. In a regular The method of choice to conduct tests is to run the file
Theano installation, the latter will be on the operating system's path and directly accessible ``theano-nose``. In a regular Theano installation, the latter will be
from any folder. Otherwise, it can be accessed in the ``Theano/bin`` folder. The following command on the operating system's path and directly accessible from any
lines may be used for the corresponding purposes: folder. Otherwise, it can be accessed in the ``Theano/bin``
folder. The following command lines may be used for the corresponding
purposes:
* ``theano-nose --theano``: Run every test found in Theano's path. * ``theano-nose --theano``: Run every test found in Theano's path.
...@@ -352,23 +354,25 @@ lines may be used for the corresponding purposes: ...@@ -352,23 +354,25 @@ lines may be used for the corresponding purposes:
* ``theano-nose test_file.py``: Run every test found in the file *test_file.py*. * ``theano-nose test_file.py``: Run every test found in the file *test_file.py*.
The following are particularly useful for development purposes since they call for The following are particularly useful for development purposes since
particular classes or even for particular tests: they call for particular classes or even for particular tests:
* ``theano-nose test_file.py:test_DoubleRop``: Run every test found inside the class *test_DoubleRop*. * ``theano-nose test_file.py:test_DoubleRop``: Run every test found inside the class *test_DoubleRop*.
* ``theano-nose test_file.py:test_DoubleRop.test_double_op``: Run only the test *test_double_op* * ``theano-nose test_file.py:test_DoubleRop.test_double_op``: Run only the test *test_double_op*
in the class *test_DoubleRop*. in the class *test_DoubleRop*.
Help with the use and functionalities of ``theano-nose`` may be obtained by running Help with the use and functionalities of ``theano-nose`` may be
it with the command line parameter ``--help (-h)``. obtained by running it with the command line parameter ``--help
(-h)``.
nosetests nosetests
--------- ---------
The command ``nosetests`` can also be used. Although it lacks the useful The command ``nosetests`` can also be used. Although it lacks the
functionalities that ``theano-nose`` provides, ``nosetests`` can be called similarly useful functionalities that ``theano-nose`` provides, ``nosetests``
to ``theano-nose`` from any folder in Python's path like so: can be called similarly to ``theano-nose`` from any folder in Python's
path like so:
``nosetests [suffix similar to the above]``. ``nosetests [suffix similar to the above]``.
...@@ -378,9 +382,10 @@ More documentation on ``nosetests`` is available here: ...@@ -378,9 +382,10 @@ More documentation on ``nosetests`` is available here:
In-file In-file
------- -------
One may also add a block of code similar to the following at the end of the One may also add a block of code similar to the following at the end
file containing a specific test of interest and run the file. In this example, the test of the file containing a specific test of interest and run the
*test_DoubleRop* in the class *test_double_op* would be performed. file. In this example, the test *test_DoubleRop* in the class
*test_double_op* would be performed.
.. code-block:: python .. code-block:: python
...@@ -407,7 +412,8 @@ Modify and execute to compute: x * y. ...@@ -407,7 +412,8 @@ Modify and execute to compute: x * y.
Modify and execute the example to return two outputs: x + y and x - y. Modify and execute the example to return two outputs: x + y and x - y.
You can omit the Rop functions. Try to implement the testing apparatus described above. You can omit the Rop functions. Try to implement the testing apparatus
described above.
(Notice that Theano's current *elemwise fusion* optimization is (Notice that Theano's current *elemwise fusion* optimization is
only applicable to computations involving a single output. Hence, to gain only applicable to computations involving a single output. Hence, to gain
...@@ -453,6 +459,7 @@ signature: ...@@ -453,6 +459,7 @@ signature:
It converts the python function to a callable object that takes as It converts the python function to a callable object that takes as
inputs Theano variables that were declared. inputs Theano variables that were declared.
as_op Example as_op Example
------------- -------------
......
...@@ -7,8 +7,8 @@ from theano import tensor ...@@ -7,8 +7,8 @@ from theano import tensor
from theano.compat.six import StringIO from theano.compat.six import StringIO
from theano.sandbox.cuda.type import CudaNdarrayType from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda import GpuOp from theano.sandbox.cuda import GpuOp
from theano.sandbox.cuda import as_cuda_ndarray_variable from theano.sandbox.cuda.basic_ops import (as_cuda_ndarray_variable,
from theano.sandbox.cuda.basic_ops import gpu_contiguous gpu_contiguous)
class GpuDot22(GpuOp): class GpuDot22(GpuOp):
......
from theano import Op, Apply from theano import Op, Apply
from theano.compat.six import StringIO from theano.compat.six import StringIO
from theano.sandbox.cuda import GpuOp, as_cuda_ndarray_variable from theano.sandbox.cuda import GpuOp
from theano.sandbox.cuda.basic_ops import as_cuda_ndarray_variable
from theano.sandbox.cuda.kernel_codegen import (nvcc_kernel, from theano.sandbox.cuda.kernel_codegen import (nvcc_kernel,
inline_softmax, inline_softmax,
......
...@@ -1143,11 +1143,12 @@ class GetItem2Lists(gof.op.Op): ...@@ -1143,11 +1143,12 @@ class GetItem2Lists(gof.op.Op):
get_item_2lists = GetItem2Lists() get_item_2lists = GetItem2Lists()
"""Select elements of sparse matrix, returning them in a vector. """Select elements of sparse matrix, returning them in a vector.
:param x: Sparse matrix. :param x: Sparse matrix.
:param index: List of two lists, first list indicating the row
of each element and second list indicating its column.
:return: The corresponding elements in `x`. :param index: List of two lists, first list indicating the row of
each element and second list indicating its column.
:return: The corresponding elements in `x`.
""" """
...@@ -1737,13 +1738,14 @@ class Diag(gof.op.Op): ...@@ -1737,13 +1738,14 @@ class Diag(gof.op.Op):
diag = Diag() diag = Diag()
"""Extract the diagonal of a square sparse matrix as a dense vector. """Extract the diagonal of a square sparse matrix as a dense vector.
:param x: A square sparse matrix in csc format. :param x: A square sparse matrix in csc format.
:return: A dense vector representing the diagonal elements. :return: A dense vector representing the diagonal elements.
:note: The grad implemented is regular, i.e. not structured, since .. note::
the output is a dense vector.
The grad implemented is regular, i.e. not structured, since the
output is a dense vector.
""" """
......
...@@ -863,18 +863,21 @@ class FillDiagonalOffset(gof.Op): ...@@ -863,18 +863,21 @@ class FillDiagonalOffset(gof.Op):
return [wr_a, wr_val,wr_offset] return [wr_a, wr_val,wr_offset]
fill_diagonal_offset = FillDiagonalOffset() fill_diagonal_offset_ = FillDiagonalOffset()
""" Returns a copy of an array with all
def fill_diagonal_offset(a, val, offset):
"""
Returns a copy of an array with all
elements of the main diagonal set to a specified scalar value. elements of the main diagonal set to a specified scalar value.
:param a: Rectangular array of two dimensions. :param a: Rectangular array of two dimensions.
:param val: Scalar value to fill the diagonal whose type must be :param val: Scalar value to fill the diagonal whose type must be
compatible with that of array 'a' (i.e. 'val' cannot be viewed compatible with that of array 'a' (i.e. 'val' cannot be viewed
as an upcast of 'a'). as an upcast of 'a').
:params offset : Scalar value Offset of the diagonal from the main :param offset: Scalar value Offset of the diagonal from the main
diagonal. Can be positive or negative integer. diagonal. Can be positive or negative integer.
:return: An array identical to 'a' except that its offset diagonal :return: An array identical to 'a' except that its offset diagonal
is filled with scalar 'val'. The output is unwrapped. is filled with scalar 'val'. The output is unwrapped.
"""
""" return fill_diagonal_offset_(a, val, offset)
...@@ -496,20 +496,35 @@ def qr(a, mode="full"): ...@@ -496,20 +496,35 @@ def qr(a, mode="full"):
Factor the matrix a as qr, where q Factor the matrix a as qr, where q
is orthonormal and r is upper-triangular. is orthonormal and r is upper-triangular.
Parameters : :type a:
------------ array_like, shape (M, N)
:param a:
a : array_like, shape (M, N)
Matrix to be factored. Matrix to be factored.
mode : {'reduced', 'complete', 'r', 'raw', 'full', 'economic'}, optional :type mode:
one of 'reduced', 'complete', 'r', 'raw', 'full' and
'economic', optional
:keyword mode:
If K = min(M, N), then If K = min(M, N), then
'reduced' : returns q, r with dimensions (M, K), (K, N) (default)
'complete' : returns q, r with dimensions (M, M), (M, N) 'reduced'
'r' : returns r only with dimensions (K, N) returns q, r with dimensions (M, K), (K, N)
'raw' : returns h, tau with dimensions (N, M), (K,)
'full' : alias of 'reduced', deprecated 'complete'
'economic' : returns h from 'raw', deprecated. The options 'reduced', returns q, r with dimensions (M, M), (M, N)
'r'
returns r only with dimensions (K, N)
'raw'
returns h, tau with dimensions (N, M), (K,)
'full'
alias of 'reduced', deprecated (default)
'economic'
returns h from 'raw', deprecated. The options 'reduced',
'complete', and 'raw' are new in numpy 1.8, see the notes for more 'complete', and 'raw' are new in numpy 1.8, see the notes for more
information. The default is 'reduced' and to maintain backward information. The default is 'reduced' and to maintain backward
compatibility with earlier versions of numpy both it and the old compatibility with earlier versions of numpy both it and the old
...@@ -518,21 +533,25 @@ def qr(a, mode="full"): ...@@ -518,21 +533,25 @@ def qr(a, mode="full"):
deprecated. The modes 'full' and 'economic' may be passed using only deprecated. The modes 'full' and 'economic' may be passed using only
the first letter for backwards compatibility, but all others the first letter for backwards compatibility, but all others
must be spelled out. must be spelled out.
Default mode is 'full' which is also default for numpy 1.6.1. Default mode is 'full' which is also default for numpy 1.6.1.
Note: Default mode was left to full as full and reduced are both doing :note: Default mode was left to full as full and reduced are
the same thing in the new numpy version but only full works on the old both doing the same thing in the new numpy version but only
previous numpy version. full works on the old previous numpy version.
Returns :
--------- :rtype q:
q : matrix of float or complex, optional matrix of float or complex, optional
A matrix with orthonormal columns. When mode = 'complete' :return q:
the result is an orthogonal/unitary matrix depending on whether A matrix with orthonormal columns. When mode = 'complete' the
or not a is real/complex. The determinant may be either +/- 1 in that case. result is an orthogonal/unitary matrix depending on whether or
not a is real/complex. The determinant may be either +/- 1 in
r : matrix of float or complex, optional that case.
:rtype r:
matrix of float or complex, optional
:return r:
The upper-triangular matrix. The upper-triangular matrix.
""" """
x = [[2, 1], [3, 4]] x = [[2, 1], [3, 4]]
if isinstance(numpy.linalg.qr(x,mode), tuple): if isinstance(numpy.linalg.qr(x,mode), tuple):
...@@ -549,8 +568,6 @@ class SVD(Op): ...@@ -549,8 +568,6 @@ class SVD(Op):
def __init__(self, full_matrices=True, compute_uv=True): def __init__(self, full_matrices=True, compute_uv=True):
""" """
inputs :
--------
full_matrices : bool, optional full_matrices : bool, optional
If True (default), u and v have the shapes (M, M) and (N, N), If True (default), u and v have the shapes (M, M) and (N, N),
respectively. respectively.
...@@ -582,21 +599,18 @@ def svd(a, full_matrices=1, compute_uv=1): ...@@ -582,21 +599,18 @@ def svd(a, full_matrices=1, compute_uv=1):
""" """
This function performs the SVD on CPU. This function performs the SVD on CPU.
Parameters : :type full_matrices: bool, optional
------------ :param full_matrices:
full_matrices : bool, optional
If True (default), u and v have the shapes (M, M) and (N, N), If True (default), u and v have the shapes (M, M) and (N, N),
respectively. respectively.
Otherwise, the shapes are (M, K) and (K, N), respectively, Otherwise, the shapes are (M, K) and (K, N), respectively,
where K = min(M, N). where K = min(M, N).
compute_uv : bool, optional :type compute_uv: bool, optional
:param compute_uv:
Whether or not to compute u and v in addition to s. Whether or not to compute u and v in addition to s.
True by default. True by default.
Returns : :returns: U, V and D matrices.
-------
U, V and D matrices.
""" """
return SVD(full_matrices, compute_uv)(a) return SVD(full_matrices, compute_uv)(a)
......
...@@ -533,31 +533,33 @@ class Conv3D(theano.Op): ...@@ -533,31 +533,33 @@ class Conv3D(theano.Op):
return strutil.render_string(codeSource,locals()) return strutil.render_string(codeSource,locals())
_conv3D = Conv3D()
conv3D = Conv3D() def conv3D(V, W, b, d):
""" """
3D "convolution" of multiple filters on a minibatch 3D "convolution" of multiple filters on a minibatch
(does not flip the kernel, moves kernel with a user specified stride) (does not flip the kernel, moves kernel with a user specified stride)
:param V: Visible unit, input. :param V: Visible unit, input.
dimensions: (batch, row, column, time, in channel) dimensions: (batch, row, column, time, in channel)
:param W: Weights, filter. :param W: Weights, filter.
dimensions: (out channel, row, column, time ,in channel) dimensions: (out channel, row, column, time ,in channel)
:param b: bias, shape == (W.shape[0],) :param b: bias, shape == (W.shape[0],)
:param d: strides when moving the filter over the input(dx, dy, dt) :param d: strides when moving the filter over the input(dx, dy, dt)
:note: The order of dimensions does not correspond to the one in `conv2d`. :note: The order of dimensions does not correspond to the one in `conv2d`.
This is for optimization. This is for optimization.
:note: The GPU implementation is very slow. You should use :note: The GPU implementation is very slow. You should use
:func:`conv3d2d <theano.tensor.nnet.conv3d2d.conv3d>` for a GPU :func:`conv3d2d <theano.tensor.nnet.conv3d2d.conv3d>` for a
graph instead. GPU graph instead.
:see: Someone made a script that shows how to swap the axes between
both 3d convolution implementations in Theano. See the last
`attachment <https://groups.google.com/d/msg/theano-users/1S9_bZgHxVw/0cQR9a4riFUJ>`_.
:see: Someone made a script that shows how to swap the axes
between both 3d convolution implementations in Theano. See
the last `attachment
<https://groups.google.com/d/msg/theano-users/1S9_bZgHxVw/0cQR9a4riFUJ>`_.
""" """
return _conv3D(V, W, b, d)
def computeH(V,W,b,d): def computeH(V,W,b,d):
assert len(W.shape) == 5 assert len(W.shape) == 5
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论