提交 dec92d76 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

revision doc keepdims et op's contract

上级 d320f322
...@@ -19,12 +19,14 @@ following methods: ...@@ -19,12 +19,14 @@ following methods:
.. function:: make_node(*inputs) .. function:: make_node(*inputs)
This method is responsible for creating output Variables of a This method is responsible for creating output Variables of a
suitable Type to serve as the outputs of this Op's application. suitable symbolic Type to serve as the outputs of this Op's application.
This method should put these outputs into an Apply instance, and The Variables found in ``*inputs`` must be operated on using Theano's
return the Apply instance. symbolic language to compute the symbolic output Variables. This method
should put these outputs into an Apply instance, and return the
Apply instance.
This method creates an Apply node representing the application of This method creates an Apply node representing the application of
the Op on the inputs provided. If the Op cannot be applied on the Op on the inputs provided. If the Op cannot be applied to
these inputs, it must raise an appropriate exception. these inputs, it must raise an appropriate exception.
The inputs of the Apply instance returned by this call must be The inputs of the Apply instance returned by this call must be
...@@ -33,25 +35,27 @@ following methods: ...@@ -33,25 +35,27 @@ following methods:
.. function:: perform(node, inputs, output_storage) .. function:: perform(node, inputs, output_storage)
This method computes the function associated to this Op. The This method computes the function associated to this Op. ``node`` is an Apply node created by the Op's ``make_node``
``node`` is an Apply node created by the Op's ``make_node`` method. ``inputs`` is a list of references to data to operate on using non-symbolic statements,
method, ``inputs`` is a list of references to data to operate on, (i.e., statements in Python, Numpy and C languages). ``output_storage`` is a list of storage cells where the
and ``output_storage`` is a list of storage cells where the variables of the computation must be put.
variables of the computation must be put. More specifically:
More specifically:
- ``node``: This is a reference to an Apply node which was previously - ``node``: This is a reference to an Apply node which was previously
obtained via the ``Op``'s ``make_node`` method. It is typically not obtained via the ``Op``'s ``make_node`` method. It is typically not
used in simple Ops, but it contains symbolic information that used in simple Ops, but it contains symbolic information that
could be required for complex Ops. could be required for complex Ops.
- ``inputs``: This is a list of data. - ``inputs``: This is a list of data from which the values stored in ``output_storage``
are to be computed using non-symbolic language.
- ``output_storage``: This is a list of storage cells. - ``output_storage``: This is a list of storage cells where the output is to be stored.
A storage cell is a one-element list. It is forbidden to change A storage cell is a one-element list. It is forbidden to change
the length of the list(s) contained in ``output_storage``. There is the length of the list(s) contained in ``output_storage``. There is
one storage cell for each output of the Op. one storage cell for each output of the Op.
The data you put in ``output_storage`` must match the type of the The data put in ``output_storage`` must match the type of the
symbolic output. This is a situation where the ``node`` argument symbolic output. This is a situation where the ``node`` argument
can come in handy. can come in handy.
...@@ -96,45 +100,65 @@ following methods: ...@@ -96,45 +100,65 @@ following methods:
.. function:: grad(inputs, output_gradients) .. function:: grad(inputs, output_gradients)
Optional (but needed if you want to have it work with {tensor,sparse}.grad()) Optional (but needed to have it work with {tensor,sparse}.grad()).
If the Op you are defining is differentiable, you can define its
gradient symbolically in this method.
Both the ``inputs`` and ``output_gradients`` will be list of Theano If the Op being defined is differentiable, its gradient may be specified
Variables. This method must return a list containing one Variable symbolically in this method. Both ``inputs`` and ``output_gradients``
(or ``None``) for each input. Each returned Variable represents the are lists of symbolic Theano Variables and those must be operated on using
gradient with respect to that input given the symbolic gradients Theano's symbolic language. The grad method must return a list containing
with respect to each output. one Variable (or ``None``) for each input. Each returned Variable represents
the gradient with respect to that input computed based on the symbolic gradients with
respect to each output.
If the output is not differentiable with respect to any inputs, If the output is not differentiable with respect to any inputs,
then this method should be defined to return ``[None for i in then this method should be defined to return ``[None for i in
inputs]``. inputs]``. If this method is not defined, then Theano assumes it has been
If this method is not defined, then Theano assumes it has been
forgotten. Symbolic differentiation will fail on a graph that forgotten. Symbolic differentiation will fail on a graph that
includes this Op. includes this Op.
It is important to understand that this is not meant to return It must be understood that the grad method is not meant to return the
the gradient of the Op's output but rather the gradient of some gradient of the Op's output but rather the gradient of some other scalar
other criterion C with respect to the Op's input. criterion C with respect to the Op's input.
If the outputs of your op are :math:`[ f_1, ... f_n]`, then In essence, the grad method must simply implement through symbolic Variables
``output_gradients`` is and operations the chain rule of differential calculus. The chain rule
:math:`[ grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C) ]`. is the mathematical procedure that allows to calculate the total derivative
If the inputs of your op are :math:`[x_1, ..., x_m]`, then your Op.grad :math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
should return :math:`[ grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C) ]`, primitive symbolic Variable x found in the list ``inputs``,
where :math:`(grad_{y} z)_i = \frac{\partial z}{\partial y_i}` based on the knowledge of the total derivative :math:`\frac{d C}{d f}` of
(and :math:`i` can have any number of dimensions). C with respect to a symbolic Variable that is returned by the Op (this is provided
(Note: in the case where i is 2 dimensional, this definition of grad in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the
is different from the standard mathematical definition of the gradient latter with respect to the primitive Variable (this has to be computed).
of a scalar with respect to a matrix, where you transpose the indices.)
In Mathematics, the total derivative of a scalar variable (C) with respect to a vector of
scalar variables (x), i.e. the gradient, is customarily represented as the
row vector of the partial derivatives, whereas the total derivative of a vector of
scalar variables (f) with respect to another (x), is customarily represented by the matrix of
the partial derivatives, i.e.the jacobian matrix. In this convenient setting,
the chain rule instructs that the gradient of the final scalar variable C with respect
to the primitive scalar variables in x through those in f is simply given by the matrix product:
:math:`\frac{d C}{d x} = \frac{d C}{d f} * \frac{d f}{d x}`.
Here, the chain rule must be implemented in a similar but slightly more complex
setting: Theano provides in the list ``output_gradients`` one gradient for each
of the Variables returned by the Op. Where f is one such particular Variable,
the corresponding gradient found in ``output_gradients`` and representing
:math:`\frac{d C}{d f}` is provided with a shape similar to f and thus not
necessarily as a row vector of scalars. Furthermore, for each Variable x of
the Op's list of input variables ``inputs``, the returned gradient representing
:math:`\frac{d C}{d x}` must have a shape similar to that of Variable x.
If the output list of the op is :math:`[f_1, ... f_n]`, then the list
``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]`.
If ``inputs`` consists of the list :math:`[x_1, ..., x_m]`, then Op.grad
should return the list :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`,
where :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and :math:`i` can stand for multiple dimensions).
In other words, :func:`grad` does not return In other words, :func:`grad` does not return
:math:`\frac{\partial f_i}{\partial x_j}`, but :math:`\frac{d f_i}{d x_j}`, but instead the appropriate dot product specified by the chain rule:
:math:`\frac{\partial C}{\partial x_j} = :math:`\frac{d C}{d x_j} =
\frac{\partial C}{\partial f_i} \cdot \frac{\partial f_i}{\partial x_j}`. \frac{d C}{d f_i} \cdot \frac{d f_i}{d x_j}`.
Both the partial derivation and that multiplication have to be done by Both the partial differentiation and the multiplication have to be performed by
:func:`grad`. :func:`grad`.
.. function:: infer_shape(node, shapes) .. function:: infer_shape(node, shapes)
......
...@@ -650,16 +650,17 @@ Reductions ...@@ -650,16 +650,17 @@ Reductions
.. function:: max(x, axis=None, keepdims=False) .. function:: max(x, axis=None, keepdims=False)
:Parameter: *x* - symbolic Tensor (or compatible) :Parameter: *x* - symbolic Tensor (or compatible)
:Parameter: *axis* - axis along which to compute the maximum :Parameter: *axis* - axis or axes along which to compute the sum
:Parameter: *keepdims* - (boolean) If this is set to True, the axis which is reduced is :Parameter: *keepdims* - (boolean) If this is set to True, the axes which are reduced are
left in the result as a dimension with size one. With this option, the result left in the result as dimensions with size one. With this option, the result
will broadcast correctly against the original tensor. will broadcast correctly against the original tensor.
:Returns: the maximum value along a given axis :Returns: maximum of *x* along *axis*
:note: see maximum for elemwise max
if axis=None, Theano 0.5rc1 or later: max over the flattened tensor (like numpy) axis can be:
older: then axis is assumed to be ndim(x)-1 * *None* - in which case the maximum is computed along all axes (like numpy)
* an *int* - computed along this axis
* a *list of ints* - computed along these axes
.. function:: argmax(x, axis=None, keepdims=False) .. function:: argmax(x, axis=None, keepdims=False)
...@@ -687,16 +688,17 @@ Reductions ...@@ -687,16 +688,17 @@ Reductions
.. function:: min(x, axis=None, keepdims=False) .. function:: min(x, axis=None, keepdims=False)
:Parameter: *x* - symbolic Tensor (or compatible) :Parameter: *x* - symbolic Tensor (or compatible)
:Parameter: *axis* - axis along which to compute the minimum :Parameter: *axis* - axis or axes along which to compute the sum
:Parameter: *keepdims* - (boolean) If this is set to True, the axis which is reduced is :Parameter: *keepdims* - (boolean) If this is set to True, the axes which are reduced are
left in the result as a dimension with size one. With this option, the result left in the result as dimensions with size one. With this option, the result
will broadcast correctly against the original tensor. will broadcast correctly against the original tensor.
:Returns: the minimum value along a given axis :Returns: minimum of *x* along *axis*
:note: see miminum for elemwise min
if axis=None, Theano 0.5rc1 or later: min over the flattened tensor (like numpy) axis can be:
older: then axis is assumed to be ndim(x)-1 * *None* - in which case the minimum is computed along all axes (like numpy)
* an *int* - computed along this axis
* a *list of ints* - computed along these axes
.. function:: argmin(x, axis=None, keepdims=False) .. function:: argmin(x, axis=None, keepdims=False)
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论