提交 8c449ca8 authored 作者: Pascal Lamblin's avatar Pascal Lamblin

Extends a bit the documentation on automatic differentiation.

上级 8bbbc82c
......@@ -137,17 +137,23 @@ following methods:
the gradient of the Op's output but rather the gradient of some
other criterion C with respect to the Op's input.
If the outputs of your op are [ f_1, ... f_n], then
``output_derivatives`` gives [ grad_{f_1} C, grad_{f_2} C, ... , grad_{f_n} C ]
If the inputs of your op are [x_1, ..., x_n], then your Op.grad should
return [ grad_{x_1} C, grad_{x_2} C, ..., grad_{x_n} C ]
where (grad_{y} z)_i = partial z / partial y_i (and i can have any
number of dimensions)
(note: in the case where i is 2 dimensional, this definition of grad
If the outputs of your op are :math:`[ f_1, ... f_n]`, then
``output_derivatives`` gives
:math:`[ grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C) ]`.
If the inputs of your op are :math:`[x_1, ..., x_m]`, then your Op.grad
should return :math:`[ grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C) ]`,
where :math:`(grad_{y} z)_i = \frac{\partial z}{\partial y_i}`
(and :math:`i` can have any number of dimensions).
(Note: in the case where i is 2 dimensional, this definition of grad
is different from the standard mathematical definition of the gradient
of a scalar with respect to a matrix, where you transpose the indices)
of a scalar with respect to a matrix, where you transpose the indices.)
In other words, :func:`grad` does not return
:math:`\frac{\partial f_i}{\partial x_j}`, but
:math:`\frac{\partial C}{\partial x_j} =
\frac{\partial C}{\partial f_i} \cdot \frac{\partial f_i}{\partial x_j}`.
Both the partial derivation and that multiplication have to be done by
:func:`grad`.
At a bare minimum, a new Op must define ``make_node`` and ``perform``, which have no defaults.
......
......@@ -18,11 +18,15 @@ awkward to use when :func:`tensor.grad` can do the job.
.. function:: grad_sources_inputs(sources, graph_inputs, warn_type=True)
A gradient source is a pair (``r``, ``g_r``), in which ``r`` is a `Variable`, and ``g_r`` is a
`Variable` that is a gradient wrt ``r``.
A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
``v``. More specifically, ``g_v`` is the gradient of an external
scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
This function traverses the graph backward from the ``r`` sources,
calling ``op.grad(...)`` for all ops with some non-None gradient on an output.
calling ``op.grad(...)`` for all ops with some non-None gradient
on an output, to compute gradients of ``cost`` wrt intermediate
variables and ``graph_inputs``.
The ``op.grad(...)`` functions are called like this:
......@@ -30,14 +34,20 @@ awkward to use when :func:`tensor.grad` can do the job.
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to ``op.grad`` should return a list or tuple: one symbolic gradient per input.
If ``op`` has a single input, then ``op.grad`` should return a list or tuple of length 1.
This call to ``op.grad`` should return a list or tuple: one symbolic
gradient per input. These gradients represent the gradients of
the same implicit ``cost`` mentionned above, wrt ``op.inputs``. Note
that this is **not** the same as the gradient of ``op.outputs`` wrt
``op.inputs``.
For each input wrt to which ``op`` is not differentiable, it should return ``None`` instead
of a `Variable` instance.
If ``op`` has a single input, then ``op.grad`` should return a list
or tuple of length 1.
For each input wrt to which ``op`` is not differentiable, it should
return ``None`` instead of a `Variable` instance.
If a source ``r`` receives a gradient from another source ``r2``,
then the effective gradient on ``r`` is the sum of both gradients.
If a source ``r`` receives a gradient from another source ``r2``, then the effective
gradient on ``r`` is the sum of both gradients.
:type sources: list of pairs of Variable: (v, gradient-on-v) to
initialize the total_gradient dictionary
......
......@@ -1105,10 +1105,14 @@ Gradient / Differentiation
Return symbolic gradients for one or more variables with respect to some
cost.
For more information about how automatic differentiation works in Theano,
see :mod:`gradient`. For information on how to implement the gradient of
a certain Op, see :func:`grad`.
:type cost: 0-d tensor variable
:type wrt: tensor variable or list of tensor variables
:type g_cost: same as `cost`
:type g_cost: same as type of `cost`
:type consider_constant: list of variables
:type warn_type: bool
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论