Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
dec92d76
提交
dec92d76
authored
6月 19, 2012
作者:
Eric Larsen
提交者:
Frederic
7月 04, 2012
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
revision doc keepdims et op's contract
上级
d320f322
隐藏空白字符变更
内嵌
并排
正在显示
2 个修改的文件
包含
85 行增加
和
59 行删除
+85
-59
op.txt
doc/extending/op.txt
+67
-43
basic.txt
doc/library/tensor/basic.txt
+18
-16
没有找到文件。
doc/extending/op.txt
浏览文件 @
dec92d76
...
@@ -19,12 +19,14 @@ following methods:
...
@@ -19,12 +19,14 @@ following methods:
.. function:: make_node(*inputs)
.. function:: make_node(*inputs)
This method is responsible for creating output Variables of a
This method is responsible for creating output Variables of a
suitable Type to serve as the outputs of this Op's application.
suitable symbolic Type to serve as the outputs of this Op's application.
This method should put these outputs into an Apply instance, and
The Variables found in ``*inputs`` must be operated on using Theano's
return the Apply instance.
symbolic language to compute the symbolic output Variables. This method
should put these outputs into an Apply instance, and return the
Apply instance.
This method creates an Apply node representing the application of
This method creates an Apply node representing the application of
the Op on the inputs provided. If the Op cannot be applied
on
the Op on the inputs provided. If the Op cannot be applied
to
these inputs, it must raise an appropriate exception.
these inputs, it must raise an appropriate exception.
The inputs of the Apply instance returned by this call must be
The inputs of the Apply instance returned by this call must be
...
@@ -33,25 +35,27 @@ following methods:
...
@@ -33,25 +35,27 @@ following methods:
.. function:: perform(node, inputs, output_storage)
.. function:: perform(node, inputs, output_storage)
This method computes the function associated to this Op. The
This method computes the function associated to this Op. ``node`` is an Apply node created by the Op's ``make_node``
``node`` is an Apply node created by the Op's ``make_node``
method. ``inputs`` is a list of references to data to operate on using non-symbolic statements,
method, ``inputs`` is a list of references to data to operate on,
(i.e., statements in Python, Numpy and C languages). ``output_storage`` is a list of storage cells where the
and ``output_storage`` is a list of storage cells where the
variables of the computation must be put.
variables of the computation must be put. More specifically:
More specifically:
- ``node``: This is a reference to an Apply node which was previously
- ``node``: This is a reference to an Apply node which was previously
obtained via the ``Op``'s ``make_node`` method. It is typically not
obtained via the ``Op``'s ``make_node`` method. It is typically not
used in simple Ops, but it contains symbolic information that
used in simple Ops, but it contains symbolic information that
could be required for complex Ops.
could be required for complex Ops.
- ``inputs``: This is a list of data.
- ``inputs``: This is a list of data from which the values stored in ``output_storage``
are to be computed using non-symbolic language.
- ``output_storage``: This is a list of storage cells.
- ``output_storage``: This is a list of storage cells
where the output is to be stored
.
A storage cell is a one-element list. It is forbidden to change
A storage cell is a one-element list. It is forbidden to change
the length of the list(s) contained in ``output_storage``. There is
the length of the list(s) contained in ``output_storage``. There is
one storage cell for each output of the Op.
one storage cell for each output of the Op.
The data
you
put in ``output_storage`` must match the type of the
The data put in ``output_storage`` must match the type of the
symbolic output. This is a situation where the ``node`` argument
symbolic output. This is a situation where the ``node`` argument
can come in handy.
can come in handy.
...
@@ -96,45 +100,65 @@ following methods:
...
@@ -96,45 +100,65 @@ following methods:
.. function:: grad(inputs, output_gradients)
.. function:: grad(inputs, output_gradients)
Optional (but needed if you want to have it work with {tensor,sparse}.grad())
Optional (but needed to have it work with {tensor,sparse}.grad()).
If the Op you are defining is differentiable, you can define its
gradient symbolically in this method.
Both the ``inputs`` and ``output_gradients`` will be list of Theano
If the Op being defined is differentiable, its gradient may be specified
Variables. This method must return a list containing one Variable
symbolically in this method. Both ``inputs`` and ``output_gradients``
(or ``None``) for each input. Each returned Variable represents the
are lists of symbolic Theano Variables and those must be operated on using
gradient with respect to that input given the symbolic gradients
Theano's symbolic language. The grad method must return a list containing
with respect to each output.
one Variable (or ``None``) for each input. Each returned Variable represents
the gradient with respect to that input computed based on the symbolic gradients with
respect to each output.
If the output is not differentiable with respect to any inputs,
If the output is not differentiable with respect to any inputs,
then this method should be defined to return ``[None for i in
then this method should be defined to return ``[None for i in
inputs]``.
inputs]``. If this method is not defined, then Theano assumes it has been
If this method is not defined, then Theano assumes it has been
forgotten. Symbolic differentiation will fail on a graph that
forgotten. Symbolic differentiation will fail on a graph that
includes this Op.
includes this Op.
It is important to understand that this is not meant to return
It must be understood that the grad method is not meant to return the
the gradient of the Op's output but rather the gradient of some
gradient of the Op's output but rather the gradient of some other scalar
other criterion C with respect to the Op's input.
criterion C with respect to the Op's input.
If the outputs of your op are :math:`[ f_1, ... f_n]`, then
In essence, the grad method must simply implement through symbolic Variables
``output_gradients`` is
and operations the chain rule of differential calculus. The chain rule
:math:`[ grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C) ]`.
is the mathematical procedure that allows to calculate the total derivative
If the inputs of your op are :math:`[x_1, ..., x_m]`, then your Op.grad
:math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
should return :math:`[ grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C) ]`,
primitive symbolic Variable x found in the list ``inputs``,
where :math:`(grad_{y} z)_i = \frac{\partial z}{\partial y_i}`
based on the knowledge of the total derivative :math:`\frac{d C}{d f}` of
(and :math:`i` can have any number of dimensions).
C with respect to a symbolic Variable that is returned by the Op (this is provided
(Note: in the case where i is 2 dimensional, this definition of grad
in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the
is different from the standard mathematical definition of the gradient
latter with respect to the primitive Variable (this has to be computed).
of a scalar with respect to a matrix, where you transpose the indices.)
In Mathematics, the total derivative of a scalar variable (C) with respect to a vector of
scalar variables (x), i.e. the gradient, is customarily represented as the
row vector of the partial derivatives, whereas the total derivative of a vector of
scalar variables (f) with respect to another (x), is customarily represented by the matrix of
the partial derivatives, i.e.the jacobian matrix. In this convenient setting,
the chain rule instructs that the gradient of the final scalar variable C with respect
to the primitive scalar variables in x through those in f is simply given by the matrix product:
:math:`\frac{d C}{d x} = \frac{d C}{d f} * \frac{d f}{d x}`.
Here, the chain rule must be implemented in a similar but slightly more complex
setting: Theano provides in the list ``output_gradients`` one gradient for each
of the Variables returned by the Op. Where f is one such particular Variable,
the corresponding gradient found in ``output_gradients`` and representing
:math:`\frac{d C}{d f}` is provided with a shape similar to f and thus not
necessarily as a row vector of scalars. Furthermore, for each Variable x of
the Op's list of input variables ``inputs``, the returned gradient representing
:math:`\frac{d C}{d x}` must have a shape similar to that of Variable x.
If the output list of the op is :math:`[f_1, ... f_n]`, then the list
``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]`.
If ``inputs`` consists of the list :math:`[x_1, ..., x_m]`, then Op.grad
should return the list :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`,
where :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and :math:`i` can stand for multiple dimensions).
In other words, :func:`grad` does not return
In other words, :func:`grad` does not return
:math:`\frac{
\partial f_i}{\partial x_j}`, but
:math:`\frac{
d f_i}{d x_j}`, but instead the appropriate dot product specified by the chain rule:
:math:`\frac{
\partial C}{\partial
x_j} =
:math:`\frac{
d C}{d
x_j} =
\frac{
\partial C}{\partial f_i} \cdot \frac{\partial f_i}{\partial
x_j}`.
\frac{
d C}{d f_i} \cdot \frac{d f_i}{d
x_j}`.
Both the partial d
erivation and that multiplication have to be done
by
Both the partial d
ifferentiation and the multiplication have to be performed
by
:func:`grad`.
:func:`grad`.
.. function:: infer_shape(node, shapes)
.. function:: infer_shape(node, shapes)
...
...
doc/library/tensor/basic.txt
浏览文件 @
dec92d76
...
@@ -650,16 +650,17 @@ Reductions
...
@@ -650,16 +650,17 @@ Reductions
.. function:: max(x, axis=None, keepdims=False)
.. function:: max(x, axis=None, keepdims=False)
:Parameter: *x* - symbolic Tensor (or compatible)
:Parameter: *x* -
symbolic Tensor (or compatible)
:Parameter: *axis* - axis
along which to compute the maxim
um
:Parameter: *axis* - axis
or axes along which to compute the s
um
:Parameter: *keepdims* - (boolean) If this is set to True, the ax
is which is reduced is
:Parameter: *keepdims* - (boolean) If this is set to True, the ax
es which are reduced are
left in the result as
a dimension
with size one. With this option, the result
left in the result as
dimensions
with size one. With this option, the result
will broadcast correctly against the original tensor.
will broadcast correctly against the original tensor.
:Returns: the maximum value along a given axis
:Returns: maximum of *x* along *axis*
:note: see maximum for elemwise max
if axis=None, Theano 0.5rc1 or later: max over the flattened tensor (like numpy)
axis can be:
older: then axis is assumed to be ndim(x)-1
* *None* - in which case the maximum is computed along all axes (like numpy)
* an *int* - computed along this axis
* a *list of ints* - computed along these axes
.. function:: argmax(x, axis=None, keepdims=False)
.. function:: argmax(x, axis=None, keepdims=False)
...
@@ -687,16 +688,17 @@ Reductions
...
@@ -687,16 +688,17 @@ Reductions
.. function:: min(x, axis=None, keepdims=False)
.. function:: min(x, axis=None, keepdims=False)
:Parameter: *x* - symbolic Tensor (or compatible)
:Parameter: *x* -
symbolic Tensor (or compatible)
:Parameter: *axis* - axis
along which to compute the minim
um
:Parameter: *axis* - axis
or axes along which to compute the s
um
:Parameter: *keepdims* - (boolean) If this is set to True, the ax
is which is reduced is
:Parameter: *keepdims* - (boolean) If this is set to True, the ax
es which are reduced are
left in the result as
a dimension
with size one. With this option, the result
left in the result as
dimensions
with size one. With this option, the result
will broadcast correctly against the original tensor.
will broadcast correctly against the original tensor.
:Returns: the minimum value along a given axis
:Returns: minimum of *x* along *axis*
:note: see miminum for elemwise min
if axis=None, Theano 0.5rc1 or later: min over the flattened tensor (like numpy)
axis can be:
older: then axis is assumed to be ndim(x)-1
* *None* - in which case the minimum is computed along all axes (like numpy)
* an *int* - computed along this axis
* a *list of ints* - computed along these axes
.. function:: argmin(x, axis=None, keepdims=False)
.. function:: argmin(x, axis=None, keepdims=False)
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论