Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
229b876f
提交
229b876f
authored
8月 29, 2014
作者:
Arnaud Bergeron
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Rework the Op contract documentation and add a bit about __props__.
上级
cfc493d1
隐藏空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
324 行增加
和
250 行删除
+324
-250
op.txt
doc/extending/op.txt
+324
-250
没有找到文件。
doc/extending/op.txt
浏览文件 @
229b876f
...
@@ -6,28 +6,26 @@ Making arithmetic Ops on double
...
@@ -6,28 +6,26 @@ Making arithmetic Ops on double
Now that we have a ``double`` type, we have yet to use it to perform
Now that we have a ``double`` type, we have yet to use it to perform
computations. We'll start by defining multiplication.
computations. We'll start by defining multiplication.
.. _op_contract:
.. _op_contract:
Op's contract
Op's contract
=============
=============
An Op (:class:`gof.Op`) is any object which defines the
An Op is any object which inherits from :class:`gof.Op`. It has to
following methods:
define the following methods.
.. function:: make_node(*inputs)
.. function:: make_node(*inputs)
This method is responsible for creating output Variables of a
This method is responsible for creating output Variables of a
suitable symbolic Type to serve as the outputs of this Op's
application.
suitable symbolic Type to serve as the outputs of this Op's
The Variables found in ``*inputs`` must be operated on using Theano's
application. The Variables found in ``*inputs`` must be operated on
symbolic language to compute the symbolic output Variables. This method
using Theano's symbolic language to compute the symbolic output
should put these outputs into an Apply instance, and return the
Variables. This method should put these outputs into an Apply
Apply instance.
instance, and return the
Apply instance.
This method creates an Apply node representing the application of
This method creates an Apply node representing the application of
the Op on the inputs provided. If the Op cannot be applied to
the Op on the inputs provided. If the Op cannot be applied to
these
these
inputs, it must raise an appropriate exception.
inputs, it must raise an appropriate exception.
The inputs of the Apply instance returned by this call must be
The inputs of the Apply instance returned by this call must be
ordered correctly: a subsequent ``self.make_node(*apply.inputs)``
ordered correctly: a subsequent ``self.make_node(*apply.inputs)``
...
@@ -35,9 +33,11 @@ following methods:
...
@@ -35,9 +33,11 @@ following methods:
.. function:: perform(node, inputs, output_storage)
.. function:: perform(node, inputs, output_storage)
This method computes the function associated to this Op. ``node`` is an Apply node created by the Op's ``make_node``
This method computes the function associated to this Op. ``node`` is
method. ``inputs`` is a list of references to data to operate on using non-symbolic statements,
an Apply node created by the Op's ``make_node`` method. ``inputs``
(i.e., statements in Python, Numpy and C languages). ``output_storage`` is a list of storage cells where the
is a list of references to data to operate on using non-symbolic
statements, (i.e., statements in Python, Numpy and C
languages). ``output_storage`` is a list of storage cells where the
variables of the computation must be put.
variables of the computation must be put.
More specifically:
More specifically:
...
@@ -52,20 +52,20 @@ following methods:
...
@@ -52,20 +52,20 @@ following methods:
- ``output_storage``: This is a list of storage cells where the output is to be stored.
- ``output_storage``: This is a list of storage cells where the output is to be stored.
A storage cell is a one-element list. It is forbidden to change
A storage cell is a one-element list. It is forbidden to change
the length of the list(s) contained in ``output_storage``.
There is
the length of the list(s) contained in ``output_storage``.
one storage cell for each output of the Op.
There is
one storage cell for each output of the Op.
The data put in ``output_storage`` must match the type of the
The data put in ``output_storage`` must match the type of the
symbolic output. This is a situation where the ``node`` argument
symbolic output. This is a situation where the ``node`` argument
can come in handy.
can come in handy.
A function Mode may allow ``output_storage`` elements to persist
between
A function Mode may allow ``output_storage`` elements to persist
evaluations, or it may reset ``output_storage`` cells to hold a value of
between evaluations, or it may reset ``output_storage`` cells to
``None``. It can also pre-allocate some memory for the Op to use.
hold a value of ``None``. It can also pre-allocate some memory
This feature can allow ``perform`` to reuse memory between
for the Op to use. This feature can allow ``perform`` to reuse
calls, for example. If there is something preallocated in the
memory between calls, for example. If there is something
``output_storage``, it will be of the good dtype, but can have
preallocated in the ``output_storage``, it will be of the good
the wrong shape and have any stride pattern.
dtype, but can have
the wrong shape and have any stride pattern.
This method must be determined by the inputs. That is to say, if
This method must be determined by the inputs. That is to say, if
it is evaluated once on inputs A and returned B, then if ever
it is evaluated once on inputs A and returned B, then if ever
...
@@ -77,6 +77,10 @@ following methods:
...
@@ -77,6 +77,10 @@ following methods:
operations <views_and_inplace>` before writing a ``perform``
operations <views_and_inplace>` before writing a ``perform``
implementation that does either of these things.
implementation that does either of these things.
Instead (or in addition to) ``perform()`` You can also provide a
:ref:`C implementation <cop>` of For more details, refer to the
documentation for :ref:`op`.
.. function:: __eq__(other)
.. function:: __eq__(other)
``other`` is also an Op.
``other`` is also an Op.
...
@@ -89,6 +93,10 @@ following methods:
...
@@ -89,6 +93,10 @@ following methods:
inputs (same view_map). For more details, see
inputs (same view_map). For more details, see
:ref:`views_and_inplace`.
:ref:`views_and_inplace`.
.. note::
If you set `__props__`, this will be automatically generated.
.. function:: __hash__()
.. function:: __hash__()
If two Op instances compare equal, then they **must** return the
If two Op instances compare equal, then they **must** return the
...
@@ -98,179 +106,281 @@ following methods:
...
@@ -98,179 +106,281 @@ following methods:
lifetime of self. Op instances should be immutable in this
lifetime of self. Op instances should be immutable in this
sense.
sense.
.. function:: connection_pattern( node )
:
.. note:
:
Optional method; sometimes needed for gradient.grad to
If you set `__props__`, this will be automatically generated.
work correctly.
Returns a list of list of bools.
.. op_optional:
Op.connection_pattern[input_idx][output_idx] is true if the
Optional methods or attributes
elements of inputs[input_idx] have an effect on the elements of
==============================
outputs[output_idx].
The ``node`` parameter is needed to determine the number of
.. attribute:: __props__
inputs. Some ops such as Subtensor take a variable number of
inputs.
If no connection_pattern is specified, gradient.grad will
*Default:* Undefined
assume that all inputs have some elements connected to some
elements of all outputs.
This method conveys two pieces of information that are otherwise
Must be a tuple. Lists the name of the attributes which influence
not part of the theano graph:
the computation performed. This will also enable the automatic
generation of appropriate __eq__, __hash__ and __str__ methods.
Should be set to `()` if you have no attributes that are relevant to
the computation to generate the methods.
1) Which of the op's inputs are truly ancestors of each of the
.. attribute:: default_output
op's outputs. Suppose an op has two inputs, x and y, and
outputs f(x) and g(y). y is not really an ancestor of f, but
it appears to be so in the theano graph.
2) Whether the actual elements of each input/output are relevant
to a computation.
For example, the shape op does not read its input's elements,
only its shape metadata. d shape(x) / dx should thus raise
a disconnected input exception (if these exceptions are
enabled).
As another example, the elements of the Alloc op's outputs
are not affected by the shape arguments to the Alloc op.
Failing to implement this function for an op that needs it can
*Default:* None
result in two types of incorrect behavior:
If this member variable is an integer, then the default
1) gradient.grad erroneously raising a TypeError reporting that
implementation of ``__call__`` will return
a gradient is undefined.
``node.outputs[self.default_output]``, where ``node`` was returned
2) gradient.grad failing to raise a ValueError reporting that
by ``make_node``. Otherwise, the entire list of outputs will be
an input is disconnected.
returned.
.. function:: make_thunk(node, storage_map, compute_map, no_recycling)
This function must return a thunk, that is a zero-arguments
function that encapsulates the computation to be performed by this
op on the arguments of the node.
:param node: Apply instance
The node for which a thunk is requested.
:param storage_map: dict of lists
This maps variables to a one-element lists holding the variable's
current value. The one-element list acts as pointer to the value
and allows sharing that "pointer" with other nodes and instances.
:param compute_map: dict of lists
This maps variables to one-element lists holding booleans. If
the value is 0 then the variable has not been computed and the
value should not be considered valid. If the value is 1 the
variable has been computed and the value is valid. If the value
is 2 the variable has been garbage-collected and is no longer
valid, but shouldn't be required anymore for this call.
:param no_recycling: WRITEME
WRITEME
The returned function must ensure that is sets the computed
variables as computed in the `compute_map`.
If you make your op class inherit from :class:`gof.Op`, then you
can use the much easier :ref:`perform_meth` method below.
.. function:: __call__(*inputs, **kwargs)
By default this is a convinience function which calls
:meth:`make_node` with the supplied arguments and returns the
result indexed by `default_output`. This can be overridden by
subclasses to do anything else, but must return an Apply node
representing the computation to be performed.
In cases where the returned graph may differ based on the arguments
or their types, it is recommended to create a helper function
rather than overriding `__call__` on an Op.
.. function:: infer_shape(node, shapes)
This function is needed for shape optimization. ``shapes`` is a
list with one tuple for each input of the Apply node (which corresponds
to the inputs of the op). Each tuple contains as many elements as the
number of dimensions of the corresponding input. The value of each element
is the shape (number of items) along the corresponding dimension of that
specific input.
While this might sound complicated, it is nothing more than the shape
of each input as symbolic variables (one per dimension).
The function should return a list with one tuple for each output.
Each tuple should contain the corresponding output's computed shape.
Implementing this method will allow Theano to compute the output's
shape without computing the output itself, potentially sparing you
a costly recomputation.
.. function:: flops(inputs, outputs)
It is only used to have more information printed by the memory
profiler. It makes it print the mega flops and giga flops per
second for each apply node. It takes as inputs two lists: one for the
inputs and one for the outputs. They contain tuples that are the
shapes of the corresponding inputs/outputs.
.. function:: __str__()
This allows you to specify a more informative string representation of your
Op. If an Op has parameters, it is highly recommended to have the
``__str__`` method include the name of the op and the Op's parameters'
values.
.. note::
If you set `__props__`, this will be automatically generated.
You can still overide it for custom output.
.. function:: do_constant_folding(node)
*Default:* Return True
By default when optimizations are enabled, we remove during
function compilation Apply nodes whose inputs are all constants.
We replace the Apply node with a Theano constant variable.
This way, the Apply node is not executed at each function
call. If you want to force the execution of an op during the
function call, make do_constant_folding return False.
Even if connection_pattern is not implemented correctly,
As done in the Alloc op, you can return False only in some cases by
if gradient.grad returns an expression, that expression will
analyzing the graph from the node parameter.
be numerically correct.
If you want you op to work with gradient.grad() you also need to
implement the functions described below.
Gradient
========
These are the function required to work with gradient.grad().
.. function:: grad(inputs, output_gradients)
.. function:: grad(inputs, output_gradients)
Optional (but needed to have it work with gradient.grad()).
If the Op being defined is differentiable, its gradient may be
specified symbolically in this method. Both ``inputs`` and
If the Op being defined is differentiable, its gradient may be specified
``output_gradients`` are lists of symbolic Theano Variables and
symbolically in this method. Both ``inputs`` and ``output_gradients``
those must be operated on using Theano's symbolic language. The grad
are lists of symbolic Theano Variables and those must be operated on using
method must return a list containing one Variable for each
Theano's symbolic language. The grad method must return a list containing
input. Each returned Variable represents the gradient with respect
one Variable for each input. Each returned Variable represents
to that input computed based on the symbolic gradients with respect
t
he gradient with respect to that input computed based on the symbolic gradients with
t
o each output.
respect to each output.
If the output is not differentiable with respect to an input then
If the output is not differentiable with respect to an input
this method should be defined to return a variable of type NullType
then this method should be defined to return a variable of type
for that input. Likewise, if you have not implemented the grad
NullType for that input. Likewise, if you have not implemented th
e
computation for some input, you may return a variable of typ
e
grad computation for some input, you may return a variable of typ
e
NullType for that input. theano.gradient contains convenienc
e
NullType for that input. theano.gradient contains convenience methods
methods that can construct the variable for you:
that can construct the variable for you:
:func:`theano.gradient.grad_undefined` and
:func:`theano.gradient.grad_undefined` and
:func:`theano.gradient.grad_not_implemented`, respectively.
:func:`theano.gradient.grad_not_implemented`, respectively.
If an element of output_gradient is of type theano.gradient.DisconnectedType,
If an element of output_gradient is of type
it means that the cost is not a function of this output. If any of the
theano.gradient.DisconnectedType, it means that the cost is not a
op's inputs participate in the computation of only disconnected outputs,
function of this output. If any of the op's inputs participate in
then Op.grad should return DisconnectedType variables for those inputs.
the computation of only disconnected outputs, then Op.grad should
return DisconnectedType variables for those inputs.
If the grad method is not defined, then Theano assumes it has been
If the grad method is not defined, then Theano assumes it has been
forgotten. Symbolic differentiation will fail on a graph that
forgotten. Symbolic differentiation will fail on a graph that
includes this Op.
includes this Op.
It must be understood that the Op's grad method is not meant to return the
It must be understood that the Op's grad method is not meant to
gradient of the Op's output. theano.tensor.grad computes gradients; Op.grad
return the gradient of the Op's output. theano.tensor.grad computes
is a helper function that computes terms that appear in gradients.
gradients; Op.grad is a helper function that computes terms that
appear in gradients.
If an Op has a single vector-valued output y and a single vector-valued input x,
If an Op has a single vector-valued output y and a single
then the grad method will be passed x and a second vector z. Define J to be
vector-valued input x, then the grad method will be passed x and a
the Jacobian of y with respect to x. The Op's grad method should return
second vector z. Define J to be the Jacobian of y with respect to
dot(J.T,z). When theano.tensor.grad calls the grad method, it will set z to
x. The Op's grad method should return dot(J.T,z). When
be the gradient of the cost C with respect to y. If this op is the only op
theano.tensor.grad calls the grad method, it will set z to be the
that acts on x, then dot(J.T,z) is the gradient of C with respect to x.
gradient of the cost C with respect to y. If this op is the only op
If there are other ops that act on x, theano.tensor.grad will have to add up
that acts on x, then dot(J.T,z) is the gradient of C with respect to
the terms of x's gradient contributed by the other op's grad method.
x. If there are other ops that act on x, theano.tensor.grad will
have to add up the terms of x's gradient contributed by the other
In practice, an op's input and output are rarely implemented as single vectors.
op's grad method.
Even if an op's output consists of a list containing a scalar, a sparse matrix,
and a 4D tensor, you can think of these objects as being formed by rearranging
In practice, an op's input and output are rarely implemented as
a vector. Likewise for the input. In this view, the values computed by the grad
single vectors. Even if an op's output consists of a list
method still represent a Jacobian-vector product.
containing a scalar, a sparse matrix, and a 4D tensor, you can think
of these objects as being formed by rearranging a vector. Likewise
In practice, it is probably not a good idea to explicitly construct the Jacobian,
for the input. In this view, the values computed by the grad method
which might be very large and very sparse. However, the returned value should
still represent a Jacobian-vector product.
be equal to the Jacobian-vector product.
In practice, it is probably not a good idea to explicitly construct
So long as you implement this product correctly, you need not understand what
the Jacobian, which might be very large and very sparse. However,
theano.tensor.grad is doing, but for the curious the mathematical justification
the returned value should be equal to the Jacobian-vector product.
is as follows:
So long as you implement this product correctly, you need not
In essence, the grad method must simply implement through symbolic Variables
understand what theano.tensor.grad is doing, but for the curious the
and operations the chain rule of differential calculus. The chain rule
mathematical justification is as follows:
is the mathematical procedure that allows one to calculate the total derivative
:math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
In essence, the grad method must simply implement through symbolic
primitive symbolic Variable x found in the list ``inputs``.
Variables and operations the chain rule of differential
The grad method does this using ``output_gradients`` which provides the total
calculus. The chain rule is the mathematical procedure that allows
derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic Variable
one to calculate the total derivative :math:`\frac{d C}{d x}` of the
that is returned by the Op (this is provided
final scalar symbolic Variable C with respect to a primitive
in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the
symbolic Variable x found in the list ``inputs``. The grad method
latter with respect to the primitive Variable (this has to be computed).
does this using ``output_gradients`` which provides the total
derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic
In mathematics, the total derivative of a scalar variable (C) with respect to a vector of
Variable that is returned by the Op (this is provided in
scalar variables (x), i.e. the gradient, is customarily represented as the
``output_gradients``), as well as the knowledge of the total
row vector of the partial derivatives, whereas the total derivative of a vector of
derivative :math:`\frac{d f}{d x}` of the latter with respect to the
scalar variables (f) with respect to another (x), is customarily represented by the matrix of
primitive Variable (this has to be computed).
the partial derivatives, i.e.the jacobian matrix. In this convenient setting,
the chain rule instructs that the gradient of the final scalar variable C with respect
In mathematics, the total derivative of a scalar variable (C) with
to the primitive scalar variables in x through those in f is simply given by the matrix product:
respect to a vector of scalar variables (x), i.e. the gradient, is
:math:`\frac{d C}{d x} = \frac{d C}{d f} * \frac{d f}{d x}`.
customarily represented as the row vector of the partial
derivatives, whereas the total derivative of a vector of scalar
Here, the chain rule must be implemented in a similar but slightly more complex
variables (f) with respect to another (x), is customarily
setting: Theano provides in the list ``output_gradients`` one gradient for each
represented by the matrix of the partial derivatives, i.e.the
of the Variables returned by the Op. Where f is one such particular Variable,
jacobian matrix. In this convenient setting, the chain rule
the corresponding gradient found in ``output_gradients`` and representing
instructs that the gradient of the final scalar variable C with
:math:`\frac{d C}{d f}` is provided with a shape similar to f and thus not
respect to the primitive scalar variables in x through those in f is
necessarily as a row vector of scalars. Furthermore, for each Variable x of
simply given by the matrix product: :math:`\frac{d C}{d x} = \frac{d
the Op's list of input variables ``inputs``, the returned gradient representing
C}{d f} * \frac{d f}{d x}`.
:math:`\frac{d C}{d x}` must have a shape similar to that of Variable x.
Here, the chain rule must be implemented in a similar but slightly
If the output list of the op is :math:`[f_1, ... f_n]`, then the list
more complex setting: Theano provides in the list
``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]`.
``output_gradients`` one gradient for each of the Variables returned
If ``inputs`` consists of the list :math:`[x_1, ..., x_m]`, then Op.grad
by the Op. Where f is one such particular Variable, the
should return the list :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`,
corresponding gradient found in ``output_gradients`` and
where :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and :math:`i` can stand for multiple dimensions).
representing :math:`\frac{d C}{d f}` is provided with a shape
similar to f and thus not necessarily as a row vector of scalars.
Furthermore, for each Variable x of the Op's list of input variables
``inputs``, the returned gradient representing :math:`\frac{d C}{d
x}` must have a shape similar to that of Variable x.
If the output list of the op is :math:`[f_1, ... f_n]`, then the
list ``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C),
... , grad_{f_n}(C)]`. If ``inputs`` consists of the list
:math:`[x_1, ..., x_m]`, then Op.grad should return the list
:math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`, where
:math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and
:math:`i` can stand for multiple dimensions).
In other words, :func:`grad` does not return
In other words, :func:`grad` does not return :math:`\frac{d f_i}{d
:math:`\frac{d f_i}{d x_j}`, but instead the appropriate dot product specified by the chain rule:
x_j}`, but instead the appropriate dot product specified by the
:math:`\frac{d C}{d x_j} =
chain rule: :math:`\frac{d C}{d x_j} = \frac{d C}{d f_i} \cdot
\frac{d C}{d f_i} \cdot \frac{d f_i}{d x_j}`.
\frac{d f_i}{d x_j}`. Both the partial differentiation and the
Both the partial differentiation and the multiplication have to be performed by
multiplication have to be performed by :func:`grad`.
:func:`grad`.
Theano currently imposes the following constraints on the values
returned by the grad method:
Theano currently imposes the following constraints on the values returned by the grad method:
1) They must be Variable instances.
1) They must be Variable instances.
2) When they are types that have dtypes, they must never have an integer dtype.
2) When they are types that have dtypes, they must never have an integer dtype.
The output gradients passed *to* Op.grad will also obey these constraints.
The output gradients passed *to* Op.grad will also obey these constraints.
Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
Integers are a tricky subject. Integers are the main reason for
NullType or zero gradient. When you have an integer as an argument to your grad method,
having DisconnectedType, NullType or zero gradient. When you have an
recall the definition of a derivative to help you decide what value to return:
integer as an argument to your grad method, recall the definition of
a derivative to help you decide what value to return:
:math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
:math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
Suppose your function f has an integer-valued output. For most functions you're likely
Suppose your function f has an integer-valued output. For most
to implement in theano, this means your gradient should be zero, because f(x+epsilon)
functions you're likely to implement in theano, this means your
= f(x) for almost all x. (The only other option is that the gradient could be undefined,
gradient should be zero, because f(x+epsilon) = f(x) for almost all
if your function is discontinuous everywhere, like the rational indicator function)
x. (The only other option is that the gradient could be undefined,
if your function is discontinuous everywhere, like the rational
Suppose your function f has an integer-valued input. This is a little trickier, because
indicator function)
you need to think about what you mean mathematically when you make a variable integer-valued
in theano. Most of the time in machine learning we mean "f is a function of a real-valued
Suppose your function f has an integer-valued input. This is a
x, but we are only going to pass in integer-values of x". In this case, f(x+epsilon) exists,
little trickier, because you need to think about what you mean
so the gradient through f should be the same whether x is an integer or a floating point
mathematically when you make a variable integer-valued in
variable. Sometimes what we mean is "f is a function of an integer-valued x, and f is only
theano. Most of the time in machine learning we mean "f is a
defined where x is an integer." Since f(x+epsilon) doesn't exist, the gradient is undefined.
function of a real-valued x, but we are only going to pass in
Finally, many times in theano, integer valued inputs don't actually affect the elements of
integer-values of x". In this case, f(x+epsilon) exists, so the
the output, only its shape.
gradient through f should be the same whether x is an integer or a
floating point variable. Sometimes what we mean is "f is a function
of an integer-valued x, and f is only defined where x is an
integer." Since f(x+epsilon) doesn't exist, the gradient is
undefined. Finally, many times in theano, integer valued inputs
don't actually affect the elements of the output, only its shape.
If your function f has both an integer-valued input and an
If your function f has both an integer-valued input and an
integer-valued output, then both rules have to be combined:
integer-valued output, then both rules have to be combined:
...
@@ -290,63 +400,75 @@ following methods:
...
@@ -290,63 +400,75 @@ following methods:
Its gradient is zero almost everywhere, so Op.grad should return
Its gradient is zero almost everywhere, so Op.grad should return
zeros in the shape of x and y.
zeros in the shape of x and y.
2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
In this case the output is floating point. It doesn't matter that y is an integer.
In this case the output is floating point. It doesn't matter
We consider f to still be defined at f(x,y+epsilon). The gradient is exactly the
that y is an integer. We consider f to still be defined at
same as if y were floating point.
f(x,y+epsilon). The gradient is exactly the same as if y were
floating point.
3) f(x,y) = argmax of x along axis y.
3) f(x,y) = argmax of x along axis y.
The gradient with respect to y is undefined, because f(x,y) is
not defined for
The gradient with respect to y is undefined, because f(x,y) is
floating point y. How could you take an argmax along a fraActional axis?
not defined for floating point y. How could you take an argmax
The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
along a fraActional axis? The gradient with respect to x is
everywhere.
0, because f(x+epsilon, y) = f(x) almost
everywhere.
4) f(x,y) = a vector with y elements, each of which taking on the value x
4) f(x,y) = a vector with y elements, each of which taking on the value x
The grad method should return DisconnectedType()() for y, because the elements of
The grad method should return DisconnectedType()() for y,
f don't depend on y. Only the shape of f depends on y. You probably also want to
because the elements of f don't depend on y. Only the shape of
implement a connection_pattern method to encode this.
f depends on y. You probably also want to implement a
connection_pattern method to encode this.
5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
gradient with respect to y will be 0.5, even if y is an
gradient with respect to y will be 0.5, even if y is an
integer. However, the gradient with respect to x will be 0,
integer. However, the gradient with respect to x will be 0,
because the output of f is integer-valued.
because the output of f is integer-valued.
.. function:: connection_pattern(node):
.. function:: infer_shape(node, shapes)
Sometimes needed for proper operation of gradient.grad().
Optional.
This function is needed for shape optimization. ``shapes`` is a
Returns a list of list of bools.
list with one tuple for each input of the Apply node (which corresponds
to the inputs of the op). Each tuple contains as many elements as the
number of dimensions of the corresponding input. The value of each element
is the shape (number of items) along the corresponding dimension of that
specific input.
While this might sound complicated, it is nothing more than the shape
Op.connection_pattern[input_idx][output_idx] is true if the
of each input as symbolic variables (one per dimension).
elements of inputs[input_idx] have an effect on the elements of
outputs[output_idx].
The function should return a list with one tuple for each output.
The ``node`` parameter is needed to determine the number of
Each tuple should contain the corresponding output's computed shape.
inputs. Some ops such as Subtensor take a variable number of
inputs.
Implementing this method will allow Theano to compute the output's
If no connection_pattern is specified, gradient.grad will
shape without computing the output itself, potentially sparing you
assume that all inputs have some elements connected to some
a costly recomputation
.
elements of all outputs
.
.. function:: flops(inputs, outputs)
This method conveys two pieces of information that are otherwise
not part of the theano graph:
Optional.
1) Which of the op's inputs are truly ancestors of each of the
op's outputs. Suppose an op has two inputs, x and y, and
outputs f(x) and g(y). y is not really an ancestor of f, but
it appears to be so in the theano graph.
2) Whether the actual elements of each input/output are relevant
to a computation.
For example, the shape op does not read its input's elements,
only its shape metadata. d shape(x) / dx should thus raise
a disconnected input exception (if these exceptions are
enabled).
As another example, the elements of the Alloc op's outputs
are not affected by the shape arguments to the Alloc op.
It is only used to have more information printed by the memory
Failing to implement this function for an op that needs it can
profiler. It makes it print the mega flops and giga flops per
result in two types of incorrect behavior:
second for each apply node. It takes as inputs two lists: one for the
inputs and one for the outputs. They contain tuples that are the
shapes of the corresponding inputs/outputs.
.. function:: make_thunk(node, storage_map, compute_map, no_recycling)
1) gradient.grad erroneously raising a TypeError reporting that
a gradient is undefined.
2) gradient.grad failing to raise a ValueError reporting that
an input is disconnected.
TODO
Even if connection_pattern is not implemented correctly, if
gradient.grad returns an expression, that expression will be
numerically correct.
.. function:: R_op(inputs, eval_points)
.. function:: R_op(inputs, eval_points)
Optional.
Optional
, to work with gradient.R_op()
.
This function implements the application of the R-operator on the
This function implements the application of the R-operator on the
function represented by your op. Let assume that function is :math:`f`,
function represented by your op. Let assume that function is :math:`f`,
...
@@ -373,54 +495,6 @@ following methods:
...
@@ -373,54 +495,6 @@ following methods:
the outputs) back to their corresponding shapes and return them as the
the outputs) back to their corresponding shapes and return them as the
output of the :func:`R_op` method.
output of the :func:`R_op` method.
.. attribute:: default_output
*Default:* None
If this member variable is an integer, then the default
implementation of ``__call__`` will return
``node.outputs[self.default_output]``, where ``node`` was returned
by ``make_node``. Otherwise, the entire list of outputs will be
returned.
.. function:: __call__(*inputs)
Syntactic shortcut to make_node which returns the output
Variables of the Op.
*Default:* this is implemented in the parent class and you do not need to change it.
.. function:: __str__()
*Default:* python default: module_path_to_your_class.CLASSNAME
This allows you to specify a more informative string representation of your
Op. If an Op has parameters, it is highly recommended to have the
``__str__`` method include the name of the op and the Op's parameters'
values.
.. function:: do_constant_folding(node)
*Default:* Return True
By default when optimizations are enabled, we remove during
function compilation Apply nodes whose inputs are all constants.
We replace the Apply node with a Theano constant variable.
This way, the Apply node is not executed at each function
call. If you want to force the execution of an op during the
function call, make do_constant_folding return False.
As done in the Alloc op, you can return False only in some cases by
analyzing the graph from the node parameter.
At a bare minimum, a new Op must define ``make_node`` and ``perform``, which
have no defaults.
You can also provide a :ref:`C implementation <cop>` of
``perform()``. For more details, refer to the documentation for
:ref:`op`.
Defining an Op: ``mul``
Defining an Op: ``mul``
=======================
=======================
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论