提交 fdb564af authored 作者: Razvan Pascanu's avatar Razvan Pascanu

new graph description in the tutorial

上级 efa9630b
......@@ -293,7 +293,8 @@ the substitutions have to work in any order.
Using Random Numbers
====================
Because everything has to be expressed symbolically firstly in Theano,
Because in Theano you first express everything symbolically and
afterwards compile this expression to get functions,
using pseudo-random numbers is not as straightforward as it is in
numpy, though also not to complicated.
......
......@@ -5,57 +5,93 @@
Graph Structures
================
In order to be able to take advantage of Theano, you need to understand
how Theano works. Theano represents mathematical computations as graphs
( for a detailed rendering see :ref:`graphstructures` - parts of this
are directly taken from there). Graphs are composed of itnerconnected
:ref:`apply` and :ref:`variable` nodes. They are associated to *function
application* and *data*, respectively. An operation is represented by
an :ref:`op` and data types are represented by :ref:`type` instances.
Here is a piece of code and a diagram showing the structure built by
that piece of code. This should help you understand how these pieces fit
together:
-----------------------
Debugging or profiling code written in Theano is not that simple if you
do not know what goes on under the hood. This chapter is meant to
introduce you to a required minimum of the inner workings of Theano,
for more details see :ref:`extending`.
The first step in writing Theano code is to write down all mathematical
relations using symbolic placeholders (**variables**). When writing down
this expressions you use operations like ``+``, ``-``, ``**``,
``sum()``, ``tanh()``. All these are represented internally as **ops**.
An **op** represents a certain computation on some type of inputs
producing some type of output. You can see it as a function definition
in most programming languages.
Theano builds internally a graph structure composed of interconnected
**variable** nodes, **op** nodes and **apply** nodes. An
**apply** node represents the application of an **op** to some
**variables**. It is important to make the difference between the
definition of a computation represented by an **op** and its application
to some actual data which is represented by the **apply** node. For more
details about this building blocks see :ref:`variable`, :ref:`op`,
:ref:`apply`. A graph example is the following:
**Code**
.. code-block:: python
x = dmatrix('x')
y = dmatrix('y')
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
**Diagram**
.. image:: apply.png
.. figure:: apply.png
:align: center
Arrows represent references to the Python objects pointed at. The blue
box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green
circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
The graph can be traversed starting from a root (the result of some
computation) down to its leaves using the owner field.
Take for example the following code:
.. code-block:: python
x = T.dmatrix('x')
y = x*2.
``y`` is such root, though there can be others for example if you also
had ``z = x+2``, then ``z`` would be a root as well. If you print
``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, which
is the apply node that connects the op and the inputs to get this
output. You can now print the name of the op that is applied to get
``y``:
>>> y.owner.op.name
'Elemwise{mul,no_inplace}'
-----------------------
So a elementwise multiplication is used to compute ``y``. This
muliplication is done between the inputs
Arrows represent references to the Python objects pointed at. The blue
box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green
circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
>>> len(y.owner.inputs)
2
>>> y.owner.inputs[0]
x
>>> y.owner.inputs[1]
InplaceDimShuffle{x,x}.0
When we create :ref:`Variables <variable>` and then :ref:`apply`
:ref:`Ops <op>` to them to make more Variables, we build a
bi-partite, directed, acyclic graph. Variables point to the Apply nodes
representing the function application producing them via their
``owner`` field. These Apply nodes point in turn to their input and
output Variables via their ``inputs`` and ``outputs`` fields.
(Apply instances also contain a list of references to their ``outputs``, but
those pointers don't count in this graph.)
Note that the second input is not 2 as we would have expected. This is
because 2 was first :ref:`broadcasted <broadcasting>` to a matrix of
same shape as x. This is done by using the op ``DimShuffle`` :
The ``owner`` field of both ``x`` and ``y`` point to ``None`` because
they are not the result of another computation. If one of them was the
result of another computation, it's ``owner`` field would point to another
blue box like ``z`` does, and so on.
>>> type(y.owner.inputs[1])
<class 'theano.tensor.basic.TensorVariable'>
>>> type(y.owner.inputs[1].owner)
<class 'theano.gof.graph.Apply'>
>>> y.owner.inputs[1].owner.op
<class 'theano.tensor.elemwise.DimShuffle object at 0x14675f0'>
>>> y.owner.inputs[1].owner.inputs
[2.0]
Note that the ``Apply`` instance's outputs points to
``z``, and ``z.owner`` points back to the ``Apply`` instance.
Starting from this graph structure is easy to understand how
*automatic differentiation* is done, or how the symbolic relations
can be optimized for performance or stability.
The graph structure is needed for *Optimizations* and *Automatic
Differentiation*.
Automatic Differentiation
=========================
......@@ -66,9 +102,10 @@ graph from the outputs back towards the inputs through all :ref:`apply`
nodes ( :ref:`apply` nodes are those who define what computations the
graph does). For each such :ref:`apply` node, its :ref:`op` defines
how to compute the gradient of the node's outputs with respect to its
inputs. Note that if an :ref:`op` does not define how to compute the
gradient, then any expression containing this :ref:`op` is not
differentiable. Using the `chain rule <http://en.wikipedia.org/wiki/Chain_rile>`_
inputs. Note that if an :ref:`op` does not provide this information,
it is assumed that the gradient does not exist, and all results that
depend on this gradient will be 0s. Using the
`chain rule <http://en.wikipedia.org/wiki/Chain_rile>`_
these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs .
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论