merged

5acb548f · James Bergstra · 92cb4b40 · 3d889b6e · 5acb548f · 5acb548f
--- a/doc/tutorial/examples.txt
+++ b/doc/tutorial/examples.txt
@@ -293,7 +293,8 @@ the substitutions have to work in any order.
 Using Random Numbers
 ====================

-Because everything has to be expressed symbolically firstly in Theano,
+Because in Theano you first express everything symbolically and
+afterwards compile this expression to get functions, 
 using pseudo-random numbers is not as straightforward as it is in 
 numpy, though also not to complicated. 


--- a/doc/tutorial/symbolic_graphs.txt
+++ b/doc/tutorial/symbolic_graphs.txt
@@ -5,57 +5,93 @@
 Graph Structures
 ================

-In order to be able to take advantage of Theano, you need to understand
-how Theano works. Theano represents mathematical computations as graphs
-( for a detailed rendering see :ref:`graphstructures` - parts of this
-are directly taken from there). Graphs are composed of itnerconnected 
-:ref:`apply` and :ref:`variable` nodes. They are associated to *function
-application* and *data*, respectively. An operation is represented by
-an :ref:`op` and data types are represented by :ref:`type` instances. 
-Here is a piece of code and a diagram showing the structure built by
-that piece of code. This should help you understand how these pieces fit
-together:
-
-----------------------
+Debugging or profiling code written in Theano is not that simple if you
+do not know what goes on under the hood. This chapter is meant to
+introduce you to a required minimum of the inner workings of Theano, 
+for more details see :ref:`extending`.
+
+The first step in writing Theano code is to write down all mathematical 
+relations using symbolic placeholders (**variables**). When writing down 
+this expressions you use operations like ``+``, ``-``, ``**``,
+``sum()``, ``tanh()``. All these are represented internally as **ops**. 
+An **op** represents a certain computation on some type of inputs
+producing some type of output. You can see it as a function definition
+in most programming languages. 
+
+Theano builds internally a graph structure composed of interconnected 
+**variable** nodes, **op** nodes and **apply** nodes. An 
+**apply** node represents the application of an **op** to some 
+**variables**. It is important to make the difference between the
+definition of a computation represented by an **op** and its application
+to some actual data which is represented by the **apply** node. For more
+details about this building blocks see :ref:`variable`, :ref:`op`, 
+:ref:`apply`. A graph example is the following:
+

 **Code**

 .. code-block:: python

-    x = dmatrix('x')
-    y = dmatrix('y')
+    x = T.dmatrix('x')
+    y = T.dmatrix('y')
    z = x + y

 **Diagram**

-.. image:: apply.png 
+.. figure:: apply.png 
+    :align: center
+
+    Arrows represent references to the Python objects pointed at. The blue
+    box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green
+    circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
+
+
+The graph can be traversed starting from a root (the result of some
+computation) down to its leaves using the owner field.
+Take for example the following code:
+
+.. code-block:: python
+    x = T.dmatrix('x')
+    y = x*2.
+
+``y`` is such root, though there can be others for example if you also 
+had ``z = x+2``, then ``z`` would be a root as well. If you print 
+``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, which 
+is the apply node that connects the op and the inputs to get this
+output. You can now print the name of the op that is applied to get 
+``y``:
+
+>>> y.owner.op.name
+'Elemwise{mul,no_inplace}'

-----------------------
+So a elementwise multiplication is used to compute ``y``. This
+muliplication is done between the inputs

-Arrows represent references to the Python objects pointed at. The blue
-box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green
-circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
+>>> len(y.owner.inputs)
+2
+>>> y.owner.inputs[0]
+x
+>>> y.owner.inputs[1]
+InplaceDimShuffle{x,x}.0

-When we create :ref:`Variables <variable>` and then :ref:`apply`
-:ref:`Ops <op>` to them to make more Variables, we build a
-bi-partite, directed, acyclic graph. Variables point to the Apply nodes
-representing the function application producing them via their
-``owner`` field. These Apply nodes point in turn to their input and
-output Variables via their ``inputs`` and ``outputs`` fields.
-(Apply instances also contain a list of references to their ``outputs``, but
-those pointers don't count in this graph.)
+Note that the second input is not 2 as we would have expected. This is 
+because 2 was first :ref:`broadcasted <broadcasting>` to a matrix of 
+same shape as x. This is done by using the op ``DimShuffle`` :

-The ``owner`` field of both ``x`` and ``y`` point to ``None`` because
-they are not the result of another computation. If one of them was the
-result of another computation, it's ``owner`` field would point to another
-blue box like ``z`` does, and so on.
+>>> type(y.owner.inputs[1])
+<class 'theano.tensor.basic.TensorVariable'>
+>>> type(y.owner.inputs[1].owner)
+<class 'theano.gof.graph.Apply'>
+>>> y.owner.inputs[1].owner.op
+<class 'theano.tensor.elemwise.DimShuffle object at 0x14675f0'>
+>>> y.owner.inputs[1].owner.inputs
+[2.0]

-Note that the ``Apply`` instance's outputs points to
-``z``, and ``z.owner`` points back to the ``Apply`` instance.

+Starting from this graph structure is easy to understand how 
+*automatic differentiation* is done, or how the symbolic relations
+can be optimized for performance or stability.

-The graph structure is needed for *Optimizations* and *Automatic
-Differentiation*.

 Automatic Differentiation
 =========================
@@ -66,9 +102,10 @@ graph from the outputs back towards the inputs through all :ref:`apply`
 nodes ( :ref:`apply` nodes are those who define what computations the
 graph does). For each such :ref:`apply` node, its  :ref:`op` defines 
 how to compute the gradient of the node's outputs with respect to its
-inputs. Note that if an :ref:`op` does not define how to compute the
-gradient, then any expression containing this :ref:`op` is not
-differentiable. Using the `chain rule <http://en.wikipedia.org/wiki/Chain_rile>`_ 
+inputs. Note that if an :ref:`op` does not provide this information, 
+it is assumed that the gradient does not exist, and all results that 
+depend on this gradient will be 0s. Using the 
+`chain rule <http://en.wikipedia.org/wiki/Chain_rile>`_ 
 these gradients can be composed in order to obtain the expression of the 
 gradient of the graph's output with respect to the graph's inputs .