Merge pull request #905 from nouiz/add_exerc_docu_rebase

Documentation improvements

Merge pull request #905 from nouiz/add_exerc_docu_rebase
00183e72 · Olivier Delalleau · c0c25559 · e1613241 · 00183e72 · 00183e72
--- a/doc/extending/extending_faq.txt
+++ b/doc/extending/extending_faq.txt
@@ -19,7 +19,7 @@ I wrote a new optimization, but it's not getting used...

 Remember that you have to register optimizations with the :ref:`optdb`
 for them to get used by the normal modes like FAST_COMPILE, FAST_RUN,
-and DEBUG_MODE.
+and DebugMode.


 I wrote a new optimization, and it changed my results even though I'm pretty sure it is correct.

--- a/doc/extending/fibby.txt
+++ b/doc/extending/fibby.txt
@@ -168,7 +168,7 @@ not modify any of the inputs.
 TODO: EXPLAIN DESTROYMAP and VIEWMAP BETTER AND GIVE EXAMPLE.

 When developing an Op, you should run computations in DebugMode, by using
-argument ``mode='DEBUG_MODE'`` to ``theano.function``. DebugMode is
+argument ``mode='DebugMode'`` to ``theano.function``. DebugMode is
 slow, but it can catch many common violations of the Op contract.

 TODO: Like what? How? Talk about Python vs. C too.

--- a/doc/extending/index.txt
+++ b/doc/extending/index.txt
@@ -6,15 +6,15 @@ Extending Theano
 ================


-This documentation is for users who want to extend Theano with new Types, new
+This advanced tutorial is for users who want to extend Theano with new Types, new
 Operations (Ops), and new graph optimizations.

 Along the way, it also introduces many aspects of how Theano works, so it is
 also good for you if you are interested in getting more under the hood with
 Theano itself.

-Before tackling this tutorial, it is highly recommended to read the
-:ref:`tutorial`.
+Before tackling this more advanced presentation, it is highly recommended to read the
+introductory :ref:`Tutorial<tutorial>`.

 The first few pages will walk you through the definition of a new :ref:`type`,
 ``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We

--- a/doc/extending/unittest.txt
+++ b/doc/extending/unittest.txt
@@ -289,7 +289,7 @@ Example:

    f = T.function([a,b],[c],mode='FAST_RUN')
    m = theano.Module()
-    minstance = m.make(mode='DEBUG_MODE')
+    minstance = m.make(mode='DebugMode')

 Whenever possible, unit tests should omit this parameter. Leaving
 out the mode will ensure that unit tests use the default mode.
@@ -306,7 +306,7 @@ type this:

    THEANO_FLAGS='mode=FAST_COMPILE' nosetests
    THEANO_FLAGS='mode=FAST_RUN' nosetests
-    THEANO_FLAGS='mode=DEBUG_MODE' nosetests
+    THEANO_FLAGS='mode=DebugMode' nosetests

 .. _random_value_in_tests:


--- a/doc/glossary.txt
+++ b/doc/glossary.txt
 .. _glossary:

-Glossary of terminology
-=======================
+Glossary
+========

 .. glossary::


--- a/doc/introduction.txt
+++ b/doc/introduction.txt
@@ -190,12 +190,10 @@ Here is the state of that vision as of 24 October 2011 (after Theano release
  * Will provide better support for GPU on Windows and use an OpenCL backend on CPU.

 * Loops work, but not all related optimizations are currently done.
-* The cvm linker allows lazy evaluation. It works, but some work is still
-  needed before enabling it by default.
+* The cvm linker allows lazy evaluation. It is the current default linker.

-  * All tests pass with linker=cvm?
-  * How to have `DEBUG_MODE` check it? Right now, DebugMode checks the computation non-lazily.
-  * The profiler used by cvm is less complete than `PROFILE_MODE`.
+  * How to have `DebugMode` check it? Right now, DebugMode checks the computation non-lazily.
+  * The profiler used by cvm is less complete than `ProfileMode`.

 * SIMD parallelism on the CPU comes from the compiler.
 * Multi-core parallelism is only supported for gemv and gemm, and only

--- a/doc/library/compile/debugmode.txt
+++ b/doc/library/compile/debugmode.txt
@@ -29,7 +29,7 @@ DebugMode can be used as follows:

    x = tensor.dvector('x')

-    f = theano.function([x], 10*x, mode='DEBUG_MODE')
+    f = theano.function([x], 10*x, mode='DebugMode')

    f(5) 
    f(0) 
@@ -42,7 +42,7 @@ It can also be used by passing a DebugMode instance as the mode, as in

 If any problem is detected, DebugMode will raise an exception according to
 what went wrong, either at call time (``f(5)``) or compile time (
-``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions
+``f = theano.function(x, 10*x, mode='DebugMode')``). These exceptions
 should *not* be ignored; talk to your local Theano guru or email the
 users list if you cannot make the exception go away.

@@ -51,7 +51,7 @@ In the example above, there is no way to guarantee that a future call to say,
 ``f(-1)`` won't cause a problem.  DebugMode is not a silver bullet.

 If you instantiate DebugMode using the constructor ``compile.DebugMode``
-rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
+rather than the keyword ``DebugMode`` you can configure its behaviour via
 constructor arguments. 

 Reference
@@ -133,7 +133,7 @@ Reference



-The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``)
+The keyword version of DebugMode (which you get by using ``mode='DebugMode``)
 is quite strict, and can raise several different Exception types.
 There following are DebugMode exceptions you might encounter:

@@ -200,7 +200,7 @@ There following are DebugMode exceptions you might encounter:
    in the same order when run several times in a row.  This can happen if any
    steps are ordered by ``id(object)`` somehow, such as via the default object
    hash function.  A Stochastic optimization invalidates the pattern of work
-    whereby we debug in DEBUG_MODE and then run the full-size jobs in FAST_RUN.
+    whereby we debug in DebugMode and then run the full-size jobs in FAST_RUN.


 .. class:: InvalidValueError(DebugModeError)

--- a/doc/library/compile/mode.txt
+++ b/doc/library/compile/mode.txt
+
+.. _libdoc_compile_mode:
+
 ======================================
 :mod:`mode` -- controlling compilation
 ======================================
@@ -17,9 +20,10 @@ Theano defines the following modes by name:

 - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
 - ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python 
-    implementations. This mode can take much longer than the other modes, 
-    but can identify many kinds of problems.
+- ``'DebugMode'``: A mode for debuging. See :ref:`DebugMode <debugmode>` for details.
+- ``'ProfileMode'``: A mode for profiling. See :ref:`ProfileMode <profilemode>` for details.
+- ``'DEBUG_MODE'``: Deprecated. Use the string DebugMode.
+- ``'PROFILE_MODE'``: Deprecated. Use the string ProfileMode.

 The default mode is typically ``FAST_RUN``, but it can be controlled via the
 configuration variable :attr:`config.mode`, which can be

--- a/doc/library/config.txt
+++ b/doc/library/config.txt
@@ -13,7 +13,7 @@
 Guide
 =====

-The config module contains many attributes that modify Theano's behavior.  Many of these
+The config module contains many ``attributes`` that modify Theano's behavior.  Many of these
 attributes are consulted during the import of the ``theano`` module and many are assumed to be
 read-only.


--- a/doc/library/gof/index.txt
+++ b/doc/library/gof/index.txt
@@ -13,7 +13,7 @@
 .. toctree::
    :maxdepth: 1

-    fgraph
+    fg
    toolbox
    type


--- a/doc/library/printing.txt
+++ b/doc/library/printing.txt
@@ -12,18 +12,18 @@
 Guide
 ======

-Symbolic printing: the Print() Op
----------------------------------
+Printing during execution
+-------------------------

 Intermediate values in a computation cannot be printed in
 the normal python way with the print statement, because Theano has no *statements*.
-Instead there is the `Print` Op.
+Instead there is the :class:`Print` Op.

 >>> x = T.dvector()
->>> hello_world_op = Print('hello world')
+>>> hello_world_op = printing.Print('hello world')
 >>> printed_x = hello_world_op(x)
 >>> f = function([x], printed_x)
->>> f([1,2,3])
+>>> f([1, 2, 3])
 >>> # output: "hello world __str__ = [ 1.  2.  3.]"

 If you print more than one thing in a function like `f`, they will not
@@ -39,15 +39,15 @@ Printing graphs
 ---------------

 Theano provides two functions (:func:`theano.pp` and
-:func:`theano.debugprint`) to print a graph to the terminal before or after
+:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
 compilation.  These two functions print expression graphs in different ways:
 :func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
-Theano also provides :func:`pydotprint` that creates a png image of the function.
+Theano also provides :func:`theano.printing.pydotprint` that creates a png image of the function.

-1) The first is :func:`theano.pp`. 
+1) The first is :func:`theano.pp`.

 >>> x = T.dscalar('x') 
->>> y = x**2
+>>> y = x ** 2
 >>> gy = T.grad(y, x)
 >>> pp(gy)  # print out the gradient prior to optimization
 '((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))'
@@ -71,56 +71,63 @@ iteration number or other kinds of information in the name.
    To make graphs legible, :func:`pp` hides some Ops that are actually in the graph.  For example,
    automatic DimShuffles are not shown.

-2) The second function to print a graph is :func:`theano.printing.debugprint(variable_or_function, depth=-1)`
+
+2) The second function to print a graph is :func:`theano.printing.debugprint`
+

 >>> theano.printing.debugprint(f.maker.fgraph.outputs[0])
- Elemwise{mul,no_inplace} 46950805397392
-   2.0 46950805310800
-   x 46950804895504
+Elemwise{mul,no_inplace} [@A] ''
+ |TensorConstant{2.0} [@B]
+ |x [@C]

 Each line printed represents a Variable in the graph.
-The line ``   x 46950804895504`` means the variable named 'x' at memory
-location 46950804895504.  If you accidentally have two variables called 'x' in
-your graph, their different memory locations will be your clue.
+The line ``|x [@C`` means the variable named ``x`` with debugprint identifier
+[@C] is an input of the Elemwise.  If you accidentally have two variables called ``x`` in
+your graph, their different debugprint identifier will be your clue.

-The line ``   2.0 46950805310800`` means that there is a constant 2.0 at the
-given memory location.
+The line ``|TensorConstant{2.0} [@B]`` means that there is a constant 2.0
+wit this debugprint identifier.

-The line `` Elemwise{mul,no_inplace} 46950805397392`` is indented less than
+The line ``Elemwise{mul,no_inplace} [@A] ''`` is indented less than
 the other ones, because it means there is a variable computed by multiplying
-the other (more indented) ones together. 
+the other (more indented) ones together.
+
+The ``|`` symbol are just there to help read big graph. The group
+together inputs to a node.

 Sometimes, you'll see a Variable but not the inputs underneath.  That can
 happen when that Variable has already been printed.  Where else has it been
-printed?  Look for the memory address using the Find feature of your text
+printed?  Look for debugprint identifier using the Find feature of your text
 editor.

 >>> theano.printing.debugprint(gy)
- Elemwise{mul} 46950804894224
-   Elemwise{mul} 46950804735120
-     Elemwise{second,no_inplace} 46950804626128
-       Elemwise{pow,no_inplace} 46950804625040
-         x 46950658736720
-         2 46950804039760
-       1.0 46950804625488
-     2 46950804039760
-   Elemwise{pow} 46950804737616
-     x 46950658736720
-     Elemwise{sub} 46950804736720
-       2 46950804039760
-       InplaceDimShuffle{} 46950804736016
-         1 46950804735760
+Elemwise{mul} [@A] ''
+ |Elemwise{mul} [@B] ''
+ | |Elemwise{second,no_inplace} [@C] ''
+ | | |Elemwise{pow,no_inplace} [@D] ''
+ | | | |x [@E]
+ | | | |TensorConstant{2} [@F]
+ | | |TensorConstant{1.0} [@G]
+ | |TensorConstant{2} [@F]
+ |Elemwise{pow} [@H] ''
+   |x [@E]
+   |Elemwise{sub} [@I] ''
+     |TensorConstant{2} [@F]
+     |InplaceDimShuffle{} [@J] ''
+       |TensorConstant{1} [@K]
+
 >>> theano.printing.debugprint(gy, depth=2)
- Elemwise{mul} 46950804894224
-   Elemwise{mul} 46950804735120
-   Elemwise{pow} 46950804737616
+Elemwise{mul} [@A] ''   
+ |Elemwise{mul} [@B] ''   
+ |Elemwise{pow} [@C] ''   

 If the depth parameter is provided, it limits the nuber of levels that are
 shown.



-3) The function :func:`theano.printing.pydotprint(fct, outfile=SOME_DEFAULT_VALUE)` will print a compiled theano function to a png file.
+3) The function :func:`theano.printing.pydotprint` will print a compiled theano function to a png file.
+

 In the image, Apply nodes (the applications of ops) are shown as boxes and variables are shown as ovals.
 The number at the end of each label indicates graph position.  
@@ -170,10 +177,13 @@ Reference
        running the function will print the value that `x` takes in the graph.


-.. function:: theano.printing.pp(*args)
+.. autofunction:: theano.printing.debugprint

-    TODO
-    
+.. function:: theano.pp(*args)

-.. autofunction:: theano.printing.debugprint
+   Just a shortcut to :func:`theano.printing.pp`
+
+.. autofunction:: theano.printing.pp(*args)
+
+.. autofunction:: theano.printing.pydotprint

--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -136,19 +136,35 @@ arange must have its length specified at creation time.
 Simple accumulation into a scalar, ditching lamba
 -------------------------------------------------

-This should be fairly self-explanatory.
+Although this example would seem almost self-explanatory, it stresses a
+pitfall to be careful of: the initial output state that is supplied, that is 
+``output_info``, must be of a **shape similar to that of the output variable**
+generated at each iteration and moreover, it **must not involve an implicit
+downcast** of the latter. 

 .. code-block:: python

+
+    import numpy as np
+    import theano
+    import theano.tensor as T
+
    up_to = T.iscalar("up_to")

    # define a named function, rather than using lambda
    def accumulate_by_adding(arange_val, sum_to_date):
        return sum_to_date + arange_val
+    seq = T.arange(up_to)

+    # An unauthorized implicit downcast from the dtype of 'seq', to that of
+    # 'T.as_tensor_variable(0)' which is of dtype 'int8' by default would occur
+    # if this instruction were to be used instead of the next one:
+    # outputs_info = T.as_tensor_variable(0)
+
+    outputs_info = T.as_tensor_variable(np.asarray(0, seq.dtype))
    scan_result, scan_updates = theano.scan(fn=accumulate_by_adding,
-                                            outputs_info=T.as_tensor_variable(0),
-                                            sequences=T.arange(up_to))
+                                            outputs_info=outputs_info,
+                                            sequences=seq)
    triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result)

    # test
@@ -157,7 +173,6 @@ This should be fairly self-explanatory.
    print [n * (n + 1) // 2 for n in xrange(some_num)]


-
 Another simple example
 ----------------------


--- a/doc/library/tensor/basic.txt
+++ b/doc/library/tensor/basic.txt

 .. currentmodule:: tensor

+.. _libdoc_basic_tensor:
+
 ===========================
 Basic Tensor Functionality
 ===========================
@@ -532,7 +534,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.



-.. function:: shape_padright(x,n_ones = 1)
+.. function:: shape_padright(x, n_ones=1)

    Reshape `x` by right padding the shape with `n_ones` 1s. Note that all
    this new dimension will be broadcastable. To make them non-broadcastable
@@ -597,7 +599,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.

    Create a matrix by filling the shape of `a` with `b`

-.. function:: eye(n, m = None, k = 0, dtype=theano.config.floatX)
+.. function:: eye(n, m=None, k=0, dtype=theano.config.floatX)

    :param n: number of rows in output (value or theano scalar)
    :param m: number of columns in output (value or theano scalar)
@@ -1065,11 +1067,11 @@ Mathematical

    Returns a variable representing the exponential of a, ie e^a.

-.. function:: maximum(a,b)
+.. function:: maximum(a, b)

   Returns a variable representing the maximum element by element of a and b

-.. function:: minimum(a,b)
+.. function:: minimum(a, b)

   Returns a variable representing the minimum element by element of a and b


--- a/doc/tutorial/adding.txt
+++ b/doc/tutorial/adding.txt
 .. _adding:

-========================================
-Baby steps - Adding two numbers together
-========================================
+====================
+Baby Steps - Algebra
+====================

-
-Adding two scalars
+Adding two Scalars
 ==================

-So, to get us started with Theano and get a feel of what we're working with, 
+To get us started with Theano and get a feel of what we're working with, 
 let's make a simple function: add two numbers together. Here is how you do
 it:

@@ -34,12 +33,12 @@ Let's break this down into several steps. The first step is to define
 two symbols (*Variables*) representing the quantities that you want
 to add. Note that from now on, we will use the term 
 *Variable* to mean "symbol" (in other words, 
-``x``, ``y``, ``z`` are all *Variable* objects). The output of the function 
-``f`` is a ``numpy.ndarray`` with zero dimensions.
+*x*, *y*, *z* are all *Variable* objects). The output of the function 
+*f* is a ``numpy.ndarray`` with zero dimensions.

 If you are following along and typing into an interpreter, you may have
 noticed that there was a slight delay in executing the ``function``
-instruction. Behind the scenes, ``f`` was being compiled into C code.
+instruction. Behind the scene, *f* was being compiled into C code.


 .. note:
@@ -52,12 +51,10 @@ instruction. Behind the scenes, ``f`` was being compiled into C code.
  >>> x = theano.tensor.ivector()
  >>> y = -x
  
-  ``x`` and ``y`` are both Variables, i.e. instances of the
+  *x* and *y* are both Variables, i.e. instances of the
  ``theano.gof.graph.Variable`` class. The
-  type of both ``x`` and ``y`` is ``theano.tensor.ivector``.
-
+  type of both *x* and *y* is ``theano.tensor.ivector``.

-------------------------------------------

 **Step 1**

@@ -68,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar``
 is the type we assign to "0-dimensional arrays (`scalar`) of doubles
 (`d`)". It is a Theano :ref:`type`.

-``dscalar`` is not a class. Therefore, neither ``x`` nor ``y``
+``dscalar`` is not a class. Therefore, neither *x* nor *y*
 are actually instances of ``dscalar``. They are instances of
-:class:`TensorVariable`. ``x`` and ``y``
+:class:`TensorVariable`. *x* and *y*
 are, however, assigned the theano Type ``dscalar`` in their ``type``
 field, as you can see here:

@@ -83,52 +80,49 @@ TensorType(float64, scalar)
 >>> x.type is T.dscalar
 True

-You can learn more about the structures in Theano in :ref:`graphstructures`.
-
 By calling ``T.dscalar`` with a string argument, you create a
 *Variable* representing a floating-point scalar quantity with the
 given name. If you provide no argument, the symbol will be unnamed. Names
 are not required, but they can help debugging.

+More will be said in a moment regarding Theano's inner structure. You
+could also learn more by looking into :ref:`graphstructures`.

-------------------------------------------

 **Step 2**

-The second step is to combine ``x`` and ``y`` into their sum ``z``:
+The second step is to combine *x* and *y* into their sum *z*:

 >>> z = x + y

-``z`` is yet another *Variable* which represents the addition of
-``x`` and ``y``. You can use the :ref:`pp <libdoc_printing>`
-function to pretty-print out the computation associated to ``z``.
+*z* is yet another *Variable* which represents the addition of
+*x* and *y*. You can use the :ref:`pp <libdoc_printing>`
+function to pretty-print out the computation associated to *z*.

 >>> print pp(z)
 (x + y)

-------------------------------------------

 **Step 3**

-The last step is to create a function taking ``x`` and ``y`` as inputs
-and giving ``z`` as output:
+The last step is to create a function taking *x* and *y* as inputs
+and giving *z* as output:

 >>> f = function([x, y], z)

 The first argument to :func:`function <function.function>` is a list of Variables
 that will be provided as inputs to the function. The second argument
 is a single Variable *or* a list of Variables. For either case, the second
-argument is what we want to see as output when we apply the function.
+argument is what we want to see as output when we apply the function. *f* may
+then be used like a normal Python function.

-``f`` may then be used like a normal Python function.

-
-Adding two matrices
+Adding two Matrices
 ===================

 You might already have guessed how to do this. Indeed, the only change
-from the previous example is that you need to instantiate ``x`` and
-``y`` using the matrix Types:
+from the previous example is that you need to instantiate *x* and
+*y* using the matrix Types:

 .. If you modify this code, also change :
 .. theano/tests/test_tutorial.py:T_adding.test_adding_2
@@ -138,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and
 >>> z = x + y
 >>> f = function([x, y], z)

-``dmatrix`` is the Type for matrices of doubles. And then we can use
+``dmatrix`` is the Type for matrices of doubles. Then we can use
 our new function on 2D arrays:

 >>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])
 array([[ 11.,  22.],
       [ 33.,  44.]])

-The variable is a numpy array. We can also use numpy arrays directly as
+The variable is a NumPy array. We can also use NumPy arrays directly as
 inputs:

 >>> import numpy
@@ -159,18 +153,36 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`.

 The following types are available:

-* **byte**: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
-* **32-bit integers**: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
-* **64-bit integers**: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
-* **float**: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
-* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
-* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4
+* **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``
+* **16-bit integers**: ``wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4``
+* **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``
+* **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``
+* **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``
+* **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``
+* **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``

-The previous list is not exhaustive. A guide to all types compatible
-with numpy arrays may be found :ref:`here <libdoc_tensor_creation>`.
+The previous list is not exhaustive and a guide to all types compatible
+with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.

 .. note::

   You, the user---not the system architecture---have to choose whether your
   program will use 32- or 64-bit integers (``i`` prefix vs. the ``l`` prefix)
   and floats (``f`` prefix vs. the ``d`` prefix).
+
+-------------------------------------------
+
+**Exercise**
+
+.. code-block:: python
+
+  import theano
+  a = theano.tensor.vector() # declare variable
+  out = a + a ** 10               # build symbolic expression
+  f = theano.function([a], out)   # compile function
+  print f([0, 1, 2])  # prints `array([0, 2, 1026])`
+  
+Modify and execute this code to compute this expression: a ** 2 + b ** 2 + 2 * a * b.
+
+
+:download:`Solution<adding_solution_1.py>`
--- a/doc/tutorial/adding_solution_1.py
+++ b/doc/tutorial/adding_solution_1.py
+#!/usr/bin/env python
+# Theano tutorial
+# Solution to Exercise in section 'Baby Steps - Algebra'
+
+import theano
+a = theano.tensor.vector()  # declare variable
+b = theano.tensor.vector()  # declare variable
+out = a ** 2 + b ** 2 + 2 * a * b  # build symbolic expression
+f = theano.function([a, b], out)   # compile function
+print f([1, 2], [4, 5])  # prints [ 25.  49.]
--- a/doc/tutorial/aliasing.txt
+++ b/doc/tutorial/aliasing.txt
--- a/doc/tutorial/conditions.txt
+++ b/doc/tutorial/conditions.txt
@@ -4,53 +4,56 @@
 Conditions
 ==========

-**IfElse vs switch**
+IfElse vs Switch
+================

- Build condition over symbolic variables.
- IfElse Op takes a `boolean` condition and two variables to compute as input.
- Switch take a `tensor` as condition and two variables to compute as input.
-  - Switch is an elementwise operation. It is more general than IfElse.
- While Switch Op evaluates both 'output' variables, IfElse Op is lazy and only
-  evaluates one variable respect to the condition.
+
+- Both ops build a condition over symbolic variables.
+- ``IfElse`` takes a *boolean* condition and two variables as inputs.
+- ``Switch`` takes a *tensor* as condition and two variables as inputs.
+  ``switch`` is an elementwise operation and is thus more general than ``ifelse``.
+- Whereas ``switch`` evaluates both *output* variables, ``ifelse`` is lazy and only
+  evaluates one variable with respect to the condition.

 **Example**

+
 .. code-block:: python

  from theano import tensor as T
  from theano.ifelse import ifelse
  import theano, time, numpy

-  a,b = T.scalars('a','b')
-  x,y = T.matrices('x','y')
+  a,b = T.scalars('a', 'b')
+  x,y = T.matrices('x', 'y')
  
-  z_switch = T.switch(T.lt(a,b), T.mean(x), T.mean(y))
-  z_lazy = ifelse(T.lt(a,b), T.mean(x), T.mean(y))
+  z_switch = T.switch(T.lt(a, b), T.mean(x), T.mean(y))
+  z_lazy = ifelse(T.lt(a, b), T.mean(x), T.mean(y))

-  f_switch = theano.function([a,b,x,y], z_switch, 
+  f_switch = theano.function([a, b, x, y], z_switch,
                      mode=theano.Mode(linker='vm'))
-  f_lazyifelse = theano.function([a,b,x,y], z_lazy,
+  f_lazyifelse = theano.function([a, b, x, y], z_lazy,
                      mode=theano.Mode(linker='vm'))

  val1 = 0.
  val2 = 1.
-  big_mat1 = numpy.ones((10000,1000))
-  big_mat2 = numpy.ones((10000,1000))
+  big_mat1 = numpy.ones((10000, 1000))
+  big_mat2 = numpy.ones((10000, 1000))

  n_times = 10

  tic = time.clock()
  for i in xrange(n_times):
      f_switch(val1, val2, big_mat1, big_mat2)
-  print 'time spent evaluating both values %f sec'%(time.clock()-tic)
+  print 'time spent evaluating both values %f sec' % (time.clock() - tic)

  tic = time.clock()
  for i in xrange(n_times):
      f_lazyifelse(val1, val2, big_mat1, big_mat2)
-  print 'time spent evaluating one value %f sec'%(time.clock()-tic)
+  print 'time spent evaluating one value %f sec' % (time.clock() - tic)

-In this example, IfElse Op spend less time (about an half) than Switch
-since it computes only one variable instead of both.
+In this example, the ``IfElse`` op spends less time (about half as much) than ``Switch``
+since it computes only one variable out of the two.

 .. code-block:: python

@@ -59,11 +62,10 @@ since it computes only one variable instead of both.
  time spent evaluating one value 0.3500 sec


-It is actually important to use  ``linker='vm'`` or ``linker='cvm'``,
-otherwise IfElse will compute both variables and take the same computation
-time as the Switch Op. The linker is not currently set by default to 'cvm' but
-it will be in a near future.
+Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
+variables and take the same computation time as ``switch``. Although the linker
+is not currently set by default to ``cvm``, it will be in the near future.

-There is not an optimization to automatically change a switch with a
-broadcasted scalar to an ifelse, as this is not always the faster. See
+There is no automatic optimization replacing a ``switch`` with a
+broadcasted scalar to an ``ifelse``, as this is not always faster. See
 this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
--- a/doc/tutorial/debug_faq.txt
+++ b/doc/tutorial/debug_faq.txt
--- a/doc/tutorial/examples.txt
+++ b/doc/tutorial/examples.txt
--- a/doc/tutorial/extending_theano.txt
+++ b/doc/tutorial/extending_theano.txt
--- a/doc/tutorial/extending_theano_solution_1.py
+++ b/doc/tutorial/extending_theano_solution_1.py
+#!/usr/bin/env python
+# Theano tutorial
+# Solution to Exercise in section 'Extending Theano'
+import unittest
+
+import theano
+
+
+# 1. Op returns x * y
+
+class ProdOp(theano.Op):
+    def __eq__(self, other):
+        return type(self) == type(other)
+
+    def __hash__(self):
+        return hash(type(self))
+
+    def __str__(self):
+        return self.__class__.__name__
+
+    def make_node(self, x, y):
+        x = theano.tensor.as_tensor_variable(x)
+        y = theano.tensor.as_tensor_variable(y)
+        outdim = x.ndim
+        output = (theano.tensor.TensorType
+                  (dtype=theano.scalar.upcast(x.dtype, y.dtype),
+                      broadcastable=[False] * outdim)())
+        return theano.Apply(self, inputs=[x, y], outputs=[output])
+
+    def perform(self, node, inputs, output_storage):
+        x, y = inputs
+        z = output_storage[0]
+        z[0] = x * y
+
+    def infer_shape(self, node, i0_shapes):
+        return [i0_shapes[0]]
+
+    def grad(self, inputs, output_grads):
+        return [output_grads[0] * inputs[1], output_grads[0] * inputs[0]]
+
+
+# 2. Op returns x + y and x - y
+
+class SumDiffOp(theano.Op):
+    def __eq__(self, other):
+        return type(self) == type(other)
+
+    def __hash__(self):
+        return hash(type(self))
+
+    def __str__(self):
+        return self.__class__.__name__
+
+    def make_node(self, x, y):
+        x = theano.tensor.as_tensor_variable(x)
+        y = theano.tensor.as_tensor_variable(y)
+        outdim = x.ndim
+        output1 = (theano.tensor.TensorType
+                  (dtype=theano.scalar.upcast(x.dtype, y.dtype),
+                      broadcastable=[False] * outdim)())
+        output2 = (theano.tensor.TensorType
+                  (dtype=theano.scalar.upcast(x.dtype, y.dtype),
+                      broadcastable=[False] * outdim)())
+        return theano.Apply(self, inputs=[x, y], outputs=[output1, output2])
+
+    def perform(self, node, inputs, output_storage):
+        x, y = inputs
+        z1, z2 = output_storage
+        z1[0] = x + y
+        z2[0] = x - y
+
+    def infer_shape(self, node, i0_shapes):
+        return [i0_shapes[0], i0_shapes[0]]
+
+    def grad(self, inputs, output_grads):
+        og1, og2 = output_grads
+        if og1 is None:
+            og1 = theano.tensor.zeros_like(og2)
+        if og2 is None:
+            og2 = theano.tensor.zeros_like(og1)
+        return [og1 + og2, og1 - og2]
+
+
+# 3. Testing apparatus
+
+import numpy
+from theano.gof import Op, Apply
+from theano import tensor, function, printing
+from theano.tests import unittest_tools as utt
+
+
+class TestProdOp(utt.InferShapeTester):
+
+    rng = numpy.random.RandomState(43)
+
+    def setUp(self):
+        super(TestProdOp, self).setUp()
+        self.op_class = ProdOp  # case 1
+
+    def test_perform(self):
+        x = theano.tensor.matrix()
+        y = theano.tensor.matrix()
+        f = theano.function([x, y], self.op_class()(x, y))
+        x_val = numpy.random.rand(5, 4)
+        y_val = numpy.random.rand(5, 4)
+        out = f(x_val, y_val)
+        assert numpy.allclose(x_val * y_val, out)
+
+    def test_gradient(self):
+        utt.verify_grad(self.op_class(), [numpy.random.rand(5, 4),
+                                numpy.random.rand(5, 4)],
+                        n_tests=1, rng=TestProdOp.rng)
+
+    def test_infer_shape(self):
+        x = tensor.dmatrix()
+        y = tensor.dmatrix()
+
+        self._compile_and_check([x, y], [self.op_class()(x, y)],
+                                [numpy.random.rand(5, 6),
+                                 numpy.random.rand(5, 6)],
+                                self.op_class)
+
+
+class TestSumDiffOp(utt.InferShapeTester):
+
+    rng = numpy.random.RandomState(43)
+
+    def setUp(self):
+        super(TestSumDiffOp, self).setUp()
+        self.op_class = SumDiffOp
+
+    def test_perform(self):
+        x = theano.tensor.matrix()
+        y = theano.tensor.matrix()
+        f = theano.function([x, y], self.op_class()(x, y))
+        x_val = numpy.random.rand(5, 4)
+        y_val = numpy.random.rand(5, 4)
+        out = f(x_val, y_val)
+        assert numpy.allclose([x_val + y_val, x_val - y_val], out)
+
+    def test_gradient(self):
+        def output_0(x, y):
+            return self.op_class()(x, y)[0]
+
+        def output_1(x, y):
+            return self.op_class()(x, y)[1]
+
+        utt.verify_grad(output_0, [numpy.random.rand(5, 4),
+                                numpy.random.rand(5, 4)],
+                        n_tests=1, rng=TestSumDiffOp.rng)
+        utt.verify_grad(output_1, [numpy.random.rand(5, 4),
+                                numpy.random.rand(5, 4)],
+                        n_tests=1, rng=TestSumDiffOp.rng)
+
+    def test_infer_shape(self):
+        x = tensor.dmatrix()
+        y = tensor.dmatrix()
+
+        # adapt the choice of the next instruction to the op under test
+
+        self._compile_and_check([x, y], self.op_class()(x, y),
+                                [numpy.random.rand(5, 6),
+                                 numpy.random.rand(5, 6)],
+                                self.op_class)
+
+if __name__ == "__main__":
+    unittest.main()
--- a/doc/tutorial/faq.txt
+++ b/doc/tutorial/faq.txt
@@ -8,33 +8,46 @@ Frequently Asked Questions
 TypeError: object of type 'TensorVariable' has no len()
 -------------------------------------------------------

-If you receive this error:
+If you receive the following error, it is because the Python function *__len__* cannot 
+be implemented on Theano variables:

 .. code-block:: python

   TypeError: object of type 'TensorVariable' has no len()

-We can't implement the __len__ function on Theano Variables. This is
-because Python requires that this function returns an integer, but we
-can't do this as we are working with symbolic variables. You can use
-`var.shape[0]` as a workaround.
+Python requires that *__len__* returns an integer, yet it cannot be done as Theano's variables are symbolic. However, `var.shape[0]` can be used as a workaround.

-Also we can't change the above error message into a more explicit one
-because of some other Python internal behavior that can't be modified.
+This error message cannot be made more explicit because the relevant aspects of Python's 
+internals cannot be modified.


 Faster gcc optimization
 -----------------------

-You can enable faster gcc optimization with the cxxflags. This list of flags was suggested on the mailing list::
-
+You can enable faster gcc optimization with the ``cxxflags``. This list of flags was suggested on the mailing list::

    cxxflags=-march=native -O3 -ffast-math -ftree-loop-distribution -funroll-loops -ftracer

-Use it at your own risk. Some people warned that the -ftree-loop-distribution optimization caused them wrong results in the past.
-Also the -march=native must be used with care if you have NFS. In that case, you MUST set the compiledir to a local path of the computer.
+Use it at your own risk. Some people warned that the ``-ftree-loop-distribution`` optimization resulted in wrong results in the past.
+Also the ``-march=native`` flag must be used with care if you have NFS. In that case, you MUST set the compiledir to a local path of the computer.

 Related Projects
 ----------------

 We try to list in this `wiki page <https://github.com/Theano/Theano/wiki/Related-projects>`_ other Theano related projects.
+
+
+"What are Theano's Limitations?"
+--------------------------------
+
+Theano offers a good amount of flexibility, but has some limitations too.
+You must answer for yourself the following question: How can my algorithm be cleverly written 
+so as to make the most of what Theano can do?
+
+Here is a list of some of the known limitations:
+
+- *While*- or *for*-Loops within an expression graph are supported, but only via
+  the :func:`theano.scan` op (which puts restrictions on how the loop body can
+  interact with the rest of the graph).
+
+- Neither *goto* nor *recursion* is supported or planned within expression graphs.
--- a/doc/tutorial/gpu_data_convert.txt
+++ b/doc/tutorial/gpu_data_convert.txt
@@ -7,54 +7,130 @@ PyCUDA/CUDAMat/Gnumpy compatibility
 PyCUDA
 ======

-Currently PyCUDA and Theano have different object to store GPU
+Currently, PyCUDA and Theano have different objects to store GPU
 data. The two implementations do not support the same set of features.
-Theano's implementation is called CudaNdarray and supports
-strides. It support only the float32 dtype. PyCUDA's implementation
-is called GPUArray and doesn't support strides. Instead it can deal with all numpy and Cuda dtypes.
+Theano's implementation is called *CudaNdarray* and supports
+*strides*. It also only supports the *float32* dtype. PyCUDA's implementation
+is called *GPUArray* and doesn't support *strides*. However, it can deal with
+all NumPy and CUDA dtypes.

-We are currently working on having the same base object that will
-mimic numpy. Until this is ready, here is some information on how to
-use both Project in the same script.
+We are currently working on having the same base object for both that will
+also mimic Numpy. Until this is ready, here is some information on how to
+use both objects in the same script.

 Transfer
 --------

-You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and
-from CudaNdarray. The function `to_cudandarray(x, copyif=False)` and
-`to_gpuarray(x)` return a new object that share the same memory space
-as the original. Otherwise it raise an ValueError. Because GPUArray don't
+You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
+from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
+``to_gpuarray(x)`` return a new object that occupies the same memory space
+as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't
 support strides, if the CudaNdarray is strided, we could copy it to
 have a non-strided copy. The resulting GPUArray won't share the same
-memory region. If you want this behavior, set `copyif=True` in
-`to_gpuarray`.
+memory region. If you want this behavior, set ``copyif=True`` in
+``to_gpuarray``.

 Compiling with PyCUDA
 ---------------------

-You can use PyCUDA to compile some CUDA function that work directly on
-CudaNdarray. There is an example in the function `test_pycuda_theano`
-in the file `theano/misc/tests/test_pycuda_theano_simple.py`. Also,
-there is an example that shows how to make an op that calls a pycuda
-function :ref:`here <pyCUDA_theano>`.
-
-Theano op using PyCUDA function
-------------------------------
-
-You can use gpu function compiled with PyCUDA in a Theano op. Look
-into the `HPCS2011 tutorial
-<http://www.iro.umontreal.ca/~lisa/pointeurs/tutorial_hpcs2011_fixed.pdf>`_ for an example.
-
-
-
+You can use PyCUDA to compile CUDA functions that work directly on
+CudaNdarrays. Here is an example from the file ``theano/misc/tests/test_pycuda_theano_simple.py``:
+
+.. code-block:: python
+
+  import sys
+  import numpy
+  import theano
+  import theano.sandbox.cuda as cuda_ndarray
+  import theano.misc.pycuda_init
+  import pycuda
+  import pycuda.driver as drv
+  import pycuda.gpuarray
+
+
+  def test_pycuda_theano():
+      """Simple example with pycuda function and Theano CudaNdarray object."""
+      from pycuda.compiler import SourceModule
+      mod = SourceModule("""
+  __global__ void multiply_them(float *dest, float *a, float *b)
+  {
+    const int i = threadIdx.x;
+    dest[i] = a[i] * b[i];
+  }
+  """)
+
+      multiply_them = mod.get_function("multiply_them")
+
+      a = numpy.random.randn(100).astype(numpy.float32)
+      b = numpy.random.randn(100).astype(numpy.float32)
+  
+      # Test with Theano object
+      ga = cuda_ndarray.CudaNdarray(a)
+      gb = cuda_ndarray.CudaNdarray(b)
+      dest = cuda_ndarray.CudaNdarray.zeros(a.shape)
+      multiply_them(dest, ga, gb,
+                    block=(400, 1, 1), grid=(1, 1))
+      assert (numpy.asarray(dest) == a * b).all()
+
+
+Theano Op using a PyCUDA function
+---------------------------------
+
+You can use a GPU function compiled with PyCUDA in a Theano op:
+
+.. code-block:: python
+
+    import numpy, theano
+    import theano.misc.pycuda_init
+    from pycuda.compiler import SourceModule
+    import theano.sandbox.cuda as cuda
+
+    class PyCUDADoubleOp(theano.Op):
+        def __eq__(self, other):
+            return type(self) == type(other)
+        def __hash__(self):
+            return hash(type(self))
+        def __str__(self):
+            return self.__class__.__name__
+        def make_node(self, inp):
+            inp = cuda.basic_ops.gpu_contiguous(
+               cuda.basic_ops.as_cuda_ndarray_variable(inp))
+            assert inp.dtype == "float32"
+            return theano.Apply(self, [inp], [inp.type()])
+        def make_thunk(self, node, storage_map, _, _2):
+            mod = SourceModule("""
+        __global__ void my_fct(float * i0, float * o0, int size) {
+        int i = blockIdx.x * blockDim.x + threadIdx.x;
+        if(i<size){
+            o0[i] = i0[i] * 2;
+        }
+      }""")
+            pycuda_fct = mod.get_function("my_fct")
+            inputs = [ storage_map[v] for v in node.inputs]
+            outputs = [ storage_map[v] for v in node.outputs]
+            def thunk():
+                z = outputs[0]
+                if z[0] is None or z[0].shape!=inputs[0][0].shape:
+                    z[0] = cuda.CudaNdarray.zeros(inputs[0][0].shape)
+                grid = (int(numpy.ceil(inputs[0][0].size / 512.)),1)
+                pycuda_fct(inputs[0][0], z[0], numpy.intc(inputs[0][0].size),
+                           block=(512, 1, 1), grid=grid)
+            return thunk
+    
 CUDAMat
 =======

-There is conversion function between CUDAMat object and Theano CudaNdArray. They are with the same principe as PyCUDA one's. They are in theano.misc.cudamat_utils.py
+There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects. 
+They obey the same principles as Theano's PyCUDA functions and can be found in
+``theano.misc.cudamat_utils.py``.
+
+.. TODO: this statement is unclear:

-WARNING: there is a strange problem with stride/shape with those converter. The test to work need a transpose and reshape...
+WARNING: There is a peculiar problem associated with stride/shape with those converters. 
+In order to work, the test needs a *transpose* and *reshape*...

 Gnumpy
 ======

-There is conversion function between gnumpy garray object and Theano CudaNdArray. They are with the same principe as PyCUDA one's. They are in theano.misc.gnumpy_utils.py
+There are conversion functions between Gnumpy *garray* objects and Theano CudaNdArray objects. 
+They are also similar to Theano's PyCUDA functions and can be found in ``theano.misc.gnumpy_utils.py``.
--- a/doc/tutorial/gradients.txt
+++ b/doc/tutorial/gradients.txt
--- a/doc/tutorial/index.txt
+++ b/doc/tutorial/index.txt
@@ -5,20 +5,21 @@
 Tutorial
 ========

-Let us start an interactive session (e.g. ``python`` or ``ipython``) and import Theano.
+Let us start an interactive session (e.g. with ``python`` or ``ipython``) and import Theano.

 >>> from theano import *

-Many of symbols you will need to use are in the ``tensor`` subpackage
-of Theano. Let's import that subpackage under a handy name like
-``T`` (many tutorials use this convention).
+Several of the symbols you will need to use are in the ``tensor`` subpackage
+of Theano. Let us import that subpackage under a handy name like
+``T`` (the tutorials will frequently use this convention).

 >>> import theano.tensor as T

-If that worked you are ready for the tutorial, otherwise check your
+If that succeeded you are ready for the tutorial, otherwise check your
 installation (see :ref:`install`).

-Throughout the tutorial, bear in mind that there is a :ref:`glossary` to help
+Throughout the tutorial, bear in mind that there is a :ref:`glossary` as well
+as *index* and *modules* links in the upper-right corner of each page to help
 you out.

 .. toctree::
@@ -27,18 +28,18 @@ you out.
    numpy
    adding
    examples
-    gradients
-    loading_and_saving
    symbolic_graphs
+    printing_drawing
+    gradients
    modes
-    aliasing
+    loading_and_saving
    conditions
    loop
    sparse
    using_gpu
    gpu_data_convert
+    aliasing
    shape_info
-    remarks
-    extending_theano
    debug_faq
+    extending_theano
    faq
--- a/doc/tutorial/loading_and_saving.txt
+++ b/doc/tutorial/loading_and_saving.txt
@@ -6,8 +6,8 @@ Loading and Saving
 ==================

 Python's standard way of saving class instances and reloading them
-is the pickle_ mechanism. Many Theano objects can be serialized (and
-deserialized) by ``pickle``, however, a limitation of ``pickle`` is that
+is the pickle_ mechanism. Many Theano objects can be *serialized* (and
+*deserialized*) by ``pickle``, however, a limitation of ``pickle`` is that
 it does not save the code or data of a class along with the instance of
 the class being serialized. As a result, reloading objects created by a
 previous version of a class can be really problematic.
@@ -24,7 +24,7 @@ as you would in the course of any other Python program.
 .. _pickle: http://docs.python.org/library/pickle.html


-The basics of pickling
+The Basics of Pickling
 ======================

 The two modules ``pickle`` and ``cPickle`` have the same functionalities, but
@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with
 .. note::

    If you want your saved object to be stored efficiently, don't forget
-    to use ``cPickle.HIGHEST_PROTOCOL``, the resulting file can be
+    to use ``cPickle.HIGHEST_PROTOCOL``. The resulting file can be
    dozens of times smaller than with the default protocol.

 .. note::
@@ -81,7 +81,7 @@ For more details about pickle's usage, see
 `Python documentation <http://docs.python.org/library/pickle.html#usage>`_.


-Short-term serialization
+Short-Term Serialization
 ========================

 If you are confident that the class instance you are serializing will be
@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of:
        self.training_set = cPickle.load(file(self.training_set_file, 'rb'))


-Long-term serialization
+Long-Term Serialization
 =======================

 If the implementation of the class you want to save is quite unstable, for
@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you
 don't.

 For instance, if the only parameters you want to save are a weight
-matrix ``W`` and a bias ``b``, you can define:
+matrix *W* and a bias *b*, you can define:

 .. code-block:: python

@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define:
        self.W = W
        self.b = b

-If, at some point in time, ``W`` is renamed to ``weights`` and ``b`` to
-``bias``, the older pickled files will still be usable, if you update these
+If at some point in time *W* is renamed to *weights* and *b* to
+*bias*, the older pickled files will still be usable, if you update these
 functions to reflect the change in name:

 .. code-block:: python
@@ -152,6 +152,6 @@ functions to reflect the change in name:
        self.weights = W
        self.bias = b

-For more information on advanced use of pickle and its internals, see Python's
+For more information on advanced use of ``pickle`` and its internals, see Python's
 pickle_ documentation.

--- a/doc/tutorial/loop.txt
+++ b/doc/tutorial/loop.txt
@@ -4,4 +4,94 @@
 Loop
 ====

-You can use :ref:`Scan <lib_scan>` to do all type of loop in Theano. All the documentation about it is in the library for now.
+
+Scan
+====
+
+- A general form of *recurrence*, which can be used for looping.
+- *Reduction* and *map* (loop over the leading dimensions) are special cases of ``scan``.
+- You ``scan`` a function along some input sequence, producing an output at each time-step.
+- The function can see the *previous K time-steps* of your function.
+- ``sum()`` could be computed by scanning the *z + x(i)* function over a list, given an initial state of *z=0*.
+- Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
+- Advantages of using ``scan`` over *for* loops:
+  
+  - Number of iterations to be part of the symbolic graph.
+  - Minimizes GPU transfers (if GPU is involved).
+  - Computes gradients through sequential steps.
+  - Slightly faster than using a *for* loop in Python with a compiled Theano function.
+  - Can lower the overall memory usage by detecting the actual amount of memory needed.
+
+The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
+
+**Scan Example: Computing pow(A,k)**
+
+.. code-block:: python
+
+  import theano
+  import theano.tensor as T
+  theano.config.warn.subtensor_merge_bug = False
+
+  k = T.iscalar("k")
+  A = T.vector("A")
+
+  def inner_fct(prior_result, A):
+      return prior_result * A
+
+  # Symbolic description of the result
+  result, updates = theano.scan(fn=inner_fct,
+                              outputs_info=T.ones_like(A),
+                              non_sequences=A, n_steps=k)
+
+  # Scan has provided us with A ** 1 through A ** k.  Keep only the last
+  # value. Scan notices this and does not waste memory saving them.
+  final_result = result[-1]
+  
+  power = theano.function(inputs=[A, k], outputs=final_result,
+                        updates=updates)
+  
+  print power(range(10),2)
+  #[  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]
+
+
+**Scan Example: Calculating a Polynomial**
+
+.. code-block:: python
+
+  import numpy
+  import theano
+  import theano.tensor as T
+  theano.config.warn.subtensor_merge_bug = False
+
+  coefficients = theano.tensor.vector("coefficients")
+  x = T.scalar("x")
+  max_coefficients_supported = 10000
+
+  # Generate the components of the polynomial
+  full_range=theano.tensor.arange(max_coefficients_supported)
+  components, updates = theano.scan(fn=lambda coeff, power, free_var:
+                                     coeff * (free_var ** power),
+                                  outputs_info=None,
+                                  sequences=[coefficients, full_range],
+                                  non_sequences=x)
+
+  polynomial = components.sum()
+  calculate_polynomial = theano.function(inputs=[coefficients, x],
+                                       outputs=polynomial)
+
+  test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
+  print calculate_polynomial(test_coeff, 3)
+  # 19.0
+
+
+-------------------------------------------
+
+
+**Exercise**
+
+Run both examples.
+
+Modify and execute the polynomial example to have the reduction done by ``scan``.
+
+
+:download:`Solution<loop_solution_1.py>`
--- a/doc/tutorial/loop_solution_1.py
+++ b/doc/tutorial/loop_solution_1.py
+#!/usr/bin/env python
+# Theano tutorial
+# Solution to Exercise in section 'Loop'
+import numpy
+
+import theano
+import theano.tensor as tt
+
+# 1. First example
+
+theano.config.warn.subtensor_merge_bug = False
+
+k = tt.iscalar("k")
+A = tt.vector("A")
+
+
+def inner_fct(prior_result, A):
+    return prior_result * A
+
+# Symbolic description of the result
+result, updates = theano.scan(fn=inner_fct,
+                              outputs_info=tt.ones_like(A),
+                              non_sequences=A, n_steps=k)
+
+# Scan has provided us with A ** 1 through A ** k.  Keep only the last
+# value. Scan notices this and does not waste memory saving them.
+final_result = result[-1]
+
+power = theano.function(inputs=[A, k], outputs=final_result,
+                        updates=updates)
+
+print power(range(10), 2)
+# [  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]
+
+
+# 2. Second example
+
+coefficients = tt.vector("coefficients")
+x = tt.scalar("x")
+max_coefficients_supported = 10000
+
+# Generate the components of the polynomial
+full_range = tt.arange(max_coefficients_supported)
+components, updates = theano.scan(fn=lambda coeff, power, free_var:
+                                  coeff * (free_var ** power),
+                                  sequences=[coefficients, full_range],
+                                  outputs_info=None,
+                                  non_sequences=x)
+polynomial = components.sum()
+calculate_polynomial1 = theano.function(inputs=[coefficients, x],
+                                        outputs=polynomial)
+
+test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
+print calculate_polynomial1(test_coeff, 3)
+# 19.0
+
+# 3. Reduction performed inside scan
+
+theano.config.warn.subtensor_merge_bug = False
+
+coefficients = tt.vector("coefficients")
+x = tt.scalar("x")
+max_coefficients_supported = 10000
+
+# Generate the components of the polynomial
+full_range = tt.arange(max_coefficients_supported)
+
+
+outputs_info = tt.as_tensor_variable(numpy.asarray(0, 'float64'))
+
+components, updates = theano.scan(fn=lambda coeff, power, prior_value, free_var:
+                                  prior_value + (coeff * (free_var ** power)),
+                                  sequences=[coefficients, full_range],
+                                  outputs_info=outputs_info,
+                                  non_sequences=x)
+
+polynomial = components[-1]
+calculate_polynomial = theano.function(inputs=[coefficients, x],
+                                       outputs=polynomial, updates=updates)
+
+test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
+print calculate_polynomial(test_coeff, 3)
+# 19.0
--- a/doc/tutorial/modes.txt
+++ b/doc/tutorial/modes.txt
--- a/doc/tutorial/modes_solution_1.py
+++ b/doc/tutorial/modes_solution_1.py
+#!/usr/bin/env python
+# Theano tutorial
+# Solution to Exercise in section 'Configuration Settings and Compiling Modes'
+
+import numpy
+import theano
+import theano.tensor as tt
+
+theano.config.floatX = 'float32'
+
+rng = numpy.random
+
+N = 400
+feats = 784
+D = (rng.randn(N, feats).astype(theano.config.floatX),
+rng.randint(size=N, low=0, high=2).astype(theano.config.floatX))
+training_steps = 10000
+
+# Declare Theano symbolic variables
+x = tt.matrix("x")
+y = tt.vector("y")
+w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
+b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
+x.tag.test_value = D[0]
+y.tag.test_value = D[1]
+#print "Initial model:"
+#print w.get_value(), b.get_value()
+
+# Construct Theano expression graph
+p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b))  # Probabily of having a one
+prediction = p_1 > 0.5  # The prediction that is done: 0 or 1
+xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1)  # Cross-entropy
+cost = tt.cast(xent.mean(), 'float32') + \
+       0.01 * (w ** 2).sum()  # The cost to optimize
+gw, gb = tt.grad(cost, [w, b])
+
+# Compile expressions to functions
+train = theano.function(
+            inputs=[x, y],
+            outputs=[prediction, xent],
+            updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
+            name="train")
+predict = theano.function(inputs=[x], outputs=prediction,
+            name="predict")
+
+if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
+train.maker.fgraph.toposort()]):
+    print 'Used the cpu'
+elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
+train.maker.fgraph.toposort()]):
+    print 'Used the gpu'
+else:
+    print 'ERROR, not able to tell if theano used the cpu or the gpu'
+    print train.maker.fgraph.toposort()
+
+for i in range(training_steps):
+    pred, err = train(D[0], D[1])
+#print "Final model:"
+#print w.get_value(), b.get_value()
+
+print "target values for D"
+print D[1]
+
+print "prediction on D"
+print predict(D[0])
--- a/doc/tutorial/numpy.txt
+++ b/doc/tutorial/numpy.txt
@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a
 neural network then the weights from the input to the first hidden
 layer would represent a matrix of size (5, #hid). 

-If I have an array:
+Consider this array:

 >>> numpy.asarray([[1., 2], [3, 4], [5, 6]])
 array([[ 1.,  2.],
@@ -37,7 +37,7 @@ This is a 3x2 matrix, i.e. there are 3 rows and 2 columns.

 To access the entry in the 3rd row (row #2) and the 1st column (column #0):

->>> numpy.asarray([[1., 2], [3, 4], [5, 6]])[2,0]
+>>> numpy.asarray([[1., 2], [3, 4], [5, 6]])[2, 0]
 5.0


@@ -61,5 +61,5 @@ array([2., 4., 6.])

 The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size
 as ``a`` during the multiplication. This trick is often useful in
-simplifying how expression are written. More details about *broadcasting*
-can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
+simplifying how expression are written. More detail about *broadcasting*
+can be found in the `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
--- a/doc/tutorial/printing_drawing.txt
+++ b/doc/tutorial/printing_drawing.txt
+
+.. _tutorial_printing_drawing:
+
+==============================
+Printing/Drawing Theano graphs
+==============================
+
+.. TODO: repair the defective links in the next paragraph
+
+Theano provides two functions (:func:`theano.pp` and
+:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
+compilation.  These two functions print expression graphs in different ways:
+:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
+Theano also provides :func:`pydotprint` that creates a *png* image of the function.
+You can read about them in :ref:`libdoc_printing`.
+
+Consider again the logistic regression but notice the additional printing instuctions. 
+The following output depicts the pre- and post- compilation graphs.
+
+.. code-block:: python
+    
+    import numpy
+    import theano
+    import theano.tensor as T
+    rng = numpy.random
+
+    N = 400
+    feats = 784
+    D = (rng.randn(N, feats).astype(theano.config.floatX),
+    rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
+    training_steps = 10000
+
+    # Declare Theano symbolic variables
+    x = T.matrix("x")
+    y = T.vector("y")
+    w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
+    b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
+    x.tag.test_value = D[0]
+    y.tag.test_value = D[1]
+    #print "Initial model:"
+    #print w.get_value(), b.get_value()
+
+
+    # Construct Theano expression graph
+    p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probabily of having a one
+    prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
+    xent = -y * T.log(p_1) - (1 - y) * T.log(1 - p_1) # Cross-entropy
+    cost = xent.mean() + 0.01 * (w ** 2).sum() # The cost to optimize
+    gw,gb = T.grad(cost, [w, b])
+
+    # Compile expressions to functions
+    train = theano.function(
+                inputs=[x, y],
+                outputs=[prediction, xent],
+                updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
+                name="train")
+    predict = theano.function(inputs=[x], outputs=prediction,
+                name="predict")
+
+    if any( [x.op.__class__.__name__=='Gemv' for x in
+    train.maker.fgraph.toposort()]):
+        print 'Used the cpu'
+    elif any( [x.op.__class__.__name__=='GpuGemm' for x in
+    train.maker.fgraph.toposort()]):
+        print 'Used the gpu'
+    else:
+        print 'ERROR, not able to tell if theano used the cpu or the gpu'
+        print train.maker.fgraph.toposort()
+
+
+    for i in range(training_steps):
+        pred, err = train(D[0], D[1])
+    #print "Final model:"
+    #print w.get_value(), b.get_value()
+
+    print "target values for D"
+    print D[1]
+
+    print "prediction on D"
+    print predict(D[0])
+
+
+    # Print the picture graphs
+    # after compilation
+    theano.printing.pydotprint(predict,
+                               outfile="pics/logreg_pydotprint_predic.png",
+                               var_with_name_simple=True)
+    # before compilation
+    theano.printing.pydotprint_variables(prediction,
+                               outfile="pics/logreg_pydotprint_prediction.png",
+                               var_with_name_simple=True)
+    theano.printing.pydotprint(train,
+                               outfile="pics/logreg_pydotprint_train.png",
+                               var_with_name_simple=True)
+
+
+Pretty Printing
+===============
+
+``theano.printing.pprint(variable)``
+
+>>> theano.printing.pprint(prediction)  # (pre-compilation)
+gt((TensorConstant{1} / (TensorConstant{1} + exp(((-(x \\dot w)) - b)))),TensorConstant{0.5})
+
+
+Debug Printing
+==============
+
+``theano.printing.debugprint({fct, variable, list of variables})``
+
+>>> theano.printing.debugprint(prediction)  # (pre-compilation)
+Elemwise{gt,no_inplace} [@181772236] ''
+ |Elemwise{true_div,no_inplace} [@181746668] ''
+ | |InplaceDimShuffle{x} [@181746412] ''
+ | | |TensorConstant{1} [@181745836]
+ | |Elemwise{add,no_inplace} [@181745644] ''
+ | | |InplaceDimShuffle{x} [@181745420] ''
+ | | | |TensorConstant{1} [@181744844]
+ | | |Elemwise{exp,no_inplace} [@181744652] ''
+ | | | |Elemwise{sub,no_inplace} [@181744012] ''
+ | | | | |Elemwise{neg,no_inplace} [@181730764] ''
+ | | | | | |dot [@181729676] ''
+ | | | | | | |x [@181563948]
+ | | | | | | |w [@181729964]
+ | | | | |InplaceDimShuffle{x} [@181743788] ''
+ | | | | | |b [@181730156]
+ |InplaceDimShuffle{x} [@181771788] ''
+ | |TensorConstant{0.5} [@181771148]
+>>> theano.printing.debugprint(predict)  # (post-compilation)
+Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] ''   2
+ |dot [@183018796] ''   1
+ | |x [@183000780]
+ | |w [@183000812]
+ |InplaceDimShuffle{x} [@183133580] ''   0
+ | |b [@183000876]
+ |TensorConstant{[ 0.5]} [@183084108]
+
+
+Picture Printing
+================
+
+>>> theano.printing.pydotprint_variables(prediction)  # (pre-compilation)
+
+.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
+   :width: 800 px
+
+Notice that ``pydotprint()`` requires *Graphviz* and Python's ``pydot``.
+
+>>> theano.printing.pydotprint(predict)  # (post-compilation)
+
+.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
+   :width: 800 px
+
+>>> theano.printing.pydotprint(train) # This is a small train example!
+
+.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
+   :width: 1500 px
+
--- a/doc/tutorial/python.txt
+++ b/doc/tutorial/python.txt
@@ -5,7 +5,8 @@
 Python tutorial
 ***************

-In this documentation, we suppose that reader know python. Here is a small list of python tutorials/exercices if you know know it or need a refresher:
+In this documentation, we suppose that the reader knows Python. Here is a small list of Python 
+tutorials/exercises if you need to learn it or only need a refresher:

  * `Python Challenge <http://www.pythonchallenge.com/>`__
  * `Dive into Python <http://diveintopython.net/>`__

--- a/doc/tutorial/remarks.txt
+++ b/doc/tutorial/remarks.txt
-
-.. _tutorial_general_remarks:
-
-=====================
-Some general Remarks
-=====================
-
-Theano offers quite a bit of flexibility, but has some limitations too.
-How should you write your algorithm to make the most of what Theano can do?
-
-Limitations
-----------
-
- While- or for-Loops within an expression graph are supported, but only via
-  the :func:`theano.scan` op (which puts restrictions on how the loop body can
-  interact with the rest of the graph).
-
- Neither ``goto`` nor recursion is supported or planned within expression graphs.
-
--- a/doc/tutorial/shape_info.txt
+++ b/doc/tutorial/shape_info.txt
 .. _shape_info:

-============================================
-How shape informations are handled by Theano
-============================================
+==========================================
+How Shape Information is Handled by Theano
+==========================================

-It is not possible to enforce strict shape into a Theano variable when
-building a graph. The given parameter of theano.function can change the
-shape any TheanoVariable in a graph.
+It is not possible to strictly enforce the shape of a Theano variable when
+building a graph since the particular value provided at run-time for a parameter of a
+Theano function may condition the shape of the Theano variables in its graph.

-Currently shape informations are used for 2 things in Theano:
+Currently, information regarding shape is used in two ways in Theano:

- When the exact shape is known, we use it to generate faster c code for
-  the 2d convolution on the cpu and gpu.
+- To generate faster C code for the 2d convolution on the CPU and the GPU,
+  when the exact output shape is known in advance.

 - To remove computations in the graph when we only want to know the
  shape, but not the actual value of a variable. This is done with the
  `Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_
  method.

-  ex:
+  Example:

  .. code-block:: python

     import theano
     x = theano.tensor.matrix('x')
-     f = theano.function([x], (x**2).shape)
+     f = theano.function([x], (x ** 2).shape)
     theano.printing.debugprint(f)
     #MakeVector [@43860304] ''   2
     # |Shape_i{0} [@43424912] ''   1
@@ -32,15 +32,15 @@ Currently shape informations are used for 2 things in Theano:
     # |Shape_i{1} [@43797968] ''   0
     # | |x [@43423568]

-The output of this compiled function do not contain any multiplication
+The output of this compiled function does not contain any multiplication
 or power. Theano has removed them to compute directly the shape of the
 output.

-Shape inference problem
+Shape Inference Problem
 =======================

-Theano propagates shape information in the graph. Sometimes this
-can lead to errors. For example:
+Theano propagates information about shape in the graph. Sometimes this
+can lead to errors. Consider this example:

 .. code-block:: python

@@ -48,9 +48,9 @@ can lead to errors. For example:
   import theano
   x = theano.tensor.matrix('x')
   y = theano.tensor.matrix('y')
-   z = theano.tensor.join(0,x,y)
-   xv = numpy.random.rand(5,4)
-   yv = numpy.random.rand(3,3)
+   z = theano.tensor.join(0, x, y)
+   xv = numpy.random.rand(5, 4)
+   yv = numpy.random.rand(3, 3)

   f = theano.function([x,y], z.shape)
   theano.printing.debugprint(f)
@@ -83,61 +83,61 @@ can lead to errors. For example:
   # |y [@44540304]

   f(xv,yv)
-   # Raise a dimensions mismatch error.
-
-As you see, when you ask for the shape of some computation (join in the
-example), we sometimes compute an inferred shape directly, without executing
-the computation itself (there is no join in the first output or debugprint).
-
-This makes the computation of the shape faster, but it can hide errors. In
-the example, the computation of the shape of join is done on the first
-theano variable in the join, not on the other.
-
-This can probably happen with many other op as elemwise, dot, ...
-Indeed, to make some optimizations (for speed or stability, for instance),
-Theano can assume that the computation is correct and consistent
-in the first place, this is the case here.
-
-You can detect those problem by running the code without this
-optimization, with the Theano flag
-`optimizer_excluding=local_shape_to_shape_i`. You can also have the
-same effect by running in the mode FAST_COMPILE (it will not apply this
-optimization, nor most other optimizations) or DEBUG_MODE (it will test
+   # Raises a dimensions mismatch error.
+
+As you can see, when asking only for the shape of some computation (``join`` in the
+example), an inferred shape is computed directly, without executing
+the computation itself (there is no ``join`` in the first output or debugprint).
+
+This makes the computation of the shape faster, but it can also hide errors. In
+this example, the computation of the shape of the output of ``join`` is done only
+based on the first input Theano variable, which leads to an error.
+
+This might happen with other ops such as ``elemwise`` and ``dot``, for example.
+Indeed, to perform some optimizations (for speed or stability, for instance),
+Theano assumes that the computation is correct and consistent
+in the first place, as it does here.
+
+You can detect those problems by running the code without this
+optimization, using the Theano flag
+``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
+same effect by running in the modes ``FAST_COMPILE`` (it will not apply this
+optimization, nor most other optimizations) or ``DebugMode`` (it will test
 before and after all optimizations (much slower)).


-Specifing exact shape
+Specifing Exact Shape
 =====================

-Currently, specifying a shape is not as easy as we want. We plan some
-upgrade, but this is the current state of what can be done.
+Currently, specifying a shape is not as easy and flexible as we wish and we plan some
+upgrade.  Here is the current state of what can be done:

- You can pass the shape info directly to the `ConvOp` created
-  when calling conv2d. You must add the parameter image_shape
-  and filter_shape to that call. They but most be tuple of 4
-  elements. Ex:
+- You can pass the shape info directly to the ``ConvOp`` created
+  when calling ``conv2d``. You simply set the parameters ``image_shape``
+  and ``filter_shape`` inside the call. They must be tuples of 4
+  elements. For example:

 .. code-block:: python

-    theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
+    theano.tensor.nnet.conv2d(..., image_shape=(7, 3, 5, 5), filter_shape=(2, 3, 4, 4))

- You can use the SpecifyShape op to add shape anywhere in the
-  graph. This allows to do some optimizations. In the following example,
-  this allows to precompute the Theano function to a constant.
+- You can use the ``SpecifyShape`` op to add shape information anywhere in the
+  graph. This allows to perform some optimizations. In the following example,
+  this makes it possible to precompute the Theano function to a constant.

 .. code-block:: python

   import theano
   x = theano.tensor.matrix()
-   x_specify_shape = theano.tensor.specify_shape(x, (2,2))
-   f = theano.function([x], (x_specify_shape**2).shape)
+   x_specify_shape = theano.tensor.specify_shape(x, (2, 2))
+   f = theano.function([x], (x_specify_shape ** 2).shape)
   theano.printing.debugprint(f)
   # [2 2] [@72791376]

-Future plans
+Future Plans
 ============

- Add the parameter "constant shape" to theano.shared(). This is probably
-  the most frequent use case when we will use it. This will make the code
-  simpler and we will be able to check that the shape does not change when
-  we update the shared variable.
+  The parameter "constant shape" will be added to ``theano.shared()``. This is probably
+  the most frequent occurrence with ``shared`` variables. It will make the code
+  simpler and will make it possible to check that the shape does not change when
+  updating the ``shared`` variable.
--- a/doc/tutorial/sparse.txt
+++ b/doc/tutorial/sparse.txt
@@ -4,9 +4,6 @@
 Sparse
 ======

-Sparse Matrices
-===============
-
 In general, *sparse* matrices provide the same functionality as regular
 matrices. The difference lies in the way the elements of *sparse* matrices are 
 represented and stored in memory. Only the non-zero elements of the latter are stored.

--- a/doc/tutorial/symbolic_graphs.txt
+++ b/doc/tutorial/symbolic_graphs.txt
@@ -5,27 +5,31 @@
 Graph Structures
 ================

+
+Theano Graphs
+=============
+
 Debugging or profiling code written in Theano is not that simple if you
 do not know what goes on under the hood. This chapter is meant to
-introduce you to a required minimum of the inner workings of Theano, 
-for more details see :ref:`extending`.
+introduce you to a required minimum of the inner workings of Theano.  
+For more detail see :ref:`extending`.

 The first step in writing Theano code is to write down all mathematical 
 relations using symbolic placeholders (**variables**). When writing down 
 these expressions you use operations like ``+``, ``-``, ``**``,
 ``sum()``, ``tanh()``. All these are represented internally as **ops**. 
-An **op** represents a certain computation on some type of inputs
-producing some type of output. You can see it as a function definition
+An *op* represents a certain computation on some type of inputs
+producing some type of output. You can see it as a *function definition*
 in most programming languages. 

 Theano builds internally a graph structure composed of interconnected 
 **variable** nodes, **op** nodes and **apply** nodes. An 
-**apply** node represents the application of an **op** to some 
-**variables**. It is important to make the difference between the
-definition of a computation represented by an **op** and its application
-to some actual data which is represented by the **apply** node. For more
-details about these building blocks see :ref:`variable`, :ref:`op`, 
-:ref:`apply`. A graph example is the following:
+*apply* node represents the application of an *op* to some 
+*variables*. It is important to draw the difference between the
+definition of a computation represented by an *op* and its application
+to some actual data which is represented by the *apply* node. For more
+detail about these building blocks refer to :ref:`variable`, :ref:`op`, 
+:ref:`apply`. Here is an example of a graph:


 **Code**
@@ -50,9 +54,9 @@ details about these building blocks see :ref:`variable`, :ref:`op`,
    WARNING: hyper-links and ref's seem to break the PDF build when placed
    into this figure caption.

-Arrows in this :ref:`figure <tutorial-graphfigure>` represent references to the 
+Arrows in this figure represent references to the 
 Python objects pointed at. The blue
-box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green
+box is an :ref:`Apply` node. Red boxes are :ref:`Variable` nodes. Green
 circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.


@@ -63,17 +67,17 @@ Take for example the following code:
 .. code-block:: python

    x = T.dmatrix('x')
-    y = x*2.
+    y = x * 2.

-If you print `type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, 
+If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, 
 which is the apply node that connects the op and the inputs to get this
 output. You can now print the name of the op that is applied to get 
-``y``:
+*y*:

 >>> y.owner.op.name
 'Elemwise{mul,no_inplace}'

-So a elementwise multiplication is used to compute ``y``. This
+Hence, an elementwise multiplication is used to compute *y*. This
 multiplication is done between the inputs:

 >>> len(y.owner.inputs)
@@ -85,7 +89,7 @@ InplaceDimShuffle{x,x}.0

 Note that the second input is not 2 as we would have expected. This is 
 because 2 was first :term:`broadcasted <broadcasting>` to a matrix of 
-same shape as x. This is done by using the op ``DimShuffle`` :
+same shape as *x*. This is done by using the op ``DimShuffle`` :

 >>> type(y.owner.inputs[1])
 <class 'theano.tensor.basic.TensorVariable'>
@@ -97,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` :
 [2.0]


-Starting from this graph structure it is easy to understand how 
-*automatic differentiation* is done, or how the symbolic relations
-can be optimized for performance or stability.
+Starting from this graph structure it is easier to understand how 
+*automatic differentiation* proceeds and how the symbolic relations
+can be *optimized* for performance or stability.  


 Automatic Differentiation
@@ -107,16 +111,19 @@ Automatic Differentiation

 Having the graph structure, computing automatic differentiation is
 simple. The only thing :func:`tensor.grad` has to do is to traverse the
-graph from the outputs back towards the inputs through all :ref:`apply`
-nodes (:ref:`apply` nodes are those that define which computations the
-graph does). For each such :ref:`apply` node, its  :ref:`op` defines 
-how to compute the gradient of the node's outputs with respect to its
-inputs. Note that if an :ref:`op` does not provide this information, 
-it is assumed that the gradient is not defined.
+graph from the outputs back towards the inputs through all *apply*
+nodes (*apply* nodes are those that define which computations the
+graph does). For each such *apply* node, its *op* defines 
+how to compute the *gradient* of the node's outputs with respect to its
+inputs. Note that if an *op* does not provide this information, 
+it is assumed that the *gradient* is not defined.
 Using the 
 `chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_ 
 these gradients can be composed in order to obtain the expression of the 
-gradient of the graph's output with respect to the graph's inputs .
+*gradient* of the graph's output with respect to the graph's inputs .
+
+A following section of this tutorial will examine the topic of :ref:`differentiation<tutcomputinggrads>`
+in greater detail.


 Optimizations
@@ -124,7 +131,7 @@ Optimizations

 When compiling a Theano function, what you give to the
 :func:`theano.function <function.function>` is actually a graph
-(starting from the outputs variables you can traverse the graph up to
+(starting from the output variables you can traverse the graph up to
 the input variables). While this graph structure shows how to compute
 the output from the input, it also offers the possibility to improve the  
 way this computation is carried out. The way optimizations work in 
@@ -135,4 +142,27 @@ identical subgraphs and ensure that the same values are not computed
 twice or reformulate parts of the graph to a GPU specific version.

 For example, one (simple) optimization that Theano uses is to replace 
-the pattern :math:`\frac{xy}{y}` by :math:`x`.
+the pattern :math:`\frac{xy}{y}` by *x.*
+
+Further information regarding the optimization
+:ref:`process<optimization>` and the specific :ref:`optimizations<optimizations>` that are applicable
+is respectively available in the library and on the entrance page of the documentation.  
+
+
+**Example**
+
+Symbolic programming involves a change of paradigm: it will become clearer
+as we apply it. Consider the following example of optimization:
+
+>>> import theano
+>>> a = theano.tensor.vector("a")      # declare symbolic variable
+>>> b = a + a ** 10                    # build symbolic expression
+>>> f = theano.function([a], b)        # compile function
+>>> print f([0, 1, 2])                 # prints `array([0,2,1026])`
+
+
+======================================================  =====================================================
+        Unoptimized graph                                    Optimized graph
+======================================================  =====================================================
+.. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png   .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
+======================================================  =====================================================
--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt
--- a/doc/tutorial/using_gpu_solution_1.py
+++ b/doc/tutorial/using_gpu_solution_1.py
--- a/theano/printing.py
+++ b/theano/printing.py
--- a/theano/scan_module/scan.py
+++ b/theano/scan_module/scan.py
--- a/theano/tests/test_tutorial.py
+++ b/theano/tests/test_tutorial.py