Correct Theano's tutorial: one more round of corrections

edfd9f24 · Eric Larsen · Frederic · dba02a39 · edfd9f24 · edfd9f24
--- a/doc/tutorial/adding.txt
+++ b/doc/tutorial/adding.txt
@@ -33,12 +33,12 @@ Let's break this down into several steps. The first step is to define
 two symbols (*Variables*) representing the quantities that you want
 to add. Note that from now on, we will use the term 
 *Variable* to mean "symbol" (in other words, 
-``x``, ``y``, ``z`` are all *Variable* objects). The output of the function 
+*x*, *y*, *z* are all *Variable* objects). The output of the function 
-``f`` is a ``numpy.ndarray`` with zero dimensions.
+*f* is a ``numpy.ndarray`` with zero dimensions.
 If you are following along and typing into an interpreter, you may have
 noticed that there was a slight delay in executing the ``function``
-instruction. Behind the scene, ``f`` was being compiled into C code.
+instruction. Behind the scene, *f* was being compiled into C code.
 .. note:
@@ -51,9 +51,9 @@ instruction. Behind the scene, ``f`` was being compiled into C code.
  >>> x = theano.tensor.ivector()
  >>> y = -x
-  ``x`` and ``y`` are both Variables, i.e. instances of the
+  *x* and *y* are both Variables, i.e. instances of the
  ``theano.gof.graph.Variable`` class. The
-  type of both ``x`` and ``y`` is ``theano.tensor.ivector``.
+  type of both *x* and *y* is ``theano.tensor.ivector``.
 **Step 1**
@@ -65,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar``
 is the type we assign to "0-dimensional arrays (`scalar`) of doubles
 (`d`)". It is a Theano :ref:`type`.
-``dscalar`` is not a class. Therefore, neither ``x`` nor ``y``
+``dscalar`` is not a class. Therefore, neither *x* nor *y*
 are actually instances of ``dscalar``. They are instances of
-:class:`TensorVariable`. ``x`` and ``y``
+:class:`TensorVariable`. *x* and *y*
 are, however, assigned the theano Type ``dscalar`` in their ``type``
 field, as you can see here:
@@ -91,13 +91,13 @@ could also learn more by looking into :ref:`graphstructures`.
 **Step 2**
-The second step is to combine ``x`` and ``y`` into their sum ``z``:
+The second step is to combine *x* and *y* into their sum *z*:
 >>> z = x + y
-``z`` is yet another *Variable* which represents the addition of
+*z* is yet another *Variable* which represents the addition of
-``x`` and ``y``. You can use the :ref:`pp <libdoc_printing>`
+*x* and *y*. You can use the :ref:`pp <libdoc_printing>`
-function to pretty-print out the computation associated to ``z``.
+function to pretty-print out the computation associated to *z*.
 >>> print pp(z)
 (x + y)
@@ -105,15 +105,15 @@ function to pretty-print out the computation associated to ``z``.
 **Step 3**
-The last step is to create a function taking ``x`` and ``y`` as inputs
+The last step is to create a function taking *x* and *y* as inputs
-and giving ``z`` as output:
+and giving *z* as output:
 >>> f = function([x, y], z)
 The first argument to :func:`function <function.function>` is a list of Variables
 that will be provided as inputs to the function. The second argument
 is a single Variable *or* a list of Variables. For either case, the second
-argument is what we want to see as output when we apply the function. ``f`` may
+argument is what we want to see as output when we apply the function. *f* may
 then be used like a normal Python function.
@@ -121,8 +121,8 @@ Adding two Matrices
 ===================
 You might already have guessed how to do this. Indeed, the only change
-from the previous example is that you need to instantiate ``x`` and
+from the previous example is that you need to instantiate *x* and
-``y`` using the matrix Types:
+*y* using the matrix Types:
 .. If you modify this code, also change :
 .. theano/tests/test_tutorial.py:T_adding.test_adding_2
@@ -153,12 +153,12 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`.
 The following types are available:
-* **byte**: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
+* **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``
-* **32-bit integers**: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
+* **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``
-* **64-bit integers**: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
+* **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``
-* **float**: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
+* **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``
-* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
+* **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``
-* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4
+* **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``
 The previous list is not exhaustive and a guide to all types compatible
 with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.

--- a/doc/tutorial/aliasing.txt
+++ b/doc/tutorial/aliasing.txt
--- a/doc/tutorial/conditions.txt
+++ b/doc/tutorial/conditions.txt
@@ -8,11 +8,11 @@ IfElse vs Switch
 ================
- Both Ops build a condition over symbolic variables.
+- Both ops build a condition over symbolic variables.
- ``IfElse`` takes a `boolean` condition and two variables as inputs.
+- ``IfElse`` takes a *boolean* condition and two variables as inputs.
- ``Switch`` takes a `tensor` as condition and two variables as inputs.
+- ``Switch`` takes a *tensor* as condition and two variables as inputs.
  ``switch`` is an elementwise operation and is thus more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables, ``ifelse`` is lazy and only
+- Whereas ``switch`` evaluates both *output* variables, ``ifelse`` is lazy and only
  evaluates one variable with respect to the condition.
 **Example**
@@ -52,7 +52,7 @@ IfElse vs Switch
      f_lazyifelse(val1, val2, big_mat1, big_mat2)
  print 'time spent evaluating one value %f sec'%(time.clock()-tic)
-In this example, the ``IfElse`` Op spends less time (about half as much) than ``Switch``
+In this example, the ``IfElse`` op spends less time (about half as much) than ``Switch``
 since it computes only one variable out of the two.
 .. code-block:: python
@@ -64,7 +64,7 @@ since it computes only one variable out of the two.
 Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
 variables and take the same computation time as ``switch``. Although the linker
-is not currently set by default to 'cvm', it will be in the near future.
+is not currently set by default to ``cvm``, it will be in the near future.
 There is no automatic optimization replacing a ``switch`` with a
 broadcasted scalar to an ``ifelse``, as this is not always faster. See

--- a/doc/tutorial/debug_faq.txt
+++ b/doc/tutorial/debug_faq.txt
--- a/doc/tutorial/examples.txt
+++ b/doc/tutorial/examples.txt
@@ -74,7 +74,7 @@ Computing More than one Thing at the Same Time
 Theano supports functions with multiple outputs. For example, we can
 compute the :ref:`elementwise <libdoc_tensor_elementwise>` difference, absolute difference, and
-squared difference between two matrices ``a`` and ``b`` at the same time:
+squared difference between two matrices *a* and *b* at the same time:
 .. If you modify this code, also change :
 .. theano/tests/test_tutorial.py:T_examples.test_examples_3
@@ -123,7 +123,7 @@ array(35.0)
 This makes use of the :ref:`Param <function_inputs>` class which allows
 you to specify properties of your function's parameters with greater detail. Here we
-give a default value of 1 for ``y`` by creating a ``Param`` instance with
+give a default value of 1 for *y* by creating a ``Param`` instance with
 its ``default`` field set to 1.
 Inputs with default values must follow inputs without default
@@ -149,7 +149,7 @@ array(34.0)
 array(33.0)
 .. note::
-   ``Param`` does not know the name of the local variables ``y`` and ``w``
+   ``Param`` does not know the name of the local variables *y* and *w*
   that are passed as arguments.  The symbolic variable objects have name
   attributes (set by ``dscalars`` in the example above) and *these* are the
   names of the keyword parameters in the functions that we build.  This is
@@ -171,7 +171,7 @@ example, let's say we want to make an accumulator: at the beginning,
 the state is initialized to zero. Then, on each function call, the state
 is incremented by the function's argument.
-First let's define the ``accumulator`` function. It adds its argument to the
+First let's define the *accumulator* function. It adds its argument to the
 internal state, and returns the old state value.
 .. If you modify this code, also change :
@@ -187,13 +187,13 @@ so-called :ref:`shared variables<libdoc_compile_shared>`.
 These are hybrid symbolic and non-symbolic variables whose value may be shared 
 between multiple functions.  Shared variables can be used in symbolic expressions just like
 the objects returned by ``dmatrices(...)`` but they also have an internal
-value, that defines the value taken by this symbolic variable in *all* the
+value that defines the value taken by this symbolic variable in *all* the
 functions that use it.  It is called a *shared* variable because its value is
 shared between many functions.  The value can be accessed and modified by the
 ``.get_value()`` and ``.set_value()`` methods. We will come back to this soon.
-The other new thing in this code is the ``updates`` parameter of function.
+The other new thing in this code is the ``updates`` parameter of ``function``.
-The updates is a list of pairs of the form (shared-variable, new expression).
+``updates`` must be supplied with a list of pairs of the form (shared-variable, new expression).
 It can also be a dictionary whose keys are shared-variables and values are
 the new expressions.  Either way, it means "whenever this function runs, it
 will replace the ``.value`` of each shared variable with the result of the
@@ -241,9 +241,9 @@ achieve a similar result by returning the new expressions, and working with
 them in NumPy as usual.  The updates mechanism can be a syntactic convenience,
 but it is mainly there for efficiency.  Updates to shared variables can
 sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
-updates).  Also, theano has more control over where and how shared variables are
+updates).  Also, Theano has more control over where and how shared variables are
 allocated, which is one of the important elements of getting good performance
-on the GPU.
+on the :ref:`GPU<using_gpu>`.
 It may happen that you expressed some formula using a shared variable, but
 you do *not* want to use its value. In this case, you can use the
@@ -326,16 +326,16 @@ so we get different random numbers every time.
 >>> f_val1 = f()  #different numbers from f_val0
 When we add the extra argument ``no_default_updates=True`` to
-``function`` (as in ``g``), then the random number generator state is
+``function`` (as in *g*), then the random number generator state is
 not affected by calling the returned function.  So, for example, calling
-``g`` multiple times will return the same numbers.
+*g* multiple times will return the same numbers.
 >>> g_val0 = g()  # different numbers from f_val0 and f_val1
 >>> g_val1 = g()  # same numbers as g_val0!
 An important remark is that a random variable is drawn at most once during any
-single function execution.  So the ``nearly_zeros`` function is guaranteed to
+single function execution.  So the *nearly_zeros* function is guaranteed to
-return approximately 0 (except for rounding error) even though the ``rv_u``
+return approximately 0 (except for rounding error) even though the *rv_u*
 random variable appears three times in the output expression.
 >>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
@@ -363,8 +363,8 @@ Sharing Streams Between Functions
 ---------------------------------
 As usual for shared variables, the random number generators used for random
-variables are common between functions.  So our ``nearly_zeros`` function will
+variables are common between functions.  So our *nearly_zeros* function will
-update the state of the generators used in function ``f`` above.
+update the state of the generators used in function *f* above.
 For example:
@@ -416,8 +416,9 @@ The preceding elements are featured in this more realistic example.  It will be
  prediction = p_1 > 0.5                    # The prediction thresholded
  xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function
  cost = xent.mean() + 0.01*(w**2).sum()    # The cost to minimize
-  gw,gb = T.grad(cost, [w,b])		  # Compute the gradient of the cost:
+  gw,gb = T.grad(cost, [w,b])		  # Compute the gradient of the cost
-					  # we shall return to this
+					  # (we shall return to this in a 
+					  # following section of this tutorial)
  # Compile
  train = theano.function(

--- a/doc/tutorial/extending_theano.txt
+++ b/doc/tutorial/extending_theano.txt
@@ -8,12 +8,12 @@ Extending Theano
 Theano Graphs
 -------------
- Theano works with symbolic graphs
+- Theano works with symbolic graphs.
- Those graphs are bi-partite graphs (graph with 2 types of nodes)
+- Those graphs are bi-partite graphs (graph with 2 types of nodes).
- The 2 types of nodes are Apply and Variable nodes
+- The 2 types of nodes are Apply and Variable nodes.
- Each Apply node has a link to the Op that it executes
+- Each Apply node has a link to the op that it executes.
-Inputs and Outputs are lists of Theano variables
+Inputs and Outputs are lists of Theano variables.
 .. image:: ../hpcs2011_tutorial/pics/apply_node.png
    :width: 500 px
@@ -93,12 +93,12 @@ The first one is :func:`make_node`. The second one
 would describe the computations that are required to be done
 at run time. Currently there are 2 different possibilites:
 implement the :func:`perform`
-and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods
+and/or :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
-<cop>`), or the :func:`make_thunk` method. The ``perform`` allows
+<cop>`), or the :func:`make_thunk` method. ``perform`` allows
-to easily wrap an existing Python function into Theano. The ``c_code``
+to easily wrap an existing Python function into Theano. ``c_code``
 and related methods allow the op to generate C code that will be 
-compiled and linked by Theano. On the other hand, the ``make_thunk``
+compiled and linked by Theano. On the other hand, ``make_thunk``
-method will be called only once during compilation and should generate
+will be called only once during compilation and should generate
 a ``thunk``: a standalone function that when called will do the wanted computations.
 This is useful if you want to generate code and compile it yourself. For
 example, this allows you to use PyCUDA to compile GPU code.
@@ -117,7 +117,7 @@ The :func:`grad` method is required if you want to differentiate some cost whose
 includes your op.
 The :func:`__str__` method is useful in order to provide a more meaningful
-string representation of your Op.
+string representation of your op.
 The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
 work with your op.
@@ -185,9 +185,9 @@ in a file and execute it with the ``nosetests`` program.
 **Basic Tests**
-Basic tests are done by you just by using the Op and checking that it
+Basic tests are done by you just by using the op and checking that it
 returns the right answer. If you detect an error, you must raise an
-exception. You can use the `assert` keyword to automatically raise an
+*exception*. You can use the ``assert`` keyword to automatically raise an
 ``AssertionError``.
 .. code-block:: python
@@ -211,10 +211,10 @@ exception. You can use the `assert` keyword to automatically raise an
 **Testing the infer_shape**
 When a class inherits from the ``InferShapeTester`` class, it gets the
-``self._compile_and_check`` method that tests the Op ``infer_shape``
+``self._compile_and_check`` method that tests the op's ``infer_shape``
-method. It tests that the Op gets optimized out of the graph if only
+method. It tests that the op gets optimized out of the graph if only
 the shape of the output is needed and not the output
-itself. Additionally, it checks that such an optimized graph computes
+itself. Additionally, it checks that the optimized graph computes
 the correct shape, by comparing it to the actual shape of the computed
 output.
@@ -222,8 +222,8 @@ output.
 parameters the lists of input and output Theano variables, as would be
 provided to ``theano.function``, and a list of real values to pass to the
 compiled function (don't use shapes that are symmetric, e.g. (3, 3),
-as they can easily to hide errors). It also takes the Op class to
+as they can easily to hide errors). It also takes the op class as a parameter to
-verify that no Ops of that type appear in the shape-optimized graph.
+verify that no instance of it appears in the shape-optimized graph.
 If there is an error, the function raises an exception. If you want to
 see it fail, you can implement an incorrect ``infer_shape``.
@@ -248,7 +248,7 @@ see it fail, you can implement an incorrect ``infer_shape``.
 **Testing the gradient**
 The function :ref:`verify_grad <validating_grad>`
-verifies the gradient of an Op or Theano graph. It compares the
+verifies the gradient of an op or Theano graph. It compares the
 analytic (symbolically computed) gradient and the numeric
 gradient (computed through the Finite Difference Method).
@@ -266,13 +266,12 @@ the multiplication by 2).
 .. TODO: repair defective links in the following paragraph
-The class :class:`RopLop_checker`, give the functions
+The class :class:`RopLop_checker` defines the functions
-:func:`RopLop_checker.check_mat_rop_lop`,
+:func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_rop_lop` and
-:func:`RopLop_checker.check_rop_lop` and
+:func:`RopLop_checker.check_nondiff_rop`. These allow to test the
-:func:`RopLop_checker.check_nondiff_rop` that allow to test the
+implementation of the Rop method of a particular op.
-implementation of the Rop method of one Op.
-To verify the Rop method of the DoubleOp, you can use this:
+For instance, to verify the Rop method of the DoubleOp, you can use this:
 .. code-block:: python
@@ -290,7 +289,7 @@ Running your tests
 You can run ``nosetests`` in the Theano folder to run all of Theano's
 tests, including yours if they are somewhere in the directory
-structure.  You can run ``nosetests test_file.py`` to run only the
+structure.  For instance, you can run the following command lines to ``nosetests test_file.py`` to run only the
 tests in that file. You can run ``nosetests
 test_file.py:test_DoubleRop`` to run only the tests inside that test
 class. You can run ``nosetests
@@ -298,7 +297,7 @@ test_file.py:test_DoubleRop.test_double_op`` to run only one
 particular test. More `nosetests
 <http://readthedocs.org/docs/nose/en/latest/>`_ documentation.
-You can also add this at the end of the test file:
+You can also add this block the end of the test file and run the file:
 .. code-block:: python
@@ -311,14 +310,13 @@ You can also add this at the end of the test file:
 **Testing GPU Ops**
-Ops that execute on the GPU should inherit from the
+Ops to be executed on the GPU should inherit from the ``theano.sandbox.cuda.GpuOp`` 
-``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
+and not ``theano.Op``. This allows Theano to distinguish them. Currently, we
-to make the distinction between both. Currently, we use this to test
+use this to test if the NVIDIA driver works correctly with our sum reduction code on the
-if the NVIDIA driver works correctly with our sum reduction code on the
 GPU.
-A more extensive discussion than this section's may be found in the advanced
+A more extensive discussion of this section's topic may be found in the advanced
 tutorial :ref:`Extending Theano<extending>`
 -------------------------------------------

--- a/doc/tutorial/faq.txt
+++ b/doc/tutorial/faq.txt
@@ -8,19 +8,17 @@ Frequently Asked Questions
 TypeError: object of type 'TensorVariable' has no len()
 -------------------------------------------------------
-If you receive this error:
+If you receive the following error, it is because the Python function *__len__* cannot 
+be implemented on Theano variables:
 .. code-block:: python
   TypeError: object of type 'TensorVariable' has no len()
-We can't implement the __len__ function on Theano Variables. This is
+Python requires that *__len__* returns an integer, yet it cannot be done as Theano's symbolic variables. However, `var.shape[0]` can be used as a workaround.
-because Python requires that this function returns an integer, but we
-can't do this as we are working with symbolic variables. You can use
-`var.shape[0]` as a workaround.
-Also we can't change the above error message into a more explicit one
+This error message cannot be made more explicit because the relevant aspects of Python's 
-because of some other Python internal behavior that can't be modified.
+internals cannot be modified.
 Faster gcc optimization

--- a/doc/tutorial/gpu_data_convert.txt
+++ b/doc/tutorial/gpu_data_convert.txt
@@ -9,13 +9,13 @@ PyCUDA
 Currently, PyCUDA and Theano have different objects to store GPU
 data. The two implementations do not support the same set of features.
-Theano's implementation is called CudaNdarray and supports
+Theano's implementation is called *CudaNdarray* and supports
 *strides*. It also only supports the *float32* dtype. PyCUDA's implementation
-is called GPUArray and doesn't support *strides*. However, it can deal with
+is called *GPUArray* and doesn't support *strides*. However, it can deal with
 all NumPy and CUDA dtypes.
-We are currently working on having the same base object that will
+We are currently working on having the same base object for both that will
-mimic Numpy. Until this is ready, here is some information on how to
+also mimic Numpy. Until this is ready, here is some information on how to
 use both objects in the same script.
 Transfer
@@ -24,8 +24,8 @@ Transfer
 You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
 from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
 ``to_gpuarray(x)`` return a new object that occupies the same memory space
-as the original. Otherwise it raises a ValueError. Because GPUArrays don't
+as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't
-support *strides*, if the CudaNdarray is strided, we could copy it to
+support strides, if the CudaNdarray is strided, we could copy it to
 have a non-strided copy. The resulting GPUArray won't share the same
 memory region. If you want this behavior, set ``copyif=True`` in
 ``to_gpuarray``.
@@ -122,13 +122,15 @@ CUDAMat
 There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects. 
 They obey the same principles as Theano's PyCUDA functions and can be found in
-theano.misc.cudamat_utils.py
+``theano.misc.cudamat_utils.py``.
-WARNING: There is a strange problem associated with stride/shape with those converters. 
+.. TODO: this statement is unclear:
-In order to work, the test needs a transpose and reshape...
+WARNING: There is a peculiar problem associated with stride/shape with those converters. 
+In order to work, the test needs a *transpose* and *reshape*...
 Gnumpy
 ======
-There are conversion functions between Gnumpy ``garray`` objects and Theano CudaNdArray objects. 
+There are conversion functions between Gnumpy *garray* objects and Theano CudaNdArray objects. 
-They are also similar to Theano's PyCUDA functions and can be found in theano.misc.gnumpy_utils.py.
+They are also similar to Theano's PyCUDA functions and can be found in ``theano.misc.gnumpy_utils.py``.
--- a/doc/tutorial/gradients.txt
+++ b/doc/tutorial/gradients.txt
@@ -10,12 +10,14 @@ Computing Gradients
 ===================
 Now let's use Theano for a slightly more sophisticated task: create a
-function which computes the derivative of some expression ``y`` with
+function which computes the derivative of some expression *y* with
-respect to its parameter ``x``. To do this we will use the macro ``T.grad``.
+respect to its parameter *x*. To do this we will use the macro ``T.grad``.
 For instance, we can compute the
 gradient of :math:`x^2` with respect to :math:`x`. Note that:
 :math:`d(x^2)/dx = 2 \cdot x`.
+.. TODO: fix the vertical positioning of the expressions in the preceding paragraph
 Here is the code to compute this gradient:
 .. If you modify this code, also change :
@@ -36,7 +38,7 @@ array(188.40000000000001)
 In this example, we can see from ``pp(gy)`` that we are computing
 the correct symbolic gradient.
 ``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as
-``x ** 2`` and fill it with 1.0.
+*x* ** *2* and fill it with *1.0*.
 .. note::
    The optimizer simplifies the symbolic gradient expression.  You can see
@@ -56,7 +58,7 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
 .. figure:: dlogistic.png
-    A plot of the gradient of the logistic function, with x on the x-axis
+    A plot of the gradient of the logistic function, with *x* on the x-axis
    and :math:`ds(x)/dx` on the y-axis.
@@ -71,17 +73,17 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
 array([[ 0.25      ,  0.19661193],
       [ 0.19661193,  0.10499359]])
-In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
+In general, for any **scalar** expression *s*, ``T.grad(s, w)`` provides
 the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In 
 this way Theano can be used for doing **efficient** symbolic differentiation
-(as the expression return by ``T.grad`` will be optimized during compilation), even for
+(as the expression returned by ``T.grad`` will be optimized during compilation), even for
 function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
 of symbolic differentiation).
 .. note::
   The second argument of ``T.grad`` can be a list, in which case the
-   output is also a list. The order in both lists is important, element
+   output is also a list. The order in both lists is important: element
   *i* of the output list is the gradient of the first argument of
   ``T.grad`` with respect to the *i*-th element of the list given as second argument.
   The first argument of ``T.grad`` has to be a scalar (a tensor
@@ -95,14 +97,17 @@ of symbolic differentiation).
 Computing the Jacobian
 ======================
-Theano implements :func:`theano.gradient.jacobian` macro that does all
+In Theano's parlance, the term *Jacobian* designates the tensor comprising the
-what is needed to compute the Jacobian. The following text explains how
+first differences of the output of a function with respect to its inputs.
+(This is a generalization of to the so-called Jacobian matrix in Mathematics.) 
+Theano implements the :func:`theano.gradient.jacobian` macro that does all
+that is needed to compute the Jacobian. The following text explains how
 to do it manually.
-In order to manually compute the Jacobian of some function ``y`` with
+In order to manually compute the Jacobian of some function *y* with
-respect to some parameter ``x`` we need to use ``scan``. What we
+respect to some parameter *x* we need to use ``scan``. What we
-do is to loop over the entries in ``y`` and compute the gradient of
+do is to loop over the entries in *y* and compute the gradient of
-``y[i]`` with respect to ``x``.
+*y[i]* with respect to *x*.
 .. note::
@@ -110,7 +115,7 @@ do is to loop over the entries in ``y`` and compute the gradient of
    manner all kinds of recurrent equations. While creating
    symbolic loops (and optimizing them for performance) is a hard task,
    effort is being done for improving the performance of ``scan``. We 
-    shall return to ``scan`` in a moment.
+    shall return to ``scan`` later in this tutorial.
 >>> x = T.dvector('x')
 >>> y = x**2
@@ -120,31 +125,33 @@ do is to loop over the entries in ``y`` and compute the gradient of
 array([[ 8.,  0.],
       [ 0.,  8.]])
-What we do in this code is to generate a sequence of ints from ``0`` to
+What we do in this code is to generate a sequence of *ints* from *0* to
 ``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
-at each step, we compute the gradient of element ``y[[i]`` with respect to
+at each step, we compute the gradient of element *y[i]* with respect to
-``x``. ``scan`` automatically concatenates all these rows, generating a
+*x*. ``scan`` automatically concatenates all these rows, generating a
 matrix which corresponds to the Jacobian.
 .. note::
    There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you
-    cannot re-write the above expression of the jacobian as
+    cannot re-write the above expression of the Jacobian as
    ``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
    non_sequences=x)``, even though from the documentation of scan this
-    seems possible. The reason is that ``y_i`` will not be a function of
+    seems possible. The reason is that *y_i* will not be a function of
-    ``x`` anymore, while ``y[i]`` still is. 
+    *x* anymore, while *y[i]* still is. 
 Computing the Hessian
 =====================
-Theano implements :func:`theano.gradient.hessian` macro that does all
+In Theano, the term *Hessian* has the usual mathematical acception: It is the 
+matrix comprising the second order partial derivative of a function with scalar
+output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all
 that is needed to compute the Hessian. The following text explains how
 to do it manually.
 You can compute the Hessian manually similarly to the Jacobian. The only
 difference is that now, instead of computing the Jacobian of some expression
-``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some
+*y*, we compute the Jacobian of ``T.grad(cost,x)``, where *cost* is some
 scalar. 
@@ -181,12 +188,12 @@ R-operator
 The *R operator* is built to evaluate the product between a Jacobian and a
 vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
-can be extended even for `x` being a matrix, or a tensor in general, case in
+can be extended even for *x* being a matrix, or a tensor in general, case in
 which also the Jacobian becomes a tensor and the product becomes some kind
 of tensor product. Because in practice we end up needing to compute such
 expressions in terms of weight matrices, Theano supports this more generic
 form of the operation. In order to evaluate the *R-operation* of
-expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
+expression *y*, with respect to *x*, multiplying the Jacobian with *v*
 you need to do something similar to this:
@@ -221,19 +228,19 @@ array([[ 0.,  0.],
 .. note::
-    `v`, the point of evaluation, differs between the *L-operator* and the *R-operator*.
+    `v`, the *point of evaluation*, differs between the *L-operator* and the *R-operator*.
    For the *L-operator*, the point of evaluation needs to have the same shape
    as the output, whereas for the *R-operator* this point should
    have the same shape as the input parameter. Furthermore, the results of these two
    operations differ. The result of the *L-operator* is of the same shape
    as the input parameter, while the result of the *R-operator* has a shape similar
-    to the output.
+    to that of the output.
 Hessian times a Vector
 ======================
-If you need to compute the Hessian times a vector, you can make use of the
+If you need to compute the *Hessian times a vector*, you can make use of the
 above-defined operators to do it more efficiently than actually computing
 the exact Hessian and then performing the product. Due to the symmetry of the 
 Hessian matrix, you have two options that will
@@ -267,7 +274,7 @@ Final Pointers
 ==============
-* The ``grad`` function works symbolically: it receives and returns a Theano variables.
+* The ``grad`` function works symbolically: it receives and returns Theano variables.
 * ``grad`` can be compared to a macro since it can be applied repeatedly.
@@ -276,5 +283,5 @@ Final Pointers
 * Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*.
 * Work is in progress on the optimizations required to compute efficiently the full
-  Jacobian and Hessian matrices and the *Jacobian times vector* expression.
+  Jacobian and the Hessian matrix as well as the *Jacobian times vector*.
--- a/doc/tutorial/loading_and_saving.txt
+++ b/doc/tutorial/loading_and_saving.txt
@@ -6,8 +6,8 @@ Loading and Saving
 ==================
 Python's standard way of saving class instances and reloading them
-is the pickle_ mechanism. Many Theano objects can be serialized (and
+is the pickle_ mechanism. Many Theano objects can be *serialized* (and
-deserialized) by ``pickle``, however, a limitation of ``pickle`` is that
+*deserialized*) by ``pickle``, however, a limitation of ``pickle`` is that
 it does not save the code or data of a class along with the instance of
 the class being serialized. As a result, reloading objects created by a
 previous version of a class can be really problematic.
@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you
 don't.
 For instance, if the only parameters you want to save are a weight
-matrix ``W`` and a bias ``b``, you can define:
+matrix *W* and a bias *b*, you can define:
 .. code-block:: python
@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define:
        self.W = W
        self.b = b
-If at some point in time ``W`` is renamed to ``weights`` and ``b`` to
+If at some point in time *W* is renamed to *weights* and *b* to
-``bias``, the older pickled files will still be usable, if you update these
+*bias*, the older pickled files will still be usable, if you update these
 functions to reflect the change in name:
 .. code-block:: python
@@ -152,6 +152,6 @@ functions to reflect the change in name:
        self.weights = W
        self.bias = b
-For more information on advanced use of pickle and its internals, see Python's
+For more information on advanced use of ``pickle`` and its internals, see Python's
 pickle_ documentation.
--- a/doc/tutorial/loop.txt
+++ b/doc/tutorial/loop.txt
@@ -9,10 +9,10 @@ Scan
 ====
 - A general form of *recurrence*, which can be used for looping.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of scan.
+- *Reduction* and *map* (loop over the leading dimensions) are special cases of ``scan``.
- You 'scan' a function along some input sequence, producing an output at each time-step.
+- You ``scan`` a function along some input sequence, producing an output at each time-step.
 - The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
+- ``sum()`` could be computed by scanning the *z + x(i)* function over a list, given an initial state of *z=0*.
 - Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
 - Advantages of using ``scan`` over *for* loops:
@@ -30,6 +30,7 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
  import theano
  import theano.tensor as T
+  theano.config.warn.subtensor_merge_bug = False
  k = T.iscalar("k"); A = T.vector("A")
@@ -54,8 +55,10 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
 .. code-block:: python
+  import numpy
  import theano
  import theano.tensor as T
+  theano.config.warn.subtensor_merge_bug = False	
  coefficients = theano.tensor.vector("coefficients")
  x = T.scalar("x"); max_coefficients_supported = 10000

--- a/doc/tutorial/modes.txt
+++ b/doc/tutorial/modes.txt
@@ -9,14 +9,14 @@ Configuration Settings and Compiling Modes
 Configuration
 =============
-The ``config`` module contains several ``attributes`` that modify Theano's behavior.  Many of these
+The ``config`` module contains several *attributes* that modify Theano's behavior.  Many of these
 attributes are examined during the import of the ``theano`` module and several are assumed to be
 read-only.
 *As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
 Theano's code comes with default values for these attributes, but you can
-override them from your .theanorc file, and override those values in turn by
+override them from your ``.theanorc`` file, and override those values in turn by
 the :envvar:`THEANO_FLAGS` environment variable.
 The order of precedence is:
@@ -110,6 +110,8 @@ time the execution using the command line ``time python file.py``.
 .. TODO: To be resolved:
+.. Solution said:
 .. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
 .. Why the latter portion?
@@ -119,10 +121,10 @@ time the execution using the command line ``time python file.py``.
   * Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
   * Cast inputs before storing them into a shared variable.
-   * Circumvent the automatic cast of int32 with float32 to float64:
+   * Circumvent the automatic cast of *int32* with *float32* to *float64*:
-     * Insert manual cast in your code or use [u]int{8,16}.
+     * Insert manual cast in your code or use *[u]int{8,16}*.
-     * Insert manual cast around the mean operator (this involves division by length, which is an int64).
+     * Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
     * Notice that a new casting mechanism is being developed.
 -------------------------------------------
@@ -130,7 +132,7 @@ time the execution using the command line ``time python file.py``.
 Mode
 ====
-Everytime :func:`theano.function <function.function>` is called
+Everytime :func:`theano.function <function.function>` is called,
 the symbolic relationships between the input and output Theano *variables*
 are optimized and compiled. The way this compilation occurs
 is controlled by the value of the ``mode`` parameter.
@@ -139,9 +141,9 @@ Theano defines the following modes by name:
 - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
 - ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
+- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python 
-    implementations. This mode can take much longer than the other modes,
+    implementations. This mode can take much longer than the other modes,but can identify
-    but can identify many kinds of problems.
+    several kinds of problems.
 - ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information
 The default mode is typically ``FAST_RUN``, but it can be controlled via
@@ -152,18 +154,18 @@ which can be overridden by passing the keyword argument to
 ================= =============================================================== ===============================================================================
 short name        Full constructor                                                What does it do?
 ================= =============================================================== ===============================================================================
-FAST_COMPILE      ``compile.mode.Mode(linker='py', optimizer='fast_compile')``    Python implementations only, quick and cheap graph transformations
+``FAST_COMPILE``  ``compile.mode.Mode(linker='py', optimizer='fast_compile')``    Python implementations only, quick and cheap graph transformations
-FAST_RUN          ``compile.mode.Mode(linker='c|py', optimizer='fast_run')``      C implementations where available, all available graph transformations.
+``FAST_RUN``      ``compile.mode.Mode(linker='c|py', optimizer='fast_run')``      C implementations where available, all available graph transformations.
-DEBUG_MODE        ``compile.debugmode.DebugMode()``                               Both implementations where available, all available graph transformations.
+``DEBUG_MODE``    ``compile.debugmode.DebugMode()``                               Both implementations where available, all available graph transformations.
-PROFILE_MODE      ``compile.profilemode.ProfileMode()``                           C implementations where available, all available graph transformations, print profile information.
+``PROFILE_MODE``  ``compile.profilemode.ProfileMode()``                           C implementations where available, all available graph transformations, print profile information.
 ================= =============================================================== ===============================================================================
 Linkers
 =======
 A mode is composed of 2 things: an optimizer and a linker. Some modes,
-like PROFILE_MODE and DEBUG_MODE, add logic around the optimizer and
+like ``PROFILE_MODE`` and ``DEBUG_MODE``, add logic around the optimizer and
-linker. PROFILE_MODE and DEBUG_MODE use their own linker.
+linker. ``PROFILE_MODE`` and ``DEBUG_MODE`` use their own linker.
 You can select witch linker to use with the Theano flag :attr:`config.linker`.
 Here is a table to compare the different linkers.
@@ -184,8 +186,8 @@ DebugMode      no         yes                VERY HIGH  Make many checks on what
 .. [#gc] Garbage collection of intermediate results during computation.
         Otherwise, their memory space used by the ops is kept between
         Theano function calls, in order not to
-         reallocate memory, and lower the overhead (make it faster...)
+         reallocate memory, and lower the overhead (make it faster...).
-.. [#cpy1] default
+.. [#cpy1] Default
 .. [#cpy2] Deprecated
@@ -201,10 +203,10 @@ While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode,
 it is useful at first (especially when you are defining new kinds of
 expressions or new optimizations) to run your code using the DebugMode
 (available via ``mode='DEBUG_MODE'``). The DebugMode is designed to
-do several self-checks and assertations that can help to diagnose
+run several self-checks and assertions that can help diagnose
-possible programming errors that can lead to incorect output. Note that
+possible programming errors leading to incorrect output. Note that
-``DEBUG_MODE`` is much slower then ``FAST_RUN`` or ``FAST_COMPILE`` so
+``DEBUG_MODE`` is much slower than ``FAST_RUN`` or ``FAST_COMPILE`` so
-use it only during development (not when you launch 1000 process on a
+use it only during development (not when you launch 1000 processes on a
 cluster!).
@@ -225,14 +227,16 @@ DebugMode is used as follows:
 If any problem is detected, DebugMode will raise an exception according to
-what went wrong, either at call time (``f(5)``) or compile time (
+what went wrong, either at call time (*f(5)*) or compile time (
 ``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions
 should *not* be ignored; talk to your local Theano guru or email the
 users list if you cannot make the exception go away.
 Some kinds of errors can only be detected for certain input value combinations.
-In the example above, there is no way to guarantee that a future call to say,
+In the example above, there is no way to guarantee that a future call to, say
-``f(-1)`` won't cause a problem.  DebugMode is not a silver bullet.
+*f(-1)*, won't cause a problem.  DebugMode is not a silver bullet.
+.. TODO: repair the following link
 If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
 rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
@@ -277,7 +281,7 @@ implementation only, should use the gof.PerformLinker (or "py" for
 short). On the other hand, a user wanting to profile his graph using C
 implementations wherever possible should use the ``gof.OpWiseCLinker``
 (or "c|py"). For testing the speed of your code we would recommend
-using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker.
+using the ``fast_run`` optimizer and the ``gof.OpWiseCLinker`` linker.
 Compiling your Graph with ProfileMode
 -------------------------------------
@@ -300,7 +304,7 @@ the desired timing information, indicating where your graph is spending most
 of its time. This is best shown through an example. Let's use our logistic
 regression example.
-Compiling the module with ProfileMode and calling ``profmode.print_summary()``
+Compiling the module with ``ProfileMode`` and calling ``profmode.print_summary()``
 generates the following output:
 .. code-block:: python
@@ -352,14 +356,14 @@ generates the following output:
 This output has two components. In the first section called
 *Apply-wise summary*, timing information is provided for the worst
-offending Apply nodes. This corresponds to individual Op applications
+offending ``Apply`` nodes. This corresponds to individual op applications
 within your graph which took longest to execute (so if you use
 ``dot`` twice, you will see two entries there). In the second portion,
-the *Op-wise summary*, the execution time of all Apply nodes executing
+the *Op-wise summary*, the execution time of all ``Apply`` nodes executing
-the same Op are grouped together and the total execution time per Op
+the same op are grouped together and the total execution time per op
 is shown (so if you use ``dot`` twice, you will see only one entry
 there corresponding to the sum of the time spent in each of them).
-Finally, notice that the ProfileMode also shows which Ops were running a C
+Finally, notice that the ``ProfileMode`` also shows which ops were running a C
 implementation.

--- a/doc/tutorial/printing_drawing.txt
+++ b/doc/tutorial/printing_drawing.txt
+.. _tutorial_printing_drawing:
+==============================
+Printing/Drawing Theano graphs
+==============================
+.. TODO: repair the defective links in the next paragraph
+Theano provides two functions (:func:`theano.pp` and
+:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
+compilation.  These two functions print expression graphs in different ways:
+:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
+Theano also provides :func:`pydotprint` that creates a *png* image of the function.
+You can read about them in :ref:`libdoc_printing`.
+Consider again the logistic regression but notice the additional printing instuctions. 
+The following output depicts the pre- and post- compilation graphs.
+.. code-block:: python
+    import numpy
+    import theano
+    import theano.tensor as T
+    rng = numpy.random
+    N = 400
+    feats = 784
+    D = (rng.randn(N, feats).astype(theano.config.floatX),
+    rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
+    training_steps = 10000
+    # Declare Theano symbolic variables
+    x = T.matrix("x")
+    y = T.vector("y")
+    w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
+    b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
+    x.tag.test_value = D[0]
+    y.tag.test_value = D[1]
+    #print "Initial model:"
+    #print w.get_value(), b.get_value()
+    # Construct Theano expression graph
+    p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
+    prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
+    xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
+    cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
+    gw,gb = T.grad(cost, [w,b])
+    # Compile expressions to functions
+    train = theano.function(
+                inputs=[x,y],
+                outputs=[prediction, xent],
+                updates={w:w-0.01*gw, b:b-0.01*gb},
+                name = "train")
+    predict = theano.function(inputs=[x], outputs=prediction,
+                name = "predict")
+    if any( [x.op.__class__.__name__=='Gemv' for x in
+    train.maker.fgraph.toposort()]):
+        print 'Used the cpu'
+    elif any( [x.op.__class__.__name__=='GpuGemm' for x in
+    train.maker.fgraph.toposort()]):
+        print 'Used the gpu'
+    else:
+        print 'ERROR, not able to tell if theano used the cpu or the gpu'
+        print train.maker.fgraph.toposort()
+    for i in range(training_steps):
+        pred, err = train(D[0], D[1])
+    #print "Final model:"
+    #print w.get_value(), b.get_value()
+    print "target values for D"
+    print D[1]
+    print "prediction on D"
+    print predict(D[0])
+    # Print the picture graphs
+    # after compilation
+    theano.printing.pydotprint(predict,
+                               outfile="pics/logreg_pydotprint_predic.png",
+                               var_with_name_simple=True)
+    # before compilation
+    theano.printing.pydotprint_variables(prediction,
+                               outfile="pics/logreg_pydotprint_prediction.png",
+                               var_with_name_simple=True)
+    theano.printing.pydotprint(train,
+                               outfile="pics/logreg_pydotprint_train.png",
+                               var_with_name_simple=True)
+Pretty Printing
+===============
+``theano.printing.pprint(variable)``
+>>> theano.printing.pprint(prediction)  # (pre-compilation)
+gt((TensorConstant{1} / (TensorConstant{1} + exp(((-(x \\dot w)) - b)))),TensorConstant{0.5})
+Debug Printing
+==============
+``theano.printing.debugprint({fct, variable, list of variables})``
+>>> theano.printing.debugprint(prediction)  # (pre-compilation)
+Elemwise{gt,no_inplace} [@181772236] ''
+ |Elemwise{true_div,no_inplace} [@181746668] ''
+ | |InplaceDimShuffle{x} [@181746412] ''
+ | | |TensorConstant{1} [@181745836]
+ | |Elemwise{add,no_inplace} [@181745644] ''
+ | | |InplaceDimShuffle{x} [@181745420] ''
+ | | | |TensorConstant{1} [@181744844]
+ | | |Elemwise{exp,no_inplace} [@181744652] ''
+ | | | |Elemwise{sub,no_inplace} [@181744012] ''
+ | | | | |Elemwise{neg,no_inplace} [@181730764] ''
+ | | | | | |dot [@181729676] ''
+ | | | | | | |x [@181563948]
+ | | | | | | |w [@181729964]
+ | | | | |InplaceDimShuffle{x} [@181743788] ''
+ | | | | | |b [@181730156]
+ |InplaceDimShuffle{x} [@181771788] ''
+ | |TensorConstant{0.5} [@181771148]
+>>> theano.printing.debugprint(predict)  # (post-compilation)
+Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] ''   2
+ |dot [@183018796] ''   1
+ | |x [@183000780]
+ | |w [@183000812]
+ |InplaceDimShuffle{x} [@183133580] ''   0
+ | |b [@183000876]
+ |TensorConstant{[ 0.5]} [@183084108]
+Picture Printing
+================
+>>> theano.printing.pydotprint_variables(prediction)  # (pre-compilation)
+.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
+   :width: 800 px
+Notice that ``pydotprint()`` requires *Graphviz* and Python's ``pydot``.
+>>> theano.printing.pydotprint(predict)  # (post-compilation)
+.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
+   :width: 800 px
+>>> theano.printing.pydotprint(train) # This is a small train example!
+.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
+   :width: 1500 px
--- a/doc/tutorial/remarks.txt
+++ b/doc/tutorial/remarks.txt
@@ -5,15 +5,19 @@
 Some general Remarks
 =====================
-Theano offers quite a bit of flexibility, but has some limitations too.
+.. TODO: This discussion is awkward. Even with this beneficial reordering (28 July 2012) its purpose and message are unclear. 
-How should you write your algorithm to make the most of what Theano can do?
 Limitations
 -----------
- While- or for-Loops within an expression graph are supported, but only via
+Theano offers a good amount of flexibility, but has some limitations too.
+How then can you write your algorithm to make the most of what Theano can do?
+- *While*- or *for*-Loops within an expression graph are supported, but only via
  the :func:`theano.scan` op (which puts restrictions on how the loop body can
  interact with the rest of the graph).
- Neither ``goto`` nor recursion is supported or planned within expression graphs.
+- Neither *goto* nor *recursion* is supported or planned within expression graphs.
--- a/doc/tutorial/shape_info.txt
+++ b/doc/tutorial/shape_info.txt
@@ -18,7 +18,7 @@ Currently, information regarding shape is used in two ways in Theano:
  `Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_
  method.
-  ex:
+  Example:
  .. code-block:: python
@@ -40,7 +40,7 @@ Shape Inference Problem
 =======================
 Theano propagates information about shape in the graph. Sometimes this
-can lead to errors. For example:
+can lead to errors. Consider this example:
 .. code-block:: python
@@ -90,19 +90,19 @@ example), an inferred shape is computed directly, without executing
 the computation itself (there is no ``join`` in the first output or debugprint).
 This makes the computation of the shape faster, but it can also hide errors. In
-the example, the computation of the shape of the output of ``join`` is done only
+this example, the computation of the shape of the output of ``join`` is done only
 based on the first input Theano variable, which leads to an error.
-This might happen with other ops such as elemwise, dot, ...
+This might happen with other ops such as ``elemwise`` and ``dot``, for example.
 Indeed, to perform some optimizations (for speed or stability, for instance),
 Theano assumes that the computation is correct and consistent
 in the first place, as it does here.
 You can detect those problems by running the code without this
-optimization, with the Theano flag
+optimization, using the Theano flag
 ``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
-same effect by running in the modes FAST_COMPILE (it will not apply this
+same effect by running in the modes ``FAST_COMPILE`` (it will not apply this
-optimization, nor most other optimizations) or DEBUG_MODE (it will test
+optimization, nor most other optimizations) or ``DEBUG_MODE`` (it will test
 before and after all optimizations (much slower)).
@@ -113,15 +113,15 @@ Currently, specifying a shape is not as easy and flexible as we wish and we plan
 upgrade.  Here is the current state of what can be done:
 - You can pass the shape info directly to the ``ConvOp`` created
-  when calling conv2d. You simply add the parameters image_shape
+  when calling ``conv2d``. You simply set the parameters ``image_shape``
-  and filter_shape to the call. They must be tuples of 4
+  and ``filter_shape`` inside the call. They must be tuples of 4
  elements. For example:
 .. code-block:: python
    theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape information anywhere in the
+- You can use the ``SpecifyShape`` op to add shape information anywhere in the
  graph. This allows to perform some optimizations. In the following example,
  this makes it possible to precompute the Theano function to a constant.
@@ -138,6 +138,6 @@ Future Plans
 ============
  The parameter "constant shape" will be added to ``theano.shared()``. This is probably
-  the most frequent case with ``shared variables``. This will make the code
+  the most frequent occurrence with ``shared`` variables. It will make the code
  simpler and will make it possible to check that the shape does not change when
-  updating the shared variable.
+  updating the ``shared`` variable.
--- a/doc/tutorial/symbolic_graphs.txt
+++ b/doc/tutorial/symbolic_graphs.txt
@@ -19,7 +19,7 @@ relations using symbolic placeholders (**variables**). When writing down
 these expressions you use operations like ``+``, ``-``, ``**``,
 ``sum()``, ``tanh()``. All these are represented internally as **ops**. 
 An **op** represents a certain computation on some type of inputs
-producing some type of output. You can see it as a function definition
+producing some type of output. You can see it as a *function definition*
 in most programming languages. 
 Theano builds internally a graph structure composed of interconnected 
@@ -69,15 +69,15 @@ Take for example the following code:
    x = T.dmatrix('x')
    y = x*2.
-If you print `type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, 
+If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, 
 which is the apply node that connects the op and the inputs to get this
 output. You can now print the name of the op that is applied to get 
-``y``:
+*y*:
 >>> y.owner.op.name
 'Elemwise{mul,no_inplace}'
-Hence, an elementwise multiplication is used to compute ``y``. This
+Hence, an elementwise multiplication is used to compute *y*. This
 multiplication is done between the inputs:
 >>> len(y.owner.inputs)
@@ -89,7 +89,7 @@ InplaceDimShuffle{x,x}.0
 Note that the second input is not 2 as we would have expected. This is 
 because 2 was first :term:`broadcasted <broadcasting>` to a matrix of 
-same shape as x. This is done by using the op ``DimShuffle`` :
+same shape as *x*. This is done by using the op ``DimShuffle`` :
 >>> type(y.owner.inputs[1])
 <class 'theano.tensor.basic.TensorVariable'>
@@ -122,7 +122,7 @@ Using the
 these gradients can be composed in order to obtain the expression of the 
 gradient of the graph's output with respect to the graph's inputs .
-A coming section of this tutorial will address the topic of differentiation
+A following section of this tutorial will examine the topic of differentiation
 in greater detail.
@@ -131,7 +131,7 @@ Optimizations
 When compiling a Theano function, what you give to the
 :func:`theano.function <function.function>` is actually a graph
-(starting from the outputs variables you can traverse the graph up to
+(starting from the output variables you can traverse the graph up to
 the input variables). While this graph structure shows how to compute
 the output from the input, it also offers the possibility to improve the  
 way this computation is carried out. The way optimizations work in 

--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt