Updates to documentation

07d66494 · Joseph Turian · 669e606d · 07d66494 · 07d66494 · 07d66494
--- a/doc/doc/README.txt
+++ b/doc/doc/README.txt
@@ -114,7 +114,7 @@ Setup on OS-X
    Note that compiling gcc42 takes a significant time (hours) so it's probably
    not the best solution if you're in a rush! In my (Doomie) experience, scipy
    failed to compile the first time I tried the command, but the second time
-    it compiled just fine. Same thing with py25-zlib.
+    it compiled fine. Same thing with py25-zlib.
 - Install some kind of BLAS library (TODO: how?)

--- a/doc/doc/how_to_make_ops.txt
+++ b/doc/doc/how_to_make_ops.txt
@@ -305,9 +305,9 @@ This is done by setting the ``destroy_map`` field of the op. ``destroy_map`` mus
 Viewers
 -------
-Similarly, an Op might not modify the inputs, but return an output which shares state with one or several of its inputs. For example, ``transpose`` can be done very efficiently by viewing the same data as the original with modified dimensions and strides. That is fine, but the compiler needs to be told.
+Similarly, an Op might not modify the inputs, but return an output which shares state with one or several of its inputs. For example, ``transpose`` can be done efficiently by viewing the same data as the original with modified dimensions and strides. That is fine, but the compiler needs to be told.
-This is done by setting the ``view_map`` field of the op. It works just like the ``destroy_map`` field: to an output index is associated the list of inputs that it shares state with. For example, ``transpose.view_map == {0: [0]``} because its first output uses the same data as its first input. ``view_map`` is conservative: if there is any probability that an output will be the view of an input, that input must be in the view list of that output.
+This is done by setting the ``view_map`` field of the op. It works like the ``destroy_map`` field: to an output index is associated the list of inputs that it shares state with. For example, ``transpose.view_map == {0: [0]``} because its first output uses the same data as its first input. ``view_map`` is conservative: if there is any probability that an output will be the view of an input, that input must be in the view list of that output.
 Important note: currently, an output can only be the view of one input. This is limiting, as an 'if' or 'switch' op would need to declare its output as a view of both its then and else branches, but for the time being the framework is not powerful enough to handle it. A future version should address this issue.
@@ -316,7 +316,7 @@ Hidden outputs (as a form of op state)
 For performance purposes, an ``op`` might want to have a hidden internal state.
-Example: if we expect to call the op repeatedly on incrementally bigger inputs, we might want private output storage that's a lot bigger than needed and take incrementally bigger views on it, to save allocation overhead. In order to do this, we can simple have two outputs: one that we will return normally and will contain the answer and the other that will be the (larger) container. In this case, the advanced note in the 'reusing outputs' section applies. Furthermore, ``__call__`` should be overriden to only return the first output instead of both of them. Here is what the example's ``perform`` and ``__call__`` would look like:
+Example: if we expect to call the op repeatedly on incrementally bigger inputs, we might want private output storage that's a lot bigger than needed and take incrementally bigger views on it, to save allocation overhead. In order to do this, we can have two outputs: one that we will return normally and will contain the answer and the other that will be the (larger) container. In this case, the advanced note in the 'reusing outputs' section applies. Furthermore, ``__call__`` should be overriden to only return the first output instead of both of them. Here is what the example's ``perform`` and ``__call__`` would look like:
 .. code-block:: python

--- a/doc/doc/internal.txt
+++ b/doc/doc/internal.txt
@@ -27,6 +27,21 @@ However, if the link target is ambiguous, Sphinx will generate errors.
 NB the ``:api:`` reference is special magic by Olivier, in
 ./scripts/docgen.py.
+How to add TODO comments in Sphinx documentation
+-------------------------------------------------
+To include a TODO comment in Sphinx documentation, use an indented block as
+follows::
+    .. TODO: This is a comment.
+    .. You have to put .. at the beginning of every line :(
+    .. These lines should all be indented.
+It will not appear in the output generated.
+    .. TODO: Check it out, this won't appear.
+    .. Nor will this.
 How to write API documentation
 ---------------------------------------

--- a/doc/doc/module.txt
+++ b/doc/doc/module.txt
@@ -292,7 +292,7 @@ Complex models can be implemented by subclassing ``Module`` (though that is not
            self.l2_coef = M.Member(T.scalar()) # we can add a hyper parameter if we need to
            return self.l2_coef * T.sum(self.w * self.w)
-Using the model is quite simple:
+Here is how we use the model:
 .. code-block:: python

--- a/doc/doc/sparse.txt
+++ b/doc/doc/sparse.txt
@@ -7,8 +7,13 @@ Sparse matrices
 scipy.sparse
 ------------
-Note that you want scipy >= 0.7.0. 0.6 has a very bug and inconsistent
+Note that you want scipy >= 0.7.0.
-implementation of sparse matrices.
+.. warning::
+    In scipy 0.6, ``scipy.csc_matrix.dot`` has a bug with singleton
+    dimensions. There may be more bugs. It also has inconsistent
+    implementation of sparse matrices.
 We describe the details of the compressed sparse matrix types.
    ``scipy.sparse.csc_matrix``

--- a/doc/doc/tutorial.txt
+++ b/doc/doc/tutorial.txt
@@ -157,7 +157,7 @@ State example
 =============
 In this example, we'll look at a complete logistic regression model, with
-training by simple gradient descent.
+training by gradient descent.
 .. code-block:: python

--- a/doc/index.txt
+++ b/doc/index.txt
@@ -31,7 +31,7 @@ not limited to:
 * constant folding
 * merging of similar subgraphs, to avoid calculating the same values more than once
-* simple arithmetic simplification (``x*y/x -> y``)
+* arithmetic simplification (``x*y/x -> y``)
 * inserting efficient BLAS_ operations
 * using inplace operations wherever it is safe to do so.
@@ -47,7 +47,7 @@ Theano is released under a BSD license (:ref:`link <license>`)
 Sneak peek
 ==========
-Here is a simple example of how to use Theano. It doesn't show
+Here is an example of how to use Theano. It doesn't show
 off many of Theano's features, but it illustrates concretely what
 Theano is.
@@ -110,7 +110,7 @@ There exist another symbolic package in Python, namely sympy_. Theano
 is different from sympy in the sense that while Theano allows symbolic
 manipulation it puts more emphasis on the evaluation of these expressions
 and being able to repeatedly evaluate them on many different inputs. Theano
-is also better suited to handling very large tensors which have no
+is also better suited to handling large tensors which have no
 assumed structures.
 If numpy_ is to be compared to MATLAB_ and sympy_ to Mathematica_,

--- a/doc/install.txt
+++ b/doc/install.txt
@@ -43,17 +43,20 @@ The following libraries and software are optional:
 Easy install
 ------------
-The following command will install the very latest revision of Theano
+The following command will install the latest revision of Theano
 on your system:
+.. TODO: Does this install the latest package version, or the latest Mercurial
+.. revision?
 .. code-block:: bash
    easy_install http://pylearn.org/hg/theano/archive/tip.tar.gz
-TODO: make sure this works
+.. TODO: make sure this works
-TODO: change the command to install the latest *stable* version of
+.. TODO: change the command to install the latest *stable* version of
-Theano, when we figure out where to put it.
+.. Theano, when we figure out where to put it.
 --------------

--- a/doc/tutorials/advanced/ex1/cop.txt
+++ b/doc/tutorials/advanced/ex1/cop.txt
@@ -17,7 +17,7 @@ an input provided by the end user (using c_extract) or it might simply
 have been calculated by another operation. For each of the outputs,
 the variables associated to them will be declared and initialized.
-The operation then simply has to compute what it needs to using the
+The operation then has to compute what it needs to using the
 input variables and place the results in the output variables.
@@ -88,7 +88,7 @@ variables x_name, y_name and output_name are all of the primitive C
 Implementing multiplication is as simple as multiplying the two input
 doubles and setting the output double to what comes out of it. If you
-had more than one output, you would simply set the variable(s) for
+had more than one output, you would just set the variable(s) for
 each output to what they should be.
 .. warning::

--- a/doc/tutorials/advanced/ex1/ctype.txt
+++ b/doc/tutorials/advanced/ex1/ctype.txt
@@ -154,7 +154,7 @@ it, it's best to publish it somewhere.
        """ % dict(name = name)
    double.c_init = c_init
-Still straightforward. This function simply has to initialize the
+This function has to initialize the
 double we declared previously to a suitable value. This is useful if
 we want to avoid dealing with garbage values, especially if our data
 type is a pointer. This is not going to be called for all Results with
@@ -375,7 +375,7 @@ like this:
     //c_cleanup for x
   }
-It's not very good looking, but it gives you an idea of how things
+It's not pretty, but it gives you an idea of how things
 work (note that the variable names won't be x, y, z, etc. - they will
 get a unique mangled name). The ``fail`` code runs a goto to the
 appropriate label in order to run all cleanup that needs to be

--- a/doc/tutorials/advanced/ex1/op.txt
+++ b/doc/tutorials/advanced/ex1/op.txt
@@ -138,11 +138,10 @@ type and it should make an Apply node with an output Result of type
   mul.make_node = make_node
-This is a pretty simple definition: the first two lines make sure that
+The first two lines make sure that both inputs are Results of the
-both inputs are Results of the ``double`` type that we created in the
+``double`` type that we created in the previous section. We would not
-previous section. We would not want to multiply two arbitrary types,
+want to multiply two arbitrary types, it would not make much sense
-it would not make much sense (and we'd be screwed when we implement
+(and we'd be screwed when we implement this in C!)
-this in C!)
 The last line is the meat of the definition. There we create an Apply
 node representing the application of ``mul`` to ``x`` and ``y``. Apply
@@ -178,8 +177,8 @@ understand the role of all three arguments of ``perform``:
  return, per our own definition.
 - *output_storage*: This is a list of storage cells. There is one
-  storage cell for each output of the Op. A storage cell is quite
+  storage cell for each output of the Op. A storage cell is
-  simply a one-element list (note: it is forbidden to change the
+  a one-element list (note: it is forbidden to change the
  length of the list(s) contained in output_storage). In this example,
  output_storage will contain a single storage cell for the
  multiplication's result.
@@ -204,18 +203,19 @@ Here, ``z`` is a list of one element. By default, ``z == [None]``.
   :ref:`op` documentation.
 .. warning::
-   The data you put in the output_storage must match the type of the
+   The data you put in ``output_storage`` must match the type of the
-   symbolic output (this is a situation where the ``node`` argument
+   symbolic output. This is a situation where the ``node`` argument
-   can come in handy). In the previous example, if you put, say, an
+   can come in handy. In this example, we gave ``z`` the Theano type
-   ``int`` in ``z[0]`` (even though we gave ``z`` the Theano type
   ``double`` in ``make_node``, which means that a Python ``float``
-   must be put there) you might have nasty problems further down the
+   must be put there. You should not put, say, an ``int`` in ``z[0]``
-   line since Theano often assumes Ops handle typing properly.
+   because Theano assumes Ops handle typing properly.
 Trying out our new Op
 =====================
+In the following code, we use our new Op:
 >>> x, y = double('x'), double('y')
 >>> z = mul(x, y)
 >>> f = theano.function([x, y], z)
@@ -224,7 +224,7 @@ Trying out our new Op
 >>> f(5.6, 6.7)
 37.519999999999996
-Seems to work. Note that there is an implicit call to
+Note that there is an implicit call to
 ``double.filter()`` on each argument, so if we give integers as inputs
 they are magically casted to the right type. Now, what if we try this?
@@ -237,7 +237,8 @@ Traceback (most recent call last):
 AttributeError: 'int' object has no attribute 'type'
 Well, ok. We'd like our Op to be a bit more flexible. This can be done
-by fixing ``make_node`` a little bit:
+by modifying ``make_node`` to accept Python ``int`` or ``float`` as
+``x`` and/or ``y``:
 .. code-block:: python
@@ -252,8 +253,8 @@ by fixing ``make_node`` a little bit:
   mul.make_node = make_node
 Whenever we pass a Python int or float instead of a Result as ``x`` or
-``y``, make_node will convert it to :ref:`constant` for us. Constant
+``y``, make_node will convert it to :ref:`constant` for us. ``gof.Constant``
-is basically a :ref:`result` we statically know the value of.
+is a :ref:`result` we statically know the value of.
 >>> x = double('x')
 >>> z = mul(x, 2)
@@ -263,18 +264,16 @@ is basically a :ref:`result` we statically know the value of.
 >>> f(3.4)
 6.7999999999999998
-And now it works the way we want it to.
+Now the code works the way we want it to.
 Final version
 =============
-While I would call the above definitions appropriately pedagogical, it
+The above example is pedagogical.  When you define other basic arithmetic
-is not necessarily the best way to do things, especially when you need
+operations ``add``, ``sub`` and ``div``, code for ``make_node`` can be
-to define the other basic arithmetic operations ``add``, ``sub`` and
+shared between these Ops. Here is revised implementation of these four
-``div``. It appears that the code for ``make_node`` can be shared
+arithmetic operators:
-between these Ops. Here is the final version of the four arithmetic
-operators (well, pending revision of this tutorial, I guess):
 .. code-block:: python
@@ -313,37 +312,27 @@ operators (well, pending revision of this tutorial, I guess):
   div = BinaryDoubleOp(name = 'div',
                        fn = lambda x, y: x / y)
-Can you see how the definition of ``mul`` here does exactly the same
+Instead of working directly on an instance of Op, we create a subclass of
-thing as the definition we had earlier?
+Op that we can parametrize. All the operations we define are binary. They
+all work on two inputs with type ``double``. They all return a single
-Instead of working directly on an instance of Op, we create a subclass
+Result of type ``double``. Therefore, ``make_node`` does the same thing
-of Op that we can parametrize. First, all the operations we define are
+for all these operations, except for the Op reference ``self`` passed
-binary, they all work on inputs with type ``double`` and they all
+as first argument to Apply.  We define ``perform`` using the function
-return a single Result of type ``double``. Therefore, ``make_node``
+``fn`` passed in the constructor.
-basically does the same thing for all these operations, except for the
-fact that the Op reference passed as first argument to Apply must be
+This design is a flexible way to define basic operations without
-themselves. Therefore we can abstract out most of the logic and pass
+duplicating code. The same way a Type subclass represents a set of
-self to Apply, which seems natural. We can also easily define
+structurally similar types (see previous section), an Op subclass
-``perform`` as depending on a function or lambda expression passed in
+represents a set of structurally similar operations: operations that
-the constructor.
+have the same input/output types, operations that only differ in one
+small detail, etc. If you see common patterns in several Ops that you
-This design therefore appears to be a flexible way to define our four
+want to define, it can be a good idea to abstract out what you can.
-basic operations (and possibly many more!) without duplicating
+Remember that an Op is just an object which satisfies the contract
-code. The same way a Type subclass represents a set of structurally
+described above on this page and that you should use all the tools at
-similar types (see previous section), an Op subclass represents a set
+your disposal to create these objects as efficiently as possible.
-of structurally similar operations: operations that have the same
-input/output types, operations that only differ in one small detail,
+**Exercise**: Make a generic DoubleOp, where the number of
-etc. If you see common patterns in several Ops that you want to
+arguments can also be given as a parameter.
-define, it can be a good idea to abstract out what you can, as I did
-here. Remember that an Op is just an object which satisfies the
-contract described above on this page and that you should use all the
-tools at your disposal to create these objects as efficiently as
-possible.
-While I could have made a generic DoubleOp where the number of
-arguments can also be given as a parameter, I decided it was not
-necessary here.
 **Next:** `Implementing double in C`_

--- a/doc/tutorials/advanced/ex1/type.txt
+++ b/doc/tutorials/advanced/ex1/type.txt
--- a/doc/tutorials/advanced/index.txt
+++ b/doc/tutorials/advanced/index.txt
@@ -11,7 +11,7 @@ Before tackling this tutorial, it is highly recommended to read the
 The advanced tutorial is meant to give the reader a greater
 understanding of the building blocks of Theano. Through this tutorial
-we are going to define one :ref:`type`, ``double`` and basic
+we are going to define one :ref:`type`, ``double``, and basic
 arithmetic :ref:`operations <op>` on that Type. We will first define
 them using a Python implementation and then we will add a C
 implementation.

--- a/doc/tutorials/advanced/inplace.txt
+++ b/doc/tutorials/advanced/inplace.txt
@@ -166,7 +166,7 @@ first input (rank 0).
 Purely destructive operations
 =============================
-While some operations will operate inplace on their inputs, some will
+While some operations will operate inplace on their inputs, some might
 simply destroy or corrupt them. For example, an Op could do temporary
 calculations right in its inputs. If that is the case, Theano also
 needs to be notified. The way to notify Theano is to assume that some

--- a/doc/tutorials/advanced/optimization.txt
+++ b/doc/tutorials/advanced/optimization.txt
@@ -176,7 +176,7 @@ optimization you wrote. For example, consider the following:
 >>> e
 [div(mul(add(y, z), x), add(y, z))]
-Nothing happened here. The reason is simple: ``add(y, z) != add(y,
+Nothing happened here. The reason is: ``add(y, z) != add(y,
 z)``. That is the case for efficiency reasons. To fix this problem we
 first need to merge the parts of the graph that represent the same
 computation, using the ``merge_optimizer`` defined in

--- a/doc/tutorials/advanced/tips.txt
+++ b/doc/tutorials/advanced/tips.txt
@@ -14,7 +14,7 @@ WRITEME
 Don't define new Ops unless you have to
 =======================================
-It is usually not very useful to define Ops that can be easily
+It is usually not useful to define Ops that can be easily
 implemented using other already existing Ops. For example, instead of
 writing a "sum_square_difference" Op, you should probably just write a
 simple function:

--- a/doc/tutorials/basic/adding.txt
+++ b/doc/tutorials/basic/adding.txt
@@ -30,6 +30,12 @@ add. Note that from now on, we will use the term :term:`Result` to
 mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result
 objects).
+If you are following along and typing into an interpreter, you may have
+noticed that there was a slight delay in executing the ``function``
+instruction. Behind the scenes, ``f`` was being compiled into C code.
+    .. TODO: help
 -------------------------------------------
 **Step 1**
@@ -119,16 +125,15 @@ The result is a numpy array. We can also use numpy arrays directly as
 inputs:
 >>> import numpy
->>> f(numpy.ones((3, 5)), numpy.ones((3, 5)))
+>>> f(numpy.array([[1, 2], [3, 4]]), numpy.array([[10, 20], [30, 40]]))
-array([[ 2.,  2.,  2.,  2.,  2.],
+array([[ 11.,  22.],
-       [ 2.,  2.,  2.,  2.,  2.],
+       [ 33.,  44.]])
-       [ 2.,  2.,  2.,  2.,  2.]])
 It is possible to add scalars to matrices, vectors to matrices,
 scalars to vectors, etc. The behavior of these operations is defined
 by :term:`broadcasting`.
-The following types are readily available:
+The following types are available:
 * **byte**: bscalar, bvector, bmatrix
 * **32-bit integers**: iscalar, ivector, imatrix
@@ -136,16 +141,15 @@ The following types are readily available:
 * **float**: fscalar, fvector, fmatrix
 * **double**: dscalar, dvector, dmatrix
+The previous list is not exhaustive. A guide to all types compatible
+with numpy arrays may be found :ref:`here <predefinedtypes>`.
 .. note::
   Watch out for the distinction between 32 and 64 bit integers (i
   prefix vs the l prefix) and between 32 and 64 bit floats (f prefix
   vs the d prefix).
-Try to mix and match them and see what happens. The previous list is
-not exhaustive. A guide to all types compatible with numpy arrays may
-be found :ref:`here <predefinedtypes>`.
 **Next:** `More examples`_

--- a/doc/tutorials/basic/dlogistic.png
+++ b/doc/tutorials/basic/dlogistic.png
--- a/doc/tutorials/basic/examples.txt
+++ b/doc/tutorials/basic/examples.txt
@@ -17,39 +17,63 @@ the logistic curve, which is given by:
   s(x) = \frac{1}{1 + e^{-x}}
+.. figure:: logistic.png
+    A plot of the logistic function, with x on the x-axis and s(x) on the
+    y-axis.
 You want to compute the function :term:`elementwise` on matrices of
-doubles.
+doubles, which means that you want to apply this function to each
+individual element of the matrix.
 Well, what you do is this:
 >>> x = T.dmatrix('x')
 >>> s = 1 / (1 + T.exp(-x))
 >>> logistic = function([x], s)
+>>> logistic([[0, 1], [-1, -2]])
+array([[ 0.5       ,  0.73105858],
+       [ 0.26894142,  0.11920292]])
-Alternatively:
+The reason logistic is performed elementwise is because all of its
+operations---division, addition, exponentiation, and division---are
+themselves elementwise operations.
->>> s = (T.tanh(x) + 1) / 2
+It is also the case that:
->>> logistic = function([x], s)
+.. math::
+    s(x) = \frac{1}{1 + e^{-x}} = \frac{1 + \tanh(x/2)}{2}
+We can verify that this alternate form produces the same values:
+>>> s2 = (1 + T.tanh(x / 2)) / 2
+>>> logistic2 = function([x], s2)
+>>> logistic2([[0, 1], [-1, -2]])
+array([[ 0.5       ,  0.73105858],
+       [ 0.26894142,  0.11920292]])
 Computing more than one thing at the same time
 ==============================================
 Theano supports functions with multiple outputs. For example, we can
-compute the :term:`elementwise` absolute difference between two
+compute the :term:`elementwise` difference, absolute difference, and
-matrices ``x`` and ``y`` and the squared difference at the same time:
+squared difference between two matrices ``x`` and ``y`` at the same time:
 >>> x, y = T.dmatrices('xy')
 >>> diff = x - y
->>> abs_diff = abs(x - y)
+>>> abs_diff = abs(diff)
 >>> diff_squared = diff**2
->>> f = function([x, y], [abs_diff, diff_squared])
+>>> f = function([x, y], [diff, abs_diff, diff_squared])
 When we use the function, it will return the two results (the printing
 was reformatted for readability):
 >>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
 [array([[ 1.,  0.],
+        [-1., -2.]]),
+ array([[ 1.,  0.],
        [ 1.,  2.]]),
 array([[ 1.,  0.],
        [ 1.,  4.]])]
@@ -62,9 +86,12 @@ Computing gradients
 ===================
 Now let's use Theano for a slightly more sophisticated task: create a
-function which computes the derivative of some expression ``e`` with
+function which computes the derivative of some expression ``y`` with
 respect to its parameter ``x``. For instance, we can compute the
-gradient of :math:`x^2` with respect to :math:`x`.
+gradient of :math:`x^2` with respect to :math:`x`. Note that:
+:math:`d(x^2)/dx = 2 \cdot x`.
+Here is code to compute this gradient:
 >>> x = T.dscalar('x')
 >>> y = x**2
@@ -76,17 +103,26 @@ array(8.0)
 array(188.40000000000001)
 We can also compute the gradient of complex expressions such as the
-logistic function defined above:
+logistic function defined above. It turns out that the derivative of the
+logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
+.. figure:: dlogistic.png
+    A plot of the gradient of the logistic function, with x on the x-axis
+    and :math:`ds(x)/dx` on the y-axis.
 >>> x = T.dmatrix('x')
 >>> s = 1 / (1 + T.exp(-x))
 >>> gs = T.grad(s, x)
->>> glogistic = function([x], gs)
+>>> dlogistic = function([x], gs)
+>>> dlogistic([[0, 1], [-1, -2]])
+array([[ 0.25      ,  0.19661193],
+       [ 0.19661193,  0.10499359]])
 The resulting function computes the gradient of its first argument
-with respect to the second. It is pretty much equivalent in semantics
+with respect to the second. In this way, Theano can be used for
-and in computational complexity as what you would obtain through an
+`automatic differentiation`_.
-`automatic differentiation`_ tool.
 .. note::
@@ -125,7 +161,7 @@ Making a function with state
 It is also possible to make a function with an internal state. For
 example, let's say we want to make an accumulator: at the beginning,
-the state is initialized to zero, then on each function call the state
+the state is initialized to zero. Then, on each function call, the state
 is incremented by the function's argument. We'll also make it so that
 the increment has a default value of 1.
@@ -136,12 +172,12 @@ First let's define the accumulator function:
 >>> new_state = state + inc
 >>> accumulator = function([(inc, 1), ((state, new_state), 0)], new_state)
-The first argument is a pair. As we saw in the previous section this
+The first argument is a pair. As we saw in the previous section, this
-simply means that inc is an input with a default value of 1. The
+means that ``inc`` is an input with a default value of 1. The
-second argument has a new syntax which creates an internal state or
+second argument has syntax that creates an internal state or
 closure. The syntax is ``((state_result, new_state_result),
 initial_value)``. What this means is that every time ``accumulator``
-will be called, the value of the internal ``state`` will be replaced
+is called, the value of the internal ``state`` will be replaced
 by the value computed as ``new_state``. In this case, the state will
 be replaced by the result of incrementing it by ``inc``.
@@ -152,7 +188,7 @@ however you like as long as the name does not conflict with the names
 of other inputs.
 Anyway, let's try it out! The state can be accessed using the square
-brackets notation ``[]``. You may access the state either by putting
+brackets notation ``[]``. You may access the state either by using
 the :ref:`result` representing it or the name of that
 :ref:`result`. In our example we can access the state either with the
 ``state`` object or the string 'state'.
@@ -174,8 +210,8 @@ array(301.0)
 >>> accumulator['state']
 array(301.0)
-It is of course possible to reset the state. This is done very
+It is possible to reset the state. This is done
-naturally by assigning to the state using the square brackets
+by assigning to the state using the square brackets
 notation:
 >>> accumulator['state'] = 5

--- a/doc/tutorials/basic/logistic.gp
+++ b/doc/tutorials/basic/logistic.gp
+set terminal svg font "Bitstream Vera Sans,10" size 300,200
+set output "logistic.svg"
+set xrange [-6:6]
+set xzeroaxis linetype -1
+set yzeroaxis linetype -1
+set xtics axis nomirror
+set ytics axis nomirror 0,0.5,1
+set key off
+set grid
+set border 1
+set samples 400
+plot 1/(1 + exp(-x)) with line linetype rgbcolor "blue" linewidth 2
+set ytics axis nomirror 0,0.25
+set output "dlogistic.svg"
+plot 1/(1 + exp(-x)) * (1 - 1/(1 + exp(-x))) with line linetype rgbcolor "blue" linewidth 2
--- a/doc/tutorials/basic/logistic.png
+++ b/doc/tutorials/basic/logistic.png
--- a/doc/tutorials/basic/module.txt
+++ b/doc/tutorials/basic/module.txt
@@ -3,11 +3,11 @@
 Using Module
 ============
-Now that we're familiar with the basics, we can see Theano's more
+Now that we're familiar with the basics, we introduce Theano's more
 advanced interface, Module. This interface allows you to define Theano
 "objects" which can have many state variables and many methods sharing
-these states. This is what you should use if you aim to use Theano to
+these states. This is what you should use to define complex systems such
-define complex systems such as a neural network.
+as a neural network.
 Remake of the "state" example
@@ -61,7 +61,7 @@ defined in our Module.
 The inc variable doesn't need to be declared as a Member because it
 will only serve as an input to the method we will define. This is why
 it is defined as an :ref:`external` variable. Do note that it is
-inconsequential if you do declare it as a Member - it is very unlikely
+inconsequential if you do declare it as a Member - it is unlikely
 to cause you any problems.
 .. note::

--- a/doc/tutorials/basic/randomstreams.txt
+++ b/doc/tutorials/basic/randomstreams.txt
@@ -52,7 +52,7 @@ object for each of fn and gn).
 >>> m.nearly_zeros = Method([], rv_u + rv_u - 2 * rv_u)
-This function will always return a 2x2 matrix of very small numbers, or possibly
+This function will always return a 2x2 matrix of small numbers, or possibly
 zeros.  It illustrates that random variables are not re-drawn every time they
 are used, they are only drawn once (per call).
@@ -84,7 +84,7 @@ seed method of a RandomStreamsInstance.
 Of course, a RandomStreamsInstance can contain several RandomState instances and
 these will _not_ all be seeded to the same seed_value.  They will all be seeded
-deterministically and very-probably uniquely as a function of the seed_value.
+deterministically and probably uniquely as a function of the seed_value.
 Seeding the generator in this way makes it possible to repeat random streams.

--- a/doc/tutorials/basic/tools.txt
+++ b/doc/tutorials/basic/tools.txt
@@ -22,7 +22,7 @@ much longer than intended - maybe we should just link to it! --OB
 Predefined types
 ----------------
-Theano gives you many premade types to work with. These types are
+Predefined types are
 located in the ``theano.tensor`` package. The name of the types follow
 a recipe:
@@ -53,9 +53,9 @@ col    [m, 1] No                                         Yes
 matrix [m, n] No                                         No
 ====== ====== ========================================== =============================================
-So for example if you want a row of 32-bit floats, it is available
+So, if you want a row of 32-bit floats, it is available
-under ``theano.tensor.frow`` and if you want a matrix of unsigned
+as ``theano.tensor.frow``. If you want a matrix of unsigned
-32-bit integers it is available under ``theano.tensor.imatrix``.
+32-bit integers it is available as ``theano.tensor.imatrix``.
 Each of the types described above can be constructed by two methods:
 a singular version (e.g., ``dmatrix``) and a plural version
@@ -108,16 +108,18 @@ complex128  complex          128 (two float64)
 .. note::
-   Even though ``theano.tensor`` does not define any type using
+   Even though ``theano.tensor`` does not define any type
-   ``complex`` dtypes (``complex64`` or ``complex128``), you can define
+   using ``complex`` dtypes (``complex64`` or ``complex128``),
-   them explicitly with ``Tensor`` (see example below). However, few
+   you can define them explicitly with ``Tensor`` (see example
-   operations are fully supported for complex types: as of version 0.1,
+   below). However, few operations are fully supported for complex
-   only elementary operations (``+-*/``) have C implementations.
+   types: as of version 0.1, only elementary operations (``+-*/``)
+   have C implementations. Additionally, complex types have received
+   little testing.
-The broadcastable pattern, on the other hand, indicates both the
+The broadcastable pattern indicates both the number of dimensions and
-number of dimensions and whether a particular dimension has length
+whether a particular dimension must have length 1.
-1. Here is a handy table mapping the :term:`broadcastable
+Here is a table mapping the :term:`broadcastable
 <broadcasting>` pattern to what kind of tensor it encodes:
 ===================== =================================
@@ -136,14 +138,18 @@ pattern               interpretation
 [False, False, False] A MxNxP tensor (pattern of a + b)
 ===================== =================================
+For dimensions in which broadcasting is False, the length of this
+dimension can be 1 or more.  For dimensions in which broadcasting is True,
+the length of this dimension must be 1.
 When two tensors have a different number of dimensions, the broadcastable
-pattern is *expanded to the left*, by padding with ``True``. So, for example,
+pattern is *expanded to the left*, by padding with ``True``. For example,
 a vector's pattern, ``[False]``, could be expanded to ``[True, False]``, and
 would behave like a row (1xN matrix). In the same way, a matrix (``[False,
 False]``) would behave like a 1xNxP tensor (``[True, False, False]``).
-So if we wanted to create a type representing a 3D array of unsigned
+If we wanted to create a type representing a 3D array of unsigned
-bytes, we would simply do: 
+bytes, we would do: 
 .. code-block:: python
@@ -158,10 +164,8 @@ bytes, we would simply do:
 Ops
 ===
-There's a lot of operations readily available in the ``theano.tensor``
+There are a lot of operations available in the ``theano.tensor`` package.
-package. They do not require much explanation according to this
+See :ref:`oplist`.
-tutorial's author, so he will simply direct you to the :ref:`oplist`
-:)

--- a/doc/tutorials/tensorop.txt
+++ b/doc/tutorials/tensorop.txt
@@ -24,7 +24,7 @@ difficult, we will give our Op a solid C implementation.
 Implementing a new Op in Python
 ===============================
-This is actually very simple to do. You are required to define two
+You are required to define two
 methods - one to create the :ref:`apply` node every time your Op is
 applied to some inputs, declaring the outputs in the process and
 another to operate on the inputs. There is also one optional method