Made the doc more clear.

3e0c5bb7 · Cesar Laurent · 5ce67be5 · 3e0c5bb7
--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -210,82 +210,6 @@ with all values set to zero except at the provided array indices.
 This demonstrates that you can introduce new Theano variables into a scan function.
-Multiple outputs, several taps values - Recurrent Neural Network with Scan
--------------------------------------------------------------------------
-The examples above showed simple uses of scan. However, scan also supports
-referring not only to the prior result and the current sequence value, but
-also looking back more than one step.
-This is needed, for example, to implement a RNN using scan. Assume
-that our RNN is defined as follows :
-.. math::
-  x(n) = \tanh( W x(n-1) + W^{in}_1 u(n) + W^{in}_2 u(n-4) +
-  W^{feedback} y(n-1) )
-  y(n) = W^{out} x(n- 3)
-Note that this network is far from a classical recurrent neural
-network and might be useless. The reason we defined as such
-is to better illustrate the features of scan.
-In this case we have a sequence over which we need to iterate ``u``,
-and two outputs ``x`` and ``y``. To implement this with scan we first
-construct a function that computes one iteration step :
-.. code-block:: python
-  def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2,  W_feedback, W_out):
-    x_t = T.tanh( theano.dot(x_tm1, W) + \
-                  theano.dot(u_t,   W_in_1) + \
-                  theano.dot(u_tm4, W_in_2) + \
-                  theano.dot(y_tm1, W_feedback))
-    y_t = theano.dot(x_tm3, W_out)
-    return [x_t, y_t]
-As naming convention for the variables we used ``a_tmb`` to mean ``a`` at
-``t-b`` and ``a_tpb`` to be ``a`` at ``t+b``.
-Note the order in which the parameters are given, and in which the
-result is returned. Try to respect chronological order among
-the taps ( time slices of sequences or outputs) used. For scan is crucial only
-for the variables representing the different time taps to be in the same order
-as the one in which these taps are given. Also, not only taps should respect
-an order, but also variables, since this is how scan figures out what should
-be represented by what. Given that we have all
-the Theano variables needed we construct our RNN as follows :
-.. code-block:: python
-   u  = T.matrix() # it is a sequence of vectors
-   x0 = T.matrix() # initial state of x has to be a matrix, since
-                   # it has to cover x[-3]
-   y0 = T.vector() # y0 is just a vector since scan has only to provide
-                   # y[-1]
-   ([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \
-                                sequences    = dict(input = u, taps= [-4,-0]), \
-                                outputs_info = [dict(initial = x0, taps = [-3,-1]),y0], \
-                                non_sequences  = [W,W_in_1,W_in_2,W_feedback, W_out])
-        # for second input y, scan adds -1 in output_taps by default
-Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
-sequence of x and y values generated by iterating over u. The
-``sequence_taps``, ``outputs_taps`` give to scan information about what
-slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
-not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
-the compiled function, the numpy array given to represent this sequence
-should be large enough to cover this values. Assume that we compile the
-above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
-By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
-will start scaning from ``uvals[4]`` towards the end.
 Using shared variables - Gibbs sampling
 ---------------------------------------
@@ -317,15 +241,7 @@ the following:
    gibbs10 = theano.function([sample], values[-1], updates=updates)
-Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but
+The first, and probably most crucial observation is that the updates
-we do not iterate over them (so scan doesn't really need to know
-anything in particular about them, just that they are used inside the
-function applied at each step) you do not need to pass them as
-arguments. Scan will find them on its own and add them to the graph. Of
-course, if you wish to (and it is good practice) you can add them, when
-you call scan (they would be in the list of non-sequence inputs).
-The second, and probably most crucial observation is that the updates
 dictionary becomes important in this case. It links a shared variable
 with its updated value after k steps. In this case it tells how the
 random streams get updated after 10 iterations. If you do not pass this
@@ -355,51 +271,152 @@ after each step of scan. If we write :
 We will see that because ``b`` does not use the updated version of
 ``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
 If we call the function again, ``b`` will become 12, ``c`` will be 22
-and ``a.value`` 21.
+and ``a.value`` 21. If we do not pass the ``updates`` dictionary to the
+function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
+``c`` will always be ``12``.
+The second observation is that if we use shared variables ( ``W``, ``bvis``,
+``bhid``) but we do not iterate over them (ie scan doesn't really need to know
+anything in particular about them, just that they are used inside the
+function applied at each step) you do not need to pass them as arguments.
+Scan will find them on its own and add them to the graph.
+However, passing them to the scan function is a good practice, as it avoids
+Scan Op calling any earlier (external) Op over and over. This results in a
+simpler computational graph, which speeds up the optimization and the 
+execution. To pass the shared variables to Scan you need to put them in a list
+and give it to the ``non_sequences`` argument. Here is the Gibbs sampling code
+updated:
+.. code-block:: python
+    W = theano.shared(W_values) # we assume that ``W_values`` contains the
+                                # initial values of your weight matrix
+    bvis = theano.shared(bvis_values)
+    bhid = theano.shared(bhid_values)
+    trng = T.shared_randomstreams.RandomStreams(1234)
+    # OneStep, with explicit use of the shared variables (W, bvis, bhid)
+    def OneStep(vsample, W, bvis, bhid):
+        hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
+        hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
+        vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
+        return trng.binomial(size=vsample.shape, n=1, p=vmean,
+                         dtype=theano.config.floatX)
+    sample = theano.tensor.vector()
+    # The new scan, with the shared variables passed as non_sequences
+    values, updates = theano.scan(fn=OneStep,
+                                  outputs_info=sample,
+                                  non_sequences=[W, bvis, bhid],
+                                  n_steps=10)
+    gibbs10 = theano.function([sample], values[-1], updates=updates)
-If we do not pass the ``updates`` dictionary to the function, then
-``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
-will always be ``12``.
 Using shared variables - the strict flag
 ----------------------------------------
-You also have the possibility to use the ``strict`` flag. When set to true,
+As we just saw, passing the shared variables to scan may result in a simpler
-Scan assumes that all the necessary shared variables in ``fn`` are passed as a
+computational graph, which speeds up the optimization and the execution. A
-part of ``non_sequences``. This has to be ensured by the user. Otherwise, it
+good way to remember to pass every shared variable used during scan is to use
-will result in an error. It avoids Scan Op calling any earlier (external) Op
+the ``strict`` flag. When set to true, scan assumes that all the necessary
-over and over. This results in a simpler computational graph, which speeds up
+shared variables in ``fn`` are passed as a part of ``non_sequences``. This has
-the optimization and the execution.
+to be ensured by the user. Otherwise, it will result in an error.
+Using the previous Gibbs sampling example:
+.. code-block:: python
+    # The new scan, using strict=True
+    values, updates = theano.scan(fn=OneStep,
+                                  outputs_info=sample,
+                                  non_sequences=[W, bvis, bhid],
+                                  n_steps=10,
+                                  strict=True)
+If you omit to pass ``W``, ``bvis`` or ``bhid`` as a ``non_sequence``, it will
+result in an error.
-Here is a simple RRN example:
+Multiple outputs, several taps values - Recurrent Neural Network with Scan
+--------------------------------------------------------------------------
+The examples above showed simple uses of scan. However, scan also supports
+referring not only to the prior result and the current sequence value, but
+also looking back more than one step.
+This is needed, for example, to implement a RNN using scan. Assume
+that our RNN is defined as follows :
 .. math::
-  x(n) = \tanh(u(n) + W x(n-1))
+  x(n) = \tanh( W x(n-1) + W^{in}_1 u(n) + W^{in}_2 u(n-4) +
+  W^{feedback} y(n-1) )
+  y(n) = W^{out} x(n- 3)
+Note that this network is far from a classical recurrent neural
+network and might be useless. The reason we defined as such
+is to better illustrate the features of scan.
-And the code using ``strict=True``:
+In this case we have a sequence over which we need to iterate ``u``,
+and two outputs ``x`` and ``y``. To implement this with scan we first
+construct a function that computes one iteration step :
 .. code-block:: python
-    u = T.matrix() # The input sequence
+  def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2,  W_feedback, W_out):
-    x0 = T.vector() # The initial state
-    W = theano.shared(W_values) # we assume that ``W_values`` contains the
+    x_t = T.tanh(theano.dot(x_tm1, W) + \
-                                # initial values of your weight matrix.
+                 theano.dot(u_t,   W_in_1) + \
+                 theano.dot(u_tm4, W_in_2) + \
+                 theano.dot(y_tm1, W_feedback))
+    y_t = theano.dot(x_tm3, W_out)
+    return [x_t, y_t]
+As naming convention for the variables we used ``a_tmb`` to mean ``a`` at
+``t-b`` and ``a_tpb`` to be ``a`` at ``t+b``.
+Note the order in which the parameters are given, and in which the
+result is returned. Try to respect chronological order among
+the taps ( time slices of sequences or outputs) used. For scan is crucial only
+for the variables representing the different time taps to be in the same order
+as the one in which these taps are given. Also, not only taps should respect
+an order, but also variables, since this is how scan figures out what should
+be represented by what. Given that we have all
+the Theano variables needed we construct our RNN as follows :
+.. code-block:: python
+   u  = T.matrix() # it is a sequence of vectors
+   x0 = T.matrix() # initial state of x has to be a matrix, since
+                   # it has to cover x[-3]
+   y0 = T.vector() # y0 is just a vector since scan has only to provide
+                   # y[-1]
-    def oneStep(u_t, x_tm1, W):
-        return T.tanh(u_t + T.dot(W, x_tm1))
-    # Using strict=True, and passing W as a non_sequence
+   ([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
-    x_vals, updates = theano.scan(fn=oneStep,
+                                             sequences=dict(input=u, taps=[-4,-0]),
-                                  sequences=dict(input=u, taps=[0]),
+                                             outputs_info=[dict(initial=x0, taps=[-3,-1]), y0],
-                                  outputs_info=[dict(initial=x0,
+                                             non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
-                                                     taps=[-1])],
-                                  non_sequences=[W], # Don't forget to pass W!
                                             strict=True)
+        # for second input y, scan adds -1 in output_taps by default
-If you omit to pass ``W`` as a ``non_sequence``, it will result in an error.
+Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
+sequence of x and y values generated by iterating over u. The
+``sequence_taps``, ``outputs_taps`` give to scan information about what
+slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
+not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
+the compiled function, the numpy array given to represent this sequence
+should be large enough to cover this values. Assume that we compile the
+above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
+By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
+will start scaning from ``uvals[4]`` towards the end.
 Conditional ending of Scan