Made the doc more clear.

3e0c5bb7 · Cesar Laurent · 5ce67be5 · 3e0c5bb7
--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -210,6 +210,138 @@ with all values set to zero except at the provided array indices.
 This demonstrates that you can introduce new Theano variables into a scan function.


+Using shared variables - Gibbs sampling
+---------------------------------------
+
+Another useful feature of scan, is that it can handle shared variables.
+For example, if we want to implement a Gibbs chain of length 10 we would do
+the following:
+
+.. code-block:: python
+
+    W = theano.shared(W_values) # we assume that ``W_values`` contains the
+                                # initial values of your weight matrix
+
+    bvis = theano.shared(bvis_values)
+    bhid = theano.shared(bhid_values)
+
+    trng = T.shared_randomstreams.RandomStreams(1234)
+
+    def OneStep(vsample) :
+        hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
+        hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
+        vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
+        return trng.binomial(size=vsample.shape, n=1, p=vmean,
+                         dtype=theano.config.floatX)
+
+    sample = theano.tensor.vector()
+
+    values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)
+
+    gibbs10 = theano.function([sample], values[-1], updates=updates)
+
+
+The first, and probably most crucial observation is that the updates
+dictionary becomes important in this case. It links a shared variable
+with its updated value after k steps. In this case it tells how the
+random streams get updated after 10 iterations. If you do not pass this
+update dictionary to your function, you will always get the same 10
+sets of random numbers. You can even use the ``updates`` dictionary
+afterwards. Look at this example :
+
+.. code-block:: python
+
+    a = theano.shared(1)
+    values, updates = theano.scan(lambda: {a: a+1}, n_steps=10)
+
+In this case the lambda expression does not require any input parameters
+and returns an update dictionary which tells how ``a`` should be updated
+after each step of scan. If we write :
+
+.. code-block:: python
+
+    b = a + 1
+    c = updates[a] + 1
+    f = theano.function([], [b, c], updates=updates)
+
+    print b
+    print c
+    print a.value
+
+We will see that because ``b`` does not use the updated version of
+``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
+If we call the function again, ``b`` will become 12, ``c`` will be 22
+and ``a.value`` 21. If we do not pass the ``updates`` dictionary to the
+function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
+``c`` will always be ``12``.
+
+The second observation is that if we use shared variables ( ``W``, ``bvis``,
+``bhid``) but we do not iterate over them (ie scan doesn't really need to know
+anything in particular about them, just that they are used inside the
+function applied at each step) you do not need to pass them as arguments.
+Scan will find them on its own and add them to the graph.
+However, passing them to the scan function is a good practice, as it avoids
+Scan Op calling any earlier (external) Op over and over. This results in a
+simpler computational graph, which speeds up the optimization and the 
+execution. To pass the shared variables to Scan you need to put them in a list
+and give it to the ``non_sequences`` argument. Here is the Gibbs sampling code
+updated:
+
+.. code-block:: python
+
+    W = theano.shared(W_values) # we assume that ``W_values`` contains the
+                                # initial values of your weight matrix
+
+    bvis = theano.shared(bvis_values)
+    bhid = theano.shared(bhid_values)
+
+    trng = T.shared_randomstreams.RandomStreams(1234)
+    
+    # OneStep, with explicit use of the shared variables (W, bvis, bhid)
+    def OneStep(vsample, W, bvis, bhid):
+        hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
+        hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
+        vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
+        return trng.binomial(size=vsample.shape, n=1, p=vmean,
+                         dtype=theano.config.floatX)
+
+    sample = theano.tensor.vector()
+    
+    # The new scan, with the shared variables passed as non_sequences
+    values, updates = theano.scan(fn=OneStep,
+                                  outputs_info=sample,
+                                  non_sequences=[W, bvis, bhid],
+                                  n_steps=10)
+
+    gibbs10 = theano.function([sample], values[-1], updates=updates)
+
+
+
+Using shared variables - the strict flag
+----------------------------------------
+
+As we just saw, passing the shared variables to scan may result in a simpler
+computational graph, which speeds up the optimization and the execution. A
+good way to remember to pass every shared variable used during scan is to use
+the ``strict`` flag. When set to true, scan assumes that all the necessary
+shared variables in ``fn`` are passed as a part of ``non_sequences``. This has
+to be ensured by the user. Otherwise, it will result in an error.
+
+Using the previous Gibbs sampling example:
+
+.. code-block:: python
+
+    # The new scan, using strict=True
+    values, updates = theano.scan(fn=OneStep,
+                                  outputs_info=sample,
+                                  non_sequences=[W, bvis, bhid],
+                                  n_steps=10,
+                                  strict=True)
+
+If you omit to pass ``W``, ``bvis`` or ``bhid`` as a ``non_sequence``, it will
+result in an error.
+
+
 Multiple outputs, several taps values - Recurrent Neural Network with Scan
 --------------------------------------------------------------------------

@@ -238,10 +370,10 @@ construct a function that computes one iteration step :

  def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2,  W_feedback, W_out):

-    x_t = T.tanh( theano.dot(x_tm1, W) + \
-                  theano.dot(u_t,   W_in_1) + \
-                  theano.dot(u_tm4, W_in_2) + \
-                  theano.dot(y_tm1, W_feedback))
+    x_t = T.tanh(theano.dot(x_tm1, W) + \
+                 theano.dot(u_t,   W_in_1) + \
+                 theano.dot(u_tm4, W_in_2) + \
+                 theano.dot(y_tm1, W_feedback))
    y_t = theano.dot(x_tm3, W_out)

    return [x_t, y_t]
@@ -266,10 +398,11 @@ the Theano variables needed we construct our RNN as follows :
                   # y[-1]


-   ([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \
-                                sequences    = dict(input = u, taps= [-4,-0]), \
-                                outputs_info = [dict(initial = x0, taps = [-3,-1]),y0], \
-                                non_sequences  = [W,W_in_1,W_in_2,W_feedback, W_out])
+   ([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
+                                             sequences=dict(input=u, taps=[-4,-0]),
+                                             outputs_info=[dict(initial=x0, taps=[-3,-1]), y0],
+                                             non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
+                                             strict=True)
        # for second input y, scan adds -1 in output_taps by default


@@ -286,122 +419,6 @@ By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
 will start scaning from ``uvals[4]`` towards the end.


-Using shared variables - Gibbs sampling
---------------------------------------
-
-Another useful feature of scan, is that it can handle shared variables.
-For example, if we want to implement a Gibbs chain of length 10 we would do
-the following:
-
-.. code-block:: python
-
- W = theano.shared(W_values) # we assume that ``W_values`` contains the
-                             # initial values of your weight matrix
-
- bvis = theano.shared(bvis_values)
- bhid = theano.shared(bhid_values)
-
- trng = T.shared_randomstreams.RandomStreams(1234)
-
- def OneStep(vsample) :
-    hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
-    hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
-    vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
-    return trng.binomial(size=vsample.shape, n=1, p=vmean,
-                         dtype=theano.config.floatX)
-
- sample = theano.tensor.vector()
-
- values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)
-
- gibbs10 = theano.function([sample], values[-1], updates=updates)
-
-
-Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but
-we do not iterate over them (so scan doesn't really need to know
-anything in particular about them, just that they are used inside the
-function applied at each step) you do not need to pass them as
-arguments. Scan will find them on its own and add them to the graph. Of
-course, if you wish to (and it is good practice) you can add them, when
-you call scan (they would be in the list of non-sequence inputs).
-
-The second, and probably most crucial observation is that the updates
-dictionary becomes important in this case. It links a shared variable
-with its updated value after k steps. In this case it tells how the
-random streams get updated after 10 iterations. If you do not pass this
-update dictionary to your function, you will always get the same 10
-sets of random numbers. You can even use the ``updates`` dictionary
-afterwards. Look at this example :
-
-.. code-block:: python
-
- a = theano.shared(1)
- values, updates = theano.scan(lambda: {a: a+1}, n_steps=10)
-
-In this case the lambda expression does not require any input parameters
-and returns an update dictionary which tells how ``a`` should be updated
-after each step of scan. If we write :
-
-.. code-block:: python
-
-  b = a + 1
-  c = updates[a] + 1
-  f = theano.function([], [b, c], updates=updates)
-
-  print b
-  print c
-  print a.value
-
-We will see that because ``b`` does not use the updated version of
-``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
-If we call the function again, ``b`` will become 12, ``c`` will be 22
-and ``a.value`` 21.
-
-If we do not pass the ``updates`` dictionary to the function, then
-``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
-will always be ``12``.
-
-
-Using shared variables - the strict flag
----------------------------------------
-
-You also have the possibility to use the ``strict`` flag. When set to true,
-Scan assumes that all the necessary shared variables in ``fn`` are passed as a
-part of ``non_sequences``. This has to be ensured by the user. Otherwise, it
-will result in an error. It avoids Scan Op calling any earlier (external) Op
-over and over. This results in a simpler computational graph, which speeds up
-the optimization and the execution.
-
-Here is a simple RRN example:
-
-.. math::
-  x(n) = \tanh(u(n) + W x(n-1))
-
-
-And the code using ``strict=True``:
-
-.. code-block:: python
-
-    u = T.matrix() # The input sequence
-    x0 = T.vector() # The initial state
-    W = theano.shared(W_values) # we assume that ``W_values`` contains the
-                                # initial values of your weight matrix.
-
-    def oneStep(u_t, x_tm1, W):
-        return T.tanh(u_t + T.dot(W, x_tm1))
-
-    # Using strict=True, and passing W as a non_sequence
-    x_vals, updates = theano.scan(fn=oneStep,
-                                  sequences=dict(input=u, taps=[0]),
-                                  outputs_info=[dict(initial=x0,
-                                                     taps=[-1])],
-                                  non_sequences=[W], # Don't forget to pass W!
-                                  strict=True)
-
-
-If you omit to pass ``W`` as a ``non_sequence``, it will result in an error.
-
-
 Conditional ending of Scan
 --------------------------