Merge pull request #2816 from Thrandis/ccw

Added doc for strict flag.

Merge pull request #2816 from Thrandis/ccw
edbf47e0 · Mehdi Mirza · 257a7a31 · 3e0c5bb7 · edbf47e0
--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -210,6 +210,138 @@ with all values set to zero except at the provided array indices.
 This demonstrates that you can introduce new Theano variables into a scan function.
+Using shared variables - Gibbs sampling
+---------------------------------------
+Another useful feature of scan, is that it can handle shared variables.
+For example, if we want to implement a Gibbs chain of length 10 we would do
+the following:
+.. code-block:: python
+    W = theano.shared(W_values) # we assume that ``W_values`` contains the
+                                # initial values of your weight matrix
+    bvis = theano.shared(bvis_values)
+    bhid = theano.shared(bhid_values)
+    trng = T.shared_randomstreams.RandomStreams(1234)
+    def OneStep(vsample) :
+        hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
+        hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
+        vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
+        return trng.binomial(size=vsample.shape, n=1, p=vmean,
+                         dtype=theano.config.floatX)
+    sample = theano.tensor.vector()
+    values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)
+    gibbs10 = theano.function([sample], values[-1], updates=updates)
+The first, and probably most crucial observation is that the updates
+dictionary becomes important in this case. It links a shared variable
+with its updated value after k steps. In this case it tells how the
+random streams get updated after 10 iterations. If you do not pass this
+update dictionary to your function, you will always get the same 10
+sets of random numbers. You can even use the ``updates`` dictionary
+afterwards. Look at this example :
+.. code-block:: python
+    a = theano.shared(1)
+    values, updates = theano.scan(lambda: {a: a+1}, n_steps=10)
+In this case the lambda expression does not require any input parameters
+and returns an update dictionary which tells how ``a`` should be updated
+after each step of scan. If we write :
+.. code-block:: python
+    b = a + 1
+    c = updates[a] + 1
+    f = theano.function([], [b, c], updates=updates)
+    print b
+    print c
+    print a.value
+We will see that because ``b`` does not use the updated version of
+``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
+If we call the function again, ``b`` will become 12, ``c`` will be 22
+and ``a.value`` 21. If we do not pass the ``updates`` dictionary to the
+function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
+``c`` will always be ``12``.
+The second observation is that if we use shared variables ( ``W``, ``bvis``,
+``bhid``) but we do not iterate over them (ie scan doesn't really need to know
+anything in particular about them, just that they are used inside the
+function applied at each step) you do not need to pass them as arguments.
+Scan will find them on its own and add them to the graph.
+However, passing them to the scan function is a good practice, as it avoids
+Scan Op calling any earlier (external) Op over and over. This results in a
+simpler computational graph, which speeds up the optimization and the 
+execution. To pass the shared variables to Scan you need to put them in a list
+and give it to the ``non_sequences`` argument. Here is the Gibbs sampling code
+updated:
+.. code-block:: python
+    W = theano.shared(W_values) # we assume that ``W_values`` contains the
+                                # initial values of your weight matrix
+    bvis = theano.shared(bvis_values)
+    bhid = theano.shared(bhid_values)
+    trng = T.shared_randomstreams.RandomStreams(1234)
+    # OneStep, with explicit use of the shared variables (W, bvis, bhid)
+    def OneStep(vsample, W, bvis, bhid):
+        hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
+        hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
+        vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
+        return trng.binomial(size=vsample.shape, n=1, p=vmean,
+                         dtype=theano.config.floatX)
+    sample = theano.tensor.vector()
+    # The new scan, with the shared variables passed as non_sequences
+    values, updates = theano.scan(fn=OneStep,
+                                  outputs_info=sample,
+                                  non_sequences=[W, bvis, bhid],
+                                  n_steps=10)
+    gibbs10 = theano.function([sample], values[-1], updates=updates)
+Using shared variables - the strict flag
+----------------------------------------
+As we just saw, passing the shared variables to scan may result in a simpler
+computational graph, which speeds up the optimization and the execution. A
+good way to remember to pass every shared variable used during scan is to use
+the ``strict`` flag. When set to true, scan assumes that all the necessary
+shared variables in ``fn`` are passed as a part of ``non_sequences``. This has
+to be ensured by the user. Otherwise, it will result in an error.
+Using the previous Gibbs sampling example:
+.. code-block:: python
+    # The new scan, using strict=True
+    values, updates = theano.scan(fn=OneStep,
+                                  outputs_info=sample,
+                                  non_sequences=[W, bvis, bhid],
+                                  n_steps=10,
+                                  strict=True)
+If you omit to pass ``W``, ``bvis`` or ``bhid`` as a ``non_sequence``, it will
+result in an error.
 Multiple outputs, several taps values - Recurrent Neural Network with Scan
 --------------------------------------------------------------------------
@@ -238,7 +370,7 @@ construct a function that computes one iteration step :
  def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2,  W_feedback, W_out):
-    x_t = T.tanh( theano.dot(x_tm1, W) + \
+    x_t = T.tanh(theano.dot(x_tm1, W) + \
                 theano.dot(u_t,   W_in_1) + \
                 theano.dot(u_tm4, W_in_2) + \
                 theano.dot(y_tm1, W_feedback))
@@ -266,10 +398,11 @@ the Theano variables needed we construct our RNN as follows :
                   # y[-1]
-   ([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \
+   ([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
-                                sequences    = dict(input = u, taps= [-4,-0]), \
+                                             sequences=dict(input=u, taps=[-4,-0]),
-                                outputs_info = [dict(initial = x0, taps = [-3,-1]),y0], \
+                                             outputs_info=[dict(initial=x0, taps=[-3,-1]), y0],
-                                non_sequences  = [W,W_in_1,W_in_2,W_feedback, W_out])
+                                             non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
+                                             strict=True)
        # for second input y, scan adds -1 in output_taps by default
@@ -286,81 +419,6 @@ By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
 will start scaning from ``uvals[4]`` towards the end.
-Using shared variables - Gibbs sampling
---------------------------------------
-Another useful feature of scan, is that it can handle shared variables.
-For example, if we want to implement a Gibbs chain of length 10 we would do
-the following:
-.. code-block:: python
- W = theano.shared(W_values) # we assume that ``W_values`` contains the
-                             # initial values of your weight matrix
- bvis = theano.shared(bvis_values)
- bhid = theano.shared(bhid_values)
- trng = T.shared_randomstreams.RandomStreams(1234)
- def OneStep(vsample) :
-    hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
-    hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
-    vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
-    return trng.binomial(size=vsample.shape, n=1, p=vmean,
-                         dtype=theano.config.floatX)
- sample = theano.tensor.vector()
- values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)
- gibbs10 = theano.function([sample], values[-1], updates=updates)
-Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but
-we do not iterate over them (so scan doesn't really need to know
-anything in particular about them, just that they are used inside the
-function applied at each step) you do not need to pass them as
-arguments. Scan will find them on its own and add them to the graph. Of
-course, if you wish to (and it is good practice) you can add them, when
-you call scan (they would be in the list of non-sequence inputs).
-The second, and probably most crucial observation is that the updates
-dictionary becomes important in this case. It links a shared variable
-with its updated value after k steps. In this case it tells how the
-random streams get updated after 10 iterations. If you do not pass this
-update dictionary to your function, you will always get the same 10
-sets of random numbers. You can even use the ``updates`` dictionary
-afterwards. Look at this example :
-.. code-block:: python
- a = theano.shared(1)
- values, updates = theano.scan(lambda: {a: a+1}, n_steps=10)
-In this case the lambda expression does not require any input parameters
-and returns an update dictionary which tells how ``a`` should be updated
-after each step of scan. If we write :
-.. code-block:: python
-  b = a + 1
-  c = updates[a] + 1
-  f = theano.function([], [b, c], updates=updates)
-  print b
-  print c
-  print a.value
-We will see that because ``b`` does not use the updated version of
-``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
-If we call the function again, ``b`` will become 12, ``c`` will be 22
-and ``a.value`` 21.
-If we do not pass the ``updates`` dictionary to the function, then
-``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
-will always be ``12``.
 Conditional ending of Scan
 --------------------------