merge

cb7bbb7c · Razvan Pascanu · 2fe8af20 · ec18a434 · cb7bbb7c · cb7bbb7c
--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -13,12 +13,11 @@ The scan functions provides the basic functionality needed to do loops
 in Theano. Scan comes with many whistles and bells, that can be easily 
 introduced through a few examples : 

-Computing :math:`A^k`
---------------------
+Basic functionality :  Computing :math:`A^k`
+--------------------------------------------

-Assume that, given *k* you want to get ``A**k`` using a loop(note that 
-you do not need to do this in Theano, this example is purely
-didactical). More precisely, if *A* is a tensor you want to compute 
+Assume that, given *k* you want to get ``A**k`` using a loop. 
+More precisely, if *A* is a tensor you want to compute 
 ``A**k`` elemwise. The python/numpy code would loop like

 .. code-block:: python
@@ -32,17 +31,17 @@ The equivalent Theano code would be
 .. code-block:: python

  # Symbolic description of the result
-  result = theano.scan(fn             = lambda x_tm1,A: x_tm1*A,\
+  result,updates = theano.scan(fn = lambda x_tm1,A: x_tm1*A,\
                       sequences      = [], \
                       initial_states = T.ones_like(A),\
                       non_sequences  = A, \
                       n_steps        = k)

  # compiled function that returns A**k
-  f = theano.function([A,k], result)
+  f = theano.function([A,k], result[-1], updates = updates)

 Let us go through the example line by line. What we did is first to 
-construct a function (using lambda expression) that given `x_tm1` and 
+construct a function (using a lambda expression) that given `x_tm1` and 
 `A` returns `x_tm1*A`. Given the order of the parameters, `x_tm1`
 is the value of our output at time step ``t-1``. Therefore 
 ``x_t`` (value of output at time `t`) is `A` times value of output 
@@ -52,9 +51,14 @@ iterate over anything) and initialize the output as a tensor with same
 shape as A filled with ones. We give A as a non sequence parameter  and
 tell scan to iterate for k steps. 

+Scan will return a tuple, containing our result (``result``) and a
+dictionary of updates ( empty for this example). Note that the result 
+is not a matrix, but a 3D tensor containing the value of ``A**k`` for 
+each step. We want the last value ( after k steps ) so we compile 
+a function to return just that. 

-Recurrent Neural Network with Scan
----------------------------------
+Multiple outputs, several taps values - Recurrent Neural Network with Scan
+--------------------------------------------------------------------------

 A more practical task would be to implement a RNN using scan. Assume 
 that our RNN is defined as follows :
@@ -65,6 +69,10 @@ that our RNN is defined as follows :

  y(n) = W^{out} x(n- 3)

+Note that this network is far away from a classical recurrent neural
+network and might be in practice useless. The reason we defined as such
+is to better ilustrate the features of scan. 
+
 In this case we have a sequence over which we need to iterate ``u``, 
 and two outputs ``x`` and ``y``. To implement this with scan we first
 construct a function that computes one iteration step : 
@@ -82,9 +90,11 @@ construct a function that computes one iteration step :
    return [x_t, y_t]

 Note the order in which the parameters are given, and in which the
-result is returned. It is crucial to respect cronological order among 
-the taps ( time slices of sequences or outputs) used, and to have same
-order in this function as when applying scan. Given that we have all 
+result is returned. Try to respect cronological order among 
+the taps ( time slices of sequences or outputs) used. In practice what
+is crucial to happen for the computation to work is to give the slices 
+in the same order as provided in the ``sequence_taps``/``outputs_taps`` dictionaries and to have same
+order of inputs here as when applying scan. Given that we have all 
 the Theano variables needed we construct our RNN as follows : 

 .. code-block:: python
@@ -96,7 +106,7 @@ the Theano variables needed we construct our RNN as follows :
                   # y[-1]


-   x_vals, y_vals = theano.scan(fn             = oneStep, \
+   ([x_vals, y_vals],updates) = theano.scan(fn             = oneStep, \
                                sequences      = [u], \
                                initial_states = [x0,y0], \
                                non_sequences  = [W,W_in_1,W_in_2,W_feedback, W_out], \
@@ -107,7 +117,91 @@ the Theano variables needed we construct our RNN as follows :


 Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
-sequence of x and y values generated by iterating over u.
+sequence of x and y values generated by iterating over u. The
+``sequence_taps``, ``outputs_taps`` give to scan information about what 
+slices are exactly needed. Note that if we want to use ``x[t-k]`` we do 
+not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying 
+the compiled function, the numpy array given to represent this sequence 
+should be large enough to cover this values. Assume that we compile the 
+above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
+By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and 
+will start scaning from ``uvals[4]`` towards the end.
+
+
+Using shared variables - Gibbs sampling
+---------------------------------------
+
+Another useful feature of scan, is that it can handle shared variables. 
+For example, if we want to implement a Gibbs chain of length 10 we would do 
+the following:
+
+.. code-block:: python
+
+ W = theano.shared ( W_values ) # we assume that ``W_values`` contains the
+                                # initial values of your weight matrix
+ 
+ bvis = theano.shared( bvis_values)
+ bhid = theano.shared( bhid_values)
+ 
+ trng = T.shared_randomstreams.RandomStreams(1234)
+
+ def OneStep( vsample) :
+    hmean   = T.nnet.sigmoid( theano.dot( vsample, W) + bhid)
+    hsample = trng.binomial( size = hmean.shape, n = 1, prob = hmean)
+    vmean   = T.nnet.sigmoid( theano.dot( hsample. W.T) + bvis)
+    return trng.binomial( size = vsample.shape, n = 1, prob = vsample)
+
+ sample = theano.tensor.vector()
+
+ values, updates = theano.scan( OneStep, [],sample, [], n_steps = 10 )
+
+ gibbs10 = theano.function([sample], values[-1], updates = updates)
+
+
+Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but 
+we do not iterate over them ( so scan doesn't really need to know
+anything in particular about them, just that they are used inside the
+function applied at each step) you do not need to pass them as
+arguments. Scan will find them on its on and add them to the graph. Of
+course, if you wish to (and it is good practice) you can add them, when 
+you call scan (they would be in the list of non sequence inputs). 
+
+The second, and probably most crucial observation is that the updates 
+dictionary becomes important in this case. It links a shared variable
+with its updated value after k steps. In this case it tells how the
+random streams get updated after 10 iterations. If you do not pass this 
+update dictionary to your function, you will always get the same 10
+sets of random numbers. You can even use the ``updates`` dictionary
+afterwards. Look at this example :
+
+.. code-block:: python
+
+ a = theano.shared(1)
+ values,updates = theano.scan( lambda : {a:a+1}, [],[],[], n_steps = 10 )
+
+In this case the lambda expression does not require any input parameters 
+and returns an update dictionary which tells how ``a`` should be updated
+after each step of scan. If we write :
+
+.. code-block:: python
+
+  b = a+1 
+  c = updates[a] + 1
+  f = theano.function([], [b,c], updates = updates)
+
+  print b
+  print c 
+  print a.value
+
+We will see that because ``b`` does not use the updated version of
+``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``. 
+If we call the function again, ``b`` will become 12, ``c`` will be 22
+and ``a.value`` 21.
+
+If we do not pass the ``updates`` dictionary to the function, then 
+``a.value`` will always remain 1, ``b`` will always be 2 and ``c`` 
+will always be ``12``.
+


 Reference

--- a/theano/scan.py
+++ b/theano/scan.py
@@ -82,16 +82,23 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
        * all time slices of the second otuput (as given in the ``initial_state`` list) ordered cronologically
        * ...
        * all other parameters over which scan doesn't iterate given in the same order as in ``non_sequences``
-    The outputs of these function should have the same order as in the list ``initial_states``
+    If you are using shared variables over which you do not want to iterate, you do not need to provide them as 
+    arguments to ``fn``, though you can if you wish so. The function should return the outputs after each step plus
+    the updates for any of the shared variables. You can either return only outputs or only updates. If you have
+    both outputs and updates the function should return them as a tuple : (outputs, updates) or (updates, outputs). 
+    Outputs can be just a theano expression if you have only one outputs or a list of theano expressions. Updates
+    can be given either as a list of as a dictionary. If you have a list of outputs, the order of these should 
+    match that of their ``initial_states``. 

-    :param sequences: list of Theano variables over which scan needs to iterate
+    :param sequences: list of Theano variables over which scan needs to iterate.

    :param initial_states: list of Theano variables containing the initial state used for the output. 
    Note that if the function applied recursively uses only the previous value of the output or none, this initial state 
    should have same shape as one time step of the output; otherwise, the 
    initial state should have the same number of dimension as output. This 
    can easily be understand through an example. For computing ``y[t]`` let 
-    assume that we need ``y[t-1]``, ``y[t-2]`` and ``y(t-4)``. Through an abuse of notation, when ``t = 0``, we would need values for ``y[-1]``, ``y[-2]`` and 
+    assume that we need ``y[t-1]``, ``y[t-2]`` and ``y(t-4)``. Through an abuse of notation, 
+    when ``t = 0``, we would need values for ``y[-1]``, ``y[-2]`` and 
    ``y[-4]``. These values are provided by the initial state of ``y``, which 
    should have same number  of dimension as ``y``, where the first dimension should 
    be large enough to cover all past values, which in this case is 4.
@@ -146,6 +153,14 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
    classical BPTT, where you only do ``truncate_gradient`` number of steps.

    :param go_backwards: Flag indicating if you should go bacwards through the sequences
+
+    :rtype: tuple 
+    :return: tuple of the form (outputs, updates)
+    ``outputs`` is either a Theano variable or a list of Theano variables 
+    representing the outputs of scan. ``updates``
+    is a dictionary specifying the updates rules for all shared
+    variables used in the scan operation; this dictionary should be pass
+    to ``theano.function``
    '''

    # check if inputs are just single variables instead of lists     
@@ -220,7 +235,7 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
    args += non_seqs

    outputs_updates  = fn(*args)
-    otuputs = []
+    outputs = []
    updates = {}
    # we try now to separate the outputs from the updates
    if not type(outputs_updates) in (list,tuple):
@@ -309,6 +324,8 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
                         + noshared
                         + shared_non_seqs))

+    if not type(values) in (tuple, list):
+        values = [values]
    for k in update_map.keys():
        update_map[k] = values [ update_map[k] ] 


--- a/theano/tests/test_scan.py
+++ b/theano/tests/test_scan.py
@@ -296,7 +296,7 @@ class T_Scan(unittest.TestCase):
          vW1 = vW1 + .1
          vW2 = vW2 + .05

-    def test_8(self):
+    def test_9(self):

        W_vals  = numpy.random.rand(20,30) -.5
        vis_val = numpy.random.binomial(1,0.5, size=(3,20))
@@ -344,6 +344,20 @@ class T_Scan(unittest.TestCase):

        assert (compareArrays(t_res, n_res))

+    def test_10(self):
+
+      s = theano.shared(1)
+
+
+      def f_pow2():
+        return {s: 2*s}
+    
+      n_steps = theano.tensor.dscalar()
+      Y, updts = theano.scan(f_pow2, [],[], [],n_steps = n_steps)
+      f1 = theano.function([n_steps], Y, updates = updts)
+      f1(3)
+      assert(compareArrays(s.value, 8))
+ 
    '''
    # test gradient simple network 
    def test_10(self):
@@ -356,6 +370,7 @@ class T_Scan(unittest.TestCase):
        - test gradient (multiple outputs / some uncomputable )
        - test gradient (truncate_gradient)
        - test_gradient (taps past/future)
+        - optimization !? 
    '''