changes into the tutorials

839eda94 · Razvan Pascanu · 0fc7c8ca · 839eda94 · 839eda94 · 839eda94
--- a/doc/tutorial/adding.txt
+++ b/doc/tutorial/adding.txt
@@ -8,8 +8,8 @@ Baby steps - Adding two numbers together
 Adding two scalars
 ==================

-So, to get us started and get a feel of what we're working with, let's
-make a simple function: add two numbers together. Here is how you do
+So, to get us started with Theano and get a feel of what we're working with, 
+let's make a simple function: add two numbers together. Here is how you do
 it:

 >>> x = T.dscalar('x')
@@ -26,7 +26,7 @@ array(28.4)


 Let's break this down into several steps. The first step is to define
-two symbols, or Variables, representing the quantities that you want
+two symbols  representing the quantities that you want
 to add. Note that from now on, we will use the term :term:`Variable`
 to mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Variable
 objects). The output of the function ``f`` is a ``numpy.ndarray``
@@ -36,7 +36,6 @@ If you are following along and typing into an interpreter, you may have
 noticed that there was a slight delay in executing the ``function``
 instruction. Behind the scenes, ``f`` was being compiled into C code.

-    .. TODO: help

 -------------------------------------------

@@ -64,8 +63,7 @@ TensorType(float64, scalar)
 >>> x.type == T.dscalar
 True

-You can learn more about the structures in Theano in
-the :ref:`advtutorial` and in :ref:`graphstructures`.
+You can learn more about the structures in Theano in :ref:`graphstructures`.

 By calling ``T.dscalar`` with a string argument, you create a
 :term:`Variable` representing a floating-point scalar quantity with the

--- a/doc/tutorial/examples.txt
+++ b/doc/tutorial/examples.txt
@@ -137,6 +137,9 @@ with respect to the second. In this way, Theano can be used for
 `automatic differentiation`_.

 .. note::
+   
+   The second argument of ``T.grad`` can be a list, case in which it 
+   will 

   The variable of ``T.grad`` has the same dimensions as the
   second argument. This is exactly like the first derivative if the

--- a/doc/tutorial/index.txt
+++ b/doc/tutorial/index.txt
@@ -10,7 +10,7 @@ Let's start an interactive session and import Theano.
 >>> from theano import *

 Many of symbols you will need to use are in the ``tensor`` subpackage
-of theano. Let's import that subpackage under a handy name. I like
+of Theano. Let's import that subpackage under a handy name. I like
 ``T`` (and many tutorials use this convention).

 >>> import theano.tensor as T

--- a/doc/tutorial/numpy.txt
+++ b/doc/tutorial/numpy.txt
@@ -8,10 +8,9 @@ NumPy refresher

 Here are some quick guides to NumPy:
  * `Numpy quick guide for Matlab users <http://www.scipy.org/NumPy_for_Matlab_Users>`__
-  * `More detailed table showing the NumPy equivalent of Matlab commands <http://www.scribd.com/doc/26685/Matlab-Python-and-R>`__
+  * `Numpy User Guide <http://docs.scipy.org/doc/numpy/user/index.html>`__
+  * `More detailed Numpy tutorial <http://www.scipy.org/Tentative_NumPy_Tutorial>`__

-    ..  TODO [DefineBroadcasting Broadcasting]
-    .. Broadcastable - Implicitly assume that all previous entries are true.
    .. [TODO: More doc, e.g. see _test_tensor.py]


@@ -20,8 +19,10 @@ Matrix conventions for machine learning


 Rows are horizontal and columns are vertical.
-Every row is an example. Therefore, inputs[10,5] is a matrix of 10 examples with 5 dimensions per.
-So to make a NN out of it, multiply by a weight matrix of size (5, #hid).
+Every row is an example. Therefore, inputs[10,5] is a matrix of 10 examples 
+where each example has dimension 5. If this would be the input of a
+neural network then the weights from the input the the first hidden
+layer would represent a matrix of size (5, #hid). 

 If I have an array:

@@ -43,3 +44,22 @@ To access the entry in the 3rd row (row #2) and the 1st column (column #0):
 To remember this, keep in mind that we read left-to-right, top-to-bottom,
 so each thing that is contiguous is a row.  That is, there are 3 rows
 and 2 columns.
+
+Broadcasting
+============
+
+Numpy does :term:`broadcasting` of numpy arrays of different shapes during
+arithmetic operations. What this means in general is that the smaller 
+array is *broadcasted* across the larger array so that they have
+compatible shapes. The example below shows an instance of
+*broadcastaing*:
+
+>>> a = numpy.asarray([1.0, 2.0, 3.0])
+>>> b = 2.0
+>>> a * b
+array([2., 4., 6.])
+
+The smaller array ``b`` in this case is *broadcasted* to the same size
+as a during the multiplication. This trick is often useful in
+simplifying how expression are written. More details about *broadcasting*
+can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__ .
--- a/theano/sandbox/scan.py
+++ b/theano/sandbox/scan.py
 """Provide Scan and related functions


-Scanning a function over sequential input(s) producing sequential output(s).
+ Scanning a function over sequential input(s) producing sequential output(s).

-Scanning is a general form of recurrence, which can be used for looping.
+ Scanning is a general form of recurrence, which can be used for looping.

-The idea is that you 'scan' a function along some input sequence, producing an output at each
-time-step that can be seen (but not modified) by the function at the next time-step.
-(Technically, the function can see the previous K time-steps.)
+ The idea is that you 'scan' a function along some input sequence, producing 
+ an output at each time-step that can be seen (but not modified) by the 
+ function at the next time-step. (Technically, the function can see the 
+ previous K  time-steps.)

-So for example, ``sum()`` could be computed by scanning the ``z+x_i`` function over a list,
-given an initial state of ``z=0``. 
+ So for example, ``sum()`` could be computed by scanning the ``z+x_i`` 
+ function over a list, given an initial state of ``z=0``. 

-Special cases:
+ Special cases:

-    - A ``reduce()`` operation can be performed by returning only the last output of a scan.
+    - A ``reduce()`` operation can be performed by returning only the last 
+      output of a scan.
    
-    - A ``map()`` operation can be performed by applying a function that ignores each previous
-      output.
+    - A ``map()`` operation can be performed by applying a function that 
+      ignores each previous output.

-Often a for loop can be expressed as a scan() operation, and scan is the closest that theano
-comes to looping.
+ Often a for loop can be expressed as a scan() operation, and scan is the 
+ closest that theano comes to looping.

-This module provides scanning functionality with the `Scan` Op.
+ This module provides scanning functionality with the `Scan` Op.

 """
 __docformat__ = 'restructedtext en'

-import traceback
 import numpy 
 import theano
-import theano.compile
 from theano.tensor import opt
 from theano import gof
 from theano.compile import optdb

-'''
- TODO : move out of sandbox !
-'''
-
-class Scan(theano.Op):
-    """Scan a function `fn` over several inputs producing several outputs 
+# Logging function for sending warning or info
+import logging
+_logger = logging.getLogger('theano.scan')
+def warning(*msg):
+    _logger.warning('WARNING theano.scan: '+' '.join(msg))
+def info(*msg):
+    _logger.info('INFO theano.scan: '+' '.join(msg))
+
+
+# Hashing a list; list used by scan are list of numbers, therefore a list 
+# can be hashed by hashing all elements in the list
+def hash_list(list):
+    hash_value = 0
+    for v in list:
+        hash_value ^= v
+    return hash_value
+
+
+# Hashing a dictionary; the dictionary used by scan has as keys numbers and 
+# as values either numbers or list of numbers
+def hash_dict(dictionary):
+    hash_value = 0
+    for k,v in dictionary,iteritems():
+        # hash key
+        hash_value ^= k
+        if type(v) in (list,tuple):
+            hash_value ^= hash_list(v)
+        else:
+            hash_value ^= v
+    return hash_value

-    This Op implements a generalization of scan in which `fn` may consult several previous
-    outputs from the past, from positions (taps) relative to the current time.   The number of
-    taps (T_j) to use for each output (y_j) must be provided when creating a Scan Op.

-    Apply Inputs:
+def scan(fn, sequnces, non_sequences, seed_values, inplace_map={}, 
+         sequences_taps={}, outputs_taps = {},
+         len = theano.tensor.zero(), force_gradient = False, 
+         truncate_gradient = -1, go_backwards = False, mode = 'FAST_RUN'):
+    '''The function creates a more intuitive interface to the scan op.

-        X sequence inputs x_1, x_2, ... x_X
+    This function first creates a scan op object, and afterwards applies it 
+    to the input data. The scan operation iterates over X sequences producing
+    Y outputs. The function that is applied recursively may consult several 
+    previous outputs from the past as well as past values and future values 
+    of the input. You can see it as havin the inputs :

-        Y initial states (u_1, u_2, ... u_Y) for our outputs. Each must have appropriate length
-        (T_1, T_2, ..., T_Y).
+        X sequences inptus x_1, x_2, .. x_X

-        W other inputs w_1, w_2, ... w_W
+        Y seeds/initial values ( u_1, u_2, .. u_Y) for the outputs

-    Apply Outputs:
+        W non sequences inputs w_1, w_2, .. w_W

-        Y sequence outputs y_1, y_2, ... y_Y
+    Outputs :
+        
+        Y sequence outputs y_1, y_2, .. y_Y

-    Each output y_j is computed one time-step at a time according to the formula:
+    Each otuput y_j computed one time step at a time according to the 
+    formula:

    .. code-block:: python

-        (y_1[t], y_2[t],.., y_Y[t]) = fn(
-            x_1[t], x_2[t], ... x_X[t],          # X current input values
-            y_1(t-1), y_1(t-2), .., y_1(t-T_1),  # T_1 previous outputs for y_1
-            y_2(t-1), y_2(t-2), ..., y_2(t-T_2), # T_2 previous outputs for y_2
-            ...,                                 # ...
-            y_Y(t-1), y_Y(t-2), ..., y_Y(t-T_Y), # T_Y previous outputs for y_Y
-            w_1, w_2,..., w_W)                   # W 'timeless' inputs
+      (y_1[t], y_2[t], .. y_Y[t]) = f( 
+        x_1[t-K_1],.. x_1[t],x_1[t+1],.. x_1[t+L_1], # x_1 past and future 
+                                                     #values
+        x_2[t-K-2],.. x_2[t],x_2[t+1],.. x_2[t+L_2], # x_2 past and future 
+                                                     # values
+        ...                                          # ...
+        y_1[t-1], y_1[t-2], .. y[t - T_1],           # past values of y_1
+        y_2[t-1], y_2[t-2], .. y[t - T_2],,          # past values of y_2 
+        ...
+        w_1, w_2, .., w_W)                           # 'timeless' inputs 

-    So `fn` must accept X + T_1 + T_2 + ... + T_Y + W arguments.

-    There are two high-level methods (`symbolic`, `compiled`) for creating a Scan Op besides
-    the low-level `__init__` constructor.  ***Why would you call them?***
+    
+    :param fn: fn is a lambda expression or a function that given a list of 
+    symbolic inputs returns the update list and symbolic outputs list of the 
+    function that shall be applied recursively. 
+
+    :param sequences:list of sequences over which the scan op should iterate;
+    sequnces length should also cover past and future taps; for example if 
+    you also use for a sequence the past tap -3 and future tap +4, to total 
+    length should be n+7, where first 3 values of sequence are those 
+    corresponding to -3 -2 -1 and the last 4 values correspond to n+1 n+2 
+    n+3 and n+4
+
+    :param non_sequences: list of inputs over which it shouldn't iterate 
+
+    :param seed_values: seeds (initial values) of the outputs; if past taps 
+    are this seeds should contain enough values to cover this past values; 
+    note that index 0 of a seed belongs to the largest past tap 
+    
+    :param inplace_map: a dictionary telling which output should be 
+    computed in place of which input sequence ; input sequence has to be 
+    of the same shape as the output

-    When applying a Scan Op to theano Variables, the order of arguments is very important! When
-    using the full flexibility of Scan there can be a lot of arguments, but it is essential to
-    put them in the following order: 
+    :param sequence_taps: a dictionary telling for each sequence what past 
+    and future taps it should use; past values should be negative, future
+    taps positives; by default 0 is added in this dictionary (current value)
+    if nothing is provided

-     1. "Ignored inputs" (x_i with i < n_inplace_ignore) that will be overwritten by an inplace scan.
+    :param outputs_taps: a dictionary telling for each output what past 
+    taps it should use (negative values); by default -1 is added to this 
+    dictionary if nothing is provided

-     2. Inputs that will be overwritten by an inplace scan (x_i with i < n_inplace)
+    :param len: a value (or theano scalar) describing for how many steps 
+    the scan should iterate; 0 means that it should iterate over the entire
+    length of the input sequence(s)

-     3. Remaining Inputs (x_i with i >= n_inplace)
+    :param force_gradient: a flag telling scan op that the gradient can be 
+    computed even though inplace or updates are used - use this on your own
+    risk

-     3. Output states (u_j) corresponding to the outputs that are computed inplace (j <
-     n_inplace)
+    :param truncate_gradient: tells for how many steps should scan go 
+    back in time on the backward pass of backpropagation through time 

-     4. Remaining output states not given in 3 (u_j with j >= n_inplace)
+    :param go_backwards: a flag indicating if scan should iterate back from 
+    the end of the sequence to the begining (if it is true) or from 0 to 
+    the end

-     5. Other inputs (w_1, w_2, ... w_W)
+    :param mode: indicates the mode that should be used to compile the
+    function that will be applied recursively

+    '''

-    Inplace Operation
-    =================

-    The Scan Op supports computing some (`n_inplace`) of the outputs y_j using the memory from
-    corresponding inputs x_j.
-    It is not possible to indicate precisely which outputs overwrite which inputs, but without
-    loss of generality we assume that each of the first `n_inplace` outputs (y_j) overwrites
-    the corresponding input (x_j).
+    # check if inputs are just single variables instead of lists     
+    if not (type(sequences) in (list, tuple)):
+        seqs = [sequences]
+    elif seqs = sequences
+        
+    if not type(seed_values) in (list,tuple)):
+        seeds = [seed_values]
+    elif 
+        seeds = seed_values
+        
+    if not (type(non_sequences) in (list,tuple)):
+        non_seqs = [non_sequences]
+    elif 
+        non_seqs = non_sequences
+
+
+
+    # compute number of sequences and number of seeds    
+    n_seqs     = len(seqs)
+
+    # see if there are outputs that do not feed anything back to the function
+    # applied recursively
+    outs_tapkeys = outputs_taps.keys()
+    for k in outs_tapkeys.sort():
+        if outputs_taps[k] == []
+            # add empty lists where you have outputs that do not have past 
+            # values
+            seeds = seeds[:k] + [[]] + seeds[k:]
+
+    n_seeds   = len(seeds)
+
+    # update sequences_taps[idx] to contain 0 if it is not defined
+    for i in xrange(n_seqs):
+        if not sequences_taps.has_key(i):
+            sequences_taps.update({i:[0]})
+        # if input sequence is not actually used by the recursive function
+        elif sequences_taps[i] == []:
+            sequences_taps.__delitem__(i)
+        elif not (sequences_taps[i] in (list,tuple)):
+            sequences_taps[i] = [sequences_taps[i]]
+
+    # update outputs_taps[idx] to contain -1 if it is not defined
+    for i in xrange(n_seeds):
+        if not outputs_taps.has_key(i):
+            outputs_taps.update({i:-1})
+        # if output sequence is not actually used as input to the recursive 
+        # function
+        elif outputs_taps[i] == []:
+            outputs_taps.__delitem__(i)
+        elif not(outputs_taps[i] in (list,tuple)):
+            outputs_taps[i] = [outputs_taps[i]]
+
+
+    # create theano inputs for the recursive function  
+    args = []
+    for (i,seq) in enumerate(seqs):
+      if sequences_taps.has_key(i):
+        for k in len(sequences_taps[i]):
+            args += [seq[0].type() ]
+    for (i,seed) in enumerate(seeds):
+      if outputs_taps.has_key(i):
+        for k in len(outputs_taps[i]):
+            args += [seed[0].type() ]
+
+    args += non_seqs
+    next_outs, updates = fn(*args)
+
+    # Create the Scan op object
+    local_op = Scan( (args,next_outs, updates), n_seqs,n_seeds,inplace_map,
+            sequences_taps, outputs_taps, force_gradient, truncate_gradient,
+            go_backwards, mode)
+
+    # Call the object on the input sequences, seeds, and non sequences
+    return local_op( *(    [thenao.tensor.as_tensor(len)]  \
+                         + seqs \
+                         + seeds \
+                         + non_seqs))
+
+
+
+
+''' The class implementing the scan op 
+
+The actual class. I would not recommend using it directly unless you really 
+know what you are doing' 
+'''
+class Scan(theano.Op):
+    def __init__(self,(inputs, outputs, updates),n_seqs, n_seeds,
+                 inplace_map={}, seqs_taps={}, outs_taps={},
+                 force_gradient = False, truncate_gradient = -1,
+                 go_backwards = False, inplace=False):
+        '''
+        :param inputs: list of symbolic inputs of the function that will 
+        be applied recursively 
+
+        :param outputs: list of symbolic outputs for the function applied 
+        recursively

-    Note that using inplace computations destroys information, and may make it
-    impossible to compute the gradient.
-    As long as the function 'fn' does not update any of the other
-    parameters (w_1,..) a gradient of this operation is supported.
-    ***Who will care about this?  Someone just using the Op? Someone writing an inplace
-    optimization?*** 
+        :param updates: list of updates for the function applied recursively

-    Ignored Inputs
-    ==============
+        :param n_seqs: number of sequences in the input over which it needs
+        to iterate

-    **** Behaviour?  Rationale?  Use case?
+        :param n_seeds: number of outputs (same as the number of seeds) 

-    """
-    @classmethod
-    def symbolic(cls,(in_args,out_args), n_ins, n_outs,\
-                n_inplace=0, n_inplace_ignore=0, taps={},
-                mode = 'FAST_RUN'):
-        
-        # if in_args is not a list assume it is just a variable and 
-        # convert it to a list (if this is neither the case the code will 
-        # raise an error somewhere else !)
-        if not( type(in_args) in (list,tuple)):
-            in_args = [in_args]
-        # if out_args is not a list assume it is just a variable and 
-        # convert it to a list 
-        if not (type(out_args) in (list,tuple)):
-            out_args = [out_args]
- 
-        # Create fn 
-        my_fn   = theano.compile.sandbox.pfunc(in_args, out_args, mode = mode)
-
-        # Create gradient function 
-        gy_next  = [out_args[0].type()]
-        g_inputs = theano.tensor.grad(out_args[0],in_args,g_cost=gy_next[-1])
-        for y_next in out_args[1:] :
-            gy_next +=[y_next.type()]
-            g_ls = theano.tensor.grad(y_next,in_args,g_cost=gy_next[-1])
-            for i in xrange(len(in_args)):
-                g_inputs[i] += g_ls[i]
-        g_fn=theano.compile.sandbox.pfunc(gy_next+in_args,g_inputs,
-                             mode=mode)
+        :param inplace_map: dictionary discribing which output should be 
+        computed inplace of which input 

-    
-        return cls(my_fn, g_fn, n_ins, n_outs,\
-                   n_inplace,n_inplace_ignore, taps)
+        :param seqs_taps: dictionary discribing which past and future taps
+        of the input sequences are used by the recursive function

-    @classmethod
-    def compiled(cls,fn,n_ins, n_outs,\
-            n_inplace=0, n_inplace_ignore=0, taps={}):
-        """Return a Scan instance that will scan the callable `fn` over `n_ins` inputs and
-        `n_outs` outputs.
+        :param outs_taps: dictionary discribing which past taps of the 
+        outputs the recursive function is using 

+        :param force_gradient: a flag indicating if the gradient is still 
+        computable even though inplace operation or updates are used

-        """
-        return cls(fn, None, n_ins, n_outs, \
-                   n_inplace, n_inplace_ignore, taps= taps)
+        :param truncate_gradient: if different from -1 it tells after how 
+        many steps in the backward pass of BPTT 
+        '''
+        

+        # check inplace map
+        for _out,_in in inplace_map.iteritems():
+            if _out > n_seeds:
+                raise ValueError(('Inplace map reffers to an unexisting'\
+                          'output %d')% _out)
+            if _in > n_seqs:
+                raise ValueError(('Inplace map reffers to an unexisting'\
+                          'input sequence %d')%_in)
+            if (_in >= 0) and (min(seqs_taps[_in]) < 0):
+                raise ValueError(('Input sequence %d uses past values that '\
+                         'will be overwritten by inplace operation')%_in)


-    def __init__(self,fn,grad_fn,n_ins,n_outs,
-                 n_inplace=0, n_inplace_ignore=0,                 
-                 taps={}, inplace=False):
-        """Create an instance of the scan class
+        #check sequences past taps
+        for k,v in seqs_taps.map_iteritems():
+          if k > n_seqs:
+            raise ValueError(('Sequences past taps dictionary reffers to '
+                    'an unexisting sequence %d')%k)

-        To use Scan, first you need to create it specifying the number of inputs, outputs,
-        inplace outputs (see notes below), and inputs to be ignored, a dictionary describing
-        the time taps used, the function that will be applied recursively and optionally, the
-        gradient function (or a symbolic definition of the function and the op will compute the
-        gradient on its own). Secondly you just call the op with a list of parameters.
+        #check outputs past taps
+        for k,v in outs_taps.map_iteritems():
+          if k > n_seeds:
+            raise ValueError(('Sequences past taps dictionary reffers to '
+                    'an unexisting sequence %d')%k)
+          if max(v) > -1:
+            raise ValueError(('Can not require future value %d of output'
+                    '%d')%(k,max(v)))

-        :param fn: compiled function that takes you from time step t-1 to t

-        :param grad_fn: gradient of the function applied recursevly
- 
-        :param n_ins: number of inputs; in the list of arguments
-        they start from 0 to 'n_ins'
-
-        :param n_outs: number of outputs; in the list of arguments you 
-        need to give the initial state of each outputs, this will be from 
-        'n_ins' to 'n_outs'; each initial state should be a matrix where 
-        the first dimension is time and should be sufficiently large to 
-        cover the time taps. The matrix for an initial state should be 
-        ordered such that if you use k delays, index 0 of matrix stands for 
-        the value at time -k, index 1 for value at time 1-k, index 2 for 
-        value at time 2-k and index k-1 for value at time -1
-
-        :param n_inplace: indicates the number of outputs that should be 
-        computed inplace; in the list of arguments there will be the first
-        'n_inplace' outputs in place of the first 'n_inplace' inputs
-
-        :param n_inplace_ignore: indicates the number of inputs that are 
-        given just to be replaced by the inplace computation and which
-        should not be given as arguments to the function applied 
-        recursevly
-
-
-        :param taps: a dictionary which for each output index gives
-        a list of what taps it uses; a tap is given as an int, 
-        where x stands for output(t - x); note that a past trace of 1 makes
-        no sense, since you get that by default
-
-        :param inplace: is used by the optimizer that allows the inplace 
-        computation
-        """
-        if n_ins < 1:
-            raise ValueError('Scan should iterate over at least on one input')
-
-        if n_outs <1:
-            raise ValueError('Scan should have at least one output')
-        if (n_inplace > n_ins):
-            raise ValueError('Number of inplace outputs should be smaller than '
-                     'the number of inputs.')
-        if (n_inplace < 0):
-            raise ValueError('Number of inplace outputs should be larger '
-                             'or equal to 0')
-        if (n_inplace_ignore > n_inplace):
-            raise ValueError('Number of inputs to ignore should not be '\
-                             'larger than number of inplace outputs')
-        if (n_inplace_ignore < 0):
-            raise ValueError('n_inplace_ignore should be non-negative')

        self.destroy_map = {}
        if inplace:
-            for i in xrange(n_inplace):
-                self.destroy_map.update( {i:[i]} )
-
-        for (k,v) in taps.iteritems():
-            if k < 0 or k > n_outs:
-                raise ValueError('Taps dictionary contains wrong key!')
-            for vi in v:
-                # why is it illegal to specify  vi < 2?  
-                # what is special about vi == 1?
-                #
-                # Would it be simpler to just leave v alone if it is non-empty (checking that
-                # all vi are >=1) and set v = [1] for all missing output keys?
-              if vi < 2:
-                raise ValueError('Taps dictionary contains wrong values!')
-
-        self.taps   = taps
-        self.n_ins  = n_ins
-        self.n_outs = n_outs
-        self.n_inplace = n_inplace
-        self.inplace = inplace
-        self.n_inplace_ignore = n_inplace_ignore
-        self.fn = fn
-        self.grad_fn = grad_fn
+            self.destroy_map = inplace_map
+
+        self.seqs_taps      = seqs_taps
+        self.outs_taps      = outs_taps
+        self.n_seqs         = n_seqs
+        self.n_seeds        = n_seeds
+        self.n_args         = n_seqs+n_seeds+1
+        self.inplace_map    = inplace_map
+        self.inplace        = inplace
+        self.inputs         = inputs
+        self.outputs        = outputs
+        self.updates        = updates
+        self.force_gradient = force_gradient
+        self.truncate_gradient = truncate_gradient
+        self.go_backwards   = go_backwards
+    

-    def make_node(self, *inputs):
-        """Create an node for the Scan operation
+        self.fn = theano.function(inputs,outputs, \
+                                   updates = updates, mode = mode)

-        :param inputs: list of inputs for the operations; they should be 
-        at least 'self.n_ins'+'self.n_outs' arguments; first 'self.n_inplace'
-        are inputs that are replaced inplace, followed by oter inputs up 
-        to 'self.n_ins'; next 'self.n_outs' are ouputs followed by other 
-        arguments that will be given to the function applied recursevly
-        """
+        g_y = [outputs[0].type()]
+        g_args = theano.tensor.grad(outputs[0],inputs, g_cost = g_y[-1])
+        # for all outputs compute gradients and then sum them up
+        for y in outputs[1:]:
+            g_y += [y.type()]
+            g_args_y = theano.tensor.grad(y,inputs, g_cost=g_y[-1])
+            for i in xrange(len(g_args)):
+                g_args[i] += g_args_y[i]

-        n_args = len(inputs)
-        min_n_args = self.n_ins+self.n_outs
-        if n_args < min_n_args:
-            err = 'There should be at least '+str(min_n_args)+ 'arguments'
-            raise ValueError(err)

-        # Create list of output datatypes
-        out_types = []
-        for i in xrange(self.n_ins,self.n_ins+self.n_outs):
-            out_types += [theano.tensor.Tensor(dtype=inputs[i].dtype,\
-                    broadcastable=(False,)+inputs[i].broadcastable[1:])()]
-        return theano.Apply(self,inputs, out_types)
+        self.g_ins = g_y+inputs   
+        self.g_outs = g_args


+    def make_node(self,*inputs):
+      n_args = len(inputs)
+      if n_args < self.n_args :
+         err = 'There should be at least '+str(self.n_args)+ 'arguments'
+         raise ValueError(err)
+
+      # Create list of output datatypes
+      out_types = []
+      for i in xrange(self.n_seqs+1, self.n_seqs+self.n_seeds+1):
+         out_types += [theano.tensor.Tensor(dtype=inputs[i].dtype,\
+                 broadcastable=(False,)+inputs[i].broadcastable[1:])()]
+      return theano.Apply(self,inputs, out_types)


    def __eq__(self,other):
-        rval = type(self) == type(other)
-        if rval:
-            rval = (self.fn is other.fn) and \
-                   (self.grad_fn is other.grad_fn) and \
-                   (self.n_ins == other.n_ins) and \
-                   (self.n_outs == other.n_outs) and \
-                   (self.n_inplace == other.n_inplace) and \
-                   (self.n_inplace_ignore == other.n_inplace_ignore) and\
-                   (self.inplace == other.inplace) and\
-                   (self.taps == other.taps) 
-        return rval
+      rval = type(self) == type(other)
+      if rval:
+        rval = (self.inputs == other.inputs) and \
+               (self.outputs ==  other.outputs) and \
+               (self.updates == other.updates) and \
+               (self.g_ins == other.g_ins) and \
+               (self.g_outs == other.g_outs) and \
+               (self.seqs_taps == other.seqs_taps) and \
+               (self.outs_taps == other.outs_taps) and \
+               (self.inplace_map == other.inplace_map) and \
+               (self.n_seqs == other.n_seqs) and\
+               (self.inplace == other.inplace) and\
+               (self.go_backwards == other.go_backwards) and\
+               (self.truncate_gradient == other.truncate_gradient) and\
+               (self.force_gradient = other.force_gradient) and\
+               (self.n_seeds == other.n_seeds) and\
+               (self.n_args == other.n_args)
+      return rval

    def __hash__(self):
-        # hash the taps dictionary
-        taps_hash = 0
-        for k,v in self.taps.iteritems():
-            taps_hash ^= k
-            for vi in v : 
-                taps_hash ^= vi
-            
-        return hash(type(self)) ^ \
-               hash(self.fn) ^ \
-               hash(self.grad_fn) ^ \
-               hash(self.n_ins) ^ \
-               hash(self.n_outs) ^ \
-               hash(self.n_inplace) ^ \
-               hash(self.n_inplace_ignore) ^\
-               hash(self.inplace) ^\
-               taps_hash 
-
-
-
+      return hash(type(self)) ^ \
+             hash(self.n_seqs) ^ \
+             hash(self.n_seeds) ^ \
+             hash(self.force_gradient) ^\
+             hash(self.inplace) ^\
+             hash(self.go_backwards) ^\
+             hash(self.truncate_gradient) ^\
+             hash(self.n_args) ^ \
+             hash_list(self.outputs) ^ \
+             hash_list(self.inputs) ^ \
+             hash_list(g_ins) ^ \
+             hash_list(h_outs) ^ \
+             hash_dict(self.seqs_taps) ^\
+             hash_dict(self.outs_taps) ^\
+             hash_dict(self.inplace_map) ^\
+             hash_dict(self.updates)

-    def grad(self, inputs, g_outs):
-        
-        if self.grad_fn == None:
-            print 'Warning! no gradient for the recursive function was given'
-            return [None for i in inputs]
-        else:
-            y = self(*inputs)
-            if not( type(y) in (list,tuple)):
-                y = [y]
- 
-            for i in xrange(len(y)):
-                if g_outs[i] == None:
-                    g_outs[i] = theano.tensor.zeros_like(y[i])

-            # Construct my gradient class: 
-            gradScan = ScanGrad(self.grad_fn, 
-                            self.n_ins- self.n_inplace_ignore, self.n_outs,
-                            self.taps)
-
-             
-            args = g_outs + y + \
-                   inputs[self.n_inplace_ignore:]
-            
-            grads = gradScan(*args)
-            rval = [None for i in inputs[:self.n_inplace_ignore]]+grads
-            return rval


    def perform(self,node,args, outs):

-        # find number of timesteps, note that a precondition is to have 
-        # atleast one input to iterate over
-        n_steps = len(args[0])
+        n_steps = 0 
+        if (self.n_seqs ==0 ) and (args[0] == 0)
+            raise ValueError('Scan does not know over how many steps it '
+                'should iterate! No input sequence or number of steps to '
+                'iterate given !')

-        # check if we deal with a inplace operation 
-        n_inplace = self.n_inplace
-        n_inplace_ignore = self.n_inplace_ignore
+        if (args[0] != 0):
+            n_steps = args[0]
+        
+        for i in xrange(self.n_seqs):
+          if self.seqs_taps.has_key(i):
+              # compute actual length of the sequence ( we need to see what
+              # past taps this sequence has, and leave room for them 
+              seq_len = args[i+1].shape[0] + min(self.seqs_taps[i+1])
+              if self.seqs_taps[i+1][2] > 0: 
+                  # using future values, so need to end the sequence earlier
+                  seq_len -= self.seqs_taps[i+1][2]
+              if n_steps == 0 :
+                  # length of the sequences, leaving room for the largest
+                  n_steps = seq_len
+              if seq_len != n_steps : 
+                  warning(('Input sequence %d has a shorter length then the '
+                          'expected number of steps %d')%(i,n_steps))
+                  n_steps = min(seq_len,n_steps)
+
+
+
+        # check if we deal with an inplace operation 
+        inplace_map  = self.inplace_map
        if not self.inplace: #if it was not optimized to work inplace
-            n_inplace = 0
+            inplace_map = {}

 
-        # check lengths of inputs
-        for i in xrange(self.n_ins):
-            if args[i].shape[0] != n_steps:
-                raise ValueError('All inputs should have n_steps length!')
-
-        # check lengths of initial states
-        for i in xrange(self.n_ins, self.n_ins+self.n_outs):
-            req_size = 1
-            if self.taps.has_key(i- self.n_ins):
-                req_size = max(self.taps[i-self.n_ins])
-            if len(args[i].shape) == 0:
-              raise ValueError('Wrong initial state! ')
+        # check lengths of seeds
+        for i in xrange(self.n_seqs+1, \
+                        self.n_seqs+self.n_seeds+1):
+          if self.outs_taps.has_key(i-self.n_seqs-1):
+            req_size = abs(min(self.outs_taps[i-self.n_seqs-1]))-1
            if args[i].shape[0] < req_size:
-              raise ValueError('Wrong initial state! ')
-
-        # allocate space for the outputs 
-        y = []
-        # inplace outputs
-        for i in xrange(n_inplace):
-            y += [args[i]]
-        # add outputs 
-        for i in xrange(self.n_ins+n_inplace,self.n_ins+self.n_outs):
-            y_shape = (n_steps,)+args[i].shape[1:]
-            y += [numpy.empty(y_shape, dtype = args[i].dtype)]
-
-        # iterate
-        for i in xrange(n_steps):
-            fn_args = []
-            # get a time slice of inputs
-            for j in xrange(n_inplace_ignore, self.n_ins):
-                fn_args += [args[j][i]]
+              warning(('Initial state for output %d has fewer values then '
+                 'required by the maximal past value %d. Scan will use 0s'
+                 ' for missing values')%(i-self.n_iterable-1,req_size))
            
-            # get past values of outputs (t-1 + taps)
-            for j in xrange(self.n_outs):
-                # get list of taps
-                ls_taps = [1]
-                if self.taps.has_key(j):
-                    ls_taps += self.taps[j]
-                maxVal = max(ls_taps)
-                for tap_value in ls_taps:
-                    if i - tap_value < 0:
-                        fn_args += [args[j+self.n_ins][maxVal-tap_value+i]]
-                    else:
-                        fn_args += [y[j][i-tap_value]]
+        self.n_steps = n_steps
+        y = self.scan(self.fn, args[1:],self.n_seqs, self.n_seeds, 
+                 self.seqs_taps, self.outs_taps, n_steps, self.go_backwards, 
+                 inplace_map)

-            # get the none iterable parameters
-            fn_args += list(args[(self.n_ins+self.n_outs):])
-            # compute output
-            something = self.fn(*fn_args)
-            # update y and inplace outputs
-            for j in xrange(self.n_outs):
-                y[j][i] = something[j]

        # write to storage
-        for i in xrange(self.n_outs):
+        for i in xrange(self.n_seeds):
            outs[i][0]=y[i]



+    def scan(fn, args, n_seqs, n_seeds, seqs_taps, outs_taps,  n_steps, 
+             go_backwards, inplace_map):
+      y = []
+      for i in xrange(self.n_seeds):
+        if inplace_map.has_key(i) and (inplace_map[i] >= 0):
+          y += [args[inplace_map[i]]]
+        else:
+          y_shape = (n_steps,)+args[i+self.n_seqs].shape[1:]
+          y += [numpy.empty(y_shape,
+                            dtype=args[i+self.n_seqs].dtype)]
+      #iterate
+      if go_backwards:
+        the_range = xrange(n_steps-1,-1,-1)
+      else:
+        the_range = xrange(n_steps)
+
+      seqs_mins = {}
+      for j in xrange(self.n_seqs):
+        if seqs_taps.has_key(j):
+          seqs_mins.update({j:  min(seqs_taps[j])})
+
+      outs_mins = {}
+      seed_size = {}
+      for j in xrange(self.n_seeds):
+        if outs_taps.has_key(j):
+          outs_mins.update({j: min(outs_taps[j])})
+          seed_size.update({j: args[n_seqs+j].shape[0]})
+
+
+      for i in the_range:
+        fn_args = []
+
+        # sequences over which scan iterates
+        for j in xrange(self.n_seqs):
+          if seqs_taps.has_key(j):
+            ls_taps = seqs_taps[j]
+            min_tap = seqs_mins[j]
+            for tap_value in ls_taps:
+                k = i - min_tap + tap_value
+                fn_args += [args[j][k]]
+
+        # seeds or past values of outputs
+        for j in xrange(self.n_seeds):
+          if outs_taps.has_key(j):
+            ls_taps = outs_taps[j]
+            min_tap = outs_mins[j]
+            seed_sz = seed_size[j]
+            for tap_value in ls_taps:
+              if i + tap_value < 0:
+                k = i + seed_sz + tap_value
+                if k < 0
+                  # past value not provided.. issue a warning and use 0s
+                  fn_args += [numpy.zeros(args[j][0].shape)]
+                  warning('Past value %d for output %d not given in seeds' %
+                           (j,tap_value))
+                else:
+                  fn_args += [args[j][k]]
+              else:
+                fn_args += [y[j][i + tap_value]]
+
+        # get the non-iterable sequences
+        fn_args += list(args[(self.n_seqs+self.n_seedss):]
+        # compute output
+        something = fn(*fn_args)
+        #update outputs 
+        for j in xrange(self.n_seeds):
+          y[j][i] = something[j]
+      return y
+
+
+    def grad(self, args, g_outs):
+        if (not self.force_gradient) and \
+           ((self.updates.keys() != []) or (self.inplace_map.keys() != [])):
+            warning('Can not compute gradients if inplace or updates ' \
+                    'are used. Use force_gradient if you know for sure '\
+                    'that the gradient can be computed automatically.')
+            return [None for i in inputs]
+        else:
+            # forward pass 
+            y = self(*args)
+            if not( type(y) in (list,tuple)):
+                y = [y]
+ 
+
+            # backwards pass
+            for i in xrange(len(y)):
+               if g_outs[i] == None:
+                  g_outs[i] = theano.tensor.zeros_like(y[i])
+
+            g_args = [self.n_steps]+g_outs + y 
+            # check if go_backwards is true
+            if self.go_backwards:
+               for seq in args[1:self.n_seqs]:
+                 g_args += [seq[::-1]]
+            else:
+               g_args += args[1:self.n_seqs] 
+
+            g_args += args[1+self.n_seqs: ]
+
+
+            g_scan = ScanGrad((self.g_ins,self.g_outs), self.n_seqs, \
+                              self.n_seeds,self.seqs_taps, self.outs_taps,
+                              self.truncate_gradient)
+
+            return g_scan(g_args)
+
+
+
 @gof.local_optimizer([None])
 def scan_make_inplace(node):
    op = node.op
-    if isinstance(op, Scan) and (not op.inplace) and (op.n_inplace>0):
-        return Scan(op.fn, op.grad_fn, op.n_ins,\
-                    op.n_outs, op.n_inplace, op.n_inplace_ignore,\
-                    op.taps,inplace=True\
-                                       ).make_node(*node.inputs).outputs
+    if isinstance(op, Scan) and (not op.inplace) \
+                            and (op.inplace_map.keys() != []):
+        return Scan((op.inputs, op.outputs, op.updates), op.n_seqs,  \
+                    op.n_seeds, op.inplace_map, op.seqs_taps, op.outs_taps, \
+                    op.force_gradient, op.truncate_gradient, \
+                    op.go_backwards, inplace=True \
+                      ).make_node(*node.inputs).outputs
    return False
-
+        
+        
 optdb.register('scan_make_inplace', opt.in2out(scan_make_inplace,\
               ignore_newtrees=True), 75, 'fast_run', 'inplace')

@@ -428,144 +587,160 @@ optdb.register('scan_make_inplace', opt.in2out(scan_make_inplace,\

 class ScanGrad(theano.Op):
    """Gradient Op for Scan"""
-
-    def __init__(self, grad_fn, n_ins, n_outs, 
-                 taps = {},inplace=False):
-        self.grad_fn = grad_fn
-        self.n_ins = n_ins # number of inputs of Scan op not of Grad Scan !!
-        self.n_outs = n_outs # number of outs of Scan op not of Grad Scan !!
-        self.inplace = inplace
-        self.taps = taps
+    def __init__(self,(g_ins, g_outs) , n_seqs, n_outs, 
+                 seqs_taps = {}, outs_taps= {}, truncate_gradient = -1):
+        self.grad_fn = theano.function(g_ins, g_outs)
+        self.inputs = g_ins
+        self.outputs = g_outs
+        self.n_seqs = n_seqs
+        self.truncate_gradient = truncate_gradient
+        self.n_outs = n_outs
+        self.seqs_taps = seqs_taps
+        self.outs_taps = outs_taps
        self.destroy_map = {}
-        if self.inplace:
-          for i in xrange(self.n_outs):
-            # claiming that output "-i" is destroying inputs is the way to
-            # declare that no real output is aliased to any inputs.  We just
-            # trash the inputs by using them as workspace.
-            self.destroy_map.update( {-i:[i]})


    def __eq__(self,other): 
        rval = type(self) == type(other)
        if rval:
-           rval = (self.grad_fn is other.grad_fn) and \
-                  (self.n_ins == other.n_ins) and \
+           rval = (self.inputs == other.inputs) and \
+                  (self.outputs == other.outputs) and \
+                  (self.n_seqs == other.n_seqs) and \
                  (self.n_outs == other.n_outs) and \
-                  (self.inplace == other.inplace) and \
-                  (self.taps == other.taps)
+                  (self.truncate_gradient == other.truncate_gradient) and\
+                  (self.seqs_taps == other.seqs_taps) and \
+                  (self.outs_taps == other.outs_taps) 
        return rval

    def __hash__(self):
-        taps_hash = 0 
-        for k,v in self.taps.iteritems():
-            taps_hash ^= k
-            for vi in v :
-                taps_hash ^= vi
-
        return hash(type(self)) ^ \
-               hash(self.grad_fn) ^ \
-               hash(self.n_ins) ^ \
+               hash(self.n_seqs) ^ \
               hash(self.n_outs) ^ \
-               hash(self.inplace) ^ taps_hash
+               hash(self.truncate_gradient) ^\
+               hash_list(self.inputs) ^ \
+               hash_list(self.outputs) ^ \
+               hash_dict(self.seqs_taps) ^ \
+               hash_dict(self.outs_taps)

    def make_node(self, *args):
        # input of the gradient op : 
-        # | g_outs | y      | ins   | outs   | other_args |
-        # | n_outs | n_outs | n_ins | n_outs | unknown    |
+        # | g_outs | y      | seqs   | outs    | non_seqs   |
+        # | n_outs | n_outs | n_seqs | n_outs  | unknown    |
        # return 
-        # | grad of ins | grad of outs | grad of other_args|
-        # |   n_ins     |  n_outs      |  unknown          |
+        # | grad of seqs | grad of outs | grad of non_seqs  |
+        # |   n_seqs     |  n_outs      |  unknown          |
        return theano.Apply(self, list(args),
-                    [i.type() for i in args[self.n_outs+self.n_outs:] ])
+                    [i.type() for i in args[1+2*self.n_outs:] ])

    def perform(self, node, args, storage):
            # get scan inputs
-            inputs = args[self.n_outs+self.n_outs:]
-            ins = inputs[:self.n_ins]
-            initSt = inputs[self.n_ins:self.n_ins+self.n_outs]
-            otherArgs = inputs[self.n_outs+self.n_ins:]
+            n_steps = args[0]
+            inputs = args[2*self.n_outs+1:]
+            seqs = inputs[:self.n_seqs]
+            seeds = inputs[self.n_seqs:self.n_seqs+self.n_outs]
+            non_seqs = inputs[self.n_outs+self.n_seqs:]
            
            # generate space for gradient 
-            # not do if inplace !?
-            g_ins   = [numpy.zeros_like(k) for k in ins]
-            g_initSt = [numpy.zeros_like(k) for k in initSt]
-            g_otherArgs = [numpy.zeros_like(k) for k in otherArgs]
+            g_seqs     = [numpy.zeros_like(k) for k in seqs]
+            g_seeds    = [numpy.zeros_like(k) for k in seeds]
+            g_non_seqs = [numpy.zeros_like(k) for k in non_seqs]
            # get gradient from above
            g_outs = args[:self.n_outs]
-            # we modify g_outs inplace ..
-            if not self.inplace:
-                g_outs = [gout.copy() for gout in g_outs]

            # get the output of the scan operation
            outs = args[self.n_outs:2*self.n_outs]

-            # check for Nones (non - differentiable )
-            #for i,g_o in enumerate(g_outs):
-            #    if numpy.all(g_o == 0.):
-            #        g_outs[i] = numpy.zeros_like(outs[i])

-            # go back through time to 0 (use a time window !?)
-            for i in xrange(len(ins[0])-1,-1,-1):
+            # go back through time to 0 or n_steps - truncate_gradient
+            lower_limit = n_steps - self.truncate_gradient
+            if lower_limit > n_steps-1:
+                the_range = xrange(n_steps-1,-1,-1)
+            elif lower_limit < -1:
+                the_range = xrange(n_steps-1,-1,-1)
+            else:
+                the_range = xrange(n_steps-1, lower_limit,-1)
+
+
+
+            seqs_mins = {}
+            for j in xrange(self.n_seqs):
+              if self.seqs_taps.has_key(j):
+                seqs_mins.update({j: min(self.seqs_taps[j])})
+
+            outs_mins = {}
+            seed_size = {}
+            for j in xrange(self.n_outs):
+              if self.outs_taps.has_key(j):
+                outs_mins.update({j: min(self.outs_taps[j])})
+                seed_size.update({j: g_seeds[j]..shape[0]})
+
+            for i in the_range:
              # time slice of inputs
-              _ins = [arg[i] for arg in ins]
+              _ins = []
+              for j in xrange(self.n_seqs)
+                if self.seqs_taps.has_key(j):
+                  ls_taps = self.seqs_taps[j] 
+                  min_tap =      seqs_mins[j]
+                  for tap_value in ls_taps:
+                    k = i - min_tap + tap_value
+                    _ins += [ins[j][k]]
              # time slice of outputs + taps
              _outs = []
              for j in xrange(self.n_outs):
-                ls_taps = [1]
-                if self.taps.has_key(j):
-                    ls_taps += self.taps[j]
-                maxVal = max(ls_taps)
-                for tap_value in ls_taps:
-                    if i - tap_value < 0:
-                        _outs += [initSt[j][maxVal-tap_value+i]]
+                if self.outs_taps.has_key(j):
+                  ls_taps = self.outs_taps[j]
+                  min_tap =      outs_mins[j]
+                  seed_sz =      seed_size[j]
+                  for tap_value in ls_taps:
+                    if i + tap_value < 0:
+                      k = i + seed_sz  + tap_value
+                      if k < 0 :
+                        #past value not provided .. issue a warning and use 0
+                        _outs += [numpy.zeros(seeds[j][0].shape)]
+                        warning('Past value %d for output $d not given' \
+                              %(j,tap_value))
+                      else:
+                        _outs += [seeds[j][[k]]
                    else:
-                        _outs += [outs[j][i- tap_value]]
+                      _outs += [outs[j][i + tap_value]]

              g_out = [arg[i] for arg in g_outs]
-              grad_args = g_out + _ins + _outs + otherArgs
+              grad_args = g_out + _ins + _outs + non_seqs
              grads=self.grad_fn(*grad_args)
 
              # get gradient for inputs 
-              for j in xrange(self.n_ins):
-                g_ins[j][i] = grads[j]
-              
+              pos = 0
+              for j in xrange(self.n_seqs):
+                if self.seqs_taps.has_key(j):
+                  ls_taps = self.seqs_taps[j]
+                  min_tap =      seqs_mins[j]
+                  for tap_value in ls_taps :
+                    k = i - min_tap + tap_value
+                    g_ins[j][k] += grads[pos]
+                    pos += 1
+
+
              # get gradient for outputs
-              pos = self.n_ins
              for j in xrange(self.n_outs):
-                ls_taps = [1]
-                if self.taps.has_key(j):
-                    ls_taps += self.taps[j]
-                maxVal = max(ls_taps)
-                for tap_value in ls_taps:
-                    if i - tap_value < 0:
-                        g_initSt[j][maxVal-tap_value+i] += grads[pos]
-                        pos +=1
-                    else:
-                       g_outs[j][i-tap_value]+= grads[pos]
-                       pos += 1
-              for j in xrange(len(g_otherArgs)):
-                g_otherArgs[j] += grads[j+pos]
-            # return the gradient 
-            for i in xrange(len(g_ins)):
-                storage[i][0] = g_ins[i] 
+                if self.outs_taps.has_key(j):
+                  ls_taps = self.outs_taps[j]
+                  min_tap =      outs_mins[j]
+                  seed_sz =      seed_size[j]
+                  for tap_value in ls_taps:
+                    if i+tap_value < 0 :
+                     k = i + seed_sz + tap_value
+                     if  k > 0 :
+                        g_seeds[j][k] += grads[pos]
+                        pos += 1
+              for j in xrange(len(g_non_seqs)):
+                g_non_seqs[j] += grads[j+pos]

-            for i in xrange(len(g_initSt)):
-                storage[i+self.n_ins][0] = g_initSt[i]

-            for i in xrange(len(g_otherArgs)):
-                storage[i+self.n_ins+self.n_outs][0] = g_otherArgs[i]
+            # return the gradient

+            for i,v in enumerate(g_ins + g_seeds+ g_non_seqs):
+                storage[i][0] = v

-@gof.local_optimizer([None])
-def grad_scan_make_inplace(node):
-    op = node.op
-    if isinstance(op, ScanGrad) and (not op.inplace):
-        return ScanGrad(op.grad_fn, op.n_ins, op.n_outs, op.taps, 
-                   inplace=True).make_node(*node.inputs).outputs
-    return False
-
-optdb.register('grad_scan_make_inplace', opt.in2out(grad_scan_make_inplace,\
-               ignore_newtrees=True), 75, 'fast_run', 'inplace')



--- a/theano/sandbox/test_scan.py
+++ b/theano/sandbox/test_scan.py
@@ -7,8 +7,6 @@ import random
 import numpy.random
 from theano.tests  import unittest_tools as utt

-
-
 def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None, 
                mode = None, cast_to_output_type = False):
    pt = [numpy.array(p) for p in pt]
@@ -75,455 +73,21 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps = None, tol = None,



-class T_Scan(unittest.TestCase):
-    def setUp(self):
-        utt.seed_rng()
-        x_1 = theano.tensor.dscalar('x_1')
-        self.my_f = theano.function([x_1],[x_1]) #dummy function
-    
-    # Naming convention : 
-    #  u_1,u_2,..   -> inputs, arrays to iterate over
-    #  x_1,x_2,..   -> outputs at t-1 that are required in the recurrent 
-    #                  computation
-    #  iu_1,iu_2,.. -> inplace inputs, inputs that are being replaced by 
-    #                  outputs during computation
-    #  du_1,du_2,.. -> dummy inputs used to do inplace computation, they 
-    #                  are not passed to my_f
-    #  ix_1,ix_2,.. -> inplace outputs at t-1
-    #  x_1_next,..  -> outputs at t
-    #  ix_1_next,.. -> inplace outputs at  time t
-    #  w_1,w_2,..   -> weights, paramters over which scan does not iterate
-    #  my_f         -> compiled function that will be applied recurrently
-    #  my_op        -> operator class
-    #  final_f      -> compiled function that applies the Scan operation
-    #  out_1,..     -> outputs of the Scan operation
-    ###################################################################
-    def test_numberOfIterableInputs(self):
-        def t1():
-            my_op = Scan.compiled(self.my_f,-1,1)
-
-        def t2():
-            my_op = Scan.compiled(self.my_f,0,1)
-        
-        self.failUnlessRaises(ValueError,t1)
-        self.failUnlessRaises(ValueError,t2)
-
-    ###################################################################
-    def test_numberOfOutputs(self):
-        def t1():
-            my_op = Scan.compiled(self.my_f,1,-1)
-
-        def t2():
-            my_op = Scan.compiled(self.my_f,1,0)
-        
-        self.failUnlessRaises(ValueError,t1)
-        self.failUnlessRaises(ValueError,t2)

-    #####################################################################
-    def test_numberOfInplaceOutputs(self):
-        def t1():
-            my_op =Scan.compiled(self.my_f,1,1,n_inplace = -1)
-        def t2():
-            my_op =Scan.compiled(self.my_f,1,1,n_inplace = 2)
-        def t3():
-            my_op =Scan.compiled(self.my_f,2,1,n_inplace=2)
-        def t4():
-            my_op =Scan.compiled(self.my_f,1,2,n_inplace=2)
-        def t5():
-            my_op =Scan.compiled(self.my_f,1,1,n_inplace=1,n_inplace_ignore=2)
-        
-        self.failUnlessRaises(ValueError,t1)
-        self.failUnlessRaises(ValueError,t2)
-        self.failUnlessRaises(ValueError,t3)
-        self.failUnlessRaises(ValueError,t4)
-        self.failUnlessRaises(ValueError,t5)
-    #####################################################################
-    def test_taps(self):
-        def t1():
-            my_op = Scan.compiled(self.my_f,1,1, taps={2:[3]})
-        def t2():
-            my_op = Scan.compiled(self.my_f,1,2, taps={0:[0]})
-        def t3():
-            my_op = Scan.compiled(self.my_f,1,2, taps={0:[1]})

-        self.failUnlessRaises(ValueError,t1)
-        self.failUnlessRaises(ValueError,t2)
-        self.failUnlessRaises(ValueError,t3)
-
-    #####################################################################
-    def test_makeNode(self):
-        def t1():
-            ######### Test inputs of different lengths
-            # define the function that is applied recurrently
-            u_1      = theano.tensor.dscalar('u_1')
-            u_2      = theano.tensor.dscalar('u_2')
-            x_1      = theano.tensor.dscalar('x_1')
-            x_1_next = u_1+u_2*x_1
-            my_f     = theano.function([u_1,u_2,x_1],[x_1_next])
-            # define the function that applies the scan operation
-            my_op    = Scan.compiled(my_f,2,1)
-            u_1      = theano.tensor.dvector('u_1')
-            u_2      = theano.tensor.dvector('u_2')
-            x_1      = theano.tensor.dvector('x_1')
-            x_1_next = my_op(u_1,u_2,x_1)
-            final_f  = theano.function([u_1,u_2,x_1],[x_1_next])
-
-            # test the function final_f
-            u_1 = numpy.random.rand(3)
-            u_2 = numpy.random.rand(2)
-            x_1 = [numpy.random.rand()]
-            out = final_f(u_1,u_2,x_1)
-    
-        def t2():
-            ######### Test function does not return correct number of outputs
-            # define the function that is applied recurrently
-            u_1       = theano.tensor.dscalar('u_1')
-            x_1       = theano.tensor.dscalar('x_1')
-            x_1_next  = u_1 * x_1
-            my_f      = theano.function([u_1,x_1],[x_1_next])
-            # define the function that applies the scan operation
-            my_op     = Scan.compiled(my_f,1,2)
-            u_1       = theano.tensor.dvector('u_1')
-            x_1       = theano.tensor.dvector('x_1')
-            x_2       = theano.tensor.dvector('x_2')
-            x_1_next,x_2_next = my_op(u_1,x_1,x_2)
-            final_f   = theano.function([u_1,x_1,x_2],[x_1_next,x_2_next])

-            #generate data
-            u_1 = numpy.random.rand(3)
-            x_1 = [numpy.random.rand()]
-            x_2 = [numpy.random.rand()]
-            out_1,out_2 = final_f(u_1,x_1,x_2)
+# Naming convention : 
+#  u_1,u_2,..   -> sequences
+#  s_1,s_2,..   -> initial states
+#  w_1,w_2,..   -> non-sequences
+###################################
 
+class T_Scan(unittest.TestCase):
+    def setUp(self):
+        utt.seed_rng()

-        self.failUnlessRaises(ValueError,t1)
-        self.failUnlessRaises(TypeError,t2)
-
-    
-    #####################################################################
-    def test_generator(self):
-        # compile my_f
-        u_1       = theano.tensor.dscalar('u_1') # dummy input, 
-                                            # required if no inplace is used!
-        x_1       = theano.tensor.dscalar('x_1')
-        w_1       = theano.tensor.dscalar('w_1')
-        x_1_next  = x_1*w_1
-        my_f      = theano.function([u_1,x_1,w_1],[x_1_next])
-        # create operation
-        my_op     = Scan.compiled(my_f,1,1)
-        u_1       = theano.tensor.dvector('u_1') # dummy input, there is no 
-                    #inplace, so output will not be put in place of this u_1!
-        x_1       = theano.tensor.dvector('x_1')
-        w_1       = theano.tensor.dscalar('w_1')
-        x_1_next  = my_op(u_1,x_1,w_1)
-        final_f   = theano.function([u_1,x_1,w_1],[x_1_next])
-
-        #generate data
-        x_1   = numpy.ndarray(3) # dummy input, just tells for how many time 
-                               # steps to run recursively
-        out_1 = final_f(x_1,[2],2)
-        self.failUnless(numpy.all(out_1 == numpy.asarray([4,8,16]))) 
-
-
-    #####################################################################
-    def test_generator_inplace_no_ignore(self):
-        # compile my_f
-        u_1      = theano.tensor.dscalar('u_1')
-        x_1      = theano.tensor.dscalar('x_1')
-        w_1      = theano.tensor.dscalar('w_1')
-        x_1_next = x_1*w_1
-        my_f     = theano.function([u_1,x_1,w_1],[x_1_next])
-        # create operation
-        my_op    = Scan.compiled(my_f,1,1,n_inplace=1)
-        iu_1     = theano.tensor.dvector('iu_1')
-        ix_1     = theano.tensor.dvector('ix_1')
-        w_1      = theano.tensor.dscalar('w_1')
-        ix_1_next= my_op(iu_1,ix_1,w_1)
-        final_f  = theano.function([theano.In(iu_1, mutable=True),ix_1,w_1],
-                                [ix_1_next], mode='FAST_RUN')
-
-        #generate data
-        iu_1  = numpy.ndarray(3)
-        out_1 = final_f(iu_1,[2],2)
-        # not concretely implemented yet .. 
-        self.failUnless(numpy.all(out_1 == numpy.asarray([4,8,16])))
-        self.failUnless(numpy.all(out_1 == iu_1))
-
-    #####################################################################
-    def test_generator_inplace_no_ignore_2states(self):
-        # compile my_f
-        u_1      = theano.tensor.dscalar('u_1')
-        u_2      = theano.tensor.dscalar('u_2')
-        x_1      = theano.tensor.dscalar('x_1')
-        x_2      = theano.tensor.dscalar('x_2')
-        w_1      = theano.tensor.dscalar('w_1')
-        x_1_next = x_1*w_1
-        x_2_next = x_2*w_1
-        my_f     = theano.function([u_1,u_2,x_1,x_2,w_1],[x_1_next,x_2_next])
-        # create operation
-        my_op    = Scan.compiled(my_f,2,2,n_inplace=2)
-        iu_1     = theano.tensor.dvector('iu_1')
-        iu_2     = theano.tensor.dvector('iu_2')
-        ix_1     = theano.tensor.dvector('ix_1')
-        ix_2     = theano.tensor.dvector('ix_2')
-        w_1      = theano.tensor.dscalar('w_1')
-        ix_1_next,ix_2_next= my_op(iu_1,iu_2,ix_1,ix_2,w_1)
-        final_f  = theano.function([theano.In(iu_1, mutable=True),
-                              theano.In(iu_2, mutable=True),ix_1,ix_2,
-                              w_1],[ix_1_next,ix_2_next], mode='FAST_RUN')
-
-        #generate data
-        iu_1  = numpy.ndarray(3)
-        iu_2  = numpy.ndarray(3)
-        out_1,out_2 = final_f(iu_1,iu_2,[2],[1],2)
-        # not concretely implemented yet .. 
-        self.failUnless(numpy.all(out_1 == numpy.asarray([4,8,16])))
-        self.failUnless(numpy.all(out_1 == iu_1))
-        self.failUnless(numpy.all(out_2 == numpy.asarray([2,4,8])))
-        self.failUnless(numpy.all(out_2 == iu_2))
-
-    #######################################################################
-    def test_generator_inplace(self):
-        #compile my_f
-        u_1      = theano.tensor.dscalar('u_1')
-        x_1      = theano.tensor.dscalar('x_1')
-        x_2      = theano.tensor.dscalar('x_2')
-        x_1_next = u_1 + x_1
-        x_2_next = x_1 * x_2
-        my_f     = theano.function([u_1,x_1,x_2],[x_1_next,x_2_next])
-        # create operation
-        my_op    = Scan.compiled(my_f,2,2,n_inplace=2,n_inplace_ignore=1)
-        du_1     = theano.tensor.dvector('du_1')
-        iu_1     = theano.tensor.dvector('iu_1')
-        ix_1     = theano.tensor.dvector('ix_1')
-        ix_2     = theano.tensor.dvector('ix_2')
-        ix_1_next,ix_2_next = my_op(du_1,iu_1,ix_1,ix_2)
-        final_f=theano.function([theano.In(du_1, mutable = True),
-                                 theano.In(iu_1, mutable = True),
-                            ix_1,ix_2],[ix_1_next,ix_2_next],mode='FAST_RUN')
-        # generate data
-        du_1 = numpy.asarray([0.,0.,0.])
-        iu_1 = numpy.asarray([1.,1.,1.])
-        ix_1 = [1]
-        ix_2 = [1]
-        out_1,out_2 = final_f(du_1,iu_1,ix_1,ix_2)
-        self.failUnless(numpy.all(out_1 == numpy.asarray([2,3,4])))
-        self.failUnless(numpy.all(out_2 == numpy.asarray([1,2,6])))
-        self.failUnless(numpy.all(out_1 == du_1))
-        self.failUnless(numpy.all(out_2 == iu_1))
-
-    #####################################################################
-    def tets_iterateOnlyOverX(self):
-        u_1      = theano.tensor.dscalar('u_1')
-        x_1      = theano.tensor.dscalar('x_1')
-        x_1_next = u_1*x_1
-        my_f     = theano.function([u_1,x_1],[x_1_next])
-        my_op    = Scan.compiled(my_f,1,1)
-        u_1      = theano.tensor.dvector('u_1')
-        x_1      = theano.tensor.dvector('x_1')
-        x_1_next = my_op(u_1,x_1)
-        final_f  = theano.function([x_1,u_1],[x_1_next])
-        u_1      = numpy.asarray([2,2,2])
-        out_1    = final_f(inp,2)
-        self.failUnless(numpy.all(out_1==numpy.asarray([4,8,16])))
-
-    #####################################################################
-    def test_iterateOverSeveralInputs(self):
-
-        u_1 = theano.tensor.dscalar('u_1') # input 1
-        u_2 = theano.tensor.dscalar('u_2') # input 2
-        x_1 = theano.tensor.dscalar('x_1') # output
-        x_1_next = (u_1+u_2)*x_1
-        my_f  = theano.function([u_1,u_2,x_1],[x_1_next])
-        my_op = Scan.compiled(my_f,2,1)
-        u_1 = theano.tensor.dvector('u_1')
-        u_2 = theano.tensor.dvector('u_2')
-        x_1 = theano.tensor.dvector('x_1')
-        x_1_next = my_op(u_1,u_2,x_1)
-        final_f = theano.function([u_1,u_2,x_1],[x_1_next])
-        u_1 = numpy.asarray([1,1,1])
-        u_2 = numpy.asarray([1,1,1])
-        x_1 = [2]
-        out_1 = final_f(u_1,u_2,x_1)
-        self.failUnless(numpy.all(out_1==numpy.asarray([4,8,16])))
-    
-    #####################################################################
-    def test_iterateOverSeveralInputsSeveralInplace(self):
-        iu_1 = theano.tensor.dscalar('iu_1')
-        u_1  = theano.tensor.dscalar('u_1')
-        u_2  = theano.tensor.dscalar('u_2')
-        u_3  = theano.tensor.dscalar('u_3')
-        u_4  = theano.tensor.dscalar('u_4')
-        ix_1 = theano.tensor.dscalar('ix_1')
-        ix_2 = theano.tensor.dscalar('ix_2')
-        x_1  = theano.tensor.dscalar('x_1')
-        w_1  = theano.tensor.dscalar('w_1')
-        ix_1_next = u_3 + u_4
-        ix_2_next = ix_1 + ix_2
-        x_1_next  = x_1 + u_3 + u_4 + ix_1 + ix_2
-        my_f = theano.function([iu_1,u_1,u_2,u_3,u_4,ix_1,ix_2,x_1,w_1],\
-                    [ix_1_next,ix_2_next, x_1_next])
-        my_op = Scan.compiled(my_f,6,3, n_inplace=2,\
-                                    n_inplace_ignore=1)
-        du_1 = theano.tensor.dvector('du_1')
-        iu_1 = theano.tensor.dvector('iu_1')
-        u_1  = theano.tensor.dvector('u_1')
-        u_2  = theano.tensor.dvector('u_2')
-        u_3  = theano.tensor.dvector('u_3')
-        u_4  = theano.tensor.dvector('u_4')
-        x_1  = theano.tensor.dvector('x_1')
-        ix_1 = theano.tensor.dvector('ix_1')
-        ix_2 = theano.tensor.dvector('ix_2')
-        w_1  = theano.tensor.dscalar('w_1')
-        [ix_1_next,ix_2_next,x_1_next]= \
-            my_op(du_1,iu_1,u_1,u_2,u_3,u_4,x_1,ix_1,ix_2,w_1)
-        final_f=theano.function([theano.In(du_1, mutable = True),
-                                 theano.In(iu_1, mutable = True),
-                                 u_1,u_2,u_3,u_4,ix_1,ix_2,x_1,w_1],
-                                 [ix_1_next,ix_2_next,
-                                  x_1_next],mode='FAST_RUN')
-        #generate data
-        du_1 = numpy.asarray([0.,0.,0.])
-        iu_1 = numpy.asarray([0.,1.,2.])
-        u_1  = numpy.asarray([1.,2.,3.])
-        u_2  = numpy.asarray([1.,1.,1.])
-        u_3  = numpy.asarray([2.,2.,2.])
-        u_4  = numpy.asarray([3.,2.,1.])
-        x_1  = [1.]
-        ix_1 = [1.]
-        ix_2 = [1.]
-        w_1  = 2.
-        out_1,out_2,out_3 = final_f(du_1,iu_1,u_1,u_2,u_3,u_4,\
-                ix_1,ix_2,x_1,w_1)
-        self.failUnless(numpy.all(out_3 == numpy.asarray([8.,19.,33.])))
-        self.failUnless(numpy.all(out_1 == numpy.asarray([5.,4.,3.])))
-        self.failUnless(numpy.all(out_2 == numpy.asarray([2.,7.,11.])))
-        self.failUnless(numpy.all(out_1 == du_1))
-        self.failUnless(numpy.all(out_2 == iu_1))
-   
-
-    #####################################################################
-    def test_computeInPlaceArguments(self):
-        u_1      = theano.tensor.dscalar('u_1')
-        x_1      = theano.tensor.dscalar('x_1')
-        w_1      = theano.tensor.dscalar('w_1')
-        x_1_next = u_1*w_1+x_1
-        my_f     = theano.function([u_1,x_1,theano.In(w_1,update=w_1*2)],
-                        [x_1_next])
-        my_op = Scan.compiled(my_f,1,1)
-        u_1 = theano.tensor.dvector('u_1')
-        x_1 = theano.tensor.dvector('x_1')
-        w_1 = theano.tensor.dscalar('w_1')
-        x_1_next = my_op(u_1,x_1,w_1)
-        final_f = theano.function([u_1,x_1,w_1], [x_1_next])
-        u_1 = [1.,1.,1.]
-        x_1 = [1.]
-        w_1 = 1.
-        out_1 = final_f(u_1,x_1,w_1)
-        self.failUnless(numpy.all(out_1 == numpy.asarray([2,4,8])))
-
-
-    #####################################################################
-    def test_timeTaps(self):
-        u_1       = theano.tensor.dscalar('u_1')
-        x_1       = theano.tensor.dscalar('x_1')
-        x_1_t2    = theano.tensor.dscalar('x_1_t2')
-        x_1_t4    = theano.tensor.dscalar('x_1_t4')
-        x_1_next  = u_1+x_1+x_1_t2+x_1_t4
-        my_f      = theano.function([u_1,x_1,x_1_t2,x_1_t4],[x_1_next])
-        my_op     = Scan.compiled(my_f,1,1,taps={0:[2,4]})
-        u_1       = theano.tensor.dvector('u_1')
-        x_1       = theano.tensor.dvector('x_1')
-        x_1_next  = my_op(u_1,x_1)
-        final_f   = theano.function([u_1,x_1],[x_1_next])
-        u_1       = [1.,1.,1.,1.,1.]
-        x_1       = [1.,2.,3.,4.]
-        out_1     = final_f(u_1,x_1)
-        self.failUnless(numpy.all(out_1==numpy.asarray([9.,16.,29.,50.,89.])))
-
-
-    #####################################################################
-    def test_constructFunction(self):
-        u_1      = theano.tensor.dscalar('u_1')
-        x_1      = theano.tensor.dscalar('x_1')
-        x_1_next = u_1 + x_1
-        my_op    = Scan.symbolic(([u_1,x_1],x_1_next),1,1)
-        u_1      = theano.tensor.dvector('u_1')
-        x_1      = theano.tensor.dvector('x_1')
-        x_1_next = my_op(u_1,x_1)
-        final_f  = theano.function([u_1,x_1],[x_1_next])
-        u_1      = [1.,1.,1.]
-        x_1      = [1.]
-        out_1    = final_f(u_1,x_1)
-        self.failUnless(numpy.all(out_1==numpy.asarray([2.,3.,4.])))
-
-    ######################################################################
-    def test_gradOneInputOneOutput(self):
-        u_1      = theano.tensor.dscalar('u_1')
-        x_1      = theano.tensor.dscalar('x_1')
-        x_1_next = u_1*x_1
-        my_op    = Scan.symbolic( ([u_1,x_1],x_1_next), 1,1)
-        u_1     = [1.,2.,3.]
-        x_1     = [1.]
-
-        verify_grad( my_op , [u_1,x_1] )
-
-
-    #######################################################################
-    def test_gradManyInputsManyOutputs(self):
-        u_1      = theano.tensor.dscalar('u_1')
-        u_2      = theano.tensor.dscalar('u_2')
-        x_1      = theano.tensor.dscalar('x_1')
-        x_2      = theano.tensor.dscalar('x_2')
-        x_1_next = x_1*u_1+x_2
-        x_2_next = x_2*u_2+x_1
-        my_op    = Scan.symbolic( ([u_1,u_2,x_1,x_2],
-                                   [x_1_next,x_2_next]),
-                          2,2)
-        u_1  = [1.,.2,3.]
-        u_2  = [1.5,1.25,.35]
-        x_1  = [.5]
-        x_2  = [.65]
-
-        verify_grad(my_op, [u_1,u_2,x_1,x_2])
-
-
-    ######################################################################
-    def test_gradTimeTaps(self):
-        u_1       = theano.tensor.dscalar('u_1')
-        x_1       = theano.tensor.dscalar('x_1')
-        x_1_t_2   = theano.tensor.dscalar('x_1_t_2')
-
-        x_1_next = x_1_t_2*x_1*u_1
-        my_op    = Scan.symbolic( ([u_1,x_1,x_1_t_2],
-                                   [x_1_next]),
-                        1,1,taps={0:[2]})
-        u_1 = [1.,2.,3.,4.]
-        x_1 = [2.,3.]
-
-        verify_grad(my_op, [u_1,x_1])
-
-    #######################################################################
-    def test_gradManyInputsManyOutputsTimeTaps(self):
-        u_1   = theano.tensor.dscalar('u_1')
-        u_2   = theano.tensor.dscalar('u_2')
-        x_1   = theano.tensor.dscalar('x_1')
-        x_1_2 = theano.tensor.dscalar('x_1_2')
-        x_2   = theano.tensor.dscalar('x_2')
-        x_2_2 = theano.tensor.dscalar('x_2_2')
-        x_1_n = x_1*x_2_2 + u_1*x_1_2
-        x_2_n = x_2*x_1_2 + u_2*x_2_2
-        my_op = Scan.symbolic(([u_1,u_2,x_1,x_1_2,
-                                x_2,x_2_2],[x_1_n,
-                                x_2_n]),2,2,taps=
-                                {0:[2],1:[2]})
-
-        u_1 = [1.,2.,3.,4.]
-        u_2 = [3.,2.,4.,1.]
-        x_1 = [0.1,0.2]
-        x_2 = [1.5,3.5]
-
-        verify_grad(my_op, [u_1,u_2,x_1,x_2])
+   def test_one(self):
+      pass

 if __name__ == '__main__':
    unittest.main()