Merge pull request #317 from pascanur/new_scan

New scan [work in progress]

Merge pull request #317 from pascanur/new_scan
44f47751 · nouiz · 86aae00b · 6b72779b · 44f47751 · 44f47751
--- a/doc/developer/index.txt
+++ b/doc/developer/index.txt
@@ -10,3 +10,4 @@ Theano Design and Implementation Documentation
   :maxdepth: 2
   tensor
+   scan
--- a/doc/developer/scan.txt
+++ b/doc/developer/scan.txt
+.. _scan_internals:
+Internal documentation of the scan op
+=====================================
+Top-level description of scan
+-----------------------------
+The `scan` operation is meant to be able to describe symbolically loops,
+recurrent relations or dynamical systems. In general, we will say that the
+scan op implements system of equations of the following form:
+.. math::
+    \mathbf{x}_1(t) = f_{\mathbf{x}_1}
+        (\mathbf{u}_1(t), \mathbf{u}_1(t-1), \ldots, \mathbf{u}_1(t-l_1), 
+         \mathbf{u}_2(t), \ldots, \mathbf{u}_2(t-l_2),
+         \ldots,
+         \mathbf{u}_M(t), \ldots, \mathbf{u}_M(t - l_M),
+         \mathbf{x}_1(t-1), \ldots, \mathbf{x}_1(t-k_1),
+         \ldots,
+         \mathbf{x}_N(t-1), \ldots, \mathbf{x}_N(t-k_N), 
+         \mathbf{w}_1, \ldots, \mathbf{w}_Q)
+    \vdots
+    \mathbf{x}_N(t) = f_{\mathbf{x}_N}
+        (\mathbf{u}_1(t), \mathbf{u}_1(t-1), \ldots, \mathbf{u}_1(t-l_1), 
+         \mathbf{u}_2(t), \ldots, \mathbf{u}_2(t-l_2),
+         \ldots,
+         \mathbf{u}_M(t), \ldots, \mathbf{u}_M(t - l_M),
+         \mathbf{x}_1(t-1), \ldots, \mathbf{x}_1(t-k_1),
+         \ldots,
+         \mathbf{x}_N(t-1), \ldots, \mathbf{x}_N(t-k_N), 
+         \mathbf{w}_1, \ldots, \mathbf{w}_Q)
+    \mathbf{y}_1(t) = f_{\mathbf{y}_1}
+        (\mathbf{u}_1(t), \mathbf{u}_1(t-1), \ldots, \mathbf{u}_1(t-l_1), 
+         \mathbf{u}_2(t), \ldots, \mathbf{u}_2(t-l_2),
+         \ldots,
+         \mathbf{u}_M(t), \ldots, \mathbf{u}_M(t - l_M),
+         \mathbf{x}_1(t-1), \ldots, \mathbf{x}_1(t-k_1),
+         \ldots,
+         \mathbf{x}_N(t-1), \ldots, \mathbf{x}_N(t-k_N), 
+         \mathbf{w}_1, \ldots, \mathbf{w}_Q)
+      \vdots
+    \mathbf{y}_M(t) = f_{\mathbf{y}_M}
+        (\mathbf{u}_1(t), \mathbf{u}_1(t-1), \ldots, \mathbf{u}_1(t-l_1), 
+         \mathbf{u}_2(t), \ldots, \mathbf{u}_2(t-l_2),
+         \ldots,
+         \mathbf{u}_M(t), \ldots, \mathbf{u}_M(t - l_M),
+         \mathbf{x}_1(t-1), \ldots, \mathbf{x}_1(t-k_1),
+         \ldots,
+         \mathbf{x}_N(t-1), \ldots, \mathbf{x}_N(t-k_N), 
+         \mathbf{w}_1, \ldots, \mathbf{w}_Q)
+The equations describe a system evolving in time, where :math:`t` represents the
+current step. The system is described by inputs, states, outputs and
+parameteres. 
+The inputs, denoted by :math:`\mathbf{u}` are time-varying quantities, 
+hence indexed by :math:`t`. They however only influence the system, but are
+not influenced by the system. 
+The states :math:`\mathbf{x}` are time-varying quantities, whose value at
+time :math:`t` depends on its (or other state) previous values as well as
+the inputs and parameters. Note that the first few values of the states are
+always provided, otherwise we could not imploy the recurrent equation to
+generate these sequence of values without a starting point. 
+The outputs, :math:`\mathbf{y}` are outputs of the system, i.e. values that
+depend on the previous values of the states and inputs. The difference
+between outputs and states is that outputs do not feed back into the system. 
+The parameters :math:`\mathbf{w}` are fixed quantities that are re-used at
+every time step of the evolution of the system. 
+Each of the equations above are implemented by the **inner function** of scan. You 
+can think of the **inner function** as a theano function that gets executed
+at each step to get the new values. This **inner function** should not be
+confused with the **constructive function**, which is what the user gives to
+the scan function. The **constructive function** is used to construct the
+computational graph that is afterwards compiled into the **inner function**.
+Naming conventions
+------------------
+* ``input_state`` will stand for a state :math:`\mathbf{x}`, when it is
+  provided as an input to the recurrent formula (the inner function) that 
+  will generate the new value of the state
+* ``output_state`` will stand for a state :math:`\math{x}` when it refers 
+  to the result of the recurrent formula (the output of the inner function)
+* ``output`` will stand for an output :math:`\mathbf{y}`
+* ``input`` will be an input :math:`\mathbf{u}`
+* ``parameter`` will stand for a parameter tensor :math:`\mathbf{w}` that stays 
+  constant at each step of the inner function 
+* ``non_numeric_input_state`` will stand for states that are not numeric in nature, 
+  more specifically *random states*, when they are provided as an input. The
+  same holds for ``non_numeric_output_state``.
+* ``t`` is the time index (the current step in the evolution of the system).
+* ``T`` is the total number of steps in the evolution of the system.
+* the suffix ``_slices`` added to either ``x`` or ``u`` will mean the list of
+  variables representing slices of states or inputs. These are the arguments
+  given to the constructive function of scan (see above). 
+* the suffix ``_inner`` added to ``x``, ``y``, ``xy``, ``u``, ``w`` or ``z``
+  will mean the variables representing the state/output/input/weights in the
+  inner function
+* the suffix ``_outer`` added to ``x``, ``y``, ``xy``, ``u``, ``w`` or ``z``
+  will mean the variables representing the state/output/input/weights in the
+  main computational graph (the one containing the scan op).
+Files 
+-----
+The implementation of scan is spread over several files. The different
+files, and section of the code they deal with, are :
+* ``scan.py`` implements the ``scan`` function. The ``scan`` function
+  arranges the arguments of scan correctly, constructs the scan op and
+  afterwards calls the constructed scan op on the arguments. This function
+  takes care of figuring out missing inputs and shared variables. 
+* ``scan_op.py`` implements the ``scanOp`` class. The ``scanOp`` respects
+  the ``Op`` interface, and contains most of the logic of the scan operator.
+* ``scan_utils.py`` contains several helpful functions used through out the
+  other files that are specific of the scan operator.
+* ``scan_views.py`` contains different views of the scan op that have
+  simpler and easier signatures to be used in specific cases.
+* ``scan_opt.py`` contains the list of all optimizations for the scan
+  operator.
+The logical flow
+----------------
+First the scan arguments are parsed by the function ``canonical_arguments``, 
+that wraps them into lists and adds default values for the arguments. One
+important step that happens in this function is that the inputs arguments
+are converted such that they all have a single tap, namely 0. For example 
+if you have ``[{'input':u, 'taps':[0, 4]}]`` as the list of inputs arguments
+to scan, it gets converted into ``[{'input':u, 'taps':[0]}, {'input':u[4:],
+'taps':[0]}]``.
+The second step is to check if ``n_steps`` is a constant and has the value 1
+or -1. If that is true then the function ``one_step_scan`` is called which 
+unwraps the computation of the inner function into the outer graph without 
+adding any scan op in the graph.
--- a/theano/sandbox/scan_module/__init__.py
+++ b/theano/sandbox/scan_module/__init__.py
+"""
+This module provides the Scan Op
+Scanning is a general form of recurrence, which can be used for looping.
+The idea is that you *scan* a function along some input sequence, producing
+an output at each time-step that can be seen (but not modified) by the
+function at the next time-step. (Technically, the function can see the
+previous K  time-steps of your outputs and L time steps (from the past and
+future) of your inputs.
+So for example, ``sum()`` could be computed by scanning the ``z+x_i``
+function over a list, given an initial state of ``z=0``.
+Special cases:
+* A *reduce* operation can be performed by returning only the last
+  output of a ``scan``.
+* A *map* operation can be performed by applying a function that
+  ignores previous steps of the outputs.
+Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is
+the closest that theano comes to looping. The advantage of using ``scan``
+over for loops is that it allows the number of iterations to be a part of
+the symbolic graph.
+The Scan Op should typically be used by calling any of the following
+functions: ``scan()``, ``map()``, ``reduce()``, ``foldl()``,
+``foldr()``.
+"""
+__docformat__ = 'restructedtext en'
+__authors__ = ("Razvan Pascanu "
+               "Frederic Bastien "
+               "James Bergstra "
+               "Pascal Lamblin "
+               "Arnaud Bergeron ")
+__copyright__ = "(c) 2010, Universite de Montreal"
+__contact__ = "Razvan Pascanu <r.pascanu@gmail>"
+from scan import scan
--- a/theano/sandbox/scan_module/scan.py
+++ b/theano/sandbox/scan_module/scan.py
+"""
+This module provides the Scan Op
+Scanning is a general form of recurrence, which can be used for looping.
+The idea is that you *scan* a function along some input sequence, producing
+an output at each time-step that can be seen (but not modified) by the
+function at the next time-step. (Technically, the function can see the
+previous K  time-steps of your outputs and L time steps (from past and
+future) of your inputs.
+So for example, ``sum()`` could be computed by scanning the ``z+x_i``
+function over a list, given an initial state of ``z=0``.
+Special cases:
+* A *reduce* operation can be performed by using only the last
+  output of a ``scan``.
+* A *map* operation can be performed by applying a function that
+  ignores previous steps of the outputs.
+Often a for-loop or while-loop can be expressed as a ``scan()`` operation,
+and ``scan`` is the closest that theano comes to looping. The advantages
+of using ``scan`` over `for` loops in python (amongs other) are:
+* it allows the number of iterations to be part of the symbolic graph
+* it allows computing gradients through the for loop
+* there exist a bunch of optimizations that help re-write your loop
+such that less memory is used and that it runs faster
+* it ensures that data is not copied from host to gpu and gpu to
+host at each step
+The Scan Op should typically be used by calling any of the following
+functions: ``scan()``, ``map()``, ``reduce()``, ``foldl()``,
+``foldr()``.
+"""
+__docformat__ = 'restructedtext en'
+__authors__ = ("Razvan Pascanu "
+               "Frederic Bastien "
+               "James Bergstra "
+               "Pascal Lamblin ")
+__copyright__ = "(c) 2010, Universite de Montreal"
+__contact__ = "Razvan Pascanu <r.pascanu@gmail>"
+from itertools import izip
+import logging
+import numpy
+from theano.compile import SharedVariable, function
+from theano import compile
+from theano import gof
+from theano.tensor import opt
+from theano import tensor
+from theano import config
+from theano.updates import Updates
+from theano.scalar.sharedvar import shared as scalar_shared
+from theano.compile.pfunc import rebuild_collect_shared
+import theano
+import scan_op
+import scan_utils
+# Logging function for sending warning or info
+_logger = logging.getLogger('theano.scan_module.scan')
+def scan(fn,
+         sequences=None,
+         outputs_info=None,
+         non_sequences=None,
+         n_steps=None,
+         truncate_gradient=-1,
+         go_backwards=False,
+         mode=None,
+         name=None,
+         options=None,
+         profile=False):
+    """
+    This function constructs and applies a Scan op to the provided
+    arguments.
+    :param fn:
+        ``fn`` is a function that describes the operations involved in one
+        step of ``scan``. ``fn`` should construct variables describing the
+        output of one iteration step. It should expect as input theano
+        variables representing all the slices of the input sequences
+        and previous values of the outputs, as well as all other arguments
+        given to scan as ``non_sequences``. The order in which scan passes
+        these variables to ``fn``  is the following :
+        * all time slices of the first sequence
+        * all time slices of the second sequence
+        * ...
+        * all time slices of the last sequence
+        * all past slices of the first output
+        * all past slices of the second otuput
+        * ...
+        * all past slices of the last output
+        * all other arguments (the list given as `non_sequences` to
+            scan)
+        The order of the sequences is the same as the one in the list
+        `sequences` given to scan. The order of the outputs is the same
+        as the order of ``output_info``. For any sequence or output the
+        order of the time slices is the same as the one in which they have
+        been given as taps. For example if one writes the following :
+        .. code-block:: python
+            scan(fn, sequences = [ dict(input= Sequence1, taps = [-3,2,-1])
+                                 , Sequence2
+                                 , dict(input =  Sequence3, taps = 3) ]
+                   , outputs_info = [ dict(initial =  Output1, taps = [-3,-5])
+                                    , dict(initial = Output2, taps = None)
+                                    , Output3 ]
+                   , non_sequences = [ Argument1, Argument 2])
+        ``fn`` should expect the following arguments in this given order:
+        #. ``Sequence1[t-3]``
+        #. ``Sequence1[t+2]``
+        #. ``Sequence1[t-1]``
+        #. ``Sequence2[t]``
+        #. ``Sequence3[t+3]``
+        #. ``Output1[t-3]``
+        #. ``Output1[t-5]``
+        #. ``Output3[t-1]``
+        #. ``Argument1``
+        #. ``Argument2``
+        The list of ``non_sequences`` can also contain shared variables
+        used in the function, though ``scan`` is able to figure those
+        out on its own so they can be skipped. For the clarity of the
+        code we recommand though to provide them to scan. To some extend
+        ``scan`` can also figure out other ``non sequences`` (not shared)
+        even if not passed to scan (but used by `fn`). A simple example of
+        this would be :
+        .. code-block:: python
+            import theano.tensor as TT
+            W   = TT.matrix()
+            W_2 = W**2
+            def f(x):
+                return TT.dot(x,W_2)
+        The function is expected to return two things. One is a list of
+        outputs ordered in the same order as ``outputs_info``, with the
+        difference that there should be only one output variable per
+        output initial state (even if no tap value is used). Secondly
+        `fn` should return an update dictionary (that tells how to
+        update any shared variable after each iteration step). The
+        dictionary can optionally be given as a list of tuples. There is
+        no constraint on the order of these two list, ``fn`` can return
+        either ``(outputs_list, update_dictionary)`` or
+        ``(update_dictionary, outputs_list)`` or just one of the two (in
+        case the other is empty).
+        To use ``scan`` as a while loop, the user needs to change the
+        function ``fn`` such that also a stopping condition is returned.
+        To do so, he/she needs to wrap the condition in an ``until`` class.
+        The condition should be returned as a third element, for example:
+        .. code-block:: python
+            ...
+            return [y1_t, y2_t], {x:x+1}, theano.scan_module.until(x < 50)
+        Note that a number of steps (considered in here as the maximum
+        number of steps ) is still required even though a condition is
+        passed (and it is used to allocate memory if needed). = {}):
+    :param sequences:
+        ``sequences`` is the list of Theano variables or dictionaries
+        describing the sequences ``scan`` has to iterate over. If a
+        sequence is given as wrapped in a dictionary, then a set of optional
+        information can be provided about the sequence. The dictionary
+        should have the following keys:
+        * ``input`` (*mandatory*) -- Theano variable representing the
+          sequence.
+        * ``taps`` -- Temporal taps of the sequence required by ``fn``.
+          They are provided as a list of integers, where a value ``k``
+          impiles that at iteration step ``t`` scan will pass to ``fn``
+          the slice ``t+k``. Default value is ``[0]``
+        Any Theano variable in the list ``sequences`` is automatically
+        wrapped into a dictionary where ``taps`` is set to ``[0]``
+    :param outputs_info:
+        ``outputs_info`` is the list of Theano variables or dictionaries
+        describing the initial state of the outputs computed
+        recurrently. When this initial states are given as dictionary
+        optional information can be provided about the output corresponding
+        to these initial states. The dictionary should have the following
+        keys:
+        * ``initial`` -- Theano variable that represents the initial
+          state of a given output. In case the output is not computed
+          recursively (think of a map) and does not require a initial
+          state this field can be skiped. Given that only the previous
+          time step of the output is used by ``fn`` the initial state
+          should have the same shape as the output. If multiple time
+          taps are used, the initial state should have one extra
+          dimension that should cover all the possible taps. For example
+          if we use ``-5``, ``-2`` and ``-1`` as past taps, at step 0,
+          ``fn`` will require (by an abuse of notation) ``output[-5]``,
+          ``output[-2]`` and ``output[-1]``. This will be given by
+          the initial state, which in this case should have the shape
+          (5,)+output.shape. If this variable containing the initial
+          state is called ``init_y`` then ``init_y[0]`` *corresponds to*
+          ``output[-5]``. ``init_y[1]`` *correponds to* ``output[-4]``,
+          ``init_y[2]`` corresponds to ``output[-3]``, ``init_y[3]``
+          coresponds to ``output[-2]``, ``init_y[4]`` corresponds to
+          ``output[-1]``. While this order might seem strange, it comes
+          natural from splitting an array at a given point. Assume that
+          we have a array ``x``, and we choose ``k`` to be time step
+          ``0``. Then our initial state would be ``x[:k]``, while the
+          output will be ``x[k:]``. Looking at this split, elements in
+          ``x[:k]`` are ordered exactly like those in ``init_y``.
+        * ``taps`` -- Temporal taps of the output that will be pass to
+          ``fn``. They are provided as a list of *negative* integers,
+          where a value ``k`` implies that at iteration step ``t`` scan
+          will pass to ``fn`` the slice ``t+k``.
+        ``scan`` will follow this logic if partial information is given:
+        * If an output is not wrapped in a dictionary, ``scan`` will wrap
+          it in one assuming that you use only the last step of the output
+          (i.e. it makes your tap value list equal to [-1]).
+        * If you wrap an output in a dictionary and you do not provide any
+          taps but you provide an initial state it will assume that you are
+          using only a tap value of -1.
+        * If you wrap an output in a dictionary but you do not provide any
+          initial state, it assumes that you are not using any form of
+          taps.
+        * If you provide a ``None`` instead of a variable or a empty
+          dictionary ``scan`` assumes that you will not use any taps for
+          this output (like for example in case of a map)
+        If ``outputs_info`` is an empty list or None, ``scan`` assumes
+        that no tap is used for any of the outputs. If information is
+        provided just for a subset of the outputs an exception is
+        raised (because there is no convention on how scan should map
+        the provided information to the outputs of ``fn``)
+    :param non_sequences:
+        ``non_sequences`` is the list of arguments that are passed to
+        ``fn`` at each steps. One can opt to exclude variable
+        used in ``fn`` from this list as long as they are part of the
+        computational graph, though for clarity we encourage not to do so.
+    :param n_steps:
+        ``n_steps`` is the number of steps to iterate given as an int
+        or Theano scalar. If any of the input sequences do not have
+        enough elements, scan will raise an error. If the *value is 0* the
+        outputs will have *0 rows*. If the value is negative, ``scan``
+        will run backwards in time. If the ``go_backwards`` flag is already
+        set and also ``n_steps`` is negative, ``scan`` will run forward
+        in time. If n stpes is not provided, ``scan`` will figure
+        out the amount of steps it should run given its input sequences.
+    :param truncate_gradient:
+        ``truncate_gradient`` is the number of steps to use in truncated
+        BPTT.  If you compute gradients through a scan op, they are
+        computed using backpropagation through time. By providing a
+        different value then -1, you choose to use truncated BPTT instead
+        of classical BPTT, where you go for only ``truncate_gradient``
+        number of steps back in time.
+    :param go_backwards:
+        ``go_backwards`` is a flag indicating if ``scan`` should go
+        backwards through the sequences. If you think of each sequence
+        as indexed by time, making this flag True would mean that
+        ``scan`` goes back in time, namely that for any sequence it
+        starts from the end and goes towards 0.
+    :param name:
+        When profiling ``scan``, it is crucial to provide a name for any
+        instance of ``scan``. The profiler will produce an overall
+        profile of your code as well as profiles for the computation of
+        one step of each instance of ``scan``. The ``name`` of the instance
+        appears in those profiles and can greatly help to disambiguate
+        information.
+    :param mode:
+        It is recommended to leave this argument to None, especially
+        when profiling ``scan`` (otherwise the results are not going to
+        be accurate). If you prefer the computations of one step of
+        ``scan`` to be done differently then the entire function, you
+        can use this parameter to describe how the computations in this
+        loop are done (see ``theano.function`` for details about
+        possible values and their meaning).
+    :param profile:
+        Flag or string. If true, or different from the empty string, a
+        profile object will be created and attached to the inner graph of
+        scan. In case ``profile`` is True, the profile object will have the
+        name of the scan instance, otherwise it will have the passed string.
+        Profile object collect (and print) information only when running the
+        inner graph with the new cvm linker ( with default modes,
+        other linkers this argument is useless)
+    :rtype: tuple
+    :return: tuple of the form (outputs, updates); ``outputs`` is either a
+             Theano variable or a list of Theano variables representing the
+             outputs of ``scan`` (in the same order as in
+             ``outputs_info``). ``updates`` is a subclass of dictionary
+             specifying the
+             update rules for all shared variables used in scan
+             This dictionary should be passed to ``theano.function`` when
+             you compile your function. The change compared to a normal
+             dictionary is that we validate that keys are SharedVariable
+             and addition of those dictionary are validated to be consistent.
+    """
+    # Note : see the internal documentation of the scan op for naming
+    # conventions and all other details
+    if options is None:
+        options = {}
+    rvals = scan_utils.canonical_arguments(sequences,
+                                           outputs_info,
+                                           non_sequences,
+                                           go_backwards,
+                                           n_steps)
+    inputs, states_and_outputs_info, parameters, T = rvals
+    # If we provided a known number of steps ( before compilation)
+    # and if that number is 1 or -1, then we can skip the Scan Op,
+    # and just apply the inner function once
+    # To do that we check here to see the nature of n_steps
+    T_value = None
+    if isinstance(n_steps, (float, int)):
+        T_value = int(n_steps)
+    else:
+        try:
+            T_value = opt.get_constant_value(n_steps)
+        except (TypeError, AttributeError):
+            T_value = None
+    if T_value in (1, -1):
+        return one_step_scan(fn,
+                             inputs,
+                             states_and_outputs_info,
+                             parameters,
+                             truncate_gradient)
+    # 1. Variable representing the current time step
+    t = scalar_shared(numpy.int64(0), name='t')
+    # 2. Allocate memory for the states of scan.
+    mintaps = []
+    lengths = []
+    for pos, arg_info in enumerate(states_and_outputs_info):
+        if arg_info.get('taps', None) == [-1]:
+            mintaps.append(1)
+            lengths.append(scalar_shared(numpy.int64(0),
+                                         name='l%d' % pos))
+            arg_info['initial'] = scan_utils.expand(tensor.unbroadcast(
+                    tensor.shape_padleft(arg_info['initial']), 0), T)
+        elif arg_info.get('taps', None):
+            if numpy.any(numpy.array(arg_info.get('taps', [])) > 0):
+                # Make sure we do not have requests for future values of a
+                # sequence we can not provide such values
+                raise ValueError('Can not use future taps of outputs',
+                                 arg_info)
+            mintap = abs(numpy.min(arg_info['taps']))
+            lengths.append(scalar_shared(numpy.int64(0),
+                                         name='l%d' % pos))
+            mintaps.append(mintap)
+            arg_info['initial'] = scan_utils.expand(
+                arg_info['initial'][:mintap], T)
+        else:
+            mintaps.append(0)
+            lengths.append(scalar_shared(numpy.int64(0),
+                                         name='l%d' % pos))
+    # 3. Generate arguments for the function passed to scan. This will
+    # function will return the outputs that need to be computed at every
+    # timesteps
+    inputs_slices = [input[t] for input in inputs]
+    states_slices = []
+    for n, state in enumerate(states_and_outputs_info):
+        # Check if it is actually a state and not an output
+        if mintaps[n] != 0:
+            for k in state['taps']:
+                states_slices.append(
+                    state['initial'][(t + mintaps[n] + k) % lengths[n]])
+    # 4. Construct outputs that are to be computed by the inner
+    # function of scan
+    args = inputs_slices + states_slices + parameters
+    cond, states_and_outputs, updates = \
+            scan_utils.get_updates_and_outputs(fn(*args))
+    # User is allowed to provide no information if it only behaves like a
+    # map
+    if (len(states_and_outputs) != len(states_and_outputs_info) and
+        len(states_and_outputs_info) == 0):
+        mintaps = [0] * len(states_and_outputs)
+    # 5. Construct the scan op
+    # 5.1 Construct list of shared variables with updates (those that
+    # can be treated as states (i.e. of TensorType) and those that can not
+    # (like Random States)
+    if cond is not None:
+        _cond = [cond]
+    else:
+        _cond = []
+    rvals = rebuild_collect_shared(
+        states_and_outputs + _cond,
+        updates=updates,
+        rebuild_strict=True,
+        copy_inputs_over=True,
+        no_default_updates=False)
+    # extracting the arguments
+    input_variables, cloned_outputs, other_rval = rvals
+    clone_d, update_d, update_expr, shared_inputs = other_rval
+    additional_input_states = []
+    additional_output_states = []
+    additional_lengths = []
+    additional_mintaps = []
+    original_numeric_shared_variables = []
+    non_numeric_input_states = []
+    non_numeric_output_states = []
+    original_non_numeric_shared_variables = []
+    pos = len(lengths)
+    for sv in shared_inputs:
+        if sv in update_d:
+            if isinstance(sv, TensorType):
+                # We can treat it as a sit sot
+                nw_state = scan_utils.expand(
+                    tensor.unbroadcast(tensor.shape_padleft(sv, 0), T))
+                additional_lengths.append(scalar_shared(numpy.int64(0),
+                                                       name='l%d' % pos))
+                pos = pos + 1
+                additional_mintaps.append(1)
+                additional_input_states.append(nw_state)
+                additional_output_states.append(
+                    scan_utils.clone(tensor.set_subtensor(
+                        nw_state[(t + 1) % additional_lengths[-1]],
+                        update_d[sv])))
+                original_numeric_shared_variables.append(sv)
+            else:
+                non_numeric_input_states.append(sv)
+                non_numeric_output_states.append(update_d[sv])
+                original_non_numeric_shared_variables.append(sv)
+    # 5.2 Collect inputs/outputs of the inner function
+    inputs = []
+    outputs = []
+    for n, mintap in enumerate(mintaps):
+        if mintap != 0:
+            input_state = states_and_outputs_info[n]['initial']
+            inputs.append(input_state)
+            outputs.append(
+                tensor.set_subtensor(
+                    input_state[(t + mintap) % lengths[n]],
+                    states_and_outputs[n]))
+        else:
+            mem_buffer = scan_utils.allocate_memory(
+                T, states_and_outputs_info[n], states_and_outputs[n])
+            inputs.append(output)
+            outputs.append(
+                tensor.set_subtensor(output[t % lengths[n]],
+                                     states_and_outputs[n]))
+    inputs.extend(additional_input_states)
+    outputs.extend(additional_output_states)
+    lengths.extend(additional_lengths)
+    mintaps.extend(additional_mintaps)
+    inputs.extend(non_numeric_input_states)
+    outputs.extend(non_numeric_output_states)
+    all_other_inputs = gof.graph.inputs(outputs)
+    parameters = [x for x in all_other_inputs
+                  if (x not in inputs and x not in lengths and x is not t
+                      and isinstance(x, gof.Variable) and
+                      not isinstance(x, gof.Constant))]
+    inputs.extend(parameters)
+    # 5.3 Construct the the options dictionary
+    options['name'] = name
+    options['profile'] = profile
+    options['mode'] = mode
+    options['inplace'] = False
+    options['gpu'] = False
+    options['truncate_gradient'] = truncate_gradient
+    options['hash_inner_graph'] = 0
+    # 5.4 Construct the ScanOp instance
+    local_op = scan_op.ScanOp(inputs=inputs,
+                              outputs=outputs,
+                              lengths=lengths,
+                              switches=[],
+                              mintaps=mintaps,
+                              index=t,
+                              options=options,
+                              as_repeatUntil=cond)
+    # Note that we get here all the outputs followed by the update rules to
+    # the shared variables we had in our scan
+    # we know that we have (in this given order):
+    #   * len(states_and_outputs) real outputs
+    #   * len(additional_input_states) updates for numeric shared variable
+    #   * len(non_numeric_input_states) updates for non numeric shared
+    #   variables
+    scan_inputs = [T] + inputs
+    scan_outputs_update_rules = scan_utils.to_list(local_op(*scan_inputs))
+    # 5.5 Collect outputs and add permutation object
+    scan_outputs = []
+    for pos in xrange(len(states_and_outputs)):
+        out = scan_utils.ScanPermutation(mintaps[pos])(
+            scan_outputs_update_rules[pos], t)
+        scan_outputs.append(out[mintap:])
+    # 5.6 Construct updates dictionary
+    update_rules = scan_outputs_update_rules[len(states_and_outputs):]
+    updates = {}
+    for v, u in izip(original_numeric_shared_variables,
+                     update_rules[:len(additional_input_states)]):
+        updates[v] = u[-1]
+    for v, u in izip(original_non_numeric_shared_variables,
+                     update_rules[len(additional_input_states):]):
+        updates[v] = u
+    # Step 5.7 We are done and can return everything back to the user
+    return scan_outputs, updates
+def one_step_scan(fn,
+                  inputs,
+                  states_and_outputs_info,
+                  parameters,
+                  truncate_gradient):
+    """
+    This function is evaluated if `n_steps` evaluates to either 1 or -1.
+    """
+    # 1. Grab slices of sequences
+    inputs_slices = [input[0] for input in inputs]
+    # 2. Grab slices of states
+    states_slices = []
+    for n, arg_info in enumerate(states_and_outputs_info):
+        if arg_info.get('taps', None) == [-1]:
+            states_slices.append(arg_info['initial'])
+        elif arg_info.get('taps', None):
+            if numpy.any(numpy.array(arg_info.get('taps', [])) > 0):
+                # Make sure we do not have requests for future values of a
+                # sequence we can not provide such values
+                raise ValueError('Can not use future taps of outputs',
+                                    arg_info)
+            # go through the taps
+            mintap = abs(numpy.min(arg_info['taps']))
+            states_slices.append(arg_info['initial'][k + mintap])
+    # Re-order args
+    args = (inputs_slices + states_slices + parameters)
+    cond, states_and_outputs, updates = \
+                scan_utils.get_updates_and_outputs(fn(*args))
+    # We do not need to use the scan op anymore, so we can just return
+    # the outputs and updates we have
+    if cond is not None:
+        _logger.warning(('When the number of steps is fixed and equal '
+                'to 1, the provided stopping condition, ',
+                str(cond), ' is ignored'))
+    states_and_outputs = [tensor.unbroadcast(
+        tensor.shape_padleft(arg), 0) for arg in states_and_outputs]
+    if len(states_and_outputs) == 1:
+        states_and_outputs = states_and_outputs[0]
+    return (states_and_outputs, updates)
--- a/theano/sandbox/scan_module/scan_op.py
+++ b/theano/sandbox/scan_module/scan_op.py
+"""
+This module provides the Scan Op
+See scan.py for details on scan
+"""
+__docformat__ = 'restructedtext en'
+__authors__ = ("Razvan Pascanu "
+               "Frederic Bastien "
+               "James Bergstra "
+               "Pascal Lamblin ")
+__copyright__ = "(c) 2010, Universite de Montreal"
+__contact__ = "Razvan Pascanu <r.pascanu@gmail>"
+import itertools
+import logging
+import time
+from itertools import izip
+import numpy
+import theano
+from theano.compile import function, Param, Out
+from theano import compile
+from theano import gradient
+from theano.gof.python25 import any
+from theano.gof import PureOp, Apply
+from theano import gof
+from theano.tensor import TensorType
+from theano import tensor
+from theano.tensor.opt import Shape_i
+#from theano.sandbox import cuda
+from theano.compile.profiling import ScanProfileStats
+import scan_utils
+# Logging function for sending warning or info
+_logger = logging.getLogger('theano.scan_module.scan_op')
+class ScanOp(PureOp):
+    def __init__(self,
+                 inputs,
+                 outputs,
+                 lengths,
+                 switches,
+                 mintaps,
+                 index,
+                 options,
+                 as_repeatUntil):
+        self.inputs = inputs
+        self.outputs = outputs
+        self.index = index
+        self.switches = switches
+        self.lengths = lengths
+        self.mintaps = mintaps
+        self.as_repeatUntil = as_repeatUntil
+        self.options = options
+        self.name = options['name']
+        self.mode = options['mode']
+        self.inplace = options['inplace']
+        self.gpu = options['gpu']
+        self.profile = options['profile']
+        self.hash_inner_graph = options['hash_inner_graph']
+        # --Construct the destroy map--
+        if self.inplace:
+            for idx in xrange(len(outputs)):
+                self.destroy_map[idx] = [idx + 1]
+        # --Decide on the default mode--
+        mode_instance = compile.mode.get_mode(self.mode)
+        # if the default mode is used, and that mode is ProfileMode
+        # then we need to copy the mode otherwise the time for a given
+        # op will be counted multiple times
+        if (self.mode is None and
+            isinstance(mode_instance, compile.profilemode.ProfileMode)):
+            mode_instance = compile.profilemode.ProfileMode(
+                optimizer=mode_instance.provided_optimizer,
+                linker=mode_instance.provided_linker)
+            compile.profilemode.prof_mode_instance_to_print.append(
+                                                    mode_instance)
+            self.mode_instance = mode_instance
+            if self.name:
+                self.mode_instance.message = self.name + " sub profile"
+            else:
+                self.mode_instance.message = "Scan sub profile"
+        else:
+            self.mode_instance = mode_instance
+        # --Adding default name--
+        if not hasattr(self, 'name') or self.name is None:
+            self.name = 'scan_fn'
+    def make_node(self, *inputs):
+        # Checking if arguments are of the right type is done in the scan
+        # function
+        out_types = [out.type() for out in self.outputs]
+        return Apply(self, inputs, out_types)
+    def __eq__(self, other):
+        # Check if we are dealing with same type of objects
+        if not type(self) == type(other):
+            return False
+        if self.options != other.options:
+            return False
+        if self.mintals != other.mintaps:
+            return False
+        # Check if the number of different types of arguments is the same
+        diff_args = ['inputs', 'outputs', 'lengths', 'mintaps', 'switches']
+        for arg in diff_args:
+            if len(getattr(self, arg)) != len(getattr(other, arg)):
+                return False
+        for x, y in izip(self.inputs, other.inputs):
+            if x.type != y.type:
+                return False
+        for x, y in izip(self.lengths, other.lengths):
+            if x.type != y.type:
+                return False
+        s_ins = [self.index] + self.inputs + self.lengths + self.switches
+        o_ins = [other.index] + other.inputs + other.lengths + other.switches
+        givens = dict(izip(s_ins, o_ins))
+        # This part might be slow
+        for x, y in izip(self.outputs, other.outputs):
+            if not gof.graph.is_same_graph(x, y, givens=givens):
+                return False
+        return True
+    def __str__(self):
+        if self.gpu:
+            gpu_str = 'gpu'
+        else:
+            gpu_str = 'cpu'
+        if self.as_repeatUntil is not None:
+            name = 'repeat/until'
+        else:
+            name = 'loop'
+        if self.inplace:
+            aux_txt = '%s{inplace,%s,%s}' % (name, gpu_str, str(self.name))
+        else:
+            aux_txt = '%s{%s,%s}' % (name, gpu_str, str(self.name))
+        return aux_txt
+    def __hash__(self):
+        rval = hash(type(self)) ^ self.hash_inner_graph
+        for val in self.options.values():
+            if isinstance(val, (list, tuple)):
+                for el in val:
+                    rval = rval ^ el
+            else:
+                rval = rval ^ val
+        return rval
+    def infer_shape(self, node, input_shapes):
+        for inp, inp_shp in izip(node.inputs, input_shapes):
+            assert inp_shp is None or len(inp_shp) == inp.type.ndim
+        n_outs = len(self.outputs)
+        if self.as_repeatUntil is not None:
+            return [(Shape_i(0)(o),) + x[1:] for o, x
+                    in izip(node.outputs, input_shapes[1: n_outs + 1])]
+        else:
+            return input_shapes[1: n_outs + 1]
+    def make_thunk(self, node, storage_map, compute_map, no_recycling):
+        """
+        :param node: the Apply node returned by the ``make_node`` function
+                     of the scan op class
+        :param storage_map: dict variable -> one-element-list where a computed
+               value for this variable may be found.
+        :param compute_map: dict variable -> one-element-list where a boolean
+                value will be found.  The boolean indicates whether the
+                variable's storage_map container contains a valid value (True)
+                or if it has not been computed yet (False).
+        :param no_recycling: list of variables for which it is forbidden to
+                reuse memory allocated by a previous call.
+        :note: If the thunk consults the storage_map on every call, it is safe
+            for it to ignore the no_recycling argument, because elements of the
+            no_recycling list will have a value of None in the storage map.  If
+            the thunk can potentially cache return values (like CLinker does),
+            then it must not do so for variables in the no_recycling list.
+        """
+        # 1. Collect all memory buffers
+        node_input_storage = [storage_map[r] for r in node.inputs]
+        node_output_storage = [storage_map[r] for r in node.outputs]
+        node_input_compute = [compute_map[r] for r in node.inputs]
+        node_output_compute = [compute_map[r] for r in node.outputs]
+        # 2. Construct fake shared variables around every argument of scan
+        givens = {}
+        base_inputs = self.inputs[:len(self.outputs)]
+        base_buffers = node_input_storage[1: 1 + len(base_inputs)]
+        aux_inputs = self.inputs[len(self.outputs):]
+        aux_membuffers = node_input_storage[1 + len(base_inputs):]
+        # 2.1 First the auxiliary arguments, those that are parameters or
+        # input
+        def fake_shared(var):
+            val = 0
+            for dim in xrange(var.ndim):
+                val = [val]
+            val = numpy.asarray(val, dtype=var.dtype)
+            return theano.shared(val, name=var.name)
+        non_tensor_args = []
+        non_tensor_buffers = []
+        aux_buffers = []
+        for mem_buf, var in izip(aux_membuffers, aux_inputs):
+            if mem_buf[0] is not None:
+                givens[var] = theano.shared(mem_buf[0], name=var.name,
+                                        borrow=True)
+            elif isinstance(var, TensorType):
+                givens[var] = fake_shared(var)
+                aux_buffers.append((givens[var], mem_buf))
+            else:
+                givens[var] = var.type()
+                non_tensor_args.append(givens[var])
+                non_tensor_buffers.append(mem_buf)
+        # 2.2. Next the states (numeric) and the outputs
+        updates = {}
+        state_buffers = []
+        n_numeric_values = len(self.lengths)
+        for pos in xrange(n_numeric_values):
+            var = base_inputs[pos]
+            mem_buf = base_buffers[pos]
+            expr = self.outputs[pos]
+            givens[var] = fake_shared(var)
+            state_buffers.append((givens[var], self.lengths[pos], mem_buf))
+            updates[givens[var]] = expr
+        #2.3 Non-numeric states
+        n_non_numeric = len(self.outputs) - n_numeric_values
+        fn_outs = self.outputs[n_numeric_values:]
+        for var in base_inputs[n_numeric_values:]:
+            givens[var] = var.type()
+            non_tensor_args.append(givens[var])
+        non_numeric_states_bufs = base_buffers[n_numeric_values:]
+        # 2.4 Add the update for the index of scan
+        updates[self.index] = self.index + numpy.int64(1)
+        # 3.1 Construct the inner function of scan
+        if self.as_repeatUntil is not None:
+            fn_outs = self.as_repeatUntil
+        self.fn = theano.function(non_tensor_args, fn_outs,
+                                  givens=givens,
+                                  updates=updates,
+                                  mode=self.mode_instance,
+                                  name=self.name,
+                                  profile=self.profile)
+        # 3.2 Construct the perform
+        if self.as_repeatUntil is not None:
+            # 3.2.1 as a repeat until
+            def p(node, args, outs):
+                pos = 0
+                cont = 1
+                # copy inputs if not inplace
+                if not self.inplace:
+                    for _, _, val in state_buffers:
+                        val[0] = val[0].copy()
+                    for buf in non_numeric_states_bufs:
+                        buf[0] = buf[0].copy()
+                # reset all switches if any
+                for sw in self.switches:
+                    sw.set_value(numpy.int8(0), borrow=True)
+                # set aux shared variables
+                for var, val in aux_buffers:
+                    var.set_value(val[0], borrow=True)
+                # set state shared variables
+                for var, length, val in state_buffers:
+                    var.set_value(val[0], borrow=True)
+                    length.set_value(val[0].shape[0], borrow=True)
+                # grab fixed arguments
+                fix_args = [x[0] for x in non_tensor_buffers]
+                while cont and pos < node_input_storage[0][0]:
+                    extra_args = [x[0] for x in non_numeric_states_bufs]
+                    rvals = self.fn(*(fix_args + extra_args))
+                    for buf, rval in izip(non_numeric_states_bufs, rvals):
+                        buf[0] = rval
+                    cont = rvals[-1]
+                    pos = pos + 1
+                # We need to trim the outputs if they are longer
+                for pos in xrange(n_numeric_values):
+                    buf = state_buffers[pos][2][0]
+                    mintap = self.mintaps[pos]
+                    if buf.shape[0] > pos + self.mintaps[pos]:
+                        node_output_storage[pos][0] = buf[:pos + mintap]
+                    else:
+                        node_output_storage[pos][0] = buf
+                for out_buf, in_buf in izip(
+                        node_output_storage[n_numeric_values:],
+                        non_numeric_states_bufs):
+                    out_buf[0] = in_buf[0]
+        else:
+            # 3.2.2 as a for
+            def p(node, args, outs):
+                # copy inputs if not inplace
+                if not self.inplace:
+                    for _, _, val in state_buffers:
+                        val[0] = val[0].copy()
+                    for buf in non_numeric_states_bufs:
+                        buf[0] = buf[0].copy()
+                # reset all switches if any
+                for sw in self.switches:
+                    sw.set_value(numpy.int8(0), borrow=True)
+                # set aux shared variables
+                for var, val in aux_buffers:
+                    var.set_value(val[0], borrow=True)
+                # set state shared variables
+                for var, length, val in state_buffers:
+                    var.set_value(val[0], borrow=True)
+                    length.set_value(val[0].shape[0], borrow=True)
+                # grab fixed arguments
+                fix_args = [x[0] for x in non_tensor_buffers]
+                for dx in xrange(node_input_storage[0][0]):
+                    extra_args = [x[0] for x in non_numeric_states_bufs]
+                    rvals = self.fn(*(fix_args + extra_args))
+                    for buf, rval in izip(non_numeric_states_bufs, rvals):
+                        buf[0] = rval
+                for pos in xrange(n_numeric_values):
+                    buf = state_buffers[pos][2][0]
+                    mintap = self.mintaps[pos]
+                    node_output_storage[pos][0] = buf
+                for out_buf, in_buf in izip(
+                        node_output_storage[n_numeric_values:],
+                        non_numeric_states_bufs):
+                    out_buf[0] = in_buf[0]
+        # 3.3 construct the rval function
+        def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
+            r = p(n, [x[0] for x in i], o)
+            for out in node.outputs:
+                compute_map[out][0] = True
+            return r
+        rval.inputs = node_input_storage
+        rval.outputs = node_output_storage
+        rval.perform = p
+        rval.lazy = False
+        return rval
+    def grad(self, args, g_outs):
+        pass
+    def R_op(self, inputs, eval_points):
+        pass
+@theano.compile.profilemode.register_profiler_printer
+def profile_printer(fct_name, compile_time, fct_call_time, fct_call,
+                    apply_time, apply_cimpl, message, outputs_size,
+                    other_time):
+    # Scan overhead profile
+    if any([isinstance(node.op, Scan) and v > 0 for (_, node), v in
+            apply_time.items()]):
+        print
+        print 'Scan overhead:'
+        print ('<Scan op time(s)> <sub scan fct time(s)> <sub scan op '
+               'time(s)> <sub scan fct time(% scan op time)> <sub scan '
+               'op time(% scan op time)> <node>')
+        total_super_scan_time = 0
+        total_scan_fct_time = 0
+        total_scan_op_time = 0
+        for (_, node), v in apply_time.items():
+            if isinstance(node.op, Scan):
+                if v > 0:
+                    scan_fct_time = node.op.mode_instance.fn_time
+                    scan_op_time = node.op.mode_instance.local_time
+                    total_super_scan_time += v
+                    total_scan_fct_time += scan_fct_time
+                    total_scan_op_time += scan_op_time
+                    print '    %5.1fs  %5.1fs  %5.1fs  %5.1f%%  %5.1f%%' % (
+                        v, scan_fct_time, scan_op_time,
+                        scan_fct_time / v * 100, scan_op_time / v * 100), node
+                else:
+                    print (' The node took 0s, so we can not compute the '
+                           'overhead'), node
+        print '    total %5.1fs  %5.1fs  %5.1fs  %5.1f%%  %5.1f%%' % (
+            total_super_scan_time, total_scan_fct_time, total_scan_op_time,
+            total_scan_fct_time / total_super_scan_time * 100,
+            total_scan_op_time / total_super_scan_time * 100)
--- a/theano/sandbox/scan_module/scan_utils.py
+++ b/theano/sandbox/scan_module/scan_utils.py
+"""
+This module provides utility functions for the Scan Op
+See scan.py for details on scan
+"""
+__docformat__ = 'restructedtext en'
+__authors__ = ("Razvan Pascanu "
+               "Frederic Bastien "
+               "James Bergstra "
+               "Pascal Lamblin "
+               "Arnaud Bergeron")
+__copyright__ = "(c) 2010, Universite de Montreal"
+__contact__ = "Razvan Pascanu <r.pascanu@gmail>"
+import copy
+import logging
+from itertools import izip
+import numpy
+import theano
+from theano.compile.pfunc import rebuild_collect_shared
+from theano import gof
+from theano import tensor, scalar
+from theano.gof.python25 import all
+from theano.tensor.basic import get_constant_value
+# Logging function for sending warning or info
+_logger = logging.getLogger('theano.scan_utils')
+def expand(tensor_var, size):
+    """
+    Given ``tensor_var``, a Theano tensor of shape (d1, d2, ..), this
+    function constructs a rval Theano tensor of shape (d1 + size, d2, ..)
+    filled with 0s, except the first d1 entries which are taken from
+    ``tensor_var``, namely:
+        rval[:d1] = tensor_var
+    :param tensor_var: Theano tensor variable
+    :param size: int
+    """
+    # Corner case that I might use in an optimization
+    if size == 0:
+        return tensor_var
+    shapes = [tensor_var.shape[x] for x in xrange(tensor_var.ndim)]
+    zeros_shape = [size + shapes[0]] + shapes[1:]
+    empty = tensor.zeros(zeros_shape,
+                               dtype=tensor_var.dtype)
+    return tensor.set_subtensor(empty[:shapes[0]], tensor_var)
+def to_list(ls):
+    """
+    Converts ``ls`` to list if it is a tuple, or wraps ``ls`` into a list if
+    it is not a list already
+    """
+    if isinstance(ls, (list, tuple)):
+        return list(ls)
+    else:
+        return [ls]
+class until(object):
+    """
+    Theano can end on a condition. In order to differentiate this condition
+    from the other outputs of scan, this class is used to wrap the condition
+    around it.
+    """
+    def __init__(self, condition):
+        self.condition = tensor.as_tensor_variable(condition)
+        assert self.condition.ndim == 0
+def get_updates_and_outputs(ls):
+    """
+    Parses the list ``ls`` into outputs and updates. The semantics
+    of ``ls`` is defined by the constructive function of scan.
+    The elemets of ``ls`` are either a list of expressions representing the
+    outputs/states, a dictionary of updates or a condition.
+    """
+    def is_list_outputs(elem):
+        if (isinstance(elem, (list, tuple)) and
+            all([isinstance(x, theano.Variable) for x in elem])):
+            return True
+        if isinstance(elem, theano.Variable):
+            return True
+        return False
+    def is_updates(elem):
+        if isinstance(elem, dict):
+            return True
+        # Dictionaries can be given as lists of tuples
+        if (isinstance(elem, (list, tuple)) and
+            all([isinstance(x, (list, tuple)) and len(x) == 2
+                 for x in elem])):
+            return True
+        return False
+    def is_condition(elem):
+        return isinstance(elem, until)
+    if is_list_outputs(ls):
+        return None, to_list(ls), {}
+    if is_updates(ls):
+        return None, [], dict(ls)
+    if not isinstance(ls, (list, tuple)):
+        raise ValueError(('Scan can not parse the return value'
+                          ' of your constructive function given to scan'))
+    ls = list(ls)
+    deprication_msg = ('The return value of the lambda function'
+                    ' has been restricted. you have to always return first the'
+                    ' outputs (if any), afterwards the updates (if any) and'
+                    ' at the end the condition')
+    error_msg = ('Scan can not parse the return value of your constructive '
+                 'funtion given to scan')
+    if len(ls) == 2:
+        if is_list_outputs(ls[0]):
+            if is_updates(ls[1]):
+                return (None, to_list(ls[0]), dict(ls[1]))
+            elif is_condition(ls[1]):
+                return (ls[1].condition, to_list(ls[0]), {})
+            else:
+                raise ValueError(error_msg)
+        elif is_updates(ls[0]):
+            if is_outputs(ls[1]):
+                raise ValueError(deprication_msg)
+            elif is_condition(ls[1]):
+                return (ls[1].condition, [], dict(ls[0]))
+            else:
+                raise ValueError(error_msg)
+        else:
+            raise ValueError(error_msg)
+    elif len(ls) == 3:
+        if is_outputs(ls[0]):
+            if is_updates(ls[1]):
+                if is_condition(ls[2]):
+                    return (ls[2].condition, to_list(ls[0]), dict(ls[1]))
+                else:
+                    raise ValueError(error_msg)
+            else:
+                raise ValueError(error_msg)
+        else:
+            raise ValueError(error_msg)
+def clone(output, replace=None, strict=True, copy_inputs=True):
+    """
+    Function that allows replacing subgraphs of a computational
+    graph. It returns a copy of the initial subgraph with the corresponding
+    substitutions.
+    :type output: Theano Variables (or Theano expressions)
+    :param outputs: Theano expression that represents the computational
+                    graph
+    :type replace: dict
+    :param replace: dictionary describing which subgraphs should be
+                    replaced by what
+    """
+    inps, outs, other_stuff = rebuild_collect_shared(output,
+                                                     [],
+                                                     replace,
+                                                     [],
+                                                     strict,
+                                                     copy_inputs)
+    return outs
+def canonical_arguments(sequences,
+                        outputs_info,
+                        non_sequences,
+                        go_backwards,
+                        n_steps):
+    """
+    This re-writes the arguments obtained from scan into a more friendly
+    form for the scan_op.
+    Mainly it makes sure that arguments are given as lists of dictionaries,
+    and that the different fields of of a dictionary are set to default
+    value if the user has not provided any.
+    """
+    states_info = to_list(outputs_info)
+    parameters = [tensor.as_tensor_variable(x) for x in to_list(non_sequences)]
+    inputs = []
+    if n_steps is not None:
+        negative_n_steps = tensor.lt(tensor.as_tensor_variable(n_steps), 0)
+    for input in to_list(sequences):
+        if not isinstance(input, dict):
+            nw_input = tensor.as_tensor_variable(input)
+            if go_backwards:
+                nw_input = nw_input[::-1]
+            if n_steps is not None:
+                nw_input = tensor.switch(negative_n_steps, nw_input[::-1],
+                                         nw_input)
+            inputs.append(tensor.as_tensor_variable(nw_input))
+        elif input.get('taps', True) is None:
+            nw_input = tensor.as_tensor_variable(input['input'])
+            if go_backwards:
+                nw_input = nw_input[::-1]
+            if n_steps is not None:
+                nw_input = tensor.switch(negative_n_steps, nw_input[::-1],
+                                         nw_input)
+            inputs.append(nw_input)
+        elif input.get('taps', None):
+            mintap = numpy.min(input['taps'])
+            maxtap = numpy.max(input['taps'])
+            orig_input = tensor.as_tensor_variable(input['input'])
+            if go_backwards:
+                orig_input = orig_input[::-1]
+            if n_steps is not None:
+                orig_input = tensor.switch(negative_n_steps, orig_input[::-1],
+                                         orig_input)
+            for k in input['taps']:
+                # We cut the sequence such that seq[i] to correspond to
+                # seq[i-k]
+                if maxtap < 0:
+                    offset = abs(maxtap)
+                else:
+                    offset = 0
+                nw_input = orig_input
+                if maxtap == mintap and maxtap != 0:
+                    nw_input = nw_input[:abs(maxtap)]
+                elif maxtap - k != 0:
+                    nw_input = nw_input[offset + k - mintap:\
+                                              -(maxtap - k)]
+                else:
+                    nw_input = nw_input[offset + k - mintap:]
+                inputs.append(nw_input)
+        else:
+            raise ValueError('Provided sequence makes no sense', str(input))
+    # Since we've added all sequences now we need to level them up based on
+    # n_steps or their different shapes
+    if n_steps is None:
+        if len(inputs) == 0:
+            # No information about the number of steps
+            raise ValueError('You need to provide either at least '
+                             'one sequence over which scan should loop '
+                             'or a number of steps for scan to loop. '
+                             'Neither of the two had been provided !')
+        T = inputs[0].shape[0]
+        for input in inputs[1:]:
+            T = tensor.minimum(T, input.shape[0])
+    else:
+        T = abs(tensor.as_tensor(n_steps))
+    # Level up sequences
+    inputs = [input[:T] for input in inputs]
+    # wrap outputs info in a dictionary if they are not already in one
+    for i, state in enumerate(states_info):
+        if state is not None and not isinstance(state, dict):
+            states_info[i] = dict(initial=tensor.as_tensor_variable(state),
+                                  taps=[-1])
+        elif isinstance(state, dict):
+            if not state.get('initial', None) and state.get('taps', None):
+                raise ValueError(('If you are using slices of an output '
+                                  'you need to provide a initial state '
+                                  'for it'), state)
+            elif state.get('initial', None) and not state.get('taps', None):
+                # initial state but taps not provided
+                if 'taps' in state:
+                    # explicitly provided a None for taps
+                    _logger.warning(
+                        ('Output %s ( index %d) has a initial '
+                         'state but taps is explicitly set to None '),
+                        getattr(states_info[i]['initial'], 'name', 'None'), i)
+                states_info[i]['taps'] = [-1]
+                states_info[i]['initial'] = \
+                        tensor.as_tensor_variable(state['initial'])
+            elif state.get('initial', None):
+                states_info[i]['initial'] = \
+                        tensor.as_tensor_variable(state['initial'])
+        else:
+            # if a None is provided as the output info we replace it
+            # with an empty dict() to simplify handling
+            states_info[i] = dict()
+    return inputs, states_info, parameters, T
+def infer_shape(outs, inputs, input_shapes):
+    '''
+    Compute the shape of the outputs given the shape of the inputs
+    of a theano graph.
+    We do it this way to avoid compiling the inner function just to get
+    the shape. Changes to ShapeFeature could require changes in this function.
+    '''
+    # We use a ShapeFeature because it has all the necessary logic
+    # inside.  We don't use the full ShapeFeature interface, but we
+    # let it initialize itself with an empty env, otherwise we will
+    # need to do it manually
+    for inp, inp_shp in izip(inputs, input_shapes):
+        if inp_shp is not None and len(inp_shp) != inp.ndim:
+            assert len(inp_shp) == inp.ndim
+    shape_feature = tensor.opt.ShapeFeature()
+    shape_feature.on_attach(theano.gof.Env([], []))
+    # Initialize shape_of with the input shapes
+    for inp, inp_shp in izip(inputs, input_shapes):
+        shape_feature.set_shape(inp, inp_shp)
+    def local_traverse(out):
+        '''
+        Go back in the graph, from out, adding computable shapes to shape_of.
+        '''
+        if out in shape_feature.shape_of:
+            # Its shape is already known
+            return
+        elif out.owner is None:
+            # This is an input of the graph
+            shape_feature.init_r(out)
+        else:
+            # Recurse over inputs
+            for inp in out.owner.inputs:
+                if not inp in shape_feature.shape_of:
+                    local_traverse(inp)
+            # shape_feature.on_import does not actually use an env
+            # It will call infer_shape and set_shape appropriately
+            dummy_env = None
+            shape_feature.on_import(dummy_env, out.owner)
+    ret = []
+    for o in outs:
+        local_traverse(o)
+        ret.append(shape_feature.shape_of[o])
+    return ret
+def allocate_memory(T, y_info, y):
+    """
+    Allocates memory for an output of scan.
+    :param T: scalar
+        Variable representing the number of steps scan will run
+    :param y_info: dict
+        Dictionary describing the output (more specifically describing shape
+        information for the output
+    :param y: Tensor variable
+        Expression describing the computation resulting in out entry of y.
+        It can be used to infer the shape of y
+    """
+    if 'shape' in y_info:
+        return tensor.zeros([T, ] + list(y_info['shape']),
+                            dtype=y.dtype)
+    else:
+        inputs = gof.graph.inputs([y])
+        ins_shapes = []
+        for inp in inputs:
+            in_shape = [inp.shape[k] for k in xrange(inp.ndim)]
+            ins_shapes.append(in_shape)
+        shape = infer_shape([y], inputs, ins_shapes)[0]
+        return tensor.zeros([T, ] + shape, dtype=y.dtype)
+class ScanPermutation(gof.Op):
+    def __init__(self, mintap=0, inplace=False):
+        self.inplace = inplace
+        self.mintap = mintap
+        if inplace:
+            self.destroy_map = {0: [0]}
+    def __eq__(self, other):
+        return type(self) == type(other) and self.inplace == other.inplace
+    def __hash__(self):
+        return hash(type(self)) ^ hash(self.inplace)
+    def __str__(self):
+        if self.inplace:
+            return "scan_permutation{inplace}"
+        else:
+            return "scan_permutation"
+    def make_node(self, membuffer, index):
+        # index has to be a scalar
+        assert index.ndim == 0
+        # we neeed at least one dimension
+        assert membuffer.ndim > 0
+        return gof.Apply(self, [membuffer, index], [membuffer.type()])
+    def perform(self, node, inputs, outputs):
+        membuffer = inputs[0]
+        index = inputs[1] + self.mintap
+        out = outputs[0]
+        if index % membuffer.shape[0] == 0:
+            if self.inplace:
+                out[0] = membuffer
+            else:
+                out[0] = membuffer.copy()
+        else:
+            pos = index % membuffer.shape[0]
+            if outputs[0] is membuffer:
+                membuffer = membuffer.copy()
+            print pos
+            out[0][:membuffer.shape[0] - pos] = membuffer[pos:]
+            out[0][membuffer.shape[0] - pos:] = membuffer[:pos]
+    def R_op(self, inputs, eval_points):
+        if eval_points[0] is None:
+            return [None]
+        return self.make_node(eval_points[0], inputs[1]).outputs
+    def grad(self, inputs, grads):
+        pos = inputs[0].shape[0] - (inputs[1] % inputs[0].shape[0])
+        return self.make_node(grads[0], pos).outputs
--- a/theano/sandbox/scan_module/tests/test_scan.py
+++ b/theano/sandbox/scan_module/tests/test_scan.py
+import os
+import shutil
+from tempfile import mkdtemp
+import time
+import unittest
+import cPickle
+import numpy
+from numpy.testing import dec
+import theano
+import theano.sandbox.rng_mrg
+from theano import tensor
+from theano.compile.pfunc import rebuild_collect_shared
+from theano.gof.python25 import any
+from theano.tests  import unittest_tools as utt
+from numpy.testing.noseclasses import KnownFailureTest
+from test_utils import *
+import theano.sandbox.scan_module as scan_module
+class TestScan(unittest.TestCase):
+    def setUp(self):
+        utt.seed_rng()
+    def new_run(self,
+              inputs_info,
+              states_info,
+              parameters_info,
+              n_outputs,
+              n_shared_updates):
+        """Generates a test for scan.
+        :param inputs_info: list of lists of dictionaries
+            Each list of dictionary represents one input sequence. Each
+            dictionary is one tap of that sequence. The dictionary has two
+            keys. ``use`` is either True or False, and it indicates if this
+            tap should be used in the inner graph or not. ``tap`` is the tap
+            value.
+        :param states_info: list of lists of dictionaries
+            see param ``inputs_info``. ``states_info`` has the same
+            semantics, just that it is for states and not for inputs
+        :param paramters_info: list of dictionary
+            Each dictionary is a different parameter. It has only one key,
+            namely ``use`` which says if the parameter should be used
+            internally or not
+        :param n_outputs: int
+            Number of pure outputs for scan
+        :param n_shared_updates: int
+            Number of shared variable with updates. They are all numeric.
+        """
+        rng = numpy.random.RandomState(utt.fetch_seed())
+        n_ins = len(inputs_info)
+        inputs = [tensor.matrix('u%d' % k) for k in xrange(n_ins)]
+        scan_inputs = []
+        for inp, info in zip(inputs, inputs_info):
+            scan_inputs.append(dict(input=inp, taps=[x['tap'] for x in
+                                                     info]))
+        n_states = len(states_info)
+        states = [tensor.matrix('x%d' % k) for k in xrange(n_states)]
+        scan_states = []
+        states = []
+        for state, info in zip(states, states_info):
+            if len(info) == 1 and info[0]['tap'] == -1:
+                state = tensor.vector('x%d' % k)
+                states.append(state)
+                scan_states.append(state)
+            else:
+                state = tensor.matrix('x%d' % k)
+                states.append(states)
+                scan_states.append(
+                    dict(initial=state, taps=[x['tap'] for x in info]))
+        n_parameters = len(parameters_info)
+        parameters = [tensor.vector('p%d' % k) for k in xrange(n_parameters)]
+        original_shared_values = []
+        shared_vars = []
+        for k in xrange(n_shared_updates):
+            data = rng.uniform(size=(4,)).astype(theano.config.floatX)
+            original_shared_values.append(data)
+            shared_vars.append(theano.shared(data, name='z%d' % k))
+        def inner_function(*args):
+            """
+            Functions that constructs the inner graph of scan
+            """
+            arg_pos = 0
+            to_add = None
+            for in_info in inputs_info:
+                for info in in_info:
+                    arg = args[arg_pos]
+                    arg_pos += 1
+                    # Construct dummy graph around input
+                    if info['use']:
+                        if to_add is None:
+                            to_add = arg * 2
+                        else:
+                            to_add = to_add + arg * 2
+            states_out = [to_add] * n_states
+            for dx, st_info in enumerate(states_info):
+                for info in st_info:
+                    try:
+                        arg = args[arg_pos]
+                    except:
+                        import ipdb; ipdb.set_trace()
+                    arg_pos += 1
+                    if info['use']:
+                        states_out[dx] = states_out[dx] + arg * 3
+            for info in paramters_info:
+                arg = args[arg_pos]
+                arg_pos += 1
+                if info['use']:
+                    if to_add is None:
+                        to_add = arg * 4
+                    else:
+                        to_add = to_add + arg * 4
+            shared_outs = [sh * 5 + to_add for sh in shared_vars]
+            states_out = [x + to_add for x in states_out]
+            pure_outs = [to_add ** 2 for x in xrange(n_outs)]
+            return states_out + pure_outs, dict(zip(shared_vars,
+                                                    shared_outs))
+        def execute_inner_graph(*args):
+            """
+            Functions that computes numerically the values that scan should
+            return
+            """
+            # Check if you need to go back in time over the sequences (the
+            # first argument is n_steps, the second is go_backwards)
+            n_steps = args[0]
+            invert = False
+            if n_steps < 0 or args[1]:
+                new_ins = [x[::-1] for x in args[2: 2 + n_ins]]
+            n_steps = abs(n_steps)
+            # Simplify the inputs by slicing them according to the taps
+            nw_inputs = []
+            for inp, info in zip(new_ins, inputs_info):
+                taps = [x['tap'] for x in info]
+                nw_inputs += [inp[abs(numpy.min(taps)) + k:] for k in taps]
+            # Simplify the states by slicing them according to the taps.
+            # Note that if the memory buffer for the inputs and outputs is
+            # the same, by changing the outputs we also change the outputs
+            nw_states_inputs = []
+            nw_states_outs = []
+            for st, info in zip(args[2 + n_ins:2 + n_ins + n_states],
+                                states_info):
+                taps = [x['tap'] for x in info]
+                membuf = numpy.zeros((n_steps + numpy.max(abs(taps)), 4))
+                membuf[:numpy.max(abs(taps))] = st[:numpy.max(abs(taps))]
+                nw_states_inputs += [membuf[numpy.max(abs(taps)) + k:]
+                                     for k in taps]
+                nw_states_outs.append(membuf[numpy.max(abs(taps)):])
+            paramters = args[2 + n_ins + n_states:]
+            out_mem_buffers = [numpy.zeros((n_steps, 4)) for k in n_outs]
+            shared_values = [x.copy() for x in original_shared_values]
+            for step in xrange(n_steps):
+                arg_pos = 0
+                to_add = None
+                for in_info in inputs_info:
+                    for info in in_info:
+                        arg = nw_inputs[arg_pos][step]
+                        arg_pos += 1
+                        # Construct dummy graph around input
+                        if info['use']:
+                            if to_add is None:
+                                to_add = arg * 2
+                            else:
+                                to_add = to_add + arg * 2
+                states_out = [to_add] * n_states
+                arg_pos = 0
+                for dx, st_info in enumerate(states_info):
+                    nw_states_outs[dx][step] = to_add
+                    for info in st_info:
+                        arg = nw_states_inputs[arg_pos][step]
+                        arg_pos += 1
+                        if info['use']:
+                            nw_states_outs[dx][step] += arg * 3
+                for arg, info in zip(parameters, paramters_info):
+                    if info['use']:
+                        if to_add is None:
+                            to_add = arg * 4
+                        else:
+                            to_add = to_add + arg * 4
+                shared_values = [sh * 5 + to_add for sh in shared_values]
+                for state in nw_states_outs:
+                    state[step] += to_add
+                for out in out_mem_buffers:
+                    out[step] = to_add ** 2
+            return nw_states_outs + out_mem_buffers, shared_values
+        for n_steps in [-1, 1, 5, -5, None]:
+            for go_backwards in [True, False]:
+                outputs, updates = scan_module.scan(
+                    inner_function,
+                    sequences=scan_inputs,
+                    outputs_info=scan_states,
+                    non_sequences=parameters,
+                    n_steps=n_steps,
+                    go_backwards=go_backwards,
+                    truncate_gradient=-1)
+            my_f = theano.function(inputs + states + parameters,
+                                   outputs,
+                                   updates=updates,
+                                   allow_input_downcast=True)
+            if n_steps is not None and abs(n_steps) == 1:
+                assert len([x for x in my_f.maker.env.toposort()
+                        if isinstance(x.op, scan_module.scan_op.ScanOp)]) == 0
+            # Generating data
+            # Scenario 1 : Good fit shapes
+            inputs_values = []
+            for info in inputs_info:
+                taps = [x['tap'] for x in info]
+                offset = abs(numpy.min([x for x in taps if x < 0]))
+                offset += numpy.max([x for x in taps if x > 0])
+                data = rng.uniform(size=(n_steps + offset, 4))
+                inputs_values.append(data)
+            state_values = []
+            for info in states_info:
+                taps = [x['tap'] for x in info]
+                offset = abs(numpy.min(taps))
+                data = rng.uniform(size=(offset, 4))
+                state_values.append(data)
+            param_values = [rng.uniform(size=(4,)) for k in
+                            xrange(n_parameters)]
+            for var, val in zip(shared_vars, original_shared_values):
+                var.set_value(val)
+            theano_outs = my_f(*(inputs_values + state_values +
+                                 param_values))
+            args = ([n_steps, go_backwards] +
+                    input_values +
+                    state_values +
+                    param_values)
+            rvals = execute_inner_graph(*args)
+            numpy_outs, numpy_shared = rvals
+            assert len(numpy_outs) == len(theano_outs)
+            assert len(numpy_shared) == len(shared_vars)
+            for th_out, num_out in zip(theano_outs, numpy_outs):
+                assert numpy.allclose(th_out, num_out)
+            for th_out, num_out in zip(shared_outs, numpy_shared):
+                assert numpy.allclose(th_out.get_value(), num_out)
+            # Scenario 2 : Loose fit (sequences longer then required)
+            inputs_values = []
+            for pos, info in enumerate(inputs_info):
+                taps = [x['tap'] for x in info]
+                offset = abs(numpy.min([x for x in taps if x < 0]))
+                offset += numpy.max([x for x in taps if x > 0])
+                data = rng.uniform(size=(n_steps + offset + pos + 1, 4))
+                inputs_values.append(data)
+            state_values = []
+            for pos, info in enumerate(states_info):
+                taps = [x['tap'] for x in info]
+                offset = abs(numpy.min(taps))
+                data = rng.uniform(size=(offset + pos + 1, 4))
+                state_values.append(data)
+            param_values = [rng.uniform(size=(4,)) for k in
+                            xrange(n_parameters)]
+            for var, val in zip(shared_vars, original_shared_values):
+                var.set_value(val)
+            theano_outs = my_f(*(inputs_values + state_values +
+                                 param_values))
+            args = ([n_steps, go_backwards] +
+                    input_values +
+                    state_values +
+                    param_values)
+            rvals = execute_inner_graph(*args)
+            numpy_outs, numpy_shared = rvals
+            assert len(numpy_outs) == len(theano_outs)
+            assert len(numpy_shared) == len(shared_vars)
+            for th_out, num_out in zip(theano_outs, numpy_outs):
+                assert numpy.allclose(th_out, num_out)
+            for th_out, num_out in zip(shared_outs, numpy_shared):
+                assert numpy.allclose(th_out.get_value(), num_out)
+            # Scenario 3 : Less data then required
+            inputs_values = []
+            for pos, info in enumerate(inputs_info):
+                taps = [x['tap'] for x in info]
+                offset = abs(numpy.min([x for x in taps if x < 0]))
+                offset += numpy.max([x for x in taps if x > 0])
+                data = rng.uniform(size=(n_steps + offset - 1, 4))
+                inputs_values.append(data)
+            state_values = []
+            for pos, info in enumerate(states_info):
+                taps = [x['tap'] for x in info]
+                offset = abs(numpy.min(taps))
+                data = rng.uniform(size=(offset - 1, 4))
+                state_values.append(data)
+            param_values = [rng.uniform(size=(4,)) for k in
+                            xrange(n_parameters)]
+            for var, val in zip(shared_vars, original_shared_values):
+                var.set_value(val)
+            self.assertRaises(Exception, my_f,
+                              inputs + state_values + param_values)
+    def test000_generate_tests(self):
+        rng = numpy.random.RandomState(utt.fetch_seed())
+        all_inputs_info = [[]]
+        possible_taps_use_pairs = [[dict(tap=0, use=True)],
+                                   [dict(tap=0, use=False)],
+                                   [dict(tap=-3, use=True),
+                                        dict(tap=-1, use=True)],
+                                   [dict(tap=-3, use=True),
+                                        dict(tap=-1, use=False)],
+                                   [dict(tap=-3, use=False),
+                                        dict(tap=-1, use=False)],
+                                   [dict(tap=-2, use=True),
+                                        dict(tap=0, use=True)],
+                                   [dict(tap=-2, use=False),
+                                        dict(tap=0, use=True)],
+                                   [dict(tap=-2, use=False),
+                                        dict(tap=0, use=False)],
+                                   [dict(tap=0, use=True),
+                                        dict(tap=3, use=True)],
+                                   [dict(tap=2, use=True),
+                                        dict(tap=3, use=True)],
+                                   [dict(tap=-2, use=True),
+                                        dict(tap=3, use=True)]]
+        for n_ins in [1,2]:
+            # Randomly pick up 4*n_ins combinations of arguments
+            for k in xrange(4*n_ins):
+                inp = []
+                for inp_nb in xrange(n_ins):
+                    pos = rng.randint(len(possible_taps_use_pairs))
+                    inp.append(possible_taps_use_pairs[pos])
+                all_inputs_info.append(inp)
+        all_states_info = [[]]
+        possible_taps_use_pairs = [[dict(tap=-1, use=True)],
+                                   [dict(tap=-1, use=False)],
+                                   [dict(tap=-3, use=True)],
+                                   [dict(tap=-3, use=False)],
+                                   [dict(tap=-3, use=True),
+                                        dict(tap=-1, use=True)],
+                                   [dict(tap=-3, use=True),
+                                        dict(tap=-1, use=False)],
+                                   [dict(tap=-3, use=False),
+                                        dict(tap=-1, use=False)],
+                                   [dict(tap=-4, use=True),
+                                        dict(tap=-2, use=True)],
+                                   [dict(tap=-4, use=False),
+                                        dict(tap=-2, use=True)]]
+        for n_ins in [1,2]:
+            # Randomly pick up 4*n_ins combinations of arguments
+            for k in xrange(4*n_ins):
+                state = []
+                for state_nb in xrange(n_ins):
+                    pos = rng.randint(len(possible_taps_use_pairs))
+                    state.append(possible_taps_use_pairs[pos])
+                all_states_info.append(state)
+        all_parameters_info = [[],
+                           [dict(use=False)],
+                           [dict(use=True)],
+                           [dict(use=True), dict(use=True)],
+                           [dict(use=True), dict(use=False)]]
+        for n_outputs in [0,1,2]:
+            for n_shared_updates in [0,1,2]:
+                for n_random_combinations in xrange(14):
+                    pos_inp = rng.randint(len(all_inputs_info))
+                    pos_st = rng.randint(len(all_states_info))
+                    pos_param = rng.randint(len(all_parameters_info))
+                    self.new_run(inputs_info=all_inputs_info[pos_inp],
+                             states_info=all_states_info[pos_st],
+                             parameters_info=all_parameters_info[pos_param],
+                             n_outputs=n_outputs,
+                             n_shared_updates=n_shared_updates)
+    def test001_generator_one_scalar_output(self):
+        def f_pow2(x_tm1):
+            return 2 * x_tm1
+        for n_steps in [-1, 1, 5, -5]:
+            state = theano.tensor.scalar('state')
+            output, updates = scan_module.scan(f_pow2,
+                                               [],
+                                               state,
+                                               [],
+                                               n_steps=n_steps,
+                                               truncate_gradient=-1,
+                                               go_backwards=False)
+            my_f = theano.function([state],
+                                   output,
+                                   updates=updates,
+                                   allow_input_downcast=True)
+            if abs(n_steps) == 1:
+                assert len([x for x in my_f.maker.env.toposort()
+                        if isinstance(x.op, scan_module.scan_op.ScanOp)]) == 0
+            rng = numpy.random.RandomState(utt.fetch_seed())
+            state = rng.uniform()
+            numpy_values = numpy.array([state * (2 ** (k + 1)) for k
+                                        in xrange(abs(n_steps))])
+            theano_values = my_f(state)
+            assert numpy.allclose(numpy_values, theano_values)
+    # simple rnn, one input, one state, weights for each; input/state
+    # are vectors, weights are scalars
+    def test002_one_sequence_one_output_and_weights(self):
+        def f_rnn(u_t, x_tm1, W_in, W):
+            return u_t * W_in + x_tm1 * W
+        u = theano.tensor.vector('u')
+        x0 = theano.tensor.scalar('x0')
+        W_in = theano.tensor.scalar('win')
+        W = theano.tensor.scalar('w')
+        output, updates = scan_module.scan(f_rnn,
+                                      u,
+                                      x0,
+                                      [W_in, W],
+                                      n_steps=n_steps,
+                                      truncate_gradient=-1,
+                                      go_backwards=False)
+        my_f = theano.function([u, x0, W_in, W],
+                               output,
+                               updates=updates,
+                               allow_input_downcast=True)
+        if n_steps is not None and abs(n_steps) == 1:
+            assert len([x for x in my_f.maker.env.toposort()
+                    if isinstance(x.op, scan_module.scan_op.ScanOp)]) == 0
+        # get random initial values
+        rng = numpy.random.RandomState(utt.fetch_seed())
+        v_u = rng.uniform(size=(8,), low=-5., high=5.)
+        v_x0 = rng.uniform()
+        W = rng.uniform()
+        W_in = rng.uniform()
+        # compute the output in numpy
+        if n_steps is not None and n_steps < 0:
+            _v_u = v_u[::-1]
+        else:
+            _v_u = v_u
+        steps = 8
+        if n_steps is not None:
+            steps = abs(n_steps)
+        v_out = numpy.zeros((8,))
+        v_out[0] = _v_u[0] * W_in + v_x0 * W
+        for step in xrange(1, steps):
+            v_out[step] = _v_u[step] * W_in + v_out[step - 1] * W
+        v_out = v_out[:steps]
+        theano_values = my_f(v_u, v_x0, W_in, W)
+        assert numpy.allclose(theano_values, v_out)
+    def test003_multiple_inputs_multiple_outputs(self):
+        pass
+    def test004_collect_parameters_outer_graph(self):
+        pass
+    def test005_multiple_taps(self):
+        pass
+    def test006_updates(self):
+        pass
--- a/theano/sandbox/scan_module/tests/test_utils.py
+++ b/theano/sandbox/scan_module/tests/test_utils.py
+import cPickle
+import numpy
+import unittest
+import theano
+from theano.compile.pfunc import rebuild_collect_shared
+import theano.sandbox.scan_module as scan_module
+if theano.config.mode == 'FAST_COMPILE':
+    mode_with_opt = theano.compile.mode.get_mode('FAST_RUN')
+else:
+    mode_with_opt = theano.compile.mode.get_default_mode()
+mode_with_gpu = mode_with_opt.including('gpu', 'scan')
+# TODO: this should replace the verify_grad in tensor/tensor_grad.py
+class multiple_outputs_numeric_grad:
+    """WRITEME"""
+    type_eps = {'float64': 1e-7,
+            'float32': 3e-3}
+    def __init__(self, f, pt, ndarray_mask=None, eps=None):
+        """Return the gradient of f at pt.
+        This function computes the gradient by a one-sided finite differences
+        of a fixed step size (eps).
+        It is assumed that f(...) will return a scalar.
+        :param eps: the stepsize for the finite differencing. None means
+        input dtype-dependent. See `type_eps`.
+        """
+        def prod(inputs):
+            rval = 1
+            for i in inputs:
+                rval *= i
+            return rval
+        packed_pt = False
+        if not isinstance(pt, (list, tuple)):
+            pt = [pt]
+            packed_pt = True
+        # This mask tells us if we are dealing with an ndarray input or
+        # something else ( a random state ? ) with which we shouldn't really
+        # mess up
+        if not ndarray_mask:
+            ndarray_mask = [True for x in pt]
+        dtype_eps = multiple_outputs_numeric_grad.type_eps['float64']
+        for i, p in enumerate(pt):
+            if ndarray_mask[i]:
+                pt[i] = numpy.array(p)
+                _eps = multiple_outputs_numeric_grad.type_eps[str(
+                                            pt[i].dtype)]
+                if _eps > dtype_eps:
+                    dtype_eps = _eps
+        self.ndarray_mask = ndarray_mask
+        #'''
+        # Compute clean output:
+        f_x = f(*pt)
+        gx = []
+        # now iterate over the elements of x and call f on those + delta x
+        for i in xrange(len(pt)):
+            if ndarray_mask[i]:
+                # It is a ndarray that we can tweak
+                if eps:
+                    _eps = eps
+                else:
+                    _eps = dtype_eps
+                if pt[i].ndim:
+                    _g = []
+                    # it has several dimensions:
+                    for pos in xrange(prod(pt[i].shape)):
+                        t = pt[i].copy()
+                        t = t.flatten()
+                        t[pos] += _eps
+                        t = t.reshape(pt[i].shape)
+                        f_eps = f(*(pt[:i] + [t] + pt[i + 1:]))
+                        _g.append(numpy.asarray((f_eps - f_x) / _eps))
+                    gx.append(numpy.asarray(_g).reshape(pt[i].shape))
+                else:
+                    t = numpy.array(pt[i] + _eps)
+                    f_eps = f(*(pt[:i] + [t] + pt[i + 1:]))
+                    gx.append(numpy.asarray((f_eps - f_x) / _eps))
+        self.gx = gx
+    @staticmethod
+    def abs_rel_err(a, b, eps=1.0e-10):
+        """Return a small number when a and b are close, relative to how big
+        they are"""
+        return abs(a - b) / (abs(a) + abs(b) + eps)
+    def max_err(self, _g_pt):
+        """Return the biggest relative error between g_pt and self.gx"""
+        g_pt = []
+        for i in xrange(len(_g_pt)):
+            if self.ndarray_mask[i]:
+                g_pt.append(_g_pt[i])
+            elif isinstance(_g_pt[i], numpy.ndarray):
+                assert numpy.all(_g_pt[i] == 0)
+        if len(g_pt) != len(self.gx):
+            raise ValueError('argument has wrong number of elements',
+                             len(g_pt))
+        errs = []
+        for i, (a, b) in enumerate(zip(g_pt, self.gx)):
+            if a.shape != b.shape:
+                raise ValueError('argument element %i has wrong shape %s' % \
+                                 (i, str((a.shape, b.shape))))
+            vv = multiple_outputs_numeric_grad.abs_rel_err(a, b)
+            errs.append(numpy.max(
+                multiple_outputs_numeric_grad.abs_rel_err(a, b)))
+        if numpy.all(numpy.isfinite(errs)):
+            return numpy.max(errs), numpy.argmax(errs)
+        else:
+            return numpy.inf, 0
+def scan_project_sum(*args, **kwargs):
+    rng = theano.tensor.shared_randomstreams.RandomStreams(123)
+    scan_outputs, updates = theano.scan(*args, **kwargs)
+    if type(scan_outputs) not in [list, tuple]:
+        scan_outputs = [scan_outputs]
+    # we should ignore the random-state updates so that
+    # the uniform numbers are the same every evaluation and on every call
+    rng.add_default_updates = False
+    factors = [rng.uniform(size=s.shape, low=0.1, high=0.9) for s
+               in scan_outputs]
+    return (sum([(s * f).sum() for s, f in zip(scan_outputs, factors)]),
+            updates)
+def asarrayX(value):
+    return theano._asarray(value, dtype=theano.config.floatX)
+def clone_optimized_graph(f):
+    maker_ins = [x for x in f.maker.env.inputs
+                 if not isinstance(x, theano.tensor.sharedvar.SharedVariable)]
+    inps, outs, _ = rebuild_collect_shared(f.maker.env.outputs,
+                                           maker_ins,
+                                           copy_inputs_over=False)
+    ins = [x for x in inps
+           if not isinstance(x, theano.tensor.sharedvar.SharedVariable)]
+    return (ins, outs)
+def grab_scan_node(output):
+    if output.owner is None:
+        return None
+    if output.owner.op.__class__.__name__ == 'Scan':
+        return [output.owner]
+    rval = []
+    for i in output.owner.inputs:
+        ri = grab_scan_node(i)
+        if ri is not None:
+            rval += ri
+    if rval is []:
+        return None
+    else:
+        return rval
+class TestScanUtils(unittest.TestCase):
+    def test_cloning_no_replace_strict_copy_inputs(self):
+        # This has nothing to do with scan, but it refers to the clone
+        # function that scan uses internally and that pfunc uses now and
+        # that users might want to use
+        x = theano.tensor.vector('x')
+        y = theano.tensor.vector('y')
+        z = theano.shared(0.25)
+        f1 = z * (x + y) ** 2 + 5
+        f2 = scan_module.scan_utils.clone(f1,
+                                          replace=None,
+                                          strict=True,
+                                          copy_inputs=True)
+        f2_inp = theano.gof.graph.inputs([f2])
+        assert z in f2_inp
+        assert x in f2_inp
+        assert y in f2_inp
+    def test_cloning_no_replace_strict_not_copy_inputs(self):
+        # This has nothing to do with scan, but it refers to the clone
+        # function that scan uses internally and that pfunc uses now and
+        # that users might want to use
+        x = theano.tensor.vector('x')
+        y = theano.tensor.vector('y')
+        z = theano.shared(0.25)
+        f1 = z * (x + y) ** 2 + 5
+        f2 = scan_module.scan_utils.clone(f1,
+                                          replace=None,
+                                          strict=True,
+                                          copy_inputs=False)
+        f2_inp = theano.gof.graph.inputs([f2])
+        assert not z in f2_inp
+        assert not x in f2_inp
+        assert not y in f2_inp
+    def test_cloning_replace_strict_copy_inputs(self):
+        # This has nothing to do with scan, but it refers to the clone
+        # function that scan uses internally and that pfunc uses now and
+        # that users might want to use
+        x = theano.tensor.vector('x')
+        y = theano.tensor.vector('y')
+        y2 = theano.tensor.vector('y2')
+        z = theano.shared(0.25)
+        f1 = z * (x + y) ** 2 + 5
+        f2 = scan_module.scan_utils.clone(f1,
+                                          replace={y: y2},
+                                          strict=True,
+                                          copy_inputs=True)
+        f2_inp = theano.gof.graph.inputs([f2])
+        assert z in f2_inp
+        assert x in f2_inp
+        assert y2 in f2_inp
+    def test_cloning_replace_not_strict_copy_inputs(self):
+        # This has nothing to do with scan, but it refers to the clone
+        # function that scan uses internally and that pfunc uses now and
+        # that users might want to use
+        x = theano.tensor.vector('x')
+        y = theano.tensor.fvector('y')
+        y2 = theano.tensor.dvector('y2')
+        z = theano.shared(0.25)
+        f1 = z * (x + y) ** 2 + 5
+        f2 = scan_module.scan_utils.clone(f1,
+                                          replace={y: y2},
+                                          strict=False,
+                                          copy_inputs=True)
+        f2_inp = theano.gof.graph.inputs([f2])
+        assert z in f2_inp
+        assert x in f2_inp
+        assert y2 in f2_inp
+    def test_cloning_replace_strict_not_copy_inputs(self):
+        # This has nothing to do with scan, but it refers to the clone
+        # function that scan uses internally and that pfunc uses now and
+        # that users might want to use
+        x = theano.tensor.vector('x')
+        y = theano.tensor.vector('y')
+        y2 = theano.tensor.vector('y2')
+        z = theano.shared(0.25)
+        f1 = z * (x + y) ** 2 + 5
+        f2 = scan_module.scan_utils.clone(f1,
+                                          replace={y: y2},
+                                          strict=True,
+                                          copy_inputs=False)
+        f2_inp = theano.gof.graph.inputs([f2])
+        assert not z in f2_inp
+        assert not x in f2_inp
+        assert not y2 in f2_inp
+    def test_cloning_replace_not_strict_not_copy_inputs(self):
+        # This has nothing to do with scan, but it refers to the clone
+        # function that scan uses internally and that pfunc uses now and
+        # that users might want to use
+        x = theano.tensor.vector('x')
+        y = theano.tensor.fvector('y')
+        y2 = theano.tensor.dvector('y2')
+        z = theano.shared(0.25)
+        f1 = z * (x + y) ** 2 + 5
+        f2 = scan_module.scan_utils.clone(f1,
+                                          replace={y: y2},
+                                          strict=False,
+                                          copy_inputs=False)
+        f2_inp = theano.gof.graph.inputs([f2])
+        assert not z in f2_inp
+        assert not x in f2_inp
+        assert not y2 in f2_inp
--- a/theano/tensor/basic.py
+++ b/theano/tensor/basic.py
@@ -2866,7 +2866,8 @@ def extract_constant(x):
        x = get_constant_value(x)
    except Exception:
        pass
-    if isinstance(x, scal.ScalarVariable):
+    if (isinstance(x, scal.ScalarVariable) or
+        isinstance(x, scal.sharedvar.ScalarSharedVariable)):
        if x.owner and isinstance(x.owner.op, ScalarFromTensor):
            x = x.owner.inputs[0]
        else: