[scan][doc][coding-style] re-arranged the documentation of scan parameters

96676ed5 · Razvan Pascanu · b15fadcc · 96676ed5
--- a/theano/scan.py
+++ b/theano/scan.py
@@ -268,161 +268,247 @@ def foldr( fn
 #   Yes, actually it will be exactly 2 ( if there are no other constraints)


-def scan(fn, sequences=[], outputs_info=[], non_sequences=[],
-         n_steps = None, truncate_gradient = -1, go_backwards = False,
-         mode = None, name = None):
-    """Function that constructs and applies a Scan op
+def scan( fn
+         , sequences         = None
+         , outputs_info      = None
+         , non_sequences     = None
+         , n_steps           = None
+         , truncate_gradient = -1
+         , go_backwards      = False
+         , mode              = None
+         , name              = None ):
+    """
+    This function constructs and applies a Scan op to the provided
+    arguments.

    :param fn:
-        Function that describes the operations involved in one step of scan
-        Given variables representing all the slices of input and past values of
-        outputs and other non sequences parameters, ``fn`` should produce
-        variables describing the output of one time step of scan. The order in
-        which the argument to this function are given is very important. You
-        should have the following order:
-
-        * all time slices of the first sequence (as given in the
-          ``sequences`` list) ordered in the same fashion as the time taps provided
-        * all time slices of the second sequence (as given in the
-          ``sequences`` list) ordered in the same fashion as the time taps provided
+        ``fn`` is a function that describes the operations involved in one step
+        of ``scan``. ``fn`` should construct variables describing the output of
+        one iteration step. It should expect as input theano variables
+        representing all the time slices of the input sequences and outputs,
+        and all other arguments given to scan as ``non_sequences``. The order
+        in which scan passes this variables to ``fn``  is the following :
+
+        * all time slices of the first sequence
+        * all time slices of the second sequence
        * ...
-        * all time slices of the first output (as given in the
-          ``initial_state`` list) ordered in the same fashion as the time taps provided
-        * all time slices of the second otuput (as given in the
-          ``initial_state`` list) ordered in the same fashion as the time taps provided
+        * all time slices of the last sequence
+        * all time slices of the first output
+        * all time slices of the second otuput
        * ...
-        * all other parameters over which scan doesn't iterate ordered accordingly
-
-        If you are using shared variables over which you do not want to iterate, 
-        you do not need to provide them as arguments to ``fn``, though you can if you 
-        wish so. The function should return the outputs after each step plus the updates 
-        for any of the shared variables. You can either return only outputs or only
-        updates. If you have both outputs and updates the function should return
-        them as a tuple : (outputs, updates) or (updates, outputs).
+        * all time slices of the last output
+        * all other arguments (the list given as `non_sequences` to
+            scan)
+
+        The order of the sequences is the same as the one in the list
+        `sequences` given to scan. The order of the outputs is the sane
+        as the order of ``output_info``. For any sequence or output the
+        order of the time slices is the same as the order of the time
+        taps provided. For example if one writes the following :
+
+        .. code-block:: python
+
+            scan(fn, sequences = [ dict( Sequence1, taps = [-3,2,-1])
+                                 , Sequence2
+                                 , dict( Sequence3, taps = 3) ]
+                   , outputs_info = [ dict( Output1, taps = [-3,-5])
+                                    , dict( Output2, taps = None)
+                                    , Output3 ]
+                   , non_sequences = [ Argument1, Argument 2])
+
+        ``fn`` should expect the following arguments in this given order:
+
+        #. ``Sequence1[t-3]``
+        #. ``Sequence1[t+2]``
+        #. ``Sequence1[t-1]``
+        #. ``Sequence2[t]``
+        #. ``Sequence3[t+3]``
+        #. ``Output1[t-3]``
+        #. ``Output1[t-5]``
+        #. ``Output3[t-1]``
+        #. ``Argument1``
+        #. ``Argument2``
+
+        The list of ``non_sequences`` can also contain shared variables
+        used in the function, though ``scan`` is able to figure those
+        out on its own so they can be skipped. For the clarity of the
+        code we recommand though to provide them to scan.
+
+        The function is expected to return two things. One is a list of
+        outputs ordered in the same order as ``outputs_info``, with the
+        difference that there should be only one output variable per
+        output initial state (even if no tap value is used). Secondly
+        `fn` should return an update dictionary ( that tells how to
+        update any shared variable after each iteration ste). The
+        dictionary can optionally be given as a list of tuples. There is
+        no constraint on the order of these two list, ``fn`` can return
+        either ``(outputs_list, update_dictionary)`` or ``(update_dictionary,
+        outputs_list)`` or just one of the two (in case the other is
+        empty).

-        Outputs can be just a theano expression if you have only one output or
-        a list of theano expressions. Updates can be given either as a list of tuples or
-        as a dictionary. If you have a list of outputs, the order of these
-        should match that of their ``initial_states``.

    :param sequences:
-        list of Theano variables or dictionaries containing Theano variables over which
-        scan needs to iterate. The reason you might want to wrap a certain Theano
-        variable in a dictionary is to provide auxiliary information about how to iterate
-        over that variable. For example this is how you specify that you want to use
-        several time slices of this sequence at each iteration step. The dictionary
-        should have the following keys :
-
-        * ``input`` -- Theano variable representing the sequence
-        * ``taps`` -- temporal taps to use for this sequence. They are given as a list
-          of ints, where a value ``k`` means that at iteration step ``t`` scan needs to
-          provide also the slice ``t+k`` The order in which you provide these int values
-          here is the same order in which the slices will be provided to ``fn``.
-
-        If you do not wrap a variable around a dictionary, scan will do it for you, under
-        the assumption that you use only one slice, defined as a tap of offset 0. This
-        means that at step ``t`` scan will provide the slice at position ``t``.
+        ``sequences`` is the list of Theano variables or dictionaries
+        describing the sequences ``scan`` has to iterate over. If a
+        sequence is given as wrapped in a dictionary a set of optional
+        information can be provided about the sequence. The dictionary
+        should have the following keys:
+
+        * ``input`` (*mandatory*) -- Theano variable representing the
+          sequence.
+
+        * ``taps`` -- Temporal taps of the sequence required by ``fn``.
+          They are provided as a list of integers, where a value ``k`` impiles
+          that at iteration step ``t`` scan will pass to ``fn`` the slice
+          ``t+k``. Default value is ``[0]``
+
+        Any Theano variable in the list ``sequences`` is automatically
+        wrapped into a dictionary where ``taps`` is set to ``[0]``
+

    :param outputs_info:
-        list of Theano variables or dictionaries containing Theano variables used
-        to initialize the outputs of scan. As before (for ``sequences``) the reason
-        you would wrap a Theano variable in a dictionary is to provide additional
-        information about how scan should deal with that specific output. The dictionary
-        should contain the following keys:
-
-        * ``initial`` -- Theano variable containing the initial state of the output
-        * ``taps`` -- temporal taps to use for this output. The taps are given as a
-          list of ints (only negative .. since you can not use future values of outputs),
-          with the same meaning as for ``sequences`` (see above).
-        * ``inplace`` -- theano variable pointing to one of the input sequences; this
-          flag tells scan that the output should be computed in the memory space occupied
-          by that input sequence. Note that scan will only do this if allowed by the
-          rest of your computational graph and if you are not using past taps of the 
-          input.
-        * ``return_steps`` how many steps to return from your output. If not given, or 
-          0 scan will return all steps, otherwise it will return the last ``return_steps``.
-          Note that if you set this to something else then 0, scan will try to be smart
-          about the amount of memory it allocates for a given input.
-
-        If the function applied recursively uses only the
-        previous value of the output, the initial state should have
-        same shape as one time step of the output; otherwise, the initial state
-        should have the same number of dimension as output. This is easily
-        understood through an example. For computing ``y[t]`` let us assume that we
-        need ``y[t-1]``, ``y[t-2]`` and ``y[t-4]``. Through an abuse of
-        notation, when ``t = 0``, we would need values for ``y[-1]``, ``y[-2]``
-        and ``y[-4]``. These values are provided by the initial state of ``y``,
-        which should have same number  of dimension as ``y``, where the first
-        dimension should be large enough to cover all the required past values, which in 
-        this case is 4.  If ``init_y`` is the variable containing the initial state
-        of ``y``, then ``init_y[0]`` corresponds to ``y[-4]``, ``init_y[1]``
-        corresponds to ``y[-3]``, ``init_y[2]`` corresponds to ``y[-2]``,
-        ``init_y[3]`` corresponds to ``y[-1]``. The default behaviour of scan is
-        the following :
-
-        * if you do not wrap an output in a dictionary, scan will wrap it for you
-          assuming that you use only the last step of the output ( i.e. it makes your tap
-          value list equal to [-1]) and that it is not computed inplace
-        * if you wrap an output in a dictionary and you do not provide any taps but
-          you provide an initial state it will assume that you are using only a tap value
-          of -1
-        * if you wrap an output in a dictionary but you do not provide any initial state,
-          it assumes that you are not using any form of taps
-        * if you provide a ``None`` instead of a variable or a dictionary scan assumes 
-          that you will not use any taps for this output (this would be the case for map)
-
-        If you did not provide any information for your outputs, scan will assume by 
-        default that you are not using any taps for any of the outputs. If you provide 
-        information for just a subset of outputs, scan will not know to which outputs 
-        these correspond and will raise an error.
+        ``outputs_info`` is the list of Theano variables or dictionaries
+        describing the initial state of the outputs computed
+        recurrently. When this initial states are given as dictionary
+        optional information can be provided about the output corresponding
+        to these initial states. The dictionary should have the following
+        keys:
+
+        * ``initial`` -- Theano variable that represents the initial
+          state of a given output. In case the output is not computed
+          recursively (think of a map) and does not require a initial
+          state this field can be skiped. Given that only the previous
+          time step of the output is used by ``fn`` the initial state
+          should have the same shape as the output. If multiple time
+          taps are used, the initial state should have one extra
+          dimension that should cover all the possible taps. For example
+          if we use ``-5``, ``-2`` and ``-1`` as past taps, at step 0,
+          ``fn`` will require (by an abuse of notation) ``output[-5]``,
+          ``output[-2]`` and ``output[-1]``. This will be given by
+          the initial state, which in this case should have the shape
+          (5,)+output.shape. If this variable containing the initial
+          state is called ``init_y`` then ``init_y[0]`` *corresponds to*
+          ``output[-5]``. ``init_y[1]`` *correponds to* ``output[-4]``,
+          ``init_y[2]`` corresponds to ``output[-3]``, ``init_y[3]``
+          coresponds to ``output[-2]``, ``init_y[4]`` corresponds to
+          ``output[-1]``. While this order might seem strange, it comes
+          natural from splitting an array at a given point. Assume that
+          we have a array ``x``, and we choose ``k`` to be time step
+          ``0``. Then our initial state would be ``x[:k]``, while the
+          output will be ``x[k:]``. Looking at this split, elements in
+          ``x[:k]`` are ordered exactly like those in ``init_y``.
+        * ``taps`` -- Temporal taps of the output that will be pass to
+          ``fn``. They are provided as a list of *negative* integers,
+          where a value ``k`` implies that at iteration step ``t`` scan will
+          pass to ``fn`` the slice ``t+k``.
+        * ``inplace`` -- One of the Theano variables provided as
+          ``sequences``. ``scan`` will try to compute this output *in
+          place* of the provided input *iff* it respects the following
+          constraints:
+
+            * There is no other output that is denied to be computed in
+              place for whatever reason.
+
+            * ``fn`` is not using past taps of the input sequence that
+              will get overwritten by the output
+
+        * ``return_steps`` -- Integer representing the number of steps
+          to return for the current steps. For example, if ``k`` is
+          provided, ``scan`` will return ``output[-k:]``. This is meant as a
+          hint, based on ``k`` and the past taps of the outputs used, scan
+          can be smart about the amount of memory it requires to store
+          intermidiate results. If not given, or ``0``, ``scan`` will return
+          all computed steps.
+        * ``store_steps`` -- Integer representing the number of
+          intermidiate steps ``scan`` should use for a given output. Use
+          this key only if you really know what you are doing. In general
+          is recommendat to let scan decide for you the ammount of memory
+          it should use.
+
+        ``scan`` will follow this logic if partial information is given:
+
+        * If an output is not wrapped in a dictionary, ``scan`` will wrap
+          it in one assuming that you use only the last step of the output
+          (i.e. it makes your tap value list equal to [-1]) and that it is
+          not computed inplace.
+        * If you wrap an output in a dictionary and you do not provide any
+          taps but you provide an initial state it will assume that you are
+          using only a tap value of -1.
+        * If you wrap an output in a dictionary but you do not provide any
+          initial state, it assumes that you are not using any form of
+          taps.
+        * If you provide a ``None`` instead of a variable or a dictionary
+          ``scan`` assumes that you will not use any taps for this output
+          (like for example in case of a map)
+
+        If ``outputs_info`` is an empty list or None, ``scan`` assumes
+        that no tap is used for any of the otuputs. If information is
+        provided just for a subset of the outputs an exception is
+        raised (because there is no convention on how scan should map
+        the provided information to the outputs of ``fn``)
+

    :param non_sequences:
-        Parameters over which scan should not iterate.  These parameters are
-        given at each time step to the function applied recursively.
+        ``non_sequences`` is the list of arguments that are passed to
+        ``fn`` at each steps. Once can opt to exclude shared variables
+        used in ``fn`` from this list.


    :param n_steps:
-        Number of steps to iterate. If the input sequences are not long enough, scan 
-        will produce a warning and run only for the maximal amount of steps allowed by 
-        the input sequences. If the value is 0, the outputs will have 0 rows. If the 
-        value is negative, scan will run backwards (or if the flag go_backwards is 
-        already set to true it will run forward in time). If n_steps is not provided, 
-        or evaluetes to None, inf or nan, scan will figure out the maximal amount of 
-        steps it can run given the input sequences and do that.
+        ``n_steps`` is the number of steps to iterate given as an int
+        or Theano scalar. If any of the input sequences do not have
+        enough elements, scan will produce a warning and run only for
+        the maximal amount of steps it can. If the *value is 0* the
+        outputs will have *0 rows*. If the value is negative, ``scan``
+        run backwards in time. If the ``go_backwards`` flag is already
+        set and also ``n_steps`` is negative, ``scan`` will run forward
+        in time. If n stpes is not provided, or evaluates to ``None``,
+        ``inf`` or ``NaN``, ``scan`` will figure out the amount of
+        steps it should run given its input sequences.
+

    :param truncate_gradient:
-        Number of steps to use in truncated BPTT.  If you compute gradients
-        through a scan op, they are computed using backpropagation through time.
-        By providing a different value then -1, you choose to use truncated BPTT
-        instead of classical BPTT, where you only do ``truncate_gradient``
-        number of steps.
+        ``truncate_gradient`` is the number of steps to use in truncated
+        BPTT.  If you compute gradients through a scan op, they are
+        computed using backpropagation through time. By providing a
+        different value then -1, you choose to use truncated BPTT instead
+        of classical BPTT, where you go for only ``truncate_gradient``
+        number of steps back in time.
+

    :param go_backwards:
-        Flag indicating if you should go backwards through the sequences ( if you 
-        think as the sequences being indexed by time, this would mean go backwards 
-        in time)
+        ``go_backwards`` is a flag indicating if ``scan`` should go
+        backwards through the sequences. If you think of each sequence
+        as indexed by time, making this flag True would mean that
+        ``scan`` goes back in time, namely that for any sequence it
+        starts from the end and goes towards 0.
+

    :param name:
-        The name of the theano function compiled by the Scan op. It will show in the 
-        profiler output.
+        When profiling ``scan`` it is crucial to provide a name for any
+        instance of ``scan``. The profiler will produce an overall
+        profile of your code as well as profiles for doing one iteration
+        step for each instance of ``scan``. The ``name`` of the instance is
+        how you differentiate between all these profiles.
+

    :param mode:
-       The mode used when compiling the theano function in the Scan op.
-       If None, it will use the config mode. If None and the config mode is set to 
-       profile mode, it we will create a new instance of the ProfileMode in order 
-       to compute the timming correctly.
-       If no new instance is created the time spend in Scan will show up twice in the 
-       profiling, once as the time taken by scan, and the second time as the time 
-       taken by the ops inside scan. This will be even worse for multiple cascading 
-       scans.
-       The new profiler instance will be printed when python exits.
+        It is recommended to leave this argument to None, especially
+        when profiling ``scan`` (otherwise the results are not going to
+        be accurate). If you prefer the computations of one step os
+        ``scan`` to be done differently then the entire function set
+        this parameters (see ``theano.function`` for details about
+        possible values and their meaning).
+

    :rtype: tuple
    :return: tuple of the form (outputs, updates); ``outputs`` is either a
             Theano variable or a list of Theano variables representing the
-             outputs of scan. ``updates`` is a dictionary specifying the
+             outputs of ``scan`` (in the same order as in
+             ``outputs_info``. ``updates`` is a dictionary specifying the
             updates rules for all shared variables used in the scan
-             operation; this dictionary should be pass to ``theano.function``
+             operation. This dictionary should be pass to ``theano.function``
+             when you compile your function.
    """
    # General observation : this code is executed only once, at creation
    # of the computational graph, so we don't yet need to be smart about