提交 96676ed5 authored 作者: Razvan Pascanu's avatar Razvan Pascanu

[scan][doc][coding-style] re-arranged the documentation of scan parameters

上级 b15fadcc
......@@ -268,161 +268,247 @@ def foldr( fn
# Yes, actually it will be exactly 2 ( if there are no other constraints)
def scan(fn, sequences=[], outputs_info=[], non_sequences=[],
n_steps = None, truncate_gradient = -1, go_backwards = False,
mode = None, name = None):
"""Function that constructs and applies a Scan op
def scan( fn
, sequences = None
, outputs_info = None
, non_sequences = None
, n_steps = None
, truncate_gradient = -1
, go_backwards = False
, mode = None
, name = None ):
"""
This function constructs and applies a Scan op to the provided
arguments.
:param fn:
Function that describes the operations involved in one step of scan
Given variables representing all the slices of input and past values of
outputs and other non sequences parameters, ``fn`` should produce
variables describing the output of one time step of scan. The order in
which the argument to this function are given is very important. You
should have the following order:
* all time slices of the first sequence (as given in the
``sequences`` list) ordered in the same fashion as the time taps provided
* all time slices of the second sequence (as given in the
``sequences`` list) ordered in the same fashion as the time taps provided
``fn`` is a function that describes the operations involved in one step
of ``scan``. ``fn`` should construct variables describing the output of
one iteration step. It should expect as input theano variables
representing all the time slices of the input sequences and outputs,
and all other arguments given to scan as ``non_sequences``. The order
in which scan passes this variables to ``fn`` is the following :
* all time slices of the first sequence
* all time slices of the second sequence
* ...
* all time slices of the first output (as given in the
``initial_state`` list) ordered in the same fashion as the time taps provided
* all time slices of the second otuput (as given in the
``initial_state`` list) ordered in the same fashion as the time taps provided
* all time slices of the last sequence
* all time slices of the first output
* all time slices of the second otuput
* ...
* all other parameters over which scan doesn't iterate ordered accordingly
If you are using shared variables over which you do not want to iterate,
you do not need to provide them as arguments to ``fn``, though you can if you
wish so. The function should return the outputs after each step plus the updates
for any of the shared variables. You can either return only outputs or only
updates. If you have both outputs and updates the function should return
them as a tuple : (outputs, updates) or (updates, outputs).
* all time slices of the last output
* all other arguments (the list given as `non_sequences` to
scan)
The order of the sequences is the same as the one in the list
`sequences` given to scan. The order of the outputs is the sane
as the order of ``output_info``. For any sequence or output the
order of the time slices is the same as the order of the time
taps provided. For example if one writes the following :
.. code-block:: python
scan(fn, sequences = [ dict( Sequence1, taps = [-3,2,-1])
, Sequence2
, dict( Sequence3, taps = 3) ]
, outputs_info = [ dict( Output1, taps = [-3,-5])
, dict( Output2, taps = None)
, Output3 ]
, non_sequences = [ Argument1, Argument 2])
``fn`` should expect the following arguments in this given order:
#. ``Sequence1[t-3]``
#. ``Sequence1[t+2]``
#. ``Sequence1[t-1]``
#. ``Sequence2[t]``
#. ``Sequence3[t+3]``
#. ``Output1[t-3]``
#. ``Output1[t-5]``
#. ``Output3[t-1]``
#. ``Argument1``
#. ``Argument2``
The list of ``non_sequences`` can also contain shared variables
used in the function, though ``scan`` is able to figure those
out on its own so they can be skipped. For the clarity of the
code we recommand though to provide them to scan.
The function is expected to return two things. One is a list of
outputs ordered in the same order as ``outputs_info``, with the
difference that there should be only one output variable per
output initial state (even if no tap value is used). Secondly
`fn` should return an update dictionary ( that tells how to
update any shared variable after each iteration ste). The
dictionary can optionally be given as a list of tuples. There is
no constraint on the order of these two list, ``fn`` can return
either ``(outputs_list, update_dictionary)`` or ``(update_dictionary,
outputs_list)`` or just one of the two (in case the other is
empty).
Outputs can be just a theano expression if you have only one output or
a list of theano expressions. Updates can be given either as a list of tuples or
as a dictionary. If you have a list of outputs, the order of these
should match that of their ``initial_states``.
:param sequences:
list of Theano variables or dictionaries containing Theano variables over which
scan needs to iterate. The reason you might want to wrap a certain Theano
variable in a dictionary is to provide auxiliary information about how to iterate
over that variable. For example this is how you specify that you want to use
several time slices of this sequence at each iteration step. The dictionary
should have the following keys :
* ``input`` -- Theano variable representing the sequence
* ``taps`` -- temporal taps to use for this sequence. They are given as a list
of ints, where a value ``k`` means that at iteration step ``t`` scan needs to
provide also the slice ``t+k`` The order in which you provide these int values
here is the same order in which the slices will be provided to ``fn``.
If you do not wrap a variable around a dictionary, scan will do it for you, under
the assumption that you use only one slice, defined as a tap of offset 0. This
means that at step ``t`` scan will provide the slice at position ``t``.
``sequences`` is the list of Theano variables or dictionaries
describing the sequences ``scan`` has to iterate over. If a
sequence is given as wrapped in a dictionary a set of optional
information can be provided about the sequence. The dictionary
should have the following keys:
* ``input`` (*mandatory*) -- Theano variable representing the
sequence.
* ``taps`` -- Temporal taps of the sequence required by ``fn``.
They are provided as a list of integers, where a value ``k`` impiles
that at iteration step ``t`` scan will pass to ``fn`` the slice
``t+k``. Default value is ``[0]``
Any Theano variable in the list ``sequences`` is automatically
wrapped into a dictionary where ``taps`` is set to ``[0]``
:param outputs_info:
list of Theano variables or dictionaries containing Theano variables used
to initialize the outputs of scan. As before (for ``sequences``) the reason
you would wrap a Theano variable in a dictionary is to provide additional
information about how scan should deal with that specific output. The dictionary
should contain the following keys:
* ``initial`` -- Theano variable containing the initial state of the output
* ``taps`` -- temporal taps to use for this output. The taps are given as a
list of ints (only negative .. since you can not use future values of outputs),
with the same meaning as for ``sequences`` (see above).
* ``inplace`` -- theano variable pointing to one of the input sequences; this
flag tells scan that the output should be computed in the memory space occupied
by that input sequence. Note that scan will only do this if allowed by the
rest of your computational graph and if you are not using past taps of the
input.
* ``return_steps`` how many steps to return from your output. If not given, or
0 scan will return all steps, otherwise it will return the last ``return_steps``.
Note that if you set this to something else then 0, scan will try to be smart
about the amount of memory it allocates for a given input.
If the function applied recursively uses only the
previous value of the output, the initial state should have
same shape as one time step of the output; otherwise, the initial state
should have the same number of dimension as output. This is easily
understood through an example. For computing ``y[t]`` let us assume that we
need ``y[t-1]``, ``y[t-2]`` and ``y[t-4]``. Through an abuse of
notation, when ``t = 0``, we would need values for ``y[-1]``, ``y[-2]``
and ``y[-4]``. These values are provided by the initial state of ``y``,
which should have same number of dimension as ``y``, where the first
dimension should be large enough to cover all the required past values, which in
this case is 4. If ``init_y`` is the variable containing the initial state
of ``y``, then ``init_y[0]`` corresponds to ``y[-4]``, ``init_y[1]``
corresponds to ``y[-3]``, ``init_y[2]`` corresponds to ``y[-2]``,
``init_y[3]`` corresponds to ``y[-1]``. The default behaviour of scan is
the following :
* if you do not wrap an output in a dictionary, scan will wrap it for you
assuming that you use only the last step of the output ( i.e. it makes your tap
value list equal to [-1]) and that it is not computed inplace
* if you wrap an output in a dictionary and you do not provide any taps but
you provide an initial state it will assume that you are using only a tap value
of -1
* if you wrap an output in a dictionary but you do not provide any initial state,
it assumes that you are not using any form of taps
* if you provide a ``None`` instead of a variable or a dictionary scan assumes
that you will not use any taps for this output (this would be the case for map)
If you did not provide any information for your outputs, scan will assume by
default that you are not using any taps for any of the outputs. If you provide
information for just a subset of outputs, scan will not know to which outputs
these correspond and will raise an error.
``outputs_info`` is the list of Theano variables or dictionaries
describing the initial state of the outputs computed
recurrently. When this initial states are given as dictionary
optional information can be provided about the output corresponding
to these initial states. The dictionary should have the following
keys:
* ``initial`` -- Theano variable that represents the initial
state of a given output. In case the output is not computed
recursively (think of a map) and does not require a initial
state this field can be skiped. Given that only the previous
time step of the output is used by ``fn`` the initial state
should have the same shape as the output. If multiple time
taps are used, the initial state should have one extra
dimension that should cover all the possible taps. For example
if we use ``-5``, ``-2`` and ``-1`` as past taps, at step 0,
``fn`` will require (by an abuse of notation) ``output[-5]``,
``output[-2]`` and ``output[-1]``. This will be given by
the initial state, which in this case should have the shape
(5,)+output.shape. If this variable containing the initial
state is called ``init_y`` then ``init_y[0]`` *corresponds to*
``output[-5]``. ``init_y[1]`` *correponds to* ``output[-4]``,
``init_y[2]`` corresponds to ``output[-3]``, ``init_y[3]``
coresponds to ``output[-2]``, ``init_y[4]`` corresponds to
``output[-1]``. While this order might seem strange, it comes
natural from splitting an array at a given point. Assume that
we have a array ``x``, and we choose ``k`` to be time step
``0``. Then our initial state would be ``x[:k]``, while the
output will be ``x[k:]``. Looking at this split, elements in
``x[:k]`` are ordered exactly like those in ``init_y``.
* ``taps`` -- Temporal taps of the output that will be pass to
``fn``. They are provided as a list of *negative* integers,
where a value ``k`` implies that at iteration step ``t`` scan will
pass to ``fn`` the slice ``t+k``.
* ``inplace`` -- One of the Theano variables provided as
``sequences``. ``scan`` will try to compute this output *in
place* of the provided input *iff* it respects the following
constraints:
* There is no other output that is denied to be computed in
place for whatever reason.
* ``fn`` is not using past taps of the input sequence that
will get overwritten by the output
* ``return_steps`` -- Integer representing the number of steps
to return for the current steps. For example, if ``k`` is
provided, ``scan`` will return ``output[-k:]``. This is meant as a
hint, based on ``k`` and the past taps of the outputs used, scan
can be smart about the amount of memory it requires to store
intermidiate results. If not given, or ``0``, ``scan`` will return
all computed steps.
* ``store_steps`` -- Integer representing the number of
intermidiate steps ``scan`` should use for a given output. Use
this key only if you really know what you are doing. In general
is recommendat to let scan decide for you the ammount of memory
it should use.
``scan`` will follow this logic if partial information is given:
* If an output is not wrapped in a dictionary, ``scan`` will wrap
it in one assuming that you use only the last step of the output
(i.e. it makes your tap value list equal to [-1]) and that it is
not computed inplace.
* If you wrap an output in a dictionary and you do not provide any
taps but you provide an initial state it will assume that you are
using only a tap value of -1.
* If you wrap an output in a dictionary but you do not provide any
initial state, it assumes that you are not using any form of
taps.
* If you provide a ``None`` instead of a variable or a dictionary
``scan`` assumes that you will not use any taps for this output
(like for example in case of a map)
If ``outputs_info`` is an empty list or None, ``scan`` assumes
that no tap is used for any of the otuputs. If information is
provided just for a subset of the outputs an exception is
raised (because there is no convention on how scan should map
the provided information to the outputs of ``fn``)
:param non_sequences:
Parameters over which scan should not iterate. These parameters are
given at each time step to the function applied recursively.
``non_sequences`` is the list of arguments that are passed to
``fn`` at each steps. Once can opt to exclude shared variables
used in ``fn`` from this list.
:param n_steps:
Number of steps to iterate. If the input sequences are not long enough, scan
will produce a warning and run only for the maximal amount of steps allowed by
the input sequences. If the value is 0, the outputs will have 0 rows. If the
value is negative, scan will run backwards (or if the flag go_backwards is
already set to true it will run forward in time). If n_steps is not provided,
or evaluetes to None, inf or nan, scan will figure out the maximal amount of
steps it can run given the input sequences and do that.
``n_steps`` is the number of steps to iterate given as an int
or Theano scalar. If any of the input sequences do not have
enough elements, scan will produce a warning and run only for
the maximal amount of steps it can. If the *value is 0* the
outputs will have *0 rows*. If the value is negative, ``scan``
run backwards in time. If the ``go_backwards`` flag is already
set and also ``n_steps`` is negative, ``scan`` will run forward
in time. If n stpes is not provided, or evaluates to ``None``,
``inf`` or ``NaN``, ``scan`` will figure out the amount of
steps it should run given its input sequences.
:param truncate_gradient:
Number of steps to use in truncated BPTT. If you compute gradients
through a scan op, they are computed using backpropagation through time.
By providing a different value then -1, you choose to use truncated BPTT
instead of classical BPTT, where you only do ``truncate_gradient``
number of steps.
``truncate_gradient`` is the number of steps to use in truncated
BPTT. If you compute gradients through a scan op, they are
computed using backpropagation through time. By providing a
different value then -1, you choose to use truncated BPTT instead
of classical BPTT, where you go for only ``truncate_gradient``
number of steps back in time.
:param go_backwards:
Flag indicating if you should go backwards through the sequences ( if you
think as the sequences being indexed by time, this would mean go backwards
in time)
``go_backwards`` is a flag indicating if ``scan`` should go
backwards through the sequences. If you think of each sequence
as indexed by time, making this flag True would mean that
``scan`` goes back in time, namely that for any sequence it
starts from the end and goes towards 0.
:param name:
The name of the theano function compiled by the Scan op. It will show in the
profiler output.
When profiling ``scan`` it is crucial to provide a name for any
instance of ``scan``. The profiler will produce an overall
profile of your code as well as profiles for doing one iteration
step for each instance of ``scan``. The ``name`` of the instance is
how you differentiate between all these profiles.
:param mode:
The mode used when compiling the theano function in the Scan op.
If None, it will use the config mode. If None and the config mode is set to
profile mode, it we will create a new instance of the ProfileMode in order
to compute the timming correctly.
If no new instance is created the time spend in Scan will show up twice in the
profiling, once as the time taken by scan, and the second time as the time
taken by the ops inside scan. This will be even worse for multiple cascading
scans.
The new profiler instance will be printed when python exits.
It is recommended to leave this argument to None, especially
when profiling ``scan`` (otherwise the results are not going to
be accurate). If you prefer the computations of one step os
``scan`` to be done differently then the entire function set
this parameters (see ``theano.function`` for details about
possible values and their meaning).
:rtype: tuple
:return: tuple of the form (outputs, updates); ``outputs`` is either a
Theano variable or a list of Theano variables representing the
outputs of scan. ``updates`` is a dictionary specifying the
outputs of ``scan`` (in the same order as in
``outputs_info``. ``updates`` is a dictionary specifying the
updates rules for all shared variables used in the scan
operation; this dictionary should be pass to ``theano.function``
operation. This dictionary should be pass to ``theano.function``
when you compile your function.
"""
# General observation : this code is executed only once, at creation
# of the computational graph, so we don't yet need to be smart about
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论