提交 3e0c5bb7 authored 作者: Cesar Laurent's avatar Cesar Laurent

Made the doc more clear.

上级 5ce67be5
...@@ -210,82 +210,6 @@ with all values set to zero except at the provided array indices. ...@@ -210,82 +210,6 @@ with all values set to zero except at the provided array indices.
This demonstrates that you can introduce new Theano variables into a scan function. This demonstrates that you can introduce new Theano variables into a scan function.
Multiple outputs, several taps values - Recurrent Neural Network with Scan
--------------------------------------------------------------------------
The examples above showed simple uses of scan. However, scan also supports
referring not only to the prior result and the current sequence value, but
also looking back more than one step.
This is needed, for example, to implement a RNN using scan. Assume
that our RNN is defined as follows :
.. math::
x(n) = \tanh( W x(n-1) + W^{in}_1 u(n) + W^{in}_2 u(n-4) +
W^{feedback} y(n-1) )
y(n) = W^{out} x(n- 3)
Note that this network is far from a classical recurrent neural
network and might be useless. The reason we defined as such
is to better illustrate the features of scan.
In this case we have a sequence over which we need to iterate ``u``,
and two outputs ``x`` and ``y``. To implement this with scan we first
construct a function that computes one iteration step :
.. code-block:: python
def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out):
x_t = T.tanh( theano.dot(x_tm1, W) + \
theano.dot(u_t, W_in_1) + \
theano.dot(u_tm4, W_in_2) + \
theano.dot(y_tm1, W_feedback))
y_t = theano.dot(x_tm3, W_out)
return [x_t, y_t]
As naming convention for the variables we used ``a_tmb`` to mean ``a`` at
``t-b`` and ``a_tpb`` to be ``a`` at ``t+b``.
Note the order in which the parameters are given, and in which the
result is returned. Try to respect chronological order among
the taps ( time slices of sequences or outputs) used. For scan is crucial only
for the variables representing the different time taps to be in the same order
as the one in which these taps are given. Also, not only taps should respect
an order, but also variables, since this is how scan figures out what should
be represented by what. Given that we have all
the Theano variables needed we construct our RNN as follows :
.. code-block:: python
u = T.matrix() # it is a sequence of vectors
x0 = T.matrix() # initial state of x has to be a matrix, since
# it has to cover x[-3]
y0 = T.vector() # y0 is just a vector since scan has only to provide
# y[-1]
([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \
sequences = dict(input = u, taps= [-4,-0]), \
outputs_info = [dict(initial = x0, taps = [-3,-1]),y0], \
non_sequences = [W,W_in_1,W_in_2,W_feedback, W_out])
# for second input y, scan adds -1 in output_taps by default
Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
sequence of x and y values generated by iterating over u. The
``sequence_taps``, ``outputs_taps`` give to scan information about what
slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
the compiled function, the numpy array given to represent this sequence
should be large enough to cover this values. Assume that we compile the
above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end.
Using shared variables - Gibbs sampling Using shared variables - Gibbs sampling
--------------------------------------- ---------------------------------------
...@@ -317,15 +241,7 @@ the following: ...@@ -317,15 +241,7 @@ the following:
gibbs10 = theano.function([sample], values[-1], updates=updates) gibbs10 = theano.function([sample], values[-1], updates=updates)
Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but The first, and probably most crucial observation is that the updates
we do not iterate over them (so scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as
arguments. Scan will find them on its own and add them to the graph. Of
course, if you wish to (and it is good practice) you can add them, when
you call scan (they would be in the list of non-sequence inputs).
The second, and probably most crucial observation is that the updates
dictionary becomes important in this case. It links a shared variable dictionary becomes important in this case. It links a shared variable
with its updated value after k steps. In this case it tells how the with its updated value after k steps. In this case it tells how the
random streams get updated after 10 iterations. If you do not pass this random streams get updated after 10 iterations. If you do not pass this
...@@ -355,51 +271,152 @@ after each step of scan. If we write : ...@@ -355,51 +271,152 @@ after each step of scan. If we write :
We will see that because ``b`` does not use the updated version of We will see that because ``b`` does not use the updated version of
``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``. ``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
If we call the function again, ``b`` will become 12, ``c`` will be 22 If we call the function again, ``b`` will become 12, ``c`` will be 22
and ``a.value`` 21. and ``a.value`` 21. If we do not pass the ``updates`` dictionary to the
function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
``c`` will always be ``12``.
The second observation is that if we use shared variables ( ``W``, ``bvis``,
``bhid``) but we do not iterate over them (ie scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as arguments.
Scan will find them on its own and add them to the graph.
However, passing them to the scan function is a good practice, as it avoids
Scan Op calling any earlier (external) Op over and over. This results in a
simpler computational graph, which speeds up the optimization and the
execution. To pass the shared variables to Scan you need to put them in a list
and give it to the ``non_sequences`` argument. Here is the Gibbs sampling code
updated:
.. code-block:: python
W = theano.shared(W_values) # we assume that ``W_values`` contains the
# initial values of your weight matrix
bvis = theano.shared(bvis_values)
bhid = theano.shared(bhid_values)
trng = T.shared_randomstreams.RandomStreams(1234)
# OneStep, with explicit use of the shared variables (W, bvis, bhid)
def OneStep(vsample, W, bvis, bhid):
hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)
sample = theano.tensor.vector()
# The new scan, with the shared variables passed as non_sequences
values, updates = theano.scan(fn=OneStep,
outputs_info=sample,
non_sequences=[W, bvis, bhid],
n_steps=10)
gibbs10 = theano.function([sample], values[-1], updates=updates)
If we do not pass the ``updates`` dictionary to the function, then
``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
will always be ``12``.
Using shared variables - the strict flag Using shared variables - the strict flag
---------------------------------------- ----------------------------------------
You also have the possibility to use the ``strict`` flag. When set to true, As we just saw, passing the shared variables to scan may result in a simpler
Scan assumes that all the necessary shared variables in ``fn`` are passed as a computational graph, which speeds up the optimization and the execution. A
part of ``non_sequences``. This has to be ensured by the user. Otherwise, it good way to remember to pass every shared variable used during scan is to use
will result in an error. It avoids Scan Op calling any earlier (external) Op the ``strict`` flag. When set to true, scan assumes that all the necessary
over and over. This results in a simpler computational graph, which speeds up shared variables in ``fn`` are passed as a part of ``non_sequences``. This has
the optimization and the execution. to be ensured by the user. Otherwise, it will result in an error.
Using the previous Gibbs sampling example:
.. code-block:: python
# The new scan, using strict=True
values, updates = theano.scan(fn=OneStep,
outputs_info=sample,
non_sequences=[W, bvis, bhid],
n_steps=10,
strict=True)
If you omit to pass ``W``, ``bvis`` or ``bhid`` as a ``non_sequence``, it will
result in an error.
Here is a simple RRN example:
Multiple outputs, several taps values - Recurrent Neural Network with Scan
--------------------------------------------------------------------------
The examples above showed simple uses of scan. However, scan also supports
referring not only to the prior result and the current sequence value, but
also looking back more than one step.
This is needed, for example, to implement a RNN using scan. Assume
that our RNN is defined as follows :
.. math:: .. math::
x(n) = \tanh(u(n) + W x(n-1)) x(n) = \tanh( W x(n-1) + W^{in}_1 u(n) + W^{in}_2 u(n-4) +
W^{feedback} y(n-1) )
y(n) = W^{out} x(n- 3)
Note that this network is far from a classical recurrent neural
network and might be useless. The reason we defined as such
is to better illustrate the features of scan.
And the code using ``strict=True``: In this case we have a sequence over which we need to iterate ``u``,
and two outputs ``x`` and ``y``. To implement this with scan we first
construct a function that computes one iteration step :
.. code-block:: python .. code-block:: python
u = T.matrix() # The input sequence def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out):
x0 = T.vector() # The initial state
W = theano.shared(W_values) # we assume that ``W_values`` contains the x_t = T.tanh(theano.dot(x_tm1, W) + \
# initial values of your weight matrix. theano.dot(u_t, W_in_1) + \
theano.dot(u_tm4, W_in_2) + \
theano.dot(y_tm1, W_feedback))
y_t = theano.dot(x_tm3, W_out)
return [x_t, y_t]
As naming convention for the variables we used ``a_tmb`` to mean ``a`` at
``t-b`` and ``a_tpb`` to be ``a`` at ``t+b``.
Note the order in which the parameters are given, and in which the
result is returned. Try to respect chronological order among
the taps ( time slices of sequences or outputs) used. For scan is crucial only
for the variables representing the different time taps to be in the same order
as the one in which these taps are given. Also, not only taps should respect
an order, but also variables, since this is how scan figures out what should
be represented by what. Given that we have all
the Theano variables needed we construct our RNN as follows :
.. code-block:: python
u = T.matrix() # it is a sequence of vectors
x0 = T.matrix() # initial state of x has to be a matrix, since
# it has to cover x[-3]
y0 = T.vector() # y0 is just a vector since scan has only to provide
# y[-1]
def oneStep(u_t, x_tm1, W):
return T.tanh(u_t + T.dot(W, x_tm1))
# Using strict=True, and passing W as a non_sequence ([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
x_vals, updates = theano.scan(fn=oneStep, sequences=dict(input=u, taps=[-4,-0]),
sequences=dict(input=u, taps=[0]), outputs_info=[dict(initial=x0, taps=[-3,-1]), y0],
outputs_info=[dict(initial=x0, non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
taps=[-1])],
non_sequences=[W], # Don't forget to pass W!
strict=True) strict=True)
# for second input y, scan adds -1 in output_taps by default
If you omit to pass ``W`` as a ``non_sequence``, it will result in an error. Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
sequence of x and y values generated by iterating over u. The
``sequence_taps``, ``outputs_taps`` give to scan information about what
slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
the compiled function, the numpy array given to represent this sequence
should be large enough to cover this values. Assume that we compile the
above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end.
Conditional ending of Scan Conditional ending of Scan
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论