提交 cb7bbb7c authored 作者: Razvan Pascanu's avatar Razvan Pascanu

merge

......@@ -13,12 +13,11 @@ The scan functions provides the basic functionality needed to do loops
in Theano. Scan comes with many whistles and bells, that can be easily
introduced through a few examples :
Computing :math:`A^k`
---------------------
Basic functionality : Computing :math:`A^k`
--------------------------------------------
Assume that, given *k* you want to get ``A**k`` using a loop(note that
you do not need to do this in Theano, this example is purely
didactical). More precisely, if *A* is a tensor you want to compute
Assume that, given *k* you want to get ``A**k`` using a loop.
More precisely, if *A* is a tensor you want to compute
``A**k`` elemwise. The python/numpy code would loop like
.. code-block:: python
......@@ -32,17 +31,17 @@ The equivalent Theano code would be
.. code-block:: python
# Symbolic description of the result
result = theano.scan(fn = lambda x_tm1,A: x_tm1*A,\
result,updates = theano.scan(fn = lambda x_tm1,A: x_tm1*A,\
sequences = [], \
initial_states = T.ones_like(A),\
non_sequences = A, \
n_steps = k)
# compiled function that returns A**k
f = theano.function([A,k], result)
f = theano.function([A,k], result[-1], updates = updates)
Let us go through the example line by line. What we did is first to
construct a function (using lambda expression) that given `x_tm1` and
construct a function (using a lambda expression) that given `x_tm1` and
`A` returns `x_tm1*A`. Given the order of the parameters, `x_tm1`
is the value of our output at time step ``t-1``. Therefore
``x_t`` (value of output at time `t`) is `A` times value of output
......@@ -52,9 +51,14 @@ iterate over anything) and initialize the output as a tensor with same
shape as A filled with ones. We give A as a non sequence parameter and
tell scan to iterate for k steps.
Scan will return a tuple, containing our result (``result``) and a
dictionary of updates ( empty for this example). Note that the result
is not a matrix, but a 3D tensor containing the value of ``A**k`` for
each step. We want the last value ( after k steps ) so we compile
a function to return just that.
Recurrent Neural Network with Scan
----------------------------------
Multiple outputs, several taps values - Recurrent Neural Network with Scan
--------------------------------------------------------------------------
A more practical task would be to implement a RNN using scan. Assume
that our RNN is defined as follows :
......@@ -65,6 +69,10 @@ that our RNN is defined as follows :
y(n) = W^{out} x(n- 3)
Note that this network is far away from a classical recurrent neural
network and might be in practice useless. The reason we defined as such
is to better ilustrate the features of scan.
In this case we have a sequence over which we need to iterate ``u``,
and two outputs ``x`` and ``y``. To implement this with scan we first
construct a function that computes one iteration step :
......@@ -82,9 +90,11 @@ construct a function that computes one iteration step :
return [x_t, y_t]
Note the order in which the parameters are given, and in which the
result is returned. It is crucial to respect cronological order among
the taps ( time slices of sequences or outputs) used, and to have same
order in this function as when applying scan. Given that we have all
result is returned. Try to respect cronological order among
the taps ( time slices of sequences or outputs) used. In practice what
is crucial to happen for the computation to work is to give the slices
in the same order as provided in the ``sequence_taps``/``outputs_taps`` dictionaries and to have same
order of inputs here as when applying scan. Given that we have all
the Theano variables needed we construct our RNN as follows :
.. code-block:: python
......@@ -96,7 +106,7 @@ the Theano variables needed we construct our RNN as follows :
# y[-1]
x_vals, y_vals = theano.scan(fn = oneStep, \
([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \
sequences = [u], \
initial_states = [x0,y0], \
non_sequences = [W,W_in_1,W_in_2,W_feedback, W_out], \
......@@ -107,7 +117,91 @@ the Theano variables needed we construct our RNN as follows :
Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
sequence of x and y values generated by iterating over u.
sequence of x and y values generated by iterating over u. The
``sequence_taps``, ``outputs_taps`` give to scan information about what
slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
the compiled function, the numpy array given to represent this sequence
should be large enough to cover this values. Assume that we compile the
above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end.
Using shared variables - Gibbs sampling
---------------------------------------
Another useful feature of scan, is that it can handle shared variables.
For example, if we want to implement a Gibbs chain of length 10 we would do
the following:
.. code-block:: python
W = theano.shared ( W_values ) # we assume that ``W_values`` contains the
# initial values of your weight matrix
bvis = theano.shared( bvis_values)
bhid = theano.shared( bhid_values)
trng = T.shared_randomstreams.RandomStreams(1234)
def OneStep( vsample) :
hmean = T.nnet.sigmoid( theano.dot( vsample, W) + bhid)
hsample = trng.binomial( size = hmean.shape, n = 1, prob = hmean)
vmean = T.nnet.sigmoid( theano.dot( hsample. W.T) + bvis)
return trng.binomial( size = vsample.shape, n = 1, prob = vsample)
sample = theano.tensor.vector()
values, updates = theano.scan( OneStep, [],sample, [], n_steps = 10 )
gibbs10 = theano.function([sample], values[-1], updates = updates)
Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but
we do not iterate over them ( so scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as
arguments. Scan will find them on its on and add them to the graph. Of
course, if you wish to (and it is good practice) you can add them, when
you call scan (they would be in the list of non sequence inputs).
The second, and probably most crucial observation is that the updates
dictionary becomes important in this case. It links a shared variable
with its updated value after k steps. In this case it tells how the
random streams get updated after 10 iterations. If you do not pass this
update dictionary to your function, you will always get the same 10
sets of random numbers. You can even use the ``updates`` dictionary
afterwards. Look at this example :
.. code-block:: python
a = theano.shared(1)
values,updates = theano.scan( lambda : {a:a+1}, [],[],[], n_steps = 10 )
In this case the lambda expression does not require any input parameters
and returns an update dictionary which tells how ``a`` should be updated
after each step of scan. If we write :
.. code-block:: python
b = a+1
c = updates[a] + 1
f = theano.function([], [b,c], updates = updates)
print b
print c
print a.value
We will see that because ``b`` does not use the updated version of
``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
If we call the function again, ``b`` will become 12, ``c`` will be 22
and ``a.value`` 21.
If we do not pass the ``updates`` dictionary to the function, then
``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
will always be ``12``.
Reference
......
......@@ -82,16 +82,23 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
* all time slices of the second otuput (as given in the ``initial_state`` list) ordered cronologically
* ...
* all other parameters over which scan doesn't iterate given in the same order as in ``non_sequences``
The outputs of these function should have the same order as in the list ``initial_states``
If you are using shared variables over which you do not want to iterate, you do not need to provide them as
arguments to ``fn``, though you can if you wish so. The function should return the outputs after each step plus
the updates for any of the shared variables. You can either return only outputs or only updates. If you have
both outputs and updates the function should return them as a tuple : (outputs, updates) or (updates, outputs).
Outputs can be just a theano expression if you have only one outputs or a list of theano expressions. Updates
can be given either as a list of as a dictionary. If you have a list of outputs, the order of these should
match that of their ``initial_states``.
:param sequences: list of Theano variables over which scan needs to iterate
:param sequences: list of Theano variables over which scan needs to iterate.
:param initial_states: list of Theano variables containing the initial state used for the output.
Note that if the function applied recursively uses only the previous value of the output or none, this initial state
should have same shape as one time step of the output; otherwise, the
initial state should have the same number of dimension as output. This
can easily be understand through an example. For computing ``y[t]`` let
assume that we need ``y[t-1]``, ``y[t-2]`` and ``y(t-4)``. Through an abuse of notation, when ``t = 0``, we would need values for ``y[-1]``, ``y[-2]`` and
assume that we need ``y[t-1]``, ``y[t-2]`` and ``y(t-4)``. Through an abuse of notation,
when ``t = 0``, we would need values for ``y[-1]``, ``y[-2]`` and
``y[-4]``. These values are provided by the initial state of ``y``, which
should have same number of dimension as ``y``, where the first dimension should
be large enough to cover all past values, which in this case is 4.
......@@ -146,6 +153,14 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
classical BPTT, where you only do ``truncate_gradient`` number of steps.
:param go_backwards: Flag indicating if you should go bacwards through the sequences
:rtype: tuple
:return: tuple of the form (outputs, updates)
``outputs`` is either a Theano variable or a list of Theano variables
representing the outputs of scan. ``updates``
is a dictionary specifying the updates rules for all shared
variables used in the scan operation; this dictionary should be pass
to ``theano.function``
'''
# check if inputs are just single variables instead of lists
......@@ -220,7 +235,7 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
args += non_seqs
outputs_updates = fn(*args)
otuputs = []
outputs = []
updates = {}
# we try now to separate the outputs from the updates
if not type(outputs_updates) in (list,tuple):
......@@ -309,6 +324,8 @@ def scan(fn, sequences, initial_states, non_sequences, inplace_map={}, \
+ noshared
+ shared_non_seqs))
if not type(values) in (tuple, list):
values = [values]
for k in update_map.keys():
update_map[k] = values [ update_map[k] ]
......
......@@ -296,7 +296,7 @@ class T_Scan(unittest.TestCase):
vW1 = vW1 + .1
vW2 = vW2 + .05
def test_8(self):
def test_9(self):
W_vals = numpy.random.rand(20,30) -.5
vis_val = numpy.random.binomial(1,0.5, size=(3,20))
......@@ -344,6 +344,20 @@ class T_Scan(unittest.TestCase):
assert (compareArrays(t_res, n_res))
def test_10(self):
s = theano.shared(1)
def f_pow2():
return {s: 2*s}
n_steps = theano.tensor.dscalar()
Y, updts = theano.scan(f_pow2, [],[], [],n_steps = n_steps)
f1 = theano.function([n_steps], Y, updates = updts)
f1(3)
assert(compareArrays(s.value, 8))
'''
# test gradient simple network
def test_10(self):
......@@ -356,6 +370,7 @@ class T_Scan(unittest.TestCase):
- test gradient (multiple outputs / some uncomputable )
- test gradient (truncate_gradient)
- test_gradient (taps past/future)
- optimization !?
'''
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论