提交 edbf47e0 authored 作者: Mehdi Mirza's avatar Mehdi Mirza

Merge pull request #2816 from Thrandis/ccw

Added doc for strict flag.
...@@ -210,6 +210,138 @@ with all values set to zero except at the provided array indices. ...@@ -210,6 +210,138 @@ with all values set to zero except at the provided array indices.
This demonstrates that you can introduce new Theano variables into a scan function. This demonstrates that you can introduce new Theano variables into a scan function.
Using shared variables - Gibbs sampling
---------------------------------------
Another useful feature of scan, is that it can handle shared variables.
For example, if we want to implement a Gibbs chain of length 10 we would do
the following:
.. code-block:: python
W = theano.shared(W_values) # we assume that ``W_values`` contains the
# initial values of your weight matrix
bvis = theano.shared(bvis_values)
bhid = theano.shared(bhid_values)
trng = T.shared_randomstreams.RandomStreams(1234)
def OneStep(vsample) :
hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)
sample = theano.tensor.vector()
values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)
gibbs10 = theano.function([sample], values[-1], updates=updates)
The first, and probably most crucial observation is that the updates
dictionary becomes important in this case. It links a shared variable
with its updated value after k steps. In this case it tells how the
random streams get updated after 10 iterations. If you do not pass this
update dictionary to your function, you will always get the same 10
sets of random numbers. You can even use the ``updates`` dictionary
afterwards. Look at this example :
.. code-block:: python
a = theano.shared(1)
values, updates = theano.scan(lambda: {a: a+1}, n_steps=10)
In this case the lambda expression does not require any input parameters
and returns an update dictionary which tells how ``a`` should be updated
after each step of scan. If we write :
.. code-block:: python
b = a + 1
c = updates[a] + 1
f = theano.function([], [b, c], updates=updates)
print b
print c
print a.value
We will see that because ``b`` does not use the updated version of
``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
If we call the function again, ``b`` will become 12, ``c`` will be 22
and ``a.value`` 21. If we do not pass the ``updates`` dictionary to the
function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
``c`` will always be ``12``.
The second observation is that if we use shared variables ( ``W``, ``bvis``,
``bhid``) but we do not iterate over them (ie scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as arguments.
Scan will find them on its own and add them to the graph.
However, passing them to the scan function is a good practice, as it avoids
Scan Op calling any earlier (external) Op over and over. This results in a
simpler computational graph, which speeds up the optimization and the
execution. To pass the shared variables to Scan you need to put them in a list
and give it to the ``non_sequences`` argument. Here is the Gibbs sampling code
updated:
.. code-block:: python
W = theano.shared(W_values) # we assume that ``W_values`` contains the
# initial values of your weight matrix
bvis = theano.shared(bvis_values)
bhid = theano.shared(bhid_values)
trng = T.shared_randomstreams.RandomStreams(1234)
# OneStep, with explicit use of the shared variables (W, bvis, bhid)
def OneStep(vsample, W, bvis, bhid):
hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)
sample = theano.tensor.vector()
# The new scan, with the shared variables passed as non_sequences
values, updates = theano.scan(fn=OneStep,
outputs_info=sample,
non_sequences=[W, bvis, bhid],
n_steps=10)
gibbs10 = theano.function([sample], values[-1], updates=updates)
Using shared variables - the strict flag
----------------------------------------
As we just saw, passing the shared variables to scan may result in a simpler
computational graph, which speeds up the optimization and the execution. A
good way to remember to pass every shared variable used during scan is to use
the ``strict`` flag. When set to true, scan assumes that all the necessary
shared variables in ``fn`` are passed as a part of ``non_sequences``. This has
to be ensured by the user. Otherwise, it will result in an error.
Using the previous Gibbs sampling example:
.. code-block:: python
# The new scan, using strict=True
values, updates = theano.scan(fn=OneStep,
outputs_info=sample,
non_sequences=[W, bvis, bhid],
n_steps=10,
strict=True)
If you omit to pass ``W``, ``bvis`` or ``bhid`` as a ``non_sequence``, it will
result in an error.
Multiple outputs, several taps values - Recurrent Neural Network with Scan Multiple outputs, several taps values - Recurrent Neural Network with Scan
-------------------------------------------------------------------------- --------------------------------------------------------------------------
...@@ -238,7 +370,7 @@ construct a function that computes one iteration step : ...@@ -238,7 +370,7 @@ construct a function that computes one iteration step :
def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out): def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out):
x_t = T.tanh( theano.dot(x_tm1, W) + \ x_t = T.tanh(theano.dot(x_tm1, W) + \
theano.dot(u_t, W_in_1) + \ theano.dot(u_t, W_in_1) + \
theano.dot(u_tm4, W_in_2) + \ theano.dot(u_tm4, W_in_2) + \
theano.dot(y_tm1, W_feedback)) theano.dot(y_tm1, W_feedback))
...@@ -266,10 +398,11 @@ the Theano variables needed we construct our RNN as follows : ...@@ -266,10 +398,11 @@ the Theano variables needed we construct our RNN as follows :
# y[-1] # y[-1]
([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \ ([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
sequences = dict(input = u, taps= [-4,-0]), \ sequences=dict(input=u, taps=[-4,-0]),
outputs_info = [dict(initial = x0, taps = [-3,-1]),y0], \ outputs_info=[dict(initial=x0, taps=[-3,-1]), y0],
non_sequences = [W,W_in_1,W_in_2,W_feedback, W_out]) non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
strict=True)
# for second input y, scan adds -1 in output_taps by default # for second input y, scan adds -1 in output_taps by default
...@@ -286,81 +419,6 @@ By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and ...@@ -286,81 +419,6 @@ By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end. will start scaning from ``uvals[4]`` towards the end.
Using shared variables - Gibbs sampling
---------------------------------------
Another useful feature of scan, is that it can handle shared variables.
For example, if we want to implement a Gibbs chain of length 10 we would do
the following:
.. code-block:: python
W = theano.shared(W_values) # we assume that ``W_values`` contains the
# initial values of your weight matrix
bvis = theano.shared(bvis_values)
bhid = theano.shared(bhid_values)
trng = T.shared_randomstreams.RandomStreams(1234)
def OneStep(vsample) :
hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)
sample = theano.tensor.vector()
values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)
gibbs10 = theano.function([sample], values[-1], updates=updates)
Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but
we do not iterate over them (so scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as
arguments. Scan will find them on its own and add them to the graph. Of
course, if you wish to (and it is good practice) you can add them, when
you call scan (they would be in the list of non-sequence inputs).
The second, and probably most crucial observation is that the updates
dictionary becomes important in this case. It links a shared variable
with its updated value after k steps. In this case it tells how the
random streams get updated after 10 iterations. If you do not pass this
update dictionary to your function, you will always get the same 10
sets of random numbers. You can even use the ``updates`` dictionary
afterwards. Look at this example :
.. code-block:: python
a = theano.shared(1)
values, updates = theano.scan(lambda: {a: a+1}, n_steps=10)
In this case the lambda expression does not require any input parameters
and returns an update dictionary which tells how ``a`` should be updated
after each step of scan. If we write :
.. code-block:: python
b = a + 1
c = updates[a] + 1
f = theano.function([], [b, c], updates=updates)
print b
print c
print a.value
We will see that because ``b`` does not use the updated version of
``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
If we call the function again, ``b`` will become 12, ``c`` will be 22
and ``a.value`` 21.
If we do not pass the ``updates`` dictionary to the function, then
``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
will always be ``12``.
Conditional ending of Scan Conditional ending of Scan
-------------------------- --------------------------
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论