Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
3e0c5bb7
提交
3e0c5bb7
authored
4月 24, 2015
作者:
Cesar Laurent
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Made the doc more clear.
上级
5ce67be5
显示空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
128 行增加
和
111 行删除
+128
-111
scan.txt
doc/library/scan.txt
+128
-111
没有找到文件。
doc/library/scan.txt
浏览文件 @
3e0c5bb7
...
@@ -210,82 +210,6 @@ with all values set to zero except at the provided array indices.
...
@@ -210,82 +210,6 @@ with all values set to zero except at the provided array indices.
This demonstrates that you can introduce new Theano variables into a scan function.
This demonstrates that you can introduce new Theano variables into a scan function.
Multiple outputs, several taps values - Recurrent Neural Network with Scan
--------------------------------------------------------------------------
The examples above showed simple uses of scan. However, scan also supports
referring not only to the prior result and the current sequence value, but
also looking back more than one step.
This is needed, for example, to implement a RNN using scan. Assume
that our RNN is defined as follows :
.. math::
x(n) = \tanh( W x(n-1) + W^{in}_1 u(n) + W^{in}_2 u(n-4) +
W^{feedback} y(n-1) )
y(n) = W^{out} x(n- 3)
Note that this network is far from a classical recurrent neural
network and might be useless. The reason we defined as such
is to better illustrate the features of scan.
In this case we have a sequence over which we need to iterate ``u``,
and two outputs ``x`` and ``y``. To implement this with scan we first
construct a function that computes one iteration step :
.. code-block:: python
def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out):
x_t = T.tanh( theano.dot(x_tm1, W) + \
theano.dot(u_t, W_in_1) + \
theano.dot(u_tm4, W_in_2) + \
theano.dot(y_tm1, W_feedback))
y_t = theano.dot(x_tm3, W_out)
return [x_t, y_t]
As naming convention for the variables we used ``a_tmb`` to mean ``a`` at
``t-b`` and ``a_tpb`` to be ``a`` at ``t+b``.
Note the order in which the parameters are given, and in which the
result is returned. Try to respect chronological order among
the taps ( time slices of sequences or outputs) used. For scan is crucial only
for the variables representing the different time taps to be in the same order
as the one in which these taps are given. Also, not only taps should respect
an order, but also variables, since this is how scan figures out what should
be represented by what. Given that we have all
the Theano variables needed we construct our RNN as follows :
.. code-block:: python
u = T.matrix() # it is a sequence of vectors
x0 = T.matrix() # initial state of x has to be a matrix, since
# it has to cover x[-3]
y0 = T.vector() # y0 is just a vector since scan has only to provide
# y[-1]
([x_vals, y_vals],updates) = theano.scan(fn = oneStep, \
sequences = dict(input = u, taps= [-4,-0]), \
outputs_info = [dict(initial = x0, taps = [-3,-1]),y0], \
non_sequences = [W,W_in_1,W_in_2,W_feedback, W_out])
# for second input y, scan adds -1 in output_taps by default
Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
sequence of x and y values generated by iterating over u. The
``sequence_taps``, ``outputs_taps`` give to scan information about what
slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
the compiled function, the numpy array given to represent this sequence
should be large enough to cover this values. Assume that we compile the
above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end.
Using shared variables - Gibbs sampling
Using shared variables - Gibbs sampling
---------------------------------------
---------------------------------------
...
@@ -317,15 +241,7 @@ the following:
...
@@ -317,15 +241,7 @@ the following:
gibbs10 = theano.function([sample], values[-1], updates=updates)
gibbs10 = theano.function([sample], values[-1], updates=updates)
Note that if we use shared variables ( ``W``, ``bvis``, ``bhid``) but
The first, and probably most crucial observation is that the updates
we do not iterate over them (so scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as
arguments. Scan will find them on its own and add them to the graph. Of
course, if you wish to (and it is good practice) you can add them, when
you call scan (they would be in the list of non-sequence inputs).
The second, and probably most crucial observation is that the updates
dictionary becomes important in this case. It links a shared variable
dictionary becomes important in this case. It links a shared variable
with its updated value after k steps. In this case it tells how the
with its updated value after k steps. In this case it tells how the
random streams get updated after 10 iterations. If you do not pass this
random streams get updated after 10 iterations. If you do not pass this
...
@@ -355,51 +271,152 @@ after each step of scan. If we write :
...
@@ -355,51 +271,152 @@ after each step of scan. If we write :
We will see that because ``b`` does not use the updated version of
We will see that because ``b`` does not use the updated version of
``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
``a``, it will be 2, ``c`` will be 12, while ``a.value`` is ``11``.
If we call the function again, ``b`` will become 12, ``c`` will be 22
If we call the function again, ``b`` will become 12, ``c`` will be 22
and ``a.value`` 21.
and ``a.value`` 21. If we do not pass the ``updates`` dictionary to the
function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
``c`` will always be ``12``.
The second observation is that if we use shared variables ( ``W``, ``bvis``,
``bhid``) but we do not iterate over them (ie scan doesn't really need to know
anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as arguments.
Scan will find them on its own and add them to the graph.
However, passing them to the scan function is a good practice, as it avoids
Scan Op calling any earlier (external) Op over and over. This results in a
simpler computational graph, which speeds up the optimization and the
execution. To pass the shared variables to Scan you need to put them in a list
and give it to the ``non_sequences`` argument. Here is the Gibbs sampling code
updated:
.. code-block:: python
W = theano.shared(W_values) # we assume that ``W_values`` contains the
# initial values of your weight matrix
bvis = theano.shared(bvis_values)
bhid = theano.shared(bhid_values)
trng = T.shared_randomstreams.RandomStreams(1234)
# OneStep, with explicit use of the shared variables (W, bvis, bhid)
def OneStep(vsample, W, bvis, bhid):
hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
return trng.binomial(size=vsample.shape, n=1, p=vmean,
dtype=theano.config.floatX)
sample = theano.tensor.vector()
# The new scan, with the shared variables passed as non_sequences
values, updates = theano.scan(fn=OneStep,
outputs_info=sample,
non_sequences=[W, bvis, bhid],
n_steps=10)
gibbs10 = theano.function([sample], values[-1], updates=updates)
If we do not pass the ``updates`` dictionary to the function, then
``a.value`` will always remain 1, ``b`` will always be 2 and ``c``
will always be ``12``.
Using shared variables - the strict flag
Using shared variables - the strict flag
----------------------------------------
----------------------------------------
You also have the possibility to use the ``strict`` flag. When set to true,
As we just saw, passing the shared variables to scan may result in a simpler
Scan assumes that all the necessary shared variables in ``fn`` are passed as a
computational graph, which speeds up the optimization and the execution. A
part of ``non_sequences``. This has to be ensured by the user. Otherwise, it
good way to remember to pass every shared variable used during scan is to use
will result in an error. It avoids Scan Op calling any earlier (external) Op
the ``strict`` flag. When set to true, scan assumes that all the necessary
over and over. This results in a simpler computational graph, which speeds up
shared variables in ``fn`` are passed as a part of ``non_sequences``. This has
the optimization and the execution.
to be ensured by the user. Otherwise, it will result in an error.
Using the previous Gibbs sampling example:
.. code-block:: python
# The new scan, using strict=True
values, updates = theano.scan(fn=OneStep,
outputs_info=sample,
non_sequences=[W, bvis, bhid],
n_steps=10,
strict=True)
If you omit to pass ``W``, ``bvis`` or ``bhid`` as a ``non_sequence``, it will
result in an error.
Here is a simple RRN example:
Multiple outputs, several taps values - Recurrent Neural Network with Scan
--------------------------------------------------------------------------
The examples above showed simple uses of scan. However, scan also supports
referring not only to the prior result and the current sequence value, but
also looking back more than one step.
This is needed, for example, to implement a RNN using scan. Assume
that our RNN is defined as follows :
.. math::
.. math::
x(n) = \tanh(u(n) + W x(n-1))
x(n) = \tanh( W x(n-1) + W^{in}_1 u(n) + W^{in}_2 u(n-4) +
W^{feedback} y(n-1) )
y(n) = W^{out} x(n- 3)
Note that this network is far from a classical recurrent neural
network and might be useless. The reason we defined as such
is to better illustrate the features of scan.
And the code using ``strict=True``:
In this case we have a sequence over which we need to iterate ``u``,
and two outputs ``x`` and ``y``. To implement this with scan we first
construct a function that computes one iteration step :
.. code-block:: python
.. code-block:: python
u = T.matrix() # The input sequence
def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out):
x0 = T.vector() # The initial state
W = theano.shared(W_values) # we assume that ``W_values`` contains the
x_t = T.tanh(theano.dot(x_tm1, W) + \
# initial values of your weight matrix.
theano.dot(u_t, W_in_1) + \
theano.dot(u_tm4, W_in_2) + \
theano.dot(y_tm1, W_feedback))
y_t = theano.dot(x_tm3, W_out)
return [x_t, y_t]
As naming convention for the variables we used ``a_tmb`` to mean ``a`` at
``t-b`` and ``a_tpb`` to be ``a`` at ``t+b``.
Note the order in which the parameters are given, and in which the
result is returned. Try to respect chronological order among
the taps ( time slices of sequences or outputs) used. For scan is crucial only
for the variables representing the different time taps to be in the same order
as the one in which these taps are given. Also, not only taps should respect
an order, but also variables, since this is how scan figures out what should
be represented by what. Given that we have all
the Theano variables needed we construct our RNN as follows :
.. code-block:: python
u = T.matrix() # it is a sequence of vectors
x0 = T.matrix() # initial state of x has to be a matrix, since
# it has to cover x[-3]
y0 = T.vector() # y0 is just a vector since scan has only to provide
# y[-1]
def oneStep(u_t, x_tm1, W):
return T.tanh(u_t + T.dot(W, x_tm1))
# Using strict=True, and passing W as a non_sequence
([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
x_vals, updates = theano.scan(fn=oneStep,
sequences=dict(input=u, taps=[-4,-0]),
sequences=dict(input=u, taps=[0]),
outputs_info=[dict(initial=x0, taps=[-3,-1]), y0],
outputs_info=[dict(initial=x0,
non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
taps=[-1])],
non_sequences=[W], # Don't forget to pass W!
strict=True)
strict=True)
# for second input y, scan adds -1 in output_taps by default
If you omit to pass ``W`` as a ``non_sequence``, it will result in an error.
Now ``x_vals`` and ``y_vals`` are symbolic variables pointing to the
sequence of x and y values generated by iterating over u. The
``sequence_taps``, ``outputs_taps`` give to scan information about what
slices are exactly needed. Note that if we want to use ``x[t-k]`` we do
not need to also have ``x[t-(k-1)], x[t-(k-2)],..``, but when applying
the compiled function, the numpy array given to represent this sequence
should be large enough to cover this values. Assume that we compile the
above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end.
Conditional ending of Scan
Conditional ending of Scan
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论