Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
93ca140a
提交
93ca140a
authored
5月 21, 2015
作者:
--global
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Add section about scan performance in doc
上级
b40ba487
显示空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
70 行增加
和
0 行删除
+70
-0
scan.txt
doc/library/scan.txt
+70
-0
没有找到文件。
doc/library/scan.txt
浏览文件 @
93ca140a
...
@@ -210,6 +210,8 @@ with all values set to zero except at the provided array indices.
...
@@ -210,6 +210,8 @@ with all values set to zero except at the provided array indices.
This demonstrates that you can introduce new Theano variables into a scan function.
This demonstrates that you can introduce new Theano variables into a scan function.
.. _lib_scan_shared_variables:
Using shared variables - Gibbs sampling
Using shared variables - Gibbs sampling
---------------------------------------
---------------------------------------
...
@@ -316,6 +318,7 @@ updated:
...
@@ -316,6 +318,7 @@ updated:
gibbs10 = theano.function([sample], values[-1], updates=updates)
gibbs10 = theano.function([sample], values[-1], updates=updates)
.. _lib_scan_strict:
Using shared variables - the strict flag
Using shared variables - the strict flag
----------------------------------------
----------------------------------------
...
@@ -454,6 +457,73 @@ As a rule, scan always expects the condition to be the last thing returned
...
@@ -454,6 +457,73 @@ As a rule, scan always expects the condition to be the last thing returned
by the inner function, otherwise an error will be raised.
by the inner function, otherwise an error will be raised.
Optimizing Scan's performance
-----------------------------
This section covers some ways to improve performance of a Theano function
using Scan.
Minimizing Scan usage
^^^^^^^^^^^^^^^^^^^^^
Scan makes it possible to define simple and compact graphs that can do the
same work as much larger and more complicated graphs. However, it comes with
a significant overhead. As such, when performance is the objective, a good
rule of thumb is to perform as much of the computation as possible outside of
Scan. This may have the effect of increasing memory usage but can also
reduce the overhead introduces by using Scan.
Passing shared variables as non-sequences
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It is possible, inside of Scan, to use shared variables previously defined
outside of the Scan without explicitly passing them as inputs to the Scan.
However it is often more efficient to explicitly pass them as non-sequence
inputs instead. Section :ref:`lib_scan_shared_variables` provides an
explanation for this and section :ref:`lib_scan_strict` describes the *strict*
flag, a tool that Scan provides to help ensure that the shared variables are
correctly passed as non-sequence inputs to Scan.
Deactivating garbage collecting in Scan
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Deactivating the garbage collection for Scan can allow it to reuse memory
between executions instead of always having to allocate new memory. This can
improve performance at the cost of increased memory usage.
There are two ways to achieve this, using the Theano flag
``config.scan.allow_gc`` and setting it to False, or using the argument
``allow_gc`` of the function theano.scan() and set it to False (when a value
is not provided for this argument, the value of the flag
``config.scan.allow_gc`` is used).
Graph optimizations
^^^^^^^^^^^^^^^^^^^
This one is simple but still worth pointing out. Theano is able to
automatically recognize and optimize many computation patterns. However, it
doesn't catch every case that could be optimized and it remains useful for
performance that the user defines an efficient graph in the first place. This
is also the case, and sometimes even more so, for the graph inside of Scan.
This is because it will be executed many times for every execution of the
Theano function that contains it.
The `LSTM tutorial <http://deeplearning.net/tutorial/lstm.html>`_ on
`DeepLearning.net <http://deeplearning.net>`_ provides an example of such
optimization. Instead of performing many matrix multiplications between matrix
:math:`x_t` and each of the matrices :math:`W_i`, :math:`W_c`, :math:`W_f` and
:math:`W_o`, the matrices :math:`W_*`, are concatenated into a single matrix
:math:`W` and the graph performs a single larger matrix multiplication
between :math:`W` and :math:`x_t`. The resulting matrix is then sliced to
obtain the results of that the small individual matrix multiplications
would have produced. This optimization replaces many small and inefficient
matrix multiplications but a single larger one and thus improves performance
at the cost of a potentially higher memory usage.
reference
reference
=========
=========
...
...
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论