提交 bf473a1f authored 作者: Frédéric Bastien's avatar Frédéric Bastien

Merge pull request #2942 from carriepl/add_doc_scan_performance

Add doc on optimizing scan's performance
...@@ -210,6 +210,8 @@ with all values set to zero except at the provided array indices. ...@@ -210,6 +210,8 @@ with all values set to zero except at the provided array indices.
This demonstrates that you can introduce new Theano variables into a scan function. This demonstrates that you can introduce new Theano variables into a scan function.
.. _lib_scan_shared_variables:
Using shared variables - Gibbs sampling Using shared variables - Gibbs sampling
--------------------------------------- ---------------------------------------
...@@ -316,6 +318,7 @@ updated: ...@@ -316,6 +318,7 @@ updated:
gibbs10 = theano.function([sample], values[-1], updates=updates) gibbs10 = theano.function([sample], values[-1], updates=updates)
.. _lib_scan_strict:
Using shared variables - the strict flag Using shared variables - the strict flag
---------------------------------------- ----------------------------------------
...@@ -454,6 +457,78 @@ As a rule, scan always expects the condition to be the last thing returned ...@@ -454,6 +457,78 @@ As a rule, scan always expects the condition to be the last thing returned
by the inner function, otherwise an error will be raised. by the inner function, otherwise an error will be raised.
Optimizing Scan's performance
-----------------------------
This section covers some ways to improve performance of a Theano function
using Scan.
Minimizing Scan usage
^^^^^^^^^^^^^^^^^^^^^
Scan makes it possible to define simple and compact graphs that can do the
same work as much larger and more complicated graphs. However, it comes with
a significant overhead. As such, when performance is the objective, a good
rule of thumb is to perform as much of the computation as possible outside of
Scan. This may have the effect of increasing memory usage but can also
reduce the overhead introduces by using Scan.
Explicitly passing inputs of the inner function to scan
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It is possible, inside of Scan, to use variables previously defined outside of
the Scan without explicitly passing them as inputs to the Scan. However, it is
often more efficient to explicitly pass them as non-sequence inputs instead.
Section :ref:`lib_scan_shared_variables` provides an explanation for this and
section :ref:`lib_scan_strict` describes the *strict* flag, a tool that Scan
provides to help ensure that the inputs to the function inside Scan have all
been provided as explicit inputs to the ``scan()`` function.
Deactivating garbage collecting in Scan
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Deactivating the garbage collection for Scan can allow it to reuse memory
between executions instead of always having to allocate new memory. This can
improve performance at the cost of increased memory usage. By default, Scan
reuses memory between iterations of the same execution but frees the memory
after the last iteration.
There are two ways to achieve this, using the Theano flag
``config.scan.allow_gc`` and setting it to False, or using the argument
``allow_gc`` of the function theano.scan() and set it to False (when a value
is not provided for this argument, the value of the flag
``config.scan.allow_gc`` is used).
Graph optimizations
^^^^^^^^^^^^^^^^^^^
This one is simple but still worth pointing out. Theano is able to
automatically recognize and optimize many computation patterns. However, there
are patterns that Theano doesn't optimize because doing so would change the
user interface (such as merging shared variables together into a single one,
for instance). Additionaly, Theano doesn't catch every case that it could
optimize and so it remains useful for performance that the user defines an
efficient graph in the first place. This is also the case, and sometimes even
more so, for the graph inside of Scan. This is because it will be executed
many times for every execution of the Theano function that contains it.
The `LSTM tutorial <http://deeplearning.net/tutorial/lstm.html>`_ on
`DeepLearning.net <http://deeplearning.net>`_ provides an example of an
optimization that Theano cannot perform. Instead of performing many matrix
multiplications between matrix :math:`x_t` and each of the shared matrices
:math:`W_i`, :math:`W_c`, :math:`W_f` and :math:`W_o`, the matrices
:math:`W_*`, are merged into a single shared matrix :math:`W` and the graph
performs a single larger matrix multiplication between :math:`W` and
:math:`x_t`. The resulting matrix is then sliced to obtain the results of that
the small individual matrix multiplications would have produced. This
optimization replaces several small and inefficient matrix multiplications by
a single larger one and thus improves performance at the cost of a potentially
higher memory usage.
reference reference
========= =========
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论