提交 f7302dfd authored 作者: --global's avatar --global

Document the types of optimizations that Theano cannot do.

上级 1045a27a
...@@ -507,24 +507,27 @@ Graph optimizations ...@@ -507,24 +507,27 @@ Graph optimizations
^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^
This one is simple but still worth pointing out. Theano is able to This one is simple but still worth pointing out. Theano is able to
automatically recognize and optimize many computation patterns. However, it automatically recognize and optimize many computation patterns. However, there
doesn't catch every case that could be optimized and it remains useful for are patterns that Theano doesn't optimize because doing so would change the
performance that the user defines an efficient graph in the first place. This user interface (such as merging shared variables together into a single one,
is also the case, and sometimes even more so, for the graph inside of Scan. for instance). Additionnaly, Theano doesn't catch every case that it could
This is because it will be executed many times for every execution of the optimize and so it remains useful for performance that the user defines an
Theano function that contains it. efficient graph in the first place. This is also the case, and sometimes even
more so, for the graph inside of Scan. This is because it will be executed
many times for every execution of the Theano function that contains it.
The `LSTM tutorial <http://deeplearning.net/tutorial/lstm.html>`_ on The `LSTM tutorial <http://deeplearning.net/tutorial/lstm.html>`_ on
`DeepLearning.net <http://deeplearning.net>`_ provides an example of such `DeepLearning.net <http://deeplearning.net>`_ provides an example of an
optimization. Instead of performing many matrix multiplications between matrix optimization that Theano cannot perform. Instead of performing many matrix
:math:`x_t` and each of the matrices :math:`W_i`, :math:`W_c`, :math:`W_f` and multiplications between matrix :math:`x_t` and each of the shared matrices
:math:`W_o`, the matrices :math:`W_*`, are concatenated into a single matrix :math:`W_i`, :math:`W_c`, :math:`W_f` and :math:`W_o`, the matrices
:math:`W` and the graph performs a single larger matrix multiplication :math:`W_*`, are merged into a single shared matrix :math:`W` and the graph
between :math:`W` and :math:`x_t`. The resulting matrix is then sliced to performs a single larger matrix multiplication between :math:`W` and
obtain the results of that the small individual matrix multiplications :math:`x_t`. The resulting matrix is then sliced to obtain the results of that
would have produced. This optimization replaces many small and inefficient the small individual matrix multiplications would have produced. This
matrix multiplications but a single larger one and thus improves performance optimization replaces many small and inefficient matrix multiplications but a
at the cost of a potentially higher memory usage. single larger one and thus improves performance at the cost of a potentially
higher memory usage.
reference reference
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论