Document the types of optimizations that Theano cannot do.

f7302dfd · --global · 1045a27a · f7302dfd
--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -507,24 +507,27 @@ Graph optimizations
 ^^^^^^^^^^^^^^^^^^^
 This one is simple but still worth pointing out. Theano is able to
-automatically recognize and optimize many computation patterns. However, it
+automatically recognize and optimize many computation patterns. However, there
-doesn't catch every case that could be optimized and it remains useful for
+are patterns that Theano doesn't optimize because doing so would change the
-performance that the user defines an efficient graph in the first place. This
+user interface (such as merging shared variables together into a single one,
-is also the case, and sometimes even more so, for the graph inside of Scan.
+for instance). Additionnaly, Theano doesn't catch every case that it could
-This is because it will be executed many times for every execution of the
+optimize and so it remains useful for performance that the user defines an
-Theano function that contains it.
+efficient graph in the first place. This is also the case, and sometimes even
+more so, for the graph inside of Scan. This is because it will be executed
+many times for every execution of the Theano function that contains it.
 The `LSTM tutorial <http://deeplearning.net/tutorial/lstm.html>`_ on
-`DeepLearning.net <http://deeplearning.net>`_ provides an example of such
+`DeepLearning.net <http://deeplearning.net>`_ provides an example of an
-optimization. Instead of performing many matrix multiplications between matrix
+optimization that Theano cannot perform. Instead of performing many matrix
-:math:`x_t` and each of the matrices :math:`W_i`, :math:`W_c`, :math:`W_f` and
+multiplications between matrix :math:`x_t` and each of the shared matrices
-:math:`W_o`, the matrices :math:`W_*`, are concatenated into a single matrix
+:math:`W_i`, :math:`W_c`, :math:`W_f` and :math:`W_o`, the matrices
-:math:`W` and the graph performs a single larger matrix multiplication
+:math:`W_*`, are merged into a single shared matrix :math:`W` and the graph
-between :math:`W` and :math:`x_t`. The resulting matrix is then sliced to
+performs a single larger matrix multiplication between :math:`W` and
-obtain the results of that the small individual matrix multiplications
+:math:`x_t`. The resulting matrix is then sliced to obtain the results of that
-would have produced. This optimization replaces many small and inefficient
+the small individual matrix multiplications would have produced. This
-matrix multiplications but a single larger one and thus improves performance
+optimization replaces many small and inefficient matrix multiplications but a
-at the cost of a potentially higher memory usage.
+single larger one and thus improves performance at the cost of a potentially
+higher memory usage.
 reference