Add doc for Eq profiling output

640b4347 · Frederic · 123c8a0e · 640b4347
--- a/doc/extending/optimization.txt
+++ b/doc/extending/optimization.txt
@@ -886,3 +886,107 @@ To understand this profile here is some explanation of how optimization work:
  * 0.028s mean it spent that time in the fgraph.validate()
  * 0.131s mean it is spent that time for callback. This is a mechanism that can trigger other execution when there is a change to the FunctionGraph.
  * ``time      - (name, class, index) - validate time`` Tell how the information for each sub optimizer get printed.
+  * All other SeqOptimizer are described like this. There is some sub optimizer from OPT_FAST_RUN that are SeqOptimizer
+
+
+* The SeqOptimizer will print some stats at the start:
+
+    .. code-block:: none
+
+       0.751816s - ('canonicalize', 'EquilibriumOptimizer', 4) - 0.004s
+         EquilibriumOptimizer      canonicalize
+           time 0.751s for 14 passes
+           nb nodes (start, end,  max) 108 81 117
+           time io_toposort 0.029s
+           time in local optimizers 0.687s
+           time in global optimizers 0.010s
+            0 - 0.050s 27 (0.000s in global opts, 0.002s io_toposort) - 108 nodes - ('local_dimshuffle_lift', 9) ('local_upcast_elemwise_constant_inputs', 5) ('local_shape_to_shape_i', 3) ('local_fill_sink', 3) ('local_fill_to_alloc', 2) ...
+            1 - 0.288s 26 (0.002s in global opts, 0.002s io_toposort) - 117 nodes - ('local_dimshuffle_lift', 8) ('local_fill_sink', 4) ('constant_folding', 4) ('local_useless_elemwise', 3) ('local_subtensor_make_vector', 3) ...
+            2 - 0.044s 13 (0.002s in global opts, 0.003s io_toposort) - 96 nodes - ('constant_folding', 4) ('local_dimshuffle_lift', 3) ('local_fill_sink', 3) ('local_useless_elemwise', 1) ('local_fill_to_alloc', 1) ...
+            3 - 0.045s 11 (0.000s in global opts, 0.002s io_toposort) - 91 nodes - ('constant_folding', 3) ('local_fill_to_alloc', 2) ('local_dimshuffle_lift', 2) ('local_mul_canonizer', 2) ('MergeOptimizer', 1) ...
+            4 - 0.035s 8 (0.002s in global opts, 0.002s io_toposort) - 93 nodes - ('local_fill_sink', 3) ('local_dimshuffle_lift', 2) ('local_fill_to_alloc', 1) ('MergeOptimizer', 1) ('constant_folding', 1)
+            5 - 0.035s 6 (0.000s in global opts, 0.002s io_toposort) - 88 nodes - ('local_fill_sink', 2) ('local_dimshuffle_lift', 2) ('local_fill_to_alloc', 1) ('local_mul_canonizer', 1)
+            6 - 0.038s 10 (0.001s in global opts, 0.002s io_toposort) - 95 nodes - ('local_fill_sink', 3) ('local_dimshuffle_lift', 3) ('constant_folding', 2) ('local_fill_to_alloc', 1) ('MergeOptimizer', 1)
+            7 - 0.032s 5 (0.001s in global opts, 0.002s io_toposort) - 91 nodes - ('local_fill_sink', 3) ('MergeOptimizer', 1) ('local_dimshuffle_lift', 1)
+            8 - 0.034s 5 (0.000s in global opts, 0.002s io_toposort) - 92 nodes - ('local_fill_sink', 3) ('MergeOptimizer', 1) ('local_greedy_distributor', 1)
+            9 - 0.031s 6 (0.001s in global opts, 0.002s io_toposort) - 90 nodes - ('local_fill_sink', 2) ('local_fill_to_alloc', 1) ('MergeOptimizer', 1) ('local_dimshuffle_lift', 1) ('local_greedy_distributor', 1)
+           10 - 0.032s 5 (0.000s in global opts, 0.002s io_toposort) - 89 nodes - ('local_dimshuffle_lift', 2) ('local_fill_to_alloc', 1) ('MergeOptimizer', 1) ('local_fill_sink', 1)
+           11 - 0.030s 5 (0.000s in global opts, 0.002s io_toposort) - 88 nodes - ('local_dimshuffle_lift', 2) ('local_fill_to_alloc', 1) ('MergeOptimizer', 1) ('constant_folding', 1)
+           12 - 0.026s 1 (0.000s in global opts, 0.003s io_toposort) - 81 nodes - ('MergeOptimizer', 1)
+           13 - 0.031s 0 (0.000s in global opts, 0.003s io_toposort) - 81 nodes -
+           times - times applied - nb node created - name:
+           0.263s - 15 - 0 - constant_folding
+           0.096s - 2 - 14 - local_greedy_distributor
+           0.066s - 4 - 19 - local_mul_canonizer
+           0.046s - 28 - 57 - local_fill_sink
+           0.042s - 35 - 78 - local_dimshuffle_lift
+           0.018s - 5 - 15 - local_upcast_elemwise_constant_inputs
+           0.010s - 11 - 4 - MergeOptimizer
+           0.009s - 4 - 0 - local_useless_elemwise
+           0.005s - 11 - 2 - local_fill_to_alloc
+           0.004s - 3 - 6 - local_neg_to_mul
+           0.002s - 1 - 3 - local_lift_transpose_through_dot
+           0.002s - 3 - 4 - local_shape_to_shape_i
+           0.002s - 2 - 4 - local_subtensor_lift
+           0.001s - 3 - 0 - local_subtensor_make_vector
+           0.001s - 1 - 1 - local_sum_all_to_none
+           0.131s - in 62 optimization that where not used (display only those with a runtime > 0)
+             0.050s - local_add_canonizer
+             0.018s - local_mul_zero
+             0.016s - local_one_minus_erf
+             0.010s - local_func_inv
+             0.006s - local_0_dot_x
+             0.005s - local_track_shape_i
+             0.004s - local_mul_switch_sink
+             0.004s - local_fill_cut
+             0.004s - local_one_minus_erf2
+             0.003s - local_remove_switch_const_cond
+             0.003s - local_cast_cast
+             0.002s - local_IncSubtensor_serialize
+             0.001s - local_sum_div_dimshuffle
+             0.001s - local_div_switch_sink
+             0.001s - local_dimshuffle_no_inplace_at_canonicalize
+             0.001s - local_cut_useless_reduce
+             0.001s - local_reduce_join
+             0.000s - local_sum_sum
+             0.000s - local_useless_alloc
+             0.000s - local_reshape_chain
+             0.000s - local_useless_subtensor
+             0.000s - local_reshape_lift
+             0.000s - local_flatten_lift
+             0.000s - local_useless_slice
+             0.000s - local_subtensor_of_alloc
+             0.000s - local_subtensor_of_dot
+             0.000s - local_subtensor_merge
+
+  * ``0.751816s - ('canonicalize', 'EquilibriumOptimizer', 4) - 0.004s``
+    This line is from SeqOptimizer. Is mean that this sub optimizer took
+    a total of .7s. Its name is canonicalize. It is an
+    'EquilibriumOptimizer'. It was executer at index 4 by the
+    SeqOptimizer. It spent 0.004s in the validate phase.
+  * All other lines are from the profiler of the EquilibriumOptimizer.
+
+  * An EquilibriumOptimizer do multiple pass on the Apply nodes from
+    the graph. Conceptually, it try to execute all optimization on all
+    node in the graph. If no optimization got applied in a pass, it
+    stop. So it try to find an equilibrium state for all the
+    optimization. This is useful when we don't know a fixed order for
+    the execution of the optimization.
+  * ``time 0.751s for 14 passes`` mean that it took .7s and did 14 pass over the graph.
+
+  * ``nb nodes (start, end, max) 108 81 117`` mean that at the start,
+    the graph had 108 node, at the end, it had 81 and the maximum size
+    was 177.
+
+  * Then it print some global timming, like is spent 0.029s in
+    io_toposort, all local optimizer took 0.687s together for all
+    passes and global optimizer took a total of 0.010s.
+
+  * Then we prit the timming for each pass and the optimization that
+    got applied and the number of time they got applied. For example,
+    in pass 0, the local_dimshuffle_lift optimizer changed the graph 9
+    time.
+
+  * Then we print the time spend in each optimizer, the number of time
+    they changed the graph and the number of node they introduced in
+    the graph.