You find that compiling a Theano function is taking too much time? You
can get profiling information about Theano optimization. The normal
:ref:`Theano profiler <tut_profiling>` will provide you with very
high-level information. The indentation shows the included in/subset
relationship between sections. The top of its output look like this:
.. code-block:: none
Function profiling
==================
Message: PATH_TO_A_FILE:23
Time in 0 calls to Function.__call__: 0.000000e+00s
Total compile time: 1.131874e+01s
Number of Apply nodes: 50
Theano Optimizer time: 1.152431e+00s
Theano validate time: 2.790451e-02s
Theano Linker time (includes C, CUDA code generation/compiling): 7.893991e-02s
Import time 1.153541e-02s
Time in all call to theano.grad() 4.732513e-02s
Explanations:
* ``Total compile time: 1.131874e+01s`` gives the total time spent inside `theano.function`.
* ``Number of Apply nodes: 50`` means that after optimization, there are 50 apply node in the graph.
* ``Theano Optimizer time: 1.152431e+00s`` means that we spend 1.15s in the ``theano.function`` phase where we optimize (modify) the graph to make it faster / more stable numerically / work on GPU /...
* ``Theano validate time: 2.790451e-02s`` means that we spent 2.8e-2s in the *validate* subset of the optimization phase.
* ``Theano Linker time (includes C, CUDA code generation/compiling): 7.893991e-02s`` means that we spent 7.9e-2s in *linker* phase of ``theano.function``.
* ``Import time 1.153541e-02s`` is a subset of the linker time where we import the compiled module.
* ``Time in all call to theano.grad() 4.732513e-02s`` tells that we spent a total of 4.7e-2s in all calls to ``theano.grad``. This is outside of the calls to ``theano.function``.
The *linker* phase includes the generation of the C code, the time spent
by g++ to compile and the time needed by Theano to build the object we
return. The C code generation and compilation is cached, so the first
time you compile a function and the following ones could take different
amount of execution time.
Detailed profiling of Theano optimizer
--------------------------------------
You can get more detailed profiling information about the Theano
optimizer phase by setting to `True` the Theano flags
:attr:`config.profile_optimizer`.
This will output something like this:
.. code-block:: none
Optimizer Profile
-----------------
SeqOptimizer OPT_FAST_RUN time 1.152s for 123/50 nodes before/after optimization
To understand this profile here is some explanation of how optimizations work:
* Optimizations are organized in an hierarchy. At the top level, there
is a ``SeqOptimizer`` (Sequence Optimizer). It contains other optimizers,
and applies them in the order they were specified. Those sub-optimizers can be
of other types, but are all *global* optimizers.
* Each Optimizer in the hierarchy will print some stats about
itself. The information that it prints depends of the type of the
optimizer.
* The SeqOptimizer will print some stats at the start:
.. code-block:: none
Optimizer Profile
-----------------
SeqOptimizer OPT_FAST_RUN time 1.152s for 123/50 nodes before/after optimization
0.028s for fgraph.validate()
0.131s for callback
time - (name, class, index) - validate time
Then it will print, with some additional indentation, each sub-optimizer's profile
information. These sub-profiles are ordered by the time they took to execute,
not by their execution order.
* ``OPT_FAST_RUN`` is the name of the optimizer
* 1.152s is the total time spent in that optimizer
* 123/50 means that before this optimization, there were 123 apply node in the function graph, and after only 50.
* 0.028s means it spent that time calls to ``fgraph.validate()``
* 0.131s means it spent that time for callbacks. This is a mechanism that can trigger other execution when there is a change to the FunctionGraph.
* ``time - (name, class, index) - validate time`` tells how the information for each sub-optimizer get printed.
* All other instances of ``SeqOptimizer`` are described like this. In particular, some sub-optimizer from OPT_FAST_RUN that are also ``SeqOptimizer``s.
* The ``SeqOptimizer`` will print some stats at the start: