@@ -623,8 +623,8 @@ Profiling Theano function compilation
...
@@ -623,8 +623,8 @@ Profiling Theano function compilation
You find that compiling a Theano function is taking too much time? You
You find that compiling a Theano function is taking too much time? You
can get profiling information about Theano optimization. The normal
can get profiling information about Theano optimization. The normal
:ref:`Theano profiler <tut_profiling>` will provide you with very high
:ref:`Theano profiler <tut_profiling>` will provide you with very
level information. The indentation show the included in/subset
high-level information. The indentation shows the included in/subset
relationship between sections. The top of its output look like this:
relationship between sections. The top of its output look like this:
.. code-block:: none
.. code-block:: none
...
@@ -643,18 +643,18 @@ relationship between sections. The top of its output look like this:
...
@@ -643,18 +643,18 @@ relationship between sections. The top of its output look like this:
Explanations:
Explanations:
* ``Total compile time: 1.131874e+01s`` give the total time spend in inside `theano.function`.
* ``Total compile time: 1.131874e+01s`` gives the total time spent inside `theano.function`.
* ``Number of Apply nodes: 50`` mean that after optimization, there is 50 apply node in the graph.
* ``Number of Apply nodes: 50`` means that after optimization, there are 50 apply node in the graph.
* ``Theano Optimizer time: 1.152431e+00s`` mean that we spend 1.15s in the ``theano.function`` phase where we optimize(modify) the graph to make it faster/more stable numerically/work on GPU/...
* ``Theano Optimizer time: 1.152431e+00s`` means that we spend 1.15s in the ``theano.function`` phase where we optimize (modify) the graph to make it faster / more stable numerically / work on GPU /...
* ``Theano validate time: 2.790451e-02s`` mean that we spend 2.8e-2s in the validate subset of the optimization phase.
* ``Theano validate time: 2.790451e-02s`` means that we spent 2.8e-2s in the *validate* subset of the optimization phase.
* ``Theano Linker time (includes C, CUDA code generation/compiling): 7.893991e-02s`` mean that we spend 7.9e-2s in the ``theano.function`` linker phase.
* ``Theano Linker time (includes C, CUDA code generation/compiling): 7.893991e-02s`` means that we spent 7.9e-2s in *linker* phase of ``theano.function``.
* ``Import time 1.153541e-02s`` is a subset of the linker time where we import the compiled module.
* ``Import time 1.153541e-02s`` is a subset of the linker time where we import the compiled module.
* ``Time in all call to theano.grad() 4.732513e-02s`` tell that we spent a total of 4.7e-2s in all calls to theano.grad. This is outside call to ``theano.function``.
* ``Time in all call to theano.grad() 4.732513e-02s`` tells that we spent a total of 4.7e-2s in all calls to ``theano.grad``. This is outside of the calls to ``theano.function``.
The linker phase include the generation of the C code, the time spent
The *linker* phase includes the generation of the C code, the time spent
by g++ to compile and the time needed by Theano to build the object we
by g++ to compile and the time needed by Theano to build the object we
return. The C code generation and compilation is cached, so the first
return. The C code generation and compilation is cached, so the first
time you compile a function and the following one could take different
time you compile a function and the following ones could take different
amount of execution time.
amount of execution time.
Detailed profiling of Theano optimizer
Detailed profiling of Theano optimizer
...
@@ -854,15 +854,15 @@ This will output something like this:
...
@@ -854,15 +854,15 @@ This will output something like this:
...
...
To understand this profile here is some explanation of how optimization work:
To understand this profile here is some explanation of how optimizations work:
* Optimization are organized in an hierarchy. At the top level, there
* Optimizations are organized in an hierarchy. At the top level, there
is a SeqOptimizer(Sequence Optimizer). It contain other optimizer
is a ``SeqOptimizer`` (Sequence Optimizer). It contains other optimizers,
and apply them in the order it was told. Those sub optimizer can be
and applies them in the order they were specified. Those sub-optimizers can be
of other Type, but are global optimizer.
of other types, but are all *global* optimizers.
* Each Optimizer in the hierarchy will print some stats about
* Each Optimizer in the hierarchy will print some stats about
itself. The information that it print depend of the type of the
itself. The information that it prints depends of the type of the
optimizer.
optimizer.
* The SeqOptimizer will print some stats at the start:
* The SeqOptimizer will print some stats at the start:
...
@@ -876,20 +876,20 @@ To understand this profile here is some explanation of how optimization work:
...
@@ -876,20 +876,20 @@ To understand this profile here is some explanation of how optimization work:
0.131s for callback
0.131s for callback
time - (name, class, index) - validate time
time - (name, class, index) - validate time
Then it will print with some indentation each sub optimizer profile
Then it will print, with some additional indentation, each sub-optimizer's profile
information. It sort them by the time they took. This isn't the
information. These sub-profiles are ordered by the time they took to execute,
order they where used.
not by their execution order.
* ``OPT_FAST_RUN`` is the name of the optimizer
* ``OPT_FAST_RUN`` is the name of the optimizer
* 1.152s is the total time spent in that optimizer
* 1.152s is the total time spent in that optimizer
* 123/50 mean that before this optimization, there was 123 apply node in the function and after only 50.
* 123/50 means that before this optimization, there were 123 apply node in the function graph, and after only 50.
* 0.028s mean it spent that time in the fgraph.validate()
* 0.028s means it spent that time calls to ``fgraph.validate()``
* 0.131s mean it is spent that time for callback. This is a mechanism that can trigger other execution when there is a change to the FunctionGraph.
* 0.131s means it spent that time for callbacks. This is a mechanism that can trigger other execution when there is a change to the FunctionGraph.
* ``time - (name, class, index) - validate time`` Tell how the information for each sub optimizer get printed.
* ``time - (name, class, index) - validate time`` tells how the information for each sub-optimizer get printed.
* All other SeqOptimizer are described like this. There is some sub optimizer from OPT_FAST_RUN that are SeqOptimizer
* All other instances of ``SeqOptimizer`` are described like this. In particular, some sub-optimizer from OPT_FAST_RUN that are also ``SeqOptimizer``s.
* The SeqOptimizer will print some stats at the start:
* The ``SeqOptimizer`` will print some stats at the start:
.. code-block:: none
.. code-block:: none
...
@@ -960,45 +960,49 @@ To understand this profile here is some explanation of how optimization work:
...
@@ -960,45 +960,49 @@ To understand this profile here is some explanation of how optimization work: