提交 ca112010 authored 作者: Frederic's avatar Frederic

Add documentation of the new profiler.

fix gh-1420
上级 eea0042b
...@@ -274,6 +274,8 @@ import theano and print the config variable, as in: ...@@ -274,6 +274,8 @@ import theano and print the config variable, as in:
Do the vm/cvm linkers profile the execution time of Theano functions? Do the vm/cvm linkers profile the execution time of Theano functions?
See :ref:`tut_profiling` for examples.
.. attribute:: profile_memory .. attribute:: profile_memory
Bool value: either True or False Bool value: either True or False
......
...@@ -41,6 +41,7 @@ you out. ...@@ -41,6 +41,7 @@ you out.
aliasing aliasing
shape_info shape_info
debug_faq debug_faq
profiling
extending_theano extending_theano
faq faq
python-memory-management python-memory-management
.. _tut_profiling:
=========================
Profiling Theano function
=========================
.. note::
This method replace the old ProfileMode. Do not use ProfileMode
anymore.
Besides checking for errors, another important task is to profile your
code. For this Theano uses Theano flags and/or parameters which are
to be passed as an argument to :func:`theano.function <function.function>`.
The simplest way to profile Theano function is to use the Theano flags
described bellow. When the process exit, they will cause the printing
of the information requested to the stdout.
Using the ProfileMode is a three-step process.
Enabling the profiler is pretty easy. Just use the Theano flag
:attr:`config.profile`.
To enable the memory profiler use the Theano flag:
:attr:`config.profile_memory`.
To enable the memory profiler use the Theano flag:
:attr:`config.profile_memory` in addition to :attr:`config.profile`.
To enable the profiler of Theano optimization phase, use the Theano
flag: :attr:`config.profile_optimizer` in addition to
:attr:`config.profile`.
You can use the Theano flags :attr:`profiling.n_apply`,
:attr:`profiling.n_ops` and :attr:`profiling.min_memory_size` to
modify the quantify of information printed.
The profiler will output one profile per Theano function and profile
that is the sum of the printed profile. Each profile contain 4
sections: global info, class info, Ops info and Apply node info.
In the global section, the "Message" is the name of the Theano
function. theano.function() have an optional parameter name that
default to None. Change it to something else to help you profile many
Theano function. In that section, we also see the number of time the
function was called (1) and the total time spent in all those
calls. The time spent in Function.fn.__call__ and in thunks are useful
to help understand Theano overhead.
Also, we see the time spend in the compilation and we see the time
spent in the 2 parts of the compilation process: optimization(modify
the graph to make it more stable/faster) and the linking (compile c
code and make the Python callable returned by function).
The class, Ops and Apply node are the same information: information
about the apply node that ran. The Ops section take the information
from the Apply section and merge the Apply node that have exactly the
same op. For example, if two Apply node in the graph have two Ops that
compare equal, they will be merged. Some ops like Elemwise, will not
compare equal, if there parameter differ. The section class will more
all Apply node whose Ops are from the same class. So they will merge
addition and multiplication elemwise node.
Here is an example output when we disable some Theano optimization to
show you better the difference between section. With Theano
optimization, that graph will result in only one operation in the
graph.
to run the example:
THEANO_FLAGS=optimizer_excluding=fusion:inplace,profile=True python doc/tutorial/profiling_example.py
The output:
.. literalinclude:: profiling_example_out.txt
import numpy
import theano
x, y, z = theano.tensor.vectors('xyz')
f = theano.function([x, y, z], [(x + y + z) * 2])
xv = numpy.random.rand(10).astype(theano.config.floatX)
yv = numpy.random.rand(10).astype(theano.config.floatX)
zv = numpy.random.rand(10).astype(theano.config.floatX)
f(xv, yv, zv)
Function profiling
==================
Message: None
Time in 1 calls to Function.__call__: 5.698204e-05s
Time in Function.fn.__call__: 1.192093e-05s (20.921%)
Time in thunks: 6.198883e-06s (10.879%)
Total compile time: 3.642474e+00s
Theano Optimizer time: 7.326508e-02s
Theano validate time: 3.712177e-04s
Theano Linker time (includes C, CUDA code generation/compiling): 9.584920e-01s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
100.0% 100.0% 0.000s 2.07e-06s C 3 3 <class 'theano.tensor.elemwise.Elemwise'>
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
65.4% 65.4% 0.000s 2.03e-06s C 2 2 Elemwise{add,no_inplace}
34.6% 100.0% 0.000s 2.15e-06s C 1 1 Elemwise{mul,no_inplace}
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
50.0% 50.0% 0.000s 3.10e-06s 1 0 Elemwise{add,no_inplace}(x, y)
34.6% 84.6% 0.000s 2.15e-06s 1 2 Elemwise{mul,no_inplace}(TensorConstant{(1,) of 2.0}, Elemwise{add,no_inplace}.0)
15.4% 100.0% 0.000s 9.54e-07s 1 1 Elemwise{add,no_inplace}(Elemwise{add,no_inplace}.0, z)
... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论