提交 dba9a7b7 authored 作者: Olivier Breuleux's avatar Olivier Breuleux

some little additions

上级 112b3b0c
......@@ -56,8 +56,9 @@ something that you're not seeing.
I wrote a new optimization, but it's not getting used...
---------------------------------------------------------
Remember that you have to register optimizations with the OptDb, for them to get
used by the normal modes like FAST_COMPILE, FAST_RUN, and DEBUG_MODE.
Remember that you have to register optimizations with the :ref:`optdb`
for them to get used by the normal modes like FAST_COMPILE, FAST_RUN,
and DEBUG_MODE.
I wrote a new optimization, and it changed my results even though I'm pretty sure it is correct.
......@@ -71,11 +72,13 @@ something that you're not seeing.
The function I compiled is too slow, what's up?
-----------------------------------------------
First, make sure you're running in FAST_RUN mode, by passing ``mode='FAST_RUN'``
to ``theano.function`` or ``theano.make``.
First, make sure you're running in FAST_RUN mode, by passing
``mode='FAST_RUN'`` to ``theano.function`` or ``theano.make``. Some
operations have excruciatingly slow Python implementations and that
can negatively effect the performance of FAST_COMPILE.
Second, try the theano :ref:`profilemode`. This will tell you which Apply nodes,
and which Ops are eating up your CPU cycles.
Second, try the theano :ref:`profilemode`. This will tell you which
Apply nodes, and which Ops are eating up your CPU cycles.
.. _faq_wraplinker:
......
......@@ -14,6 +14,5 @@ Topics
profilemode
debugmode
debug_faq
module_vs_op
randomstreams
......@@ -17,20 +17,31 @@ First create a ProfileMode instance.
>>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
The ProfileMode constructor takes as input an optimizer and a linker. Which optimizer
and linker to use will depend on the application. For example, a user wanting
to profile the Python implementation only, should use the gof.PerformLinker (or
"py" for short). On the other hand, a user wanting to profile his graph using
c-implementations wherever possible should use the ``gof.OpWiseCLinker`` (or "c|py").
The ProfileMode constructor takes as input an optimizer and a
linker. Which optimizer and linker to use will depend on the
application. For example, a user wanting to profile the Python
implementation only, should use the gof.PerformLinker (or "py" for
short). On the other hand, a user wanting to profile his graph using C
implementations wherever possible should use the ``gof.OpWiseCLinker``
(or "c|py").
In the same manner, modifying which optimizer is passed to ProfileMode
will decide which optimizations are applied to the graph, prior to
profiling. Changing the optimizer should be especially useful when developing
new graph optimizations, in order to evaluate their impact on performance.
Note that most users will want to use ProfileMode to optimize their graph and
find where most of the computation time is being spent. In this context,
'fast_run' optimizer and ``gof.OpWiseCLinker`` are the most appropriate choices.
profiling. Changing the optimizer should be especially useful when
developing new graph optimizations, in order to evaluate their impact
on performance. Also keep in mind that optimizations might change the
computation graph a lot, meaning that you might not recognize some of
the operations that are profiled (you did not use them explicitly but
an optimizer decided to use it to improve performance or numerical
stability). If you cannot easily relate the output of ProfileMode with
the computations you defined, you might want to try setting optimizer
to None (but keep in mind the computations will be slower than if they
were optimized).
Note that most users will want to use ProfileMode to optimize their
graph and find where most of the computation time is being spent. In
this context, 'fast_run' optimizer and ``gof.OpWiseCLinker`` are the
most appropriate choices.
Compiling your Graph with ProfileMode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -107,16 +118,20 @@ generates the following output:
"""
The summary has two components to it. In the first section called the Apply-wise
summary, timing information is provided for the worst offending Apply nodes. This
corresponds to individual nodes within your graph which take the longest to
execute. In the second portion, the Op-wise summary, the execution time of
all Apply nodes executing the same Op are grouped together and the total
execution time per Op is shown.
The summary has two components to it. In the first section called the
Apply-wise summary, timing information is provided for the worst
offending Apply nodes. This corresponds to individual Op applications
within your graph which take the longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion,
the Op-wise summary, the execution time of all Apply nodes executing
the same Op are grouped together and the total execution time per Op
is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them).
Note that the ProfileMode also shows which Ops were running a c implementation.
Note that the ProfileMode also shows which Ops were running a c
implementation.
Developers wishing to optimize the performance of their graph, should focus on the
worst offending Ops. If no c-implementation exists for this op, consider writing
a c-implementation yourself or use the mailing list, to suggest that a c-implementation
be provided.
Developers wishing to optimize the performance of their graph, should
focus on the worst offending Ops. If no C implementation exists for
this op, consider writing a C implementation yourself or use the
mailing list, to suggest that a C implementation be provided.
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论