... (remaining 20 Apply account for 171B/171B ((100.00%)) of the Apply with dense outputs sizes)
All Apply nodes have output sizes that take less than 1024B.
<created/inplace/view> is taken from the Op's declaration.
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases.
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
Sorry, no tip for today.
"""
Exercise 5
-----------
- In the last exercises, do you see a speed up with the GPU?
- Where does it come from? (Use ProfileMode)
- Where does it come from? (Use profile=True)
- Is there something we can do to speed up the GPU version?
...
...
@@ -427,4 +521,3 @@ Known limitations
- A few hundreds nodes is fine
- Disabling a few optimizations can speed up compilation
- Usually too many nodes indicates a problem with the graph
functions using either of the following two options:
1. Use Theano flag :attr:`config.profile` to enable profiling.
1. Use Theano flag :attr:`config.profile` to enable profiling.
- To enable the memory profiler use the Theano flag:
:attr:`config.profile_memory` in addition to :attr:`config.profile`.
- Moreover, to enable the profiling of Theano optimization phase,
...
...
@@ -30,8 +30,8 @@ functions using either of the following two options:
2. Pass the argument :attr:`profile=True` to the function :func:`theano.function <function.function>`. And then call :attr:`f.profile.print_summary()` for a single function.
- Use this option when you want to profile not all the
functions but one or more specific function(s).
- You can also combine the profile of many functions:
- You can also combine the profile of many functions:
.. testcode::
profile = theano.compile.ProfileStats()
...
...
@@ -68,6 +68,15 @@ compare equal, if their parameters differ (the scalar being
executed). So the class section will merge more Apply nodes then the
Ops section.
Note that the profile also shows which Ops were running a c implementation.
Developers wishing to optimize the performance of their graph should
focus on the worst offending Ops and Apply nodes – either by optimizing
an implementation, providing a missing C implementation, or by writing
a graph optimization that eliminates the offending Op altogether.
You should strongly consider emailing one of our lists about your
issue before spending too much time on this.
Here is an example output when we disable some Theano optimizations to
give you a better idea of the difference between sections. With all
optimizations enabled, there would be only one op left in the graph.