提交 e493f9cd authored 作者: Mehdi Mirza's avatar Mehdi Mirza 提交者: memimo

cleup profileMode deprecation in docs

上级 5a0d273c
...@@ -123,6 +123,7 @@ Loops ...@@ -123,6 +123,7 @@ Loops
.. testcode:: .. testcode::
import numpy
import theano import theano
import theano.tensor as T import theano.tensor as T
...@@ -179,96 +180,189 @@ Inplace optimization ...@@ -179,96 +180,189 @@ Inplace optimization
Profiling Profiling
--------- ---------
- To replace the default mode with this mode, use the Theano flags ``mode=ProfileMode`` - To replace the default mode with this mode, use the Theano flags ``profile=True``
- To enable the memory profiling use the flags ``ProfileMode.profile_memory=True`` - To enable the memory profiling use the flags ``profile=True,profile_memory=True``
Theano output: Theano output:
.. code-block:: python .. code-block:: python
""" """
Time since import 33.456s Function profiling
Theano compile time: 1.023s (3.1% since import) ==================
Optimization time: 0.789s Message: train.py:17
Linker time: 0.221s Time in 1 calls to Function.__call__: 5.440712e-04s
Theano fct call 30.878s (92.3% since import) Time in Function.fn.__call__: 4.799366e-04s (88.212%)
Theano Op time 29.411s 87.9%(since import) 95.3%(of fct call) Time in thunks: 7.891655e-05s (14.505%)
Theano function overhead in ProfileMode 1.466s 4.4%(since import) Total compile time: 5.701292e-01s
4.7%(of fct call) Number of Apply nodes: 20
10001 Theano fct call, 0.003s per call Theano Optimizer time: 2.405829e-01s
Rest of the time since import 1.555s 4.6% Theano validate time: 1.702785e-03s
Theano Linker time (includes C, CUDA code generation/compiling): 1.597619e-02s
Theano fct summary: Import time 1.968861e-03s
<% total fct time> <total time> <time per call> <nb call> <fct name>
100.0% 30.877s 3.09e-03s 10000 train Time in all call to theano.grad() 0.000000e+00s
0.0% 0.000s 4.06e-04s 1 predict Time since theano import 1.436s
Class
Single Op-wise summary: ---
<% of local_time spent on this kind of Op> <cumulative %> <% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
<self seconds> <cumulative seconds> <time per call> <nb_call> 54.4% 54.4% 0.000s 3.90e-06s C 11 11 theano.tensor.elemwise.Elemwise
<nb_op> <nb_apply> <Op name> 17.8% 72.2% 0.000s 1.41e-05s C 1 1 theano.compile.ops.Shape_i
87.3% 87.3% 25.672s 25.672s 2.57e-03s 10000 1 1 <Gemv> 11.5% 83.7% 0.000s 2.26e-06s C 4 4 theano.tensor.basic.ScalarFromTensor
9.7% s 97.0% 2.843s 28.515s 2.84e-04s 10001 1 2 <Dot> 9.1% 92.7% 0.000s 3.58e-06s C 2 2 theano.tensor.subtensor.Subtensor
2.4% 99.3% 0.691s 29.206s 7.68e-06s * 90001 10 10 <Elemwise> 3.6% 96.4% 0.000s 2.86e-06s C 1 1 theano.tensor.elemwise.DimShuffle
0.4% 99.7% 0.127s 29.334s 1.27e-05s 10000 1 1 <Alloc> 3.6% 100.0% 0.000s 2.86e-06s C 1 1 theano.tensor.elemwise.Sum
0.2% 99.9% 0.053s 29.386s 1.75e-06s * 30001 2 4 <DimShuffle> ... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
0.0% 100.0% 0.014s 29.400s 1.40e-06s * 10000 1 1 <Sum>
0.0% 100.0% 0.011s 29.411s 1.10e-06s * 10000 1 1 <Shape_i> Ops
(*) Op is running a c implementation ---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
Op-wise summary: 17.8% 17.8% 0.000s 1.41e-05s C 1 1 Shape_i{0}
<% of local_time spent on this kind of Op> <cumulative %> 15.1% 32.9% 0.000s 1.19e-05s C 1 1 Elemwise{Composite{(i0 * (i1 ** i2))}}
<self seconds> <cumulative seconds> <time per call> 11.5% 44.4% 0.000s 2.26e-06s C 4 4 ScalarFromTensor
<nb_call> <nb apply> <Op name> 9.1% 53.5% 0.000s 3.58e-06s C 2 2 Subtensor{int64:int64:int8}
87.3% 87.3% 25.672s 25.672s 2.57e-03s 10000 1 Gemv{inplace} 8.8% 62.2% 0.000s 3.46e-06s C 2 2 Elemwise{switch,no_inplace}
9.7% 97.0% 2.843s 28.515s 2.84e-04s 10001 2 dot 6.3% 68.6% 0.000s 2.50e-06s C 2 2 Elemwise{Composite{Switch(i0, i1, minimum(i2, i3))}}[(0, 2)]
1.3% 98.2% 0.378s 28.893s 3.78e-05s * 10000 1 Elemwise{Composite{scalar_softplus,{mul,scalar_softplus,{neg,mul,sub}}}} 6.0% 74.6% 0.000s 2.38e-06s C 2 2 Elemwise{le,no_inplace}
0.4% 98.7% 0.127s 29.021s 1.27e-05s 10000 1 Alloc 5.1% 79.8% 0.000s 4.05e-06s C 1 1 Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i2, i1), i2, i1))}}[(0, 2)]
0.3% 99.0% 0.092s 29.112s 9.16e-06s * 10000 1 Elemwise{Composite{exp,{mul,{true_div,neg,{add,mul}}}}}[(0, 0)] 5.1% 84.9% 0.000s 4.05e-06s C 1 1 Elemwise{minimum,no_inplace}
0.1% 99.3% 0.033s 29.265s 1.66e-06s * 20001 3 InplaceDimShuffle{x} 3.9% 88.8% 0.000s 3.10e-06s C 1 1 Elemwise{lt,no_inplace}
... (remaining 11 Apply account for 0.7%(0.00s) of the runtime) 3.9% 92.7% 0.000s 3.10e-06s C 1 1 Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i1, i2), i1, i2))}}
(*) Op is running a c implementation 3.6% 96.4% 0.000s 2.86e-06s C 1 1 Sum{acc_dtype=float64}
3.6% 100.0% 0.000s 2.86e-06s C 1 1 InplaceDimShuffle{x}
Apply-wise summary: ... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
<% of local_time spent at this position> <cumulative %%>
<apply time> <cumulative seconds> <time per call> Apply
<nb_call> <Apply position> <Apply Op name> ------
87.3% 87.3% 25.672s 25.672s 2.57e-03s 10000 15 Gemv{inplace}(w, TensorConstant{-0.01}, InplaceDimShuffle{1,0}.0, Elemwise{Composite{exp,{mul,{true_div,neg,{add,mul}}}}}[(0, 0)].0, TensorConstant{0.9998}) <% time> <sum %> <apply time> <time per call> <#call> <id> <Mflops> <Gflops/s> <Apply name>
9.7% 97.0% 2.843s 28.515s 2.84e-04s 10000 1 dot(x, w) 17.8% 17.8% 0.000s 1.41e-05s 1 0 Shape_i{0}(coefficients)
1.3% 98.2% 0.378s 28.893s 3.78e-05s 10000 9 Elemwise{Composite{scalar_softplus,{mul,scalar_softplus,{neg,mul,sub}}}}(y, Elemwise{Composite{neg,sub}}[(0, 0)].0, Elemwise{sub,no_inplace}.0, Elemwise{neg,no_inplace}.0) input 0: dtype=float32, shape=(3,), strides=c
0.4% 98.7% 0.127s 29.020s 1.27e-05s 10000 10 Alloc(Elemwise{inv,no_inplace}.0, Shape_i{0}.0) output 0: dtype=int64, shape=(), strides=c
0.3% 99.0% 0.092s 29.112s 9.16e-06s 10000 13 Elemwise{Composite{exp,{mul,{true_div,neg,{add,mul}}}}}[(0,0)](Elemwise{ScalarSigmoid{output_types_preference=transfer_type{0}, _op_use_c_code=True}}[(0, 0)].0, Alloc.0, y, Elemwise{Composite{neg,sub}}[(0,0)].0, Elemwise{sub,no_inplace}.0, InplaceDimShuffle{x}.0) 15.1% 32.9% 0.000s 1.19e-05s 1 18 Elemwise{Composite{(i0 * (i1 ** i2))}}(Subtensor{int64:int64:int8}.0, InplaceDimShuffle{x}.0, Subtensor{int64:int64:int8}.0)
0.3% 99.3% 0.080s 29.192s 7.99e-06s 10000 11 Elemwise{ScalarSigmoid{output_types_preference=transfer_type{0}, _op_use_c_code=True}}[(0, 0)](Elemwise{neg,no_inplace}.0) input 0: dtype=float32, shape=(3,), strides=c
... (remaining 14 Apply instances account for input 1: dtype=float32, shape=(1,), strides=c
0.7%(0.00s) of the runtime) input 2: dtype=int64, shape=(3,), strides=c
output 0: dtype=float64, shape=(3,), strides=c
Profile of Theano functions memory: 5.1% 38.1% 0.000s 4.05e-06s 1 17 Subtensor{int64:int64:int8}(TensorConstant{[ 0 1..9998 9999]}, ScalarFromTensor.0, ScalarFromTensor.0, Constant{1})
(This check only the output of each apply node. It don't check the temporary memory used by the op in the apply node.) input 0: dtype=int64, shape=(10000,), strides=c
Theano fct: train input 1: dtype=int64, shape=8, strides=c
Max without gc, inplace and view (KB) 2481 input 2: dtype=int64, shape=8, strides=c
Max FAST_RUN_NO_GC (KB) 16 input 3: dtype=int8, shape=1, strides=c
Max FAST_RUN (KB) 16 output 0: dtype=int64, shape=(3,), strides=c
Memory saved by view (KB) 2450 5.1% 43.2% 0.000s 4.05e-06s 1 11 Elemwise{switch,no_inplace}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{0})
Memory saved by inplace (KB) 15 input 0: dtype=int8, shape=(), strides=c
Memory saved by GC (KB) 0 input 1: dtype=int8, shape=(), strides=c
<Sum apply outputs (bytes)> <Apply outputs memory size(bytes)> input 2: dtype=int64, shape=(), strides=c
<created/inplace/view> <Apply node> output 0: dtype=int64, shape=(), strides=c
<created/inplace/view> is taked from the op declaration, not ... 5.1% 48.3% 0.000s 4.05e-06s 1 5 Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i2, i1), i2, i1))}}[(0, 2)](Elemwise{lt,no_inplace}.0, TensorConstant{10000}, Elemwise{minimum,no_inplace}.0, TensorConstant{0})
2508800B [2508800] v InplaceDimShuffle{1,0}(x) input 0: dtype=int8, shape=(), strides=c
6272B [6272] i Gemv{inplace}(w, ...) input 1: dtype=int64, shape=(), strides=c
3200B [3200] c Elemwise{Composite{...}}(y, ...) input 2: dtype=int64, shape=(), strides=c
input 3: dtype=int8, shape=(), strides=c
Here are tips to potentially make your code run faster (if you think of new ones, suggest them on the mailing list). output 0: dtype=int64, shape=(), strides=c
5.1% 53.5% 0.000s 4.05e-06s 1 2 Elemwise{minimum,no_inplace}(Shape_i{0}.0, TensorConstant{10000})
input 0: dtype=int64, shape=(), strides=c
input 1: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=(), strides=c
3.9% 57.4% 0.000s 3.10e-06s 1 16 Subtensor{int64:int64:int8}(coefficients, ScalarFromTensor.0, ScalarFromTensor.0, Constant{1})
input 0: dtype=float32, shape=(3,), strides=c
input 1: dtype=int64, shape=8, strides=c
input 2: dtype=int64, shape=8, strides=c
input 3: dtype=int8, shape=1, strides=c
output 0: dtype=float32, shape=(3,), strides=c
3.9% 61.3% 0.000s 3.10e-06s 1 14 ScalarFromTensor(Elemwise{Composite{Switch(i0, i1, minimum(i2, i3))}}[(0, 2)].0)
input 0: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=8, strides=c
3.9% 65.3% 0.000s 3.10e-06s 1 10 Elemwise{Composite{Switch(i0, i1, minimum(i2, i3))}}[(0, 2)](Elemwise{le,no_inplace}.0, TensorConstant{0}, Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i2, i1), i2, i1))}}[(0, 2)].0, TensorConstant{10000})
input 0: dtype=int8, shape=(), strides=c
input 1: dtype=int8, shape=(), strides=c
input 2: dtype=int64, shape=(), strides=c
input 3: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=(), strides=c
3.9% 69.2% 0.000s 3.10e-06s 1 4 Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i1, i2), i1, i2))}}(Elemwise{lt,no_inplace}.0, Elemwise{minimum,no_inplace}.0, Shape_i{0}.0, TensorConstant{0})
input 0: dtype=int8, shape=(), strides=c
input 1: dtype=int64, shape=(), strides=c
input 2: dtype=int64, shape=(), strides=c
input 3: dtype=int8, shape=(), strides=c
output 0: dtype=int64, shape=(), strides=c
3.9% 73.1% 0.000s 3.10e-06s 1 3 Elemwise{lt,no_inplace}(Elemwise{minimum,no_inplace}.0, TensorConstant{0})
input 0: dtype=int64, shape=(), strides=c
input 1: dtype=int8, shape=(), strides=c
output 0: dtype=int8, shape=(), strides=c
3.6% 76.7% 0.000s 2.86e-06s 1 19 Sum{acc_dtype=float64}(Elemwise{Composite{(i0 * (i1 ** i2))}}.0)
input 0: dtype=float64, shape=(3,), strides=c
output 0: dtype=float64, shape=(), strides=c
3.6% 80.4% 0.000s 2.86e-06s 1 9 Elemwise{switch,no_inplace}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{0})
input 0: dtype=int8, shape=(), strides=c
input 1: dtype=int8, shape=(), strides=c
input 2: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=(), strides=c
3.6% 84.0% 0.000s 2.86e-06s 1 7 Elemwise{le,no_inplace}(Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i2, i1), i2, i1))}}[(0, 2)].0, TensorConstant{0})
input 0: dtype=int64, shape=(), strides=c
input 1: dtype=int8, shape=(), strides=c
output 0: dtype=int8, shape=(), strides=c
3.6% 87.6% 0.000s 2.86e-06s 1 1 InplaceDimShuffle{x}(x)
input 0: dtype=float32, shape=(), strides=c
output 0: dtype=float32, shape=(1,), strides=c
2.7% 90.3% 0.000s 2.15e-06s 1 12 ScalarFromTensor(Elemwise{Composite{Switch(i0, i1, minimum(i2, i3))}}[(0, 2)].0)
input 0: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=8, strides=c
2.4% 92.7% 0.000s 1.91e-06s 1 15 ScalarFromTensor(Elemwise{switch,no_inplace}.0)
input 0: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=8, strides=c
2.4% 95.2% 0.000s 1.91e-06s 1 13 ScalarFromTensor(Elemwise{switch,no_inplace}.0)
input 0: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=8, strides=c
2.4% 97.6% 0.000s 1.91e-06s 1 8 Elemwise{Composite{Switch(i0, i1, minimum(i2, i3))}}[(0, 2)](Elemwise{le,no_inplace}.0, TensorConstant{0}, Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i1, i2), i1, i2))}}.0, Shape_i{0}.0)
input 0: dtype=int8, shape=(), strides=c
input 1: dtype=int8, shape=(), strides=c
input 2: dtype=int64, shape=(), strides=c
input 3: dtype=int64, shape=(), strides=c
output 0: dtype=int64, shape=(), strides=c
2.4% 100.0% 0.000s 1.91e-06s 1 6 Elemwise{le,no_inplace}(Elemwise{Composite{Switch(i0, Switch(LT((i1 + i2), i3), i3, (i1 + i2)), Switch(LT(i1, i2), i1, i2))}}.0, TensorConstant{0})
input 0: dtype=int64, shape=(), strides=c
input 1: dtype=int8, shape=(), strides=c
output 0: dtype=int8, shape=(), strides=c
... (remaining 0 Apply instances account for 0.00%(0.00s) of the runtime)
Memory Profile
(Sparse variables are ignored)
(For values in brackets, it's for linker = c|py
---
Max if no gc (allow_gc=False): 0KB (0KB)
CPU: 0KB (0KB)
GPU: 0KB (0KB)
---
Max if linker=cvm(default): 0KB (0KB)
CPU: 0KB (0KB)
GPU: 0KB (0KB)
---
Memory saved if views are used: 0KB (0KB)
Memory saved if inplace ops are used: 0KB (0KB)
Memory saved if gc is enabled: 0KB (0KB)
---
<Sum apply outputs (bytes)> <Apply outputs shape> <created/inplace/view> <Apply node>
... (remaining 20 Apply account for 171B/171B ((100.00%)) of the Apply with dense outputs sizes)
All Apply nodes have output sizes that take less than 1024B.
<created/inplace/view> is taken from the Op's declaration.
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases.
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup. Test them first, as they are not guaranteed to always provide a speedup.
- Try the Theano flag floatX=float32 Sorry, no tip for today.
""" """
Exercise 5 Exercise 5
----------- -----------
- In the last exercises, do you see a speed up with the GPU? - In the last exercises, do you see a speed up with the GPU?
- Where does it come from? (Use ProfileMode) - Where does it come from? (Use profile=True)
- Is there something we can do to speed up the GPU version? - Is there something we can do to speed up the GPU version?
...@@ -427,4 +521,3 @@ Known limitations ...@@ -427,4 +521,3 @@ Known limitations
- A few hundreds nodes is fine - A few hundreds nodes is fine
- Disabling a few optimizations can speed up compilation - Disabling a few optimizations can speed up compilation
- Usually too many nodes indicates a problem with the graph - Usually too many nodes indicates a problem with the graph
...@@ -176,7 +176,7 @@ Theano flags ...@@ -176,7 +176,7 @@ Theano flags
Theano can be configured with flags. They can be defined in two ways Theano can be configured with flags. They can be defined in two ways
* With an environment variable: ``THEANO_FLAGS="mode=ProfileMode,ProfileMode.profile_memory=True"`` * With an environment variable: ``THEANO_FLAGS="profile=True,profile_memory=True"``
* With a configuration file that defaults to ``~/.theanorc`` * With a configuration file that defaults to ``~/.theanorc``
......
...@@ -104,7 +104,7 @@ Exercise 5 ...@@ -104,7 +104,7 @@ Exercise 5
----------- -----------
- In the last exercises, do you see a speed up with the GPU? - In the last exercises, do you see a speed up with the GPU?
- Where does it come from? (Use ProfileMode) - Where does it come from? (Use profile=True)
- Is there something we can do to speed up the GPU version? - Is there something we can do to speed up the GPU version?
......
...@@ -133,7 +133,7 @@ Theano flags ...@@ -133,7 +133,7 @@ Theano flags
Theano can be configured with flags. They can be defined in two ways Theano can be configured with flags. They can be defined in two ways
* With an environment variable: ``THEANO_FLAGS="mode=ProfileMode,ProfileMode.profile_memory=True"`` * With an environment variable: ``THEANO_FLAGS="profile=True,profile_memory=True"``
* With a configuration file that defaults to ``~/.theanorc`` * With a configuration file that defaults to ``~/.theanorc``
......
...@@ -23,7 +23,7 @@ Theano defines the following modes by name: ...@@ -23,7 +23,7 @@ Theano defines the following modes by name:
- ``'DebugMode'``: A mode for debugging. See :ref:`DebugMode <debugmode>` for details. - ``'DebugMode'``: A mode for debugging. See :ref:`DebugMode <debugmode>` for details.
- ``'ProfileMode'``: Deprecated, use the Theano flag :attr:`config.profile`. - ``'ProfileMode'``: Deprecated, use the Theano flag :attr:`config.profile`.
- ``'DEBUG_MODE'``: Deprecated. Use the string DebugMode. - ``'DEBUG_MODE'``: Deprecated. Use the string DebugMode.
- ``'PROFILE_MODE'``: Deprecated. Use the string ProfileMode. - ``'PROFILE_MODE'``: Deprecated, use the Theano flag :attr:`config.profile`.
The default mode is typically ``FAST_RUN``, but it can be controlled via the The default mode is typically ``FAST_RUN``, but it can be controlled via the
configuration variable :attr:`config.mode`, which can be configuration variable :attr:`config.mode`, which can be
...@@ -70,4 +70,3 @@ Reference ...@@ -70,4 +70,3 @@ Reference
Return a new Mode instance like this one, but with an Return a new Mode instance like this one, but with an
optimizer modified by requiring the given tags. optimizer modified by requiring the given tags.
...@@ -68,6 +68,15 @@ compare equal, if their parameters differ (the scalar being ...@@ -68,6 +68,15 @@ compare equal, if their parameters differ (the scalar being
executed). So the class section will merge more Apply nodes then the executed). So the class section will merge more Apply nodes then the
Ops section. Ops section.
Note that the profile also shows which Ops were running a c implementation.
Developers wishing to optimize the performance of their graph should
focus on the worst offending Ops and Apply nodes – either by optimizing
an implementation, providing a missing C implementation, or by writing
a graph optimization that eliminates the offending Op altogether.
You should strongly consider emailing one of our lists about your
issue before spending too much time on this.
Here is an example output when we disable some Theano optimizations to Here is an example output when we disable some Theano optimizations to
give you a better idea of the difference between sections. With all give you a better idea of the difference between sections. With all
optimizations enabled, there would be only one op left in the graph. optimizations enabled, there would be only one op left in the graph.
......
...@@ -213,8 +213,8 @@ Tips for Improving Performance on GPU ...@@ -213,8 +213,8 @@ Tips for Improving Performance on GPU
frequently-accessed data (see :func:`shared()<shared.shared>`). When using frequently-accessed data (see :func:`shared()<shared.shared>`). When using
the GPU, *float32* tensor ``shared`` variables are stored on the GPU by default to the GPU, *float32* tensor ``shared`` variables are stored on the GPU by default to
eliminate transfer time for GPU ops using those variables. eliminate transfer time for GPU ops using those variables.
* If you aren't happy with the performance you see, try building your functions with * If you aren't happy with the performance you see, try running your script with
``mode='ProfileMode'``. This should print some timing information at program ``profil=True`` flag. This should print some timing information at program
termination. Is time being used sensibly? If an op or Apply is termination. Is time being used sensibly? If an op or Apply is
taking more time than its share, then if you know something about GPU taking more time than its share, then if you know something about GPU
programming, have a look at how it's implemented in theano.sandbox.cuda. programming, have a look at how it's implemented in theano.sandbox.cuda.
...@@ -339,7 +339,7 @@ to the exercise in section :ref:`Configuration Settings and Compiling Mode<using ...@@ -339,7 +339,7 @@ to the exercise in section :ref:`Configuration Settings and Compiling Mode<using
Is there an increase in speed from CPU to GPU? Is there an increase in speed from CPU to GPU?
Where does it come from? (Use ``ProfileMode``) Where does it come from? (Use ``profile=True`` flag.)
What can be done to further increase the speed of the GPU version? Put your ideas to test. What can be done to further increase the speed of the GPU version? Put your ideas to test.
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论