1600B [(400,)] i GpuElemwise{Sub}[(0, 1)](CudaNdarrayConstant{[ 1.]}, GpuFromHost.0)
1600B [(400,)] c HostFromGpu(GpuElemwise{Sub}[(0, 1)].0)
1600B [(400,)] c GpuFromHost(y)
... (remaining 7 Apply account for 448B/3800192B ((0.01%)) of the Apply with dense outputs sizes)
<created/inplace/view> is taken from the Op's declaration.
Apply nodes marked 'inplace' or 'view' may actually allocate memory, this is not reported here. If you use DebugMode, warnings will be emitted in those cases.
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
Test them first, as they are not guaranteed to always provide a speedup.
- Try the Theano flag floatX=float32
Sorry, no tip for today.
"""
"""
Exercise 5
Exercise 5
-----------
-----------
- In the last exercises, do you see a speed up with the GPU?
- In the last exercises, do you see a speed up with the GPU?
- Where does it come from? (Use ProfileMode)
- Where does it come from? (Use profile=True)
- Is there something we can do to speed up the GPU version?
- Is there something we can do to speed up the GPU version?
...
@@ -427,4 +532,3 @@ Known limitations
...
@@ -427,4 +532,3 @@ Known limitations
- A few hundreds nodes is fine
- A few hundreds nodes is fine
- Disabling a few optimizations can speed up compilation
- Disabling a few optimizations can speed up compilation
- Usually too many nodes indicates a problem with the graph
- Usually too many nodes indicates a problem with the graph
print('Other time since import %.3fs %.1f%%'%(other_time,other_time/total_time*100))
print('%i Theano fct call, %.3fs per call'%(total_fct_call,time_per_call))
print()
print("List of apply that don't have float64 as input but have float64 in outputs. Usefull to know if we forgot some cast when using floatX=float32 or gpu code.")
print("Here are tips to potentially make your code run faster (if you think of new ones, suggest them on the mailing list). Test them first as they are not guaranteed to always provide a speedup.")
scal.Sqrt,scal.Abs,scal.Cos,scal.Sin,scal.Tan,scal.Tanh,scal.Cosh,scal.Sinh,T.nnet.sigm.ScalarSigmoid,T.nnet.sigm.ScalarSoftplus]# Abs, Mod in float{32,64} only
print(" - With the default gcc libm, exp in float32 is slower than in float64! Try Theano flags floatX=float64 or install amdlibm and set the theano flags lib.amdlibm=True")