提交 666cf404 authored 作者: Frederic's avatar Frederic

New doc info from @mrocklin

上级 035ca639
......@@ -610,16 +610,11 @@ Modify and execute to support *stride* (i.e. so as not constrain the input to be
GPU Async capabilities
----------------------
Since Theano 0.6, we started to use the asynchone capability of
GPU. This allow to be faster, but some errors are raised later, at the
wrong place. This mess with the profiling of Theano apply node.
In both case, you can use the NVIDIA driver feature that when
environment variable CUDA_LAUNCH_BLOCKING=1 is set, all kernal call
get automatically syncronized. This will restore to the old beavior
that provide good profiling and error message.
This feature interact with Theano garbage collector of intermediate
results. To get the most of this feature, you need to disable the gc
as it insert synchronization point in the graph. Set the Theano flag
allow_gc=False to get event faster speed! This will raise the memory
usage.
Ever since Theano 0.6 we started to use the asynchronous capability of
GPUs. This allows us to be faster but with the possibility that some
errors may be raised later than when they should occur. This can cause
difficulties when profiling Theano apply nodes. There is a NVIDIA
driver feature to help with these issues. If you set the environment
variable CUDA_LAUNCH_BLOCKING=1 then all kernel calls will be
automatically synchronized. This reduces performance but provides good
profiling and appropriately placed error messages.
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论