Skip to content
项目
群组
代码片段
帮助
当前项目
正在载入...
登录 / 注册
切换导航面板
P
pytensor
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
图表
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
日程
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
图像
聊天
创建新问题
作业
提交
问题看板
Open sidebar
testgroup
pytensor
Commits
6a6e7fc3
提交
6a6e7fc3
authored
9月 14, 2012
作者:
Frederic
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Re-added the paragraph about allow_gc=False and moved the doc to a more visible space.
上级
666cf404
隐藏空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
20 行增加
和
13 行删除
+20
-13
using_gpu.txt
doc/tutorial/using_gpu.txt
+20
-13
没有找到文件。
doc/tutorial/using_gpu.txt
浏览文件 @
6a6e7fc3
...
@@ -256,13 +256,13 @@ what to expect right now:
...
@@ -256,13 +256,13 @@ what to expect right now:
that data. Getting GPU performance largely hinges on making data transfer to
that data. Getting GPU performance largely hinges on making data transfer to
the device pay off.
the device pay off.
Tips for Improving Performance on GPU
Tips for Improving Performance on GPU
-------------------------------------
-------------------------------------
* Consider
* Consider
adding ``floatX=float32`` to your ``.theanorc`` file if you plan to do a lot of
adding ``floatX=float32`` to your ``.theanorc`` file if you plan to do a lot of
GPU work.
GPU work.
* Use the Theano flag ``allow_gc=False``. See :ref:`gpu_async`
* Prefer
* Prefer
constructors like ``matrix``, ``vector`` and ``scalar`` to ``dmatrix``, ``dvector`` and
constructors like ``matrix``, ``vector`` and ``scalar`` to ``dmatrix``, ``dvector`` and
``dscalar`` because the former will give you *float32* variables when
``dscalar`` because the former will give you *float32* variables when
...
@@ -285,6 +285,25 @@ Tips for Improving Performance on GPU
...
@@ -285,6 +285,25 @@ Tips for Improving Performance on GPU
This can tell you if not enough of your graph is on the GPU or if there
This can tell you if not enough of your graph is on the GPU or if there
is too much memory transfer.
is too much memory transfer.
.. _gpu_async:
GPU Async capabilities
----------------------
Ever since Theano 0.6 we started to use the asynchronous capability of
GPUs. This allows us to be faster but with the possibility that some
errors may be raised later than when they should occur. This can cause
difficulties when profiling Theano apply nodes. There is a NVIDIA
driver feature to help with these issues. If you set the environment
variable CUDA_LAUNCH_BLOCKING=1 then all kernel calls will be
automatically synchronized. This reduces performance but provides good
profiling and appropriately placed error messages.
This feature interacts with Theano garbage collection of intermediate
results. To get the most of this feature, you need to disable the gc
as it inserts synchronization points in the graph. Set the Theano flag
``allow_gc=False`` to get even faster speed! This will raise the memory
usage.
Changing the Value of Shared Variables
Changing the Value of Shared Variables
--------------------------------------
--------------------------------------
...
@@ -606,15 +625,3 @@ have to be jointly optimized explicitly in the code.)
...
@@ -606,15 +625,3 @@ have to be jointly optimized explicitly in the code.)
Modify and execute to support *stride* (i.e. so as not constrain the input to be *C-contiguous*).
Modify and execute to support *stride* (i.e. so as not constrain the input to be *C-contiguous*).
GPU Async capabilities
----------------------
Ever since Theano 0.6 we started to use the asynchronous capability of
GPUs. This allows us to be faster but with the possibility that some
errors may be raised later than when they should occur. This can cause
difficulties when profiling Theano apply nodes. There is a NVIDIA
driver feature to help with these issues. If you set the environment
variable CUDA_LAUNCH_BLOCKING=1 then all kernel calls will be
automatically synchronized. This reduces performance but provides good
profiling and appropriately placed error messages.
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论