提交 7d18f6a6 authored 作者: Frederic's avatar Frederic

Allow to pass optimization option to nvcc and document this.

上级 eab4cada
......@@ -284,6 +284,14 @@ Tips for Improving Performance on GPU
Check the line similar to *Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op*.
This can tell you if not enough of your graph is on the GPU or if there
is too much memory transfer.
* Use nvcc options. nvcc support those options to speed up some
computations: `-ftz=true` to `flush denormals values to
zeros. <https://developer.nvidia.com/content/cuda-pro-tip-flush-denormals-confidence>`_,
`--prec-div=false` and `--prec-sqrt=false` option to speed up
division and square root operation by being less precise. You can
enable all of them with with the `nvcc.flags=--use_fast_math` Theano
flags or you can enable them individually as in this example
`nvcc.flags=-ftz=true --prec-div=false`.
.. _gpu_async:
......
......@@ -255,10 +255,15 @@ class NVCC_compiler(object):
# compute capability? '--gpu-architecture=compute_13',
# '--gpu-code=compute_13',
#nvcc argument
preargs1 = [pa for pa in preargs
if pa.startswith('-O') or
pa.startswith('--maxrregcount=') or
pa.startswith('-arch=')]
preargs1 = []
for pa in preargs:
for pattern in ['-O', '-arch=',
'--fmad', '--ftz', '--maxrregcount',
'--prec-div', '--prec-sqrt', '--use_fast_math',
'-fmad', '-ftz', '-maxrregcount',
'-prec-div', '-prec-sqrt', '-use_fast_math']:
if pa.startswith(pattern):
preargs1.append(pa)
preargs2 = [pa for pa in preargs
if pa not in preargs1] # other arguments
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论