提交 7d18f6a6 authored 作者: Frederic's avatar Frederic

Allow to pass optimization option to nvcc and document this.

上级 eab4cada
...@@ -284,6 +284,14 @@ Tips for Improving Performance on GPU ...@@ -284,6 +284,14 @@ Tips for Improving Performance on GPU
Check the line similar to *Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op*. Check the line similar to *Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op*.
This can tell you if not enough of your graph is on the GPU or if there This can tell you if not enough of your graph is on the GPU or if there
is too much memory transfer. is too much memory transfer.
* Use nvcc options. nvcc support those options to speed up some
computations: `-ftz=true` to `flush denormals values to
zeros. <https://developer.nvidia.com/content/cuda-pro-tip-flush-denormals-confidence>`_,
`--prec-div=false` and `--prec-sqrt=false` option to speed up
division and square root operation by being less precise. You can
enable all of them with with the `nvcc.flags=--use_fast_math` Theano
flags or you can enable them individually as in this example
`nvcc.flags=-ftz=true --prec-div=false`.
.. _gpu_async: .. _gpu_async:
......
...@@ -255,10 +255,15 @@ class NVCC_compiler(object): ...@@ -255,10 +255,15 @@ class NVCC_compiler(object):
# compute capability? '--gpu-architecture=compute_13', # compute capability? '--gpu-architecture=compute_13',
# '--gpu-code=compute_13', # '--gpu-code=compute_13',
#nvcc argument #nvcc argument
preargs1 = [pa for pa in preargs preargs1 = []
if pa.startswith('-O') or for pa in preargs:
pa.startswith('--maxrregcount=') or for pattern in ['-O', '-arch=',
pa.startswith('-arch=')] '--fmad', '--ftz', '--maxrregcount',
'--prec-div', '--prec-sqrt', '--use_fast_math',
'-fmad', '-ftz', '-maxrregcount',
'-prec-div', '-prec-sqrt', '-use_fast_math']:
if pa.startswith(pattern):
preargs1.append(pa)
preargs2 = [pa for pa in preargs preargs2 = [pa for pa in preargs
if pa not in preargs1] # other arguments if pa not in preargs1] # other arguments
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论