Use the CUDA driver API for CUDA gpuarray operations.
Instead of mixing the CUDA driver API and the runtime API in the generated code,
use only the CUDA driver API.
GPU programs for CUDA gpuarray operations (except conv operations) are now
generated as a string that is passed to the python interface of libgpuarray.
libgpuarray then generates a cubin bytearray, which is embedded in the
generated code. The generated code then uses the CUDA driver
API via the C++ interface of libgpuarray to load and launch the GPU program.
This has at least two benefits:
(1) This approach does not use the nvcc offline compiler to compile the
generated code into the shared library. It uses the host compiler
directly, which is likely to be faster. Note that, for cubin generation,
libgpuarray still uses the nvcc offline compiler, but an improvement is
being made to use NVRTC and ptxas instead of nvcc, which should be, again,
faster.
(2) Mixing the CUDA driver API and the runtime API is typically discouraged.
正在显示
请
注册
或者
登录
后发表评论