1. 04 9月, 2015 4 次提交
    • Sean Lee's avatar
      Force instantiate kernel templates · 0d5cffbe
      Sean Lee 提交于
      0d5cffbe
    • Sean Lee's avatar
      Use the CUDA driver API for CUDA gpuarray operations. · 89f584bc
      Sean Lee 提交于
      Instead of mixing the CUDA driver API and the runtime API in the generated code,
      use only the CUDA driver API.
      GPU programs for CUDA gpuarray operations (except conv operations) are now
      generated as a string that is passed to the python interface of libgpuarray.
      libgpuarray then generates a cubin bytearray, which is embedded in the
      generated code.  The generated code then uses the CUDA driver
      API via the C++ interface of libgpuarray to load and launch the GPU program.
      
      This has at least two benefits:
      
      (1) This approach does not use the nvcc offline compiler to compile the
          generated code into the shared library.  It uses the host compiler
          directly, which is likely to be faster.  Note that, for cubin generation,
          libgpuarray still uses the nvcc offline compiler, but an improvement is
          being made to use NVRTC and ptxas instead of nvcc, which should be, again,
          faster.
      (2) Mixing the CUDA driver API and the runtime API is typically discouraged.
      89f584bc
    • Xavier Bouthillier's avatar
      Merge pull request #3357 from Thrandis/gpu_reshape · 7852531c
      Xavier Bouthillier 提交于
      Gpu reshape opt.
      7852531c
    • abergeron's avatar
      Merge pull request #3358 from nouiz/tests · c42a18c6
      abergeron 提交于
      Fix test and better error message
      c42a18c6
  2. 03 9月, 2015 4 次提交
  3. 02 9月, 2015 8 次提交
  4. 01 9月, 2015 7 次提交
  5. 31 8月, 2015 7 次提交
  6. 29 8月, 2015 4 次提交
  7. 28 8月, 2015 6 次提交