- 04 9月, 2015 3 次提交
-
-
由 Sean Lee 提交于
Instead of mixing the CUDA driver API and the runtime API in the generated code, use only the CUDA driver API. GPU programs for CUDA gpuarray operations (except conv operations) are now generated as a string that is passed to the python interface of libgpuarray. libgpuarray then generates a cubin bytearray, which is embedded in the generated code. The generated code then uses the CUDA driver API via the C++ interface of libgpuarray to load and launch the GPU program. This has at least two benefits: (1) This approach does not use the nvcc offline compiler to compile the generated code into the shared library. It uses the host compiler directly, which is likely to be faster. Note that, for cubin generation, libgpuarray still uses the nvcc offline compiler, but an improvement is being made to use NVRTC and ptxas instead of nvcc, which should be, again, faster. (2) Mixing the CUDA driver API and the runtime API is typically discouraged. -
由 Xavier Bouthillier 提交于
Gpu reshape opt.
-
由 abergeron 提交于
Fix test and better error message
-
- 03 9月, 2015 4 次提交
-
-
由 Frederic 提交于
-
由 Frederic 提交于
-
由 Frederic 提交于
-
由 Cesar Laurent 提交于
-
- 02 9月, 2015 8 次提交
- 01 9月, 2015 7 次提交
-
-
由 Frederic 提交于
-
由 Frederic 提交于
-
由 Andy Jiang 提交于
-
由 Frederic Bastien 提交于
-
由 Frederic Bastien 提交于
-
由 abergeron 提交于
Deactivate merge of assert as it cause cycle in the graph
-
由 carriepl 提交于
Prod dimshuffle opt
-
- 31 8月, 2015 7 次提交
-
-
由 Frederic 提交于
-
由 Mohammad Pezeshki 提交于
-
由 Mohammad Pezeshki 提交于
-
由 Mohammad Pezeshki 提交于
-
由 Mohammad Pezeshki 提交于
-
由 Mohammad Pezeshki 提交于
-
由 Mohammad Pezeshki 提交于
-
- 29 8月, 2015 4 次提交
-
-
由 Frédéric Bastien 提交于
Implement batched_tensordot in terms of batched_dot
-
由 Frederic Bastien 提交于
-
由 abergeron 提交于
Delete old stuff
-
由 abergeron 提交于
Nouiz mixed
-
- 28 8月, 2015 7 次提交
-
-
由 Arnaud Bergeron 提交于
-
由 Arnaud Bergeron 提交于
-
由 Arnaud Bergeron 提交于
-
由 Arnaud Bergeron 提交于
-
由 Frederic 提交于
-
由 Frederic 提交于
-
由 Frederic 提交于
-