theano/gpuarray/nnet.py · ef7ce799b05fc8564995e7806b4d8a8fe94b1c21 · testgroup / pytensor

cuda fix · fc36eefb

由提交于 6月 12, 2017

All tests in test_nnet.py pass with CUDA.
Only fp32 tests in test_nnet.py pass with OpenCL. GpuFromHost doesn't work with fp16 or fp64.
Larger work item size doesn't improve performance.
Add 2 local_barrier(), it's strange that AMD card doesn't need these local_barrier(), but they are necessary for NVIDIA cards.

fc36eefb

nnet.py 45.3 KB

Replace nnet.py