提交 fc36eefb authored 作者: xiaoqie's avatar xiaoqie

cuda fix

All tests in test_nnet.py pass with CUDA. Only fp32 tests in test_nnet.py pass with OpenCL. GpuFromHost doesn't work with fp16 or fp64. Larger work item size doesn't improve performance. Add 2 local_barrier(), it's strange that AMD card doesn't need these local_barrier(), but they are necessary for NVIDIA cards.
上级 2c6d7e6e
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论