提交 e7ed20c6 authored 作者: Frederic's avatar Frederic

Force a sync befoce gpu to cpu copy for safety as there is no clear official doc on this.

上级 77d11a8d
...@@ -1048,7 +1048,13 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){ ...@@ -1048,7 +1048,13 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
} }
//-10 could be any value different then 0. //-10 could be any value different then 0.
int cpu_err_var=-10; int cpu_err_var=-10;
// We are not 100% sure that cudaMemcpy wait that the async gpu kernel are
// finished before doing the transfer. So we add this explicit sync as it
// is pretty fast. In a python loop, I ran 1 000 000 call in 1 second.
// It is better to be save and not significatively slower then not safe.
cudaThreadSynchronize();
err = cudaMemcpy(&cpu_err_var, err_var, sizeof(int), err = cudaMemcpy(&cpu_err_var, err_var, sizeof(int),
cudaMemcpyDeviceToHost); cudaMemcpyDeviceToHost);
if (cudaSuccess != err) { if (cudaSuccess != err) {
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论