提交 9cd61627 authored 作者: Frederic's avatar Frederic

Use the async gpu kernel call by default.

Our transfer call are synchronized version. So no problem there. The problem that we need to work around is that the Theano gc could free the output var before we finish with it. cudaFree is instananeous, it don't get in the stream of command to execute.
上级 00183e72
...@@ -88,6 +88,11 @@ int device_free(void *ptr) ...@@ -88,6 +88,11 @@ int device_free(void *ptr)
if(!g_gpu_context_active) { if(!g_gpu_context_active) {
return 0; return 0;
} }
// We need sync as the Theano's GC could remove intermediate variable that
// are still needed as the gpu kernel are running or in the queue.
cudaThreadSynchronize();
cudaError_t err = cudaFree(ptr); cudaError_t err = cudaFree(ptr);
if (cudaSuccess != err) if (cudaSuccess != err)
{ {
......
...@@ -27,7 +27,7 @@ typedef float real; ...@@ -27,7 +27,7 @@ typedef float real;
#define NUM_VECTOR_OP_THREADS_PER_BLOCK 256 //Should be read from device properties. (#10) #define NUM_VECTOR_OP_THREADS_PER_BLOCK 256 //Should be read from device properties. (#10)
#endif #endif
#if 0 #if 1
// Do not wait after every kernel & transfer. // Do not wait after every kernel & transfer.
#define CNDA_THREAD_SYNC #define CNDA_THREAD_SYNC
#else #else
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论