提交 ecf8a165 authored 作者: Pascal Lamblin's avatar Pascal Lamblin

Fix bug in GpuOuter, where the previous output memory was not erased.

sger accumulates into an output buffer, so if a c-continuous output memory buffer was used several times, the result was incorrect.
上级 d3f52989
......@@ -265,7 +265,7 @@ class GpuOuter(Op):
return hash(type(self))
def c_code_cache_version(self):
return (3,)
return (4,)
def c_code(self, node, name, inputs, outputs, sub):
# A = x * y'
......@@ -311,6 +311,20 @@ class GpuOuter(Op):
%(fail)s;
}
}
else
{
// sger accumulates into A. We need to zero it first.
int total_size = (sizeof(real) *
CudaNdarray_HOST_DIMS(%(A)s)[0] *
CudaNdarray_HOST_DIMS(%(A)s)[1]);
if (cudaSuccess != cudaMemset(%(A)s->devdata, 0, total_size))
{
PyErr_Format(PyExc_MemoryError, "GpuOuter: Error memsetting %%d bytes of device memory.", total_size);
Py_DECREF(%(name)sy);
Py_DECREF(%(name)sx);
%(fail)s;
}
}
%(name)sres = CudaNdarray_sger(1.0, %(name)sx, %(name)sy, %(A)s);
Py_DECREF(%(name)sy);
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论