提交 ef25bb73 authored 作者: Frederic Bastien's avatar Frederic Bastien

bugfix in GpuSum. It caused trouble with the reduce of pattern 111 and 1011.

上级 5e2109c8
...@@ -583,7 +583,7 @@ class GpuSum(Op): ...@@ -583,7 +583,7 @@ class GpuSum(Op):
def _k_init(self, *args): def _k_init(self, *args):
return """ return """
const int threadCount = blockDim.x * blockDim.y * blockDim.y; const int threadCount = blockDim.x * blockDim.y * blockDim.z;
const int threadNum = threadIdx.z * blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x; const int threadNum = threadIdx.z * blockDim.x * blockDim.y + threadIdx.y * blockDim.x + threadIdx.x;
extern __shared__ float buf[]; extern __shared__ float buf[];
float mysum = 0.0f; float mysum = 0.0f;
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论