提交 76c0f3df authored 作者: Arnaud Bergeron's avatar Arnaud Bergeron

Make GpuAdvancedIncSubtensor1_dev20 work with doubles.

上级 d3e893e8
......@@ -674,6 +674,22 @@ class GpuAdvancedIncSubtensor1_dev20(GpuKernelBase, GpuAdvancedIncSubtensor1):
kname = "k_vector_add_fast"
k_var = "k_vector_add_fast_" + nodename
code = """
/*
* This is an atomicAdd that works for doubles since that is not provided
* natively by cuda.
*/
__device__ double atomicAdd(ga_double* address, ga_double val) {
unsigned long long int* address_as_ull =
(unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed,
__double_as_longlong(val +
__longlong_as_double(assumed)));
} while (assumed != old);
return __longlong_as_double(old);
}
/*
* This is a version of atomicAdd that works for half-floats. It may
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论