Limit the total size of blocks to 512 and the size of the grids to 65535.
This should help older GPUs run at all and newer GPUs fit more blocks
on one SM.
With this change the code is cc 2.0+ compatible. But it will only be
fast on cc 3.0+ cards (due to atomicAdd).
正在显示
请
注册
或者
登录
后发表评论