Update after code review:
- use KERNEL macro
- do not use `%(fail)s` on GPU to avoid returning prematurely from
kernel
- have special block for y == 0 (and reorder other ones)
- keep calling // 0 or % 0 on GPU, even though cuda will not fail
正在显示
请
注册
或者
登录
后发表评论