detect at run time the gpu pointeur size and int size. Use that to allow fusing more gpu elemwise together
拖放文件到此处或者 点击上传