In local_gpu_careduce, remove dimensions that is now detected as rebroadcastable, but wasn't during graph build.
拖放文件到此处或者 点击上传