Fix random segfault on exit when the new backend is in use.
The problem is that the old backend assumes it is alone in the
universe and even when not in use will interact with the gpu. Most
notably in this case it will forcibly destroy any primary context on
exit. The only problem is that we are also using the primary context
and our cleanup handlers run after those of the old backend.
This is a major problem because cublas uses the runtime api which will
create a context whenever it thinks it needs one (like for cudaFree).
However all the allocations are from the old context that the old
backend destroyed. So when it tries to clean up its resources the
low-level handlers get confused and we end up in a segmentation fault.
As to why it doesn't always happen, I figure it's because sometimes we
get lucky and addresses line up or something.
Anyway, if we stop the old backend from calling cudaThreadExit() the
segfaults go away. This may cause a very small leakage of resources
for the few seconds we are running until we exit, but I don't think
this will be a problem.
正在显示
请
注册
或者
登录
后发表评论