Merge pull request #5852 from Faruk-Ahmed/split_elemwise_addmul
Adapt local_gpu_elemwise optimization of new gpuarray back-end to avoid number of inputs overflow with Elemwise<add,mul>. The current optimization was already splitting the input, but it was not using the method split_huge_add_or_mul because of new gpuarray lifter signature (see comment
https://github.com/Theano/Theano/pull/5852#discussion_r114145523)
The unit test for large number of inputs was invalid because it was testing the old back-end (theano.sandbox.cuda). It is now adapted to gpuarray lifter optimization function. The number of settings tested is reduced to lower the computation time while still making sure we test at least one case with no number of inputs overflow and at least one case with number of inputs overflow.
split_huge_add_or_mul() is made more general so it can be used if any case like Elemwise<add,mul> occurs elsewhere.
正在显示
请
注册
或者
登录
后发表评论