Re-add part of the dtype constraint on out grads
In order to avoid expanding memory usage and computations in the part of
the graph that computes gradients, I propose the following conventions,
that re-instate some of the constraint that existed before on the
dtype of gradients:
- When calling some_op.grad(inputs, output_grads), each variable in the
"output_grads" list, if it is an actual numeric variable (and not,
for instance, DisconnectedType or NullType), should have the same
dtype as the corresponding output variable.
- Moreover, if one of the output variables is of a discrete dtype (int
or uint), then the corresponding output gradient (if not a special
case like NullType) should be zeros.
This is implemented in theano.grad, so the Op's grad method does not
have to be changed, but now it can rely again on the fact that, if an
output gradient has a dtype, that dtype will be the same as the
corresponding output variable.
正在显示
请
注册
或者
登录
后发表评论