-
由 Pascal Lamblin 提交于
In order to avoid expanding memory usage and computations in the part of the graph that computes gradients, I propose the following conventions, that re-instate some of the constraint that existed before on the dtype of gradients: - When calling some_op.grad(inputs, output_grads), each variable in the "output_grads" list, if it is an actual numeric variable (and not, for instance, DisconnectedType or NullType), should have the same dtype as the corresponding output variable. - Moreover, if one of the output variables is of a discrete dtype (int or uint), then the corresponding output gradient (if not a special case like NullType) should be zeros. This is implemented in theano.grad, so the Op's grad method does not have to be changed, but now it can rely again on the fact that, if an output gradient has a dtype, that dtype will be the same as the corresponding output variable.
3bd9ffde