提交 3e5ae1b2 authored 作者: Ian Goodfellow's avatar Ian Goodfellow

Merge remote-tracking branch 'origin/int_grad' into int_grad

......@@ -266,6 +266,17 @@ following methods:
Finally, many times in theano, integer valued inputs don't actually affect the elements of
the output, only its shape.
If your function f has both an integer-valued input and an
integer-valued output, then both rules have to be combined:
- If f is defined at (x+epsilon), then the input gradient is
defined. Since f(x+epsilon) would be equal to f(x) almost
everywhere, the gradient should be 0 (first rule).
- If f is only defined where x is an integer, then the gradient
is undefined, regardless of what the gradient with respect to the
output is.
Examples:
1) f(x,y) = dot product between x and y. x and y are integers.
......@@ -278,11 +289,18 @@ following methods:
same as if y were floating point.
3) f(x,y) = argmax of x along axis y.
The gradient with respect to y is undefined, because f(x,y) is not defined for
floating point y. How could you take an argmax along a fractional axis?
floating point y. How could you take an argmax along a fraActional axis?
The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
everywhere.
4) f(x,y) = a vector with y elements, each of which taking on the value x
The grad method should return DisconnectedType()() for y, because the elements of
f don't depend on y. Only the shape of f depends on y. You probably also want to
implement a connection_pattern method to encode this.
5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
gradient with respect to y will be 0.5, even if y is an
integer. However, the gradient with respect to x will be 0,
because the output of f is integer-valued.
.. function:: infer_shape(node, shapes)
......
......@@ -1305,7 +1305,13 @@ class Scan(PureOp):
# 7.3. compute gradients of the inputs given one output
for dx, out in enumerate(clean_outputs):
if g_outs[dx] != None:
inner_g_out = safe_new(g_outs[dx][0])
else:
# We do not have a gradient on this output so we need a
# placeholder, which for now has the same dtype as the
# output
inner_g_out = safe_new(out)
###
#### I need to clip the gradient HERE !!
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论