提交 66be96ea authored 作者: goodfeli's avatar goodfeli

Merge pull request #7 from lamblin/int_grad_doc

Try to clarify points raised during code review
...@@ -266,6 +266,17 @@ following methods: ...@@ -266,6 +266,17 @@ following methods:
Finally, many times in theano, integer valued inputs don't actually affect the elements of Finally, many times in theano, integer valued inputs don't actually affect the elements of
the output, only its shape. the output, only its shape.
If your function f has both an integer-valued input and an
integer-valued output, then both rules have to be combined:
- If f is defined at (x+epsilon), then the input gradient is
defined. Since f(x+epsilon) would be equal to f(x) almost
everywhere, the gradient should be 0 (first rule).
- If f is only defined where x is an integer, then the gradient
is undefined, regardless of what the gradient with respect to the
output is.
Examples: Examples:
1) f(x,y) = dot product between x and y. x and y are integers. 1) f(x,y) = dot product between x and y. x and y are integers.
...@@ -278,11 +289,18 @@ following methods: ...@@ -278,11 +289,18 @@ following methods:
same as if y were floating point. same as if y were floating point.
3) f(x,y) = argmax of x along axis y. 3) f(x,y) = argmax of x along axis y.
The gradient with respect to y is undefined, because f(x,y) is not defined for The gradient with respect to y is undefined, because f(x,y) is not defined for
floating point y. How could you take an argmax along a fractional axis? floating point y. How could you take an argmax along a fraActional axis?
The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
everywhere.
4) f(x,y) = a vector with y elements, each of which taking on the value x 4) f(x,y) = a vector with y elements, each of which taking on the value x
The grad method should return DisconnectedType()() for y, because the elements of The grad method should return DisconnectedType()() for y, because the elements of
f don't depend on y. Only the shape of f depends on y. You probably also want to f don't depend on y. Only the shape of f depends on y. You probably also want to
implement a connection_pattern method to encode this. implement a connection_pattern method to encode this.
5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
gradient with respect to y will be 0.5, even if y is an
integer. However, the gradient with respect to x will be 0,
because the output of f is integer-valued.
.. function:: infer_shape(node, shapes) .. function:: infer_shape(node, shapes)
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论