Try to clarify points raised during code review

56c21e8e · Pascal Lamblin · 748116e8 · 56c21e8e
--- a/doc/extending/op.txt
+++ b/doc/extending/op.txt
@@ -266,6 +266,17 @@ following methods:
  Finally, many times in theano, integer valued inputs don't actually affect the elements of
  the output, only its shape.

+  If your function f has both an integer-valued input and an
+  integer-valued output, then both rules have to be combined:
+
+  - If f is defined at (x+epsilon), then the input gradient is
+    defined. Since f(x+epsilon) would be equal to f(x) almost
+    everywhere, the gradient should be 0 (first rule).
+
+  - If f is only defined where x is an integer, then the gradient
+    is undefined, regardless of what the gradient with respect to the
+    output is.
+
  Examples:

  1) f(x,y) = dot product between x and y. x and y are integers.
@@ -278,11 +289,18 @@ following methods:
        same as if y were floating point.
  3) f(x,y) = argmax of x along axis y.
        The gradient with respect to y is undefined, because f(x,y) is not defined for
-	 floating point y. How could you take an argmax along a fractional axis?
+        floating point y. How could you take an argmax along a fraActional axis?
+        The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
+        everywhere.
  4) f(x,y) = a vector with y elements, each of which taking on the value x
        The grad method should return DisconnectedType()() for y, because the elements of
        f don't depend on y. Only the shape of f depends on y. You probably also want to
        implement a connection_pattern method to encode this.
+  5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
+        If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
+        gradient with respect to y will be 0.5, even if y is an
+        integer. However, the gradient with respect to x will be 0,
+        because the output of f is integer-valued.


 .. function:: infer_shape(node, shapes)