Merge remote-tracking branch 'origin/int_grad' into int_grad

3e5ae1b2 · Ian Goodfellow · 6d7ad4d6 · 66be96ea · 3e5ae1b2 · 3e5ae1b2
--- a/doc/extending/op.txt
+++ b/doc/extending/op.txt
@@ -266,6 +266,17 @@ following methods:
  Finally, many times in theano, integer valued inputs don't actually affect the elements of
  the output, only its shape.

+  If your function f has both an integer-valued input and an
+  integer-valued output, then both rules have to be combined:
+
+  - If f is defined at (x+epsilon), then the input gradient is
+    defined. Since f(x+epsilon) would be equal to f(x) almost
+    everywhere, the gradient should be 0 (first rule).
+
+  - If f is only defined where x is an integer, then the gradient
+    is undefined, regardless of what the gradient with respect to the
+    output is.
+
  Examples:

  1) f(x,y) = dot product between x and y. x and y are integers.
@@ -278,11 +289,18 @@ following methods:
        same as if y were floating point.
  3) f(x,y) = argmax of x along axis y.
        The gradient with respect to y is undefined, because f(x,y) is not defined for
-	 floating point y. How could you take an argmax along a fractional axis?
+        floating point y. How could you take an argmax along a fraActional axis?
+        The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
+        everywhere.
  4) f(x,y) = a vector with y elements, each of which taking on the value x
        The grad method should return DisconnectedType()() for y, because the elements of
        f don't depend on y. Only the shape of f depends on y. You probably also want to
        implement a connection_pattern method to encode this.
+  5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
+        If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
+        gradient with respect to y will be 0.5, even if y is an
+        integer. However, the gradient with respect to x will be 0,
+        because the output of f is integer-valued.


 .. function:: infer_shape(node, shapes)

--- a/theano/scan_module/scan_op.py
+++ b/theano/scan_module/scan_op.py
@@ -1305,7 +1305,13 @@ class Scan(PureOp):

        # 7.3. compute gradients of the inputs given one output
        for dx, out in enumerate(clean_outputs):
+            if g_outs[dx] != None:
                inner_g_out = safe_new(g_outs[dx][0])
+            else:
+                # We do not have a gradient on this output so we need a
+                # placeholder, which for now has the same dtype as the
+                # output
+                inner_g_out = safe_new(out)
            ###
            #### I need to clip the gradient HERE !!