documented new op contract

02528764 · Ian Goodfellow · 4ca430a8 · 02528764
--- a/doc/extending/op.txt
+++ b/doc/extending/op.txt
@@ -239,6 +239,52 @@ following methods:
  Both the partial differentiation and the multiplication have to be performed by
  :func:`grad`.
+  Theano currently imposes the following constraints on the values returned by the grad method:
+  1) They must be Variable instances.
+  2) When they are types that have dtypes, they must never have an integer dtype.
+  Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
+  NullType or zero gradient. When you have an integer as an argument to your grad method,
+  recall the definition of a derivative to help you decide what value to return:
+  :math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
+  Suppose your function f has an integer-valued output. For most functions you're likely
+  to implement in theano, this means your gradient should be zero, because f(x+epsilon)
+  = f(x) for almost all x. (The only other option is that the gradient could be undefined,
+  if your function is discontinuous everywhere, like the rational indicator function)
+  Suppose your function f has an integer-valued input. This is a little trickier, because
+  you need to think about what you mean mathematically when you make a variable integer-valued
+  in theano. Most of the time in machine learning we mean "f is a function of a real-valued
+  x, but we are only going to pass in integer-values of x". In this case, f(x+epsilon) exists,
+  so the gradient through f should be the same whether x is an integer or a floating point
+  variable. Sometimes what we mean is "f is a function of an integer-valued x, and f is only
+  defined where x is an integer." Since f(x+epsilon) doesn't exist, the gradient is undefined.
+  Finally, many times in theano, integer valued inputs don't actually affect the elements of
+  the output, only its shape.
+  Examples:
+  1) f(x,y) = dot product between x and y. x and y are integers.
+	Since the output is also an integer, f is a step function.
+	Its gradient is zero almost everywhere, so Op.grad should return
+	zeros in the shape of x and y.
+  2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
+        In this case the output is floating point. It doesn't matter that y is an integer.
+	We consider f to still be defined at f(x,y+epsilon). The gradient is exactly the
+	same as if y were floating point.
+  3) f(x,y) = argmax of x along axis y.
+        The gradient with respect to y is undefined, because f(x,y) is not defined for
+	 floating point y. How could you take an argmax along a fractional axis?
+  4) f(x,y) = a vector with y elements, each of which taking on the value x
+        The grad method should return DisconnectedType()() for y, because the elements of
+	f don't depend on y. Only the shape of f depends on y. You probably also want to
+	implement a connection_pattern method to encode this.
 .. function:: infer_shape(node, shapes)
   Optional.