Rework the Op contract documentation and add a bit about __props__.

Rework the Op contract documentation and add a bit about props.
229b876f · Arnaud Bergeron · cfc493d1 · 229b876f
--- a/doc/extending/op.txt
+++ b/doc/extending/op.txt
@@ -6,28 +6,26 @@ Making arithmetic Ops on double
 Now that we have a ``double`` type, we have yet to use it to perform
 computations. We'll start by defining multiplication.
 .. _op_contract:
 Op's contract
 =============
-An Op (:class:`gof.Op`) is any object which defines the
+An Op is any object which inherits from :class:`gof.Op`.  It has to
-following methods:
+define the following methods.
 .. function:: make_node(*inputs)
  This method is responsible for creating output Variables of a
-  suitable symbolic Type to serve as the outputs of this Op's application.
+  suitable symbolic Type to serve as the outputs of this Op's
-  The Variables found in ``*inputs`` must be operated on using Theano's
+  application.  The Variables found in ``*inputs`` must be operated on
-  symbolic language to compute the symbolic output Variables. This method
+  using Theano's symbolic language to compute the symbolic output
-  should put these outputs into an Apply instance, and return the
+  Variables. This method should put these outputs into an Apply
-  Apply instance.
+  instance, and return the Apply instance.
  This method creates an Apply node representing the application of
-  the Op on the inputs provided. If the Op cannot be applied to
+  the Op on the inputs provided. If the Op cannot be applied to these
-  these inputs, it must raise an appropriate exception.
+  inputs, it must raise an appropriate exception.
  The inputs of the Apply instance returned by this call must be
  ordered correctly: a subsequent ``self.make_node(*apply.inputs)``
@@ -35,9 +33,11 @@ following methods:
 .. function:: perform(node, inputs, output_storage)
-  This method computes the function associated to this Op. ``node`` is an Apply node created by the Op's ``make_node``
+  This method computes the function associated to this Op. ``node`` is
-  method. ``inputs`` is a list of references to data to operate on using non-symbolic statements, 
+  an Apply node created by the Op's ``make_node`` method. ``inputs``
-  (i.e., statements in Python, Numpy and C languages). ``output_storage`` is a list of storage cells where the
+  is a list of references to data to operate on using non-symbolic
+  statements, (i.e., statements in Python, Numpy and C
+  languages). ``output_storage`` is a list of storage cells where the
  variables of the computation must be put.
  More specifically:
@@ -52,20 +52,20 @@ following methods:
    - ``output_storage``: This is a list of storage cells where the output is to be stored.
      A storage cell is a one-element list. It is forbidden to change
-      the length of the list(s) contained in ``output_storage``.  There is
+      the length of the list(s) contained in ``output_storage``.
-      one storage cell for each output of the Op.
+      There is one storage cell for each output of the Op.
      The data put in ``output_storage`` must match the type of the
      symbolic output. This is a situation where the ``node`` argument
      can come in handy.
-      A function Mode may allow ``output_storage`` elements to persist between
+      A function Mode may allow ``output_storage`` elements to persist
-      evaluations, or it may reset ``output_storage`` cells to hold a value of
+      between evaluations, or it may reset ``output_storage`` cells to
-      ``None``.  It can also pre-allocate some memory for the Op to use.
+      hold a value of ``None``.  It can also pre-allocate some memory
-      This feature can allow ``perform`` to reuse memory between
+      for the Op to use.  This feature can allow ``perform`` to reuse
-      calls, for example. If there is something  preallocated in the
+      memory between calls, for example. If there is something
-      ``output_storage``, it will be of the good dtype, but can have
+      preallocated in the ``output_storage``, it will be of the good
-      the wrong shape and have any stride pattern.
+      dtype, but can have the wrong shape and have any stride pattern.
  This method must be determined by the inputs. That is to say, if
  it is evaluated once on inputs A and returned B, then if ever
@@ -77,6 +77,10 @@ following methods:
  operations <views_and_inplace>` before writing a ``perform``
  implementation that does either of these things.
+Instead (or in addition to) ``perform()`` You can also provide a
+:ref:`C implementation <cop>` of For more details, refer to the
+documentation for :ref:`op`.
 .. function:: __eq__(other)
  ``other`` is also an Op.
@@ -89,6 +93,10 @@ following methods:
  inputs (same view_map). For more details, see
  :ref:`views_and_inplace`.
+   .. note::
+     If you set `__props__`, this will be automatically generated.
 .. function:: __hash__()
  If two Op instances compare equal, then they **must** return the
@@ -98,179 +106,281 @@ following methods:
  lifetime of self.  Op instances should be immutable in this
  sense.
-.. function:: connection_pattern( node ):
+   .. note::
-  Optional method; sometimes needed for gradient.grad to
+     If you set `__props__`, this will be automatically generated.
-  work correctly.
-  Returns a list of list of bools.
+.. op_optional:
-  Op.connection_pattern[input_idx][output_idx] is true if the
+Optional methods or attributes
-  elements of inputs[input_idx] have an effect on the elements of
+==============================
-  outputs[output_idx].
-  The ``node`` parameter is needed to determine the number of
+.. attribute:: __props__
-  inputs. Some ops such as Subtensor take a variable number of
-  inputs.
-  If no connection_pattern is specified, gradient.grad will
+  *Default:* Undefined
-  assume that all inputs have some elements connected to some
-  elements of all outputs.
-  This method conveys two pieces of information that are otherwise
+  Must be a tuple.  Lists the name of the attributes which influence
-  not part of the theano graph:
+  the computation performed.  This will also enable the automatic
+  generation of appropriate __eq__, __hash__ and __str__ methods.
+  Should be set to `()` if you have no attributes that are relevant to
+  the computation to generate the methods.
-  1) Which of the op's inputs are truly ancestors of each of the
+.. attribute:: default_output
-     op's outputs. Suppose an op has two inputs, x and y, and
-     outputs f(x) and g(y). y is not really an ancestor of f, but
-     it appears to be so in the theano graph.
-  2) Whether the actual elements of each input/output are relevant
-     to a computation.
-     For example, the shape op does not read its input's elements,
-     only its shape metadata. d shape(x) / dx should thus raise
-     a disconnected input exception (if these exceptions are
-     enabled).
-     As another example, the elements of the Alloc op's outputs
-     are not affected by the shape arguments to the Alloc op.
-  Failing to implement this function for an op that needs it can
+  *Default:* None
-  result in two types of incorrect behavior:
+  If this member variable is an integer, then the default
-  1) gradient.grad erroneously raising a TypeError reporting that
+  implementation of ``__call__`` will return
-     a gradient is undefined.
+  ``node.outputs[self.default_output]``, where ``node`` was returned
-  2) gradient.grad failing to raise a ValueError reporting that
+  by ``make_node``.  Otherwise, the entire list of outputs will be
-     an input is disconnected.
+  returned.
+.. function:: make_thunk(node, storage_map, compute_map, no_recycling)
+   This function must return a thunk, that is a zero-arguments
+   function that encapsulates the computation to be performed by this
+   op on the arguments of the node.
+   :param node: Apply instance
+     The node for which a thunk is requested.
+   :param storage_map: dict of lists
+     This maps variables to a one-element lists holding the variable's
+     current value. The one-element list acts as pointer to the value
+     and allows sharing that "pointer" with other nodes and instances.
+   :param compute_map: dict of lists
+     This maps variables to one-element lists holding booleans.  If
+     the value is 0 then the variable has not been computed and the
+     value should not be considered valid.  If the value is 1 the
+     variable has been computed and the value is valid.  If the value
+     is 2 the variable has been garbage-collected and is no longer
+     valid, but shouldn't be required anymore for this call.
+   :param no_recycling: WRITEME
+     WRITEME
+   The returned function must ensure that is sets the computed
+   variables as computed in the `compute_map`.
+   If you make your op class inherit from :class:`gof.Op`, then you
+   can use the much easier :ref:`perform_meth` method below.
+.. function:: __call__(*inputs, **kwargs)
+   By default this is a convinience function which calls
+   :meth:`make_node` with the supplied arguments and returns the
+   result indexed by `default_output`.  This can be overridden by
+   subclasses to do anything else, but must return an Apply node
+   representing the computation to be performed.
+   In cases where the returned graph may differ based on the arguments
+   or their types, it is recommended to create a helper function
+   rather than overriding `__call__` on an Op.
+.. function:: infer_shape(node, shapes)
+   This function is needed for shape optimization. ``shapes`` is a
+   list with one tuple for each input of the Apply node (which corresponds
+   to the inputs of the op).  Each tuple contains as many elements as the
+   number of dimensions of the corresponding input. The value of each element
+   is the shape (number of items) along the corresponding dimension of that
+   specific input.
+   While this might sound complicated, it is nothing more than the shape
+   of each input as symbolic variables (one per dimension).
+   The function should return a list with one tuple for each output.
+   Each tuple should contain the corresponding output's computed shape.
+   Implementing this method will allow Theano to compute the output's
+   shape without computing the output itself, potentially sparing you
+   a costly recomputation.
+.. function:: flops(inputs, outputs)
+   It is only used to have more information printed by the memory
+   profiler.  It makes it print the mega flops and giga flops per
+   second for each apply node. It takes as inputs two lists: one for the
+   inputs and one for the outputs. They contain tuples that are the
+   shapes of the corresponding inputs/outputs.
+.. function:: __str__()
+   This allows you to specify a more informative string representation of your
+   Op. If an Op has parameters, it is highly recommended to have the
+   ``__str__`` method include the name of the op and the Op's parameters'
+   values.
+   .. note::
+     If you set `__props__`, this will be automatically generated.
+     You can still overide it for custom output.
+.. function:: do_constant_folding(node)
+   *Default:* Return True
+   By default when optimizations are enabled, we remove during
+   function compilation Apply nodes whose inputs are all constants.
+   We replace the Apply node with a Theano constant variable.
+   This way, the Apply node is not executed at each function
+   call. If you want to force the execution of an op during the
+   function call, make do_constant_folding return False.
-  Even if connection_pattern is not implemented correctly,
+   As done in the Alloc op, you can return False only in some cases by
-  if gradient.grad returns an expression, that expression will
+   analyzing the graph from the node parameter.
-  be numerically correct.
+If you want you op to work with gradient.grad() you also need to
+implement the functions described below.
+Gradient
+========
+These are the function required to work with gradient.grad().
 .. function:: grad(inputs, output_gradients)
-  Optional (but needed to have it work with gradient.grad()).
+  If the Op being defined is differentiable, its gradient may be
+  specified symbolically in this method. Both ``inputs`` and
-  If the Op being defined is differentiable, its gradient may be specified 
+  ``output_gradients`` are lists of symbolic Theano Variables and
-  symbolically in this method. Both ``inputs`` and ``output_gradients``
+  those must be operated on using Theano's symbolic language. The grad
-  are lists of symbolic Theano Variables and those must be operated on using 
+  method must return a list containing one Variable for each
-  Theano's symbolic language. The grad method must return a list containing 
+  input. Each returned Variable represents the gradient with respect
-  one Variable for each input. Each returned Variable represents 
+  to that input computed based on the symbolic gradients with respect
-  the gradient with respect to that input computed based on the symbolic gradients with
+  to each output.
-  respect to each output.
+  If the output is not differentiable with respect to an input then
-  If the output is not differentiable with respect to an input
+  this method should be defined to return a variable of type NullType
-  then this method should be defined to return a variable of type
+  for that input. Likewise, if you have not implemented the grad
-  NullType for that input. Likewise, if you have not implemented the
+  computation for some input, you may return a variable of type
-  grad computation for some input, you may return a variable of type
+  NullType for that input. theano.gradient contains convenience
-  NullType for that input. theano.gradient contains convenience methods
+  methods that can construct the variable for you:
-  that can construct the variable for you: :func:`theano.gradient.grad_undefined` and
+  :func:`theano.gradient.grad_undefined` and
  :func:`theano.gradient.grad_not_implemented`, respectively.
-  If an element of output_gradient is of type theano.gradient.DisconnectedType,
+  If an element of output_gradient is of type
-  it means that the cost is not a function of this output. If any of the
+  theano.gradient.DisconnectedType, it means that the cost is not a
-  op's inputs participate in the computation of only disconnected outputs,
+  function of this output. If any of the op's inputs participate in
-  then Op.grad should return DisconnectedType variables for those inputs.
+  the computation of only disconnected outputs, then Op.grad should
+  return DisconnectedType variables for those inputs.
  If the grad method is not defined, then Theano assumes it has been
  forgotten.  Symbolic differentiation will fail on a graph that
  includes this Op.
-  It must be understood that the Op's grad method is not meant to return the
+  It must be understood that the Op's grad method is not meant to
-  gradient of the Op's output. theano.tensor.grad computes gradients; Op.grad
+  return the gradient of the Op's output. theano.tensor.grad computes
-  is a helper function that computes terms that appear in gradients.
+  gradients; Op.grad is a helper function that computes terms that
+  appear in gradients.
-  If an Op has a single vector-valued output y and a single vector-valued input x,
+  If an Op has a single vector-valued output y and a single
-  then the grad method will be passed x and a second vector z. Define J to be
+  vector-valued input x, then the grad method will be passed x and a
-  the Jacobian of y with respect to x. The Op's grad method should return
+  second vector z. Define J to be the Jacobian of y with respect to
-  dot(J.T,z). When theano.tensor.grad calls the grad method, it will set z to
+  x. The Op's grad method should return dot(J.T,z). When
-  be the gradient of the cost C with respect to y. If this op is the only op
+  theano.tensor.grad calls the grad method, it will set z to be the
-  that acts on x, then dot(J.T,z) is the gradient of C with respect to x.
+  gradient of the cost C with respect to y. If this op is the only op
-  If there are other ops that act on x, theano.tensor.grad will have to add up
+  that acts on x, then dot(J.T,z) is the gradient of C with respect to
-  the terms of x's gradient contributed by the other op's grad method.
+  x.  If there are other ops that act on x, theano.tensor.grad will
+  have to add up the terms of x's gradient contributed by the other
-  In practice, an op's input and output are rarely implemented as single vectors.
+  op's grad method.
-  Even if an op's output consists of a list containing a scalar, a sparse matrix,
-  and a 4D tensor, you can think of these objects as being formed by rearranging
+  In practice, an op's input and output are rarely implemented as
-  a vector. Likewise for the input. In this view, the values computed by the grad
+  single vectors.  Even if an op's output consists of a list
-  method still represent a Jacobian-vector product.
+  containing a scalar, a sparse matrix, and a 4D tensor, you can think
+  of these objects as being formed by rearranging a vector. Likewise
-  In practice, it is probably not a good idea to explicitly construct the Jacobian,
+  for the input. In this view, the values computed by the grad method
-  which might be very large and very sparse. However, the returned value should
+  still represent a Jacobian-vector product.
-  be equal to the Jacobian-vector product.
+  In practice, it is probably not a good idea to explicitly construct
-  So long as you implement this product correctly, you need not understand what
+  the Jacobian, which might be very large and very sparse. However,
-  theano.tensor.grad is doing, but for the curious the mathematical justification
+  the returned value should be equal to the Jacobian-vector product.
-  is as follows:
+  So long as you implement this product correctly, you need not
-  In essence, the grad method must simply implement through symbolic Variables
+  understand what theano.tensor.grad is doing, but for the curious the
-  and operations the chain rule of differential calculus. The chain rule
+  mathematical justification is as follows:
-  is the mathematical procedure that allows one to calculate the total derivative
-  :math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
+  In essence, the grad method must simply implement through symbolic
-  primitive symbolic Variable x found in the list ``inputs``.
+  Variables and operations the chain rule of differential
-  The grad method does this using ``output_gradients`` which provides the total
+  calculus. The chain rule is the mathematical procedure that allows
-  derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic Variable
+  one to calculate the total derivative :math:`\frac{d C}{d x}` of the
-  that is returned by the Op (this is provided
+  final scalar symbolic Variable C with respect to a primitive
-  in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the
+  symbolic Variable x found in the list ``inputs``.  The grad method
-  latter with respect to the primitive Variable (this has to be computed).
+  does this using ``output_gradients`` which provides the total
+  derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic
-  In mathematics, the total derivative of a scalar variable (C) with respect to a vector of
+  Variable that is returned by the Op (this is provided in
-  scalar variables (x), i.e. the gradient, is customarily represented as the
+  ``output_gradients``), as well as the knowledge of the total
-  row vector of the partial derivatives, whereas the total derivative of a vector of
+  derivative :math:`\frac{d f}{d x}` of the latter with respect to the
-  scalar variables (f) with respect to another (x), is customarily represented by the matrix of
+  primitive Variable (this has to be computed).
-  the partial derivatives, i.e.the jacobian matrix. In this convenient setting,
-  the chain rule instructs that the gradient of the final scalar variable C with respect
+  In mathematics, the total derivative of a scalar variable (C) with
-  to the primitive scalar variables in x through those in f is simply given by the matrix product: 
+  respect to a vector of scalar variables (x), i.e. the gradient, is
-  :math:`\frac{d C}{d x} = \frac{d C}{d f} * \frac{d f}{d x}`.
+  customarily represented as the row vector of the partial
+  derivatives, whereas the total derivative of a vector of scalar
-  Here, the chain rule must be implemented in a similar but slightly more complex
+  variables (f) with respect to another (x), is customarily
-  setting: Theano provides in the list ``output_gradients`` one gradient for each
+  represented by the matrix of the partial derivatives, i.e.the
-  of the Variables returned by the Op. Where f is one such particular Variable,
+  jacobian matrix. In this convenient setting, the chain rule
-  the corresponding gradient found in ``output_gradients`` and representing
+  instructs that the gradient of the final scalar variable C with
-  :math:`\frac{d C}{d f}` is provided with a shape similar to f and thus not
+  respect to the primitive scalar variables in x through those in f is
-  necessarily as a row vector of scalars.  Furthermore, for each Variable x of 
+  simply given by the matrix product: :math:`\frac{d C}{d x} = \frac{d
-  the Op's list of input variables ``inputs``, the returned gradient representing
+  C}{d f} * \frac{d f}{d x}`.
-  :math:`\frac{d C}{d x}` must have a shape similar to that of Variable x.
+  Here, the chain rule must be implemented in a similar but slightly
-  If the output list of the op is :math:`[f_1, ... f_n]`, then the list 
+  more complex setting: Theano provides in the list
-  ``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]`.
+  ``output_gradients`` one gradient for each of the Variables returned
-  If ``inputs`` consists of the list :math:`[x_1, ..., x_m]`, then Op.grad
+  by the Op. Where f is one such particular Variable, the
-  should return the list :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`,
+  corresponding gradient found in ``output_gradients`` and
-  where :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and :math:`i` can stand for multiple dimensions).
+  representing :math:`\frac{d C}{d f}` is provided with a shape
+  similar to f and thus not necessarily as a row vector of scalars.
+  Furthermore, for each Variable x of the Op's list of input variables
+  ``inputs``, the returned gradient representing :math:`\frac{d C}{d
+  x}` must have a shape similar to that of Variable x.
+  If the output list of the op is :math:`[f_1, ... f_n]`, then the
+  list ``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C),
+  ... , grad_{f_n}(C)]`.  If ``inputs`` consists of the list
+  :math:`[x_1, ..., x_m]`, then Op.grad should return the list
+  :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`, where
+  :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and
+  :math:`i` can stand for multiple dimensions).
-  In other words, :func:`grad` does not return
+  In other words, :func:`grad` does not return :math:`\frac{d f_i}{d
-  :math:`\frac{d f_i}{d x_j}`, but instead the appropriate dot product specified by the chain rule:  
+  x_j}`, but instead the appropriate dot product specified by the
-  :math:`\frac{d C}{d x_j} =
+  chain rule: :math:`\frac{d C}{d x_j} = \frac{d C}{d f_i} \cdot
-  \frac{d C}{d f_i} \cdot \frac{d f_i}{d x_j}`.
+  \frac{d f_i}{d x_j}`.  Both the partial differentiation and the
-  Both the partial differentiation and the multiplication have to be performed by
+  multiplication have to be performed by :func:`grad`.
-  :func:`grad`.
+  Theano currently imposes the following constraints on the values
+  returned by the grad method:
-  Theano currently imposes the following constraints on the values returned by the grad method:
  1) They must be Variable instances.
  2) When they are types that have dtypes, they must never have an integer dtype.
  The output gradients passed *to* Op.grad will also obey these constraints.
-  Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
+  Integers are a tricky subject. Integers are the main reason for
-  NullType or zero gradient. When you have an integer as an argument to your grad method,
+  having DisconnectedType, NullType or zero gradient. When you have an
-  recall the definition of a derivative to help you decide what value to return:
+  integer as an argument to your grad method, recall the definition of
+  a derivative to help you decide what value to return:
  :math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
-  Suppose your function f has an integer-valued output. For most functions you're likely
+  Suppose your function f has an integer-valued output. For most
-  to implement in theano, this means your gradient should be zero, because f(x+epsilon)
+  functions you're likely to implement in theano, this means your
-  = f(x) for almost all x. (The only other option is that the gradient could be undefined,
+  gradient should be zero, because f(x+epsilon) = f(x) for almost all
-  if your function is discontinuous everywhere, like the rational indicator function)
+  x. (The only other option is that the gradient could be undefined,
+  if your function is discontinuous everywhere, like the rational
-  Suppose your function f has an integer-valued input. This is a little trickier, because
+  indicator function)
-  you need to think about what you mean mathematically when you make a variable integer-valued
-  in theano. Most of the time in machine learning we mean "f is a function of a real-valued
+  Suppose your function f has an integer-valued input. This is a
-  x, but we are only going to pass in integer-values of x". In this case, f(x+epsilon) exists,
+  little trickier, because you need to think about what you mean
-  so the gradient through f should be the same whether x is an integer or a floating point
+  mathematically when you make a variable integer-valued in
-  variable. Sometimes what we mean is "f is a function of an integer-valued x, and f is only
+  theano. Most of the time in machine learning we mean "f is a
-  defined where x is an integer." Since f(x+epsilon) doesn't exist, the gradient is undefined.
+  function of a real-valued x, but we are only going to pass in
-  Finally, many times in theano, integer valued inputs don't actually affect the elements of
+  integer-values of x". In this case, f(x+epsilon) exists, so the
-  the output, only its shape.
+  gradient through f should be the same whether x is an integer or a
+  floating point variable. Sometimes what we mean is "f is a function
+  of an integer-valued x, and f is only defined where x is an
+  integer." Since f(x+epsilon) doesn't exist, the gradient is
+  undefined.  Finally, many times in theano, integer valued inputs
+  don't actually affect the elements of the output, only its shape.
  If your function f has both an integer-valued input and an
  integer-valued output, then both rules have to be combined:
@@ -290,63 +400,75 @@ following methods:
        Its gradient is zero almost everywhere, so Op.grad should return
        zeros in the shape of x and y.
  2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
-        In this case the output is floating point. It doesn't matter that y is an integer.
+        In this case the output is floating point. It doesn't matter
-        We consider f to still be defined at f(x,y+epsilon). The gradient is exactly the
+        that y is an integer.  We consider f to still be defined at
-        same as if y were floating point.
+        f(x,y+epsilon). The gradient is exactly the same as if y were
+        floating point.
  3) f(x,y) = argmax of x along axis y.
-        The gradient with respect to y is undefined, because f(x,y) is not defined for
+        The gradient with respect to y is undefined, because f(x,y) is
-        floating point y. How could you take an argmax along a fraActional axis?
+        not defined for floating point y. How could you take an argmax
-        The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
+        along a fraActional axis?  The gradient with respect to x is
-        everywhere.
+        0, because f(x+epsilon, y) = f(x) almost everywhere.
  4) f(x,y) = a vector with y elements, each of which taking on the value x
-        The grad method should return DisconnectedType()() for y, because the elements of
+        The grad method should return DisconnectedType()() for y,
-        f don't depend on y. Only the shape of f depends on y. You probably also want to
+        because the elements of f don't depend on y. Only the shape of
-        implement a connection_pattern method to encode this.
+        f depends on y. You probably also want to implement a
+        connection_pattern method to encode this.
  5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
        If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
        gradient with respect to y will be 0.5, even if y is an
        integer. However, the gradient with respect to x will be 0,
        because the output of f is integer-valued.
+.. function:: connection_pattern(node):
-.. function:: infer_shape(node, shapes)
+  Sometimes needed for proper operation of gradient.grad().
-   Optional.
-   This function is needed for shape optimization. ``shapes`` is a
+  Returns a list of list of bools.
-   list with one tuple for each input of the Apply node (which corresponds
-   to the inputs of the op).  Each tuple contains as many elements as the
-   number of dimensions of the corresponding input. The value of each element
-   is the shape (number of items) along the corresponding dimension of that
-   specific input.
-   While this might sound complicated, it is nothing more than the shape
+  Op.connection_pattern[input_idx][output_idx] is true if the
-   of each input as symbolic variables (one per dimension).
+  elements of inputs[input_idx] have an effect on the elements of
+  outputs[output_idx].
-   The function should return a list with one tuple for each output.
+  The ``node`` parameter is needed to determine the number of
-   Each tuple should contain the corresponding output's computed shape.
+  inputs. Some ops such as Subtensor take a variable number of
+  inputs.
-   Implementing this method will allow Theano to compute the output's
+  If no connection_pattern is specified, gradient.grad will
-   shape without computing the output itself, potentially sparing you
+  assume that all inputs have some elements connected to some
-   a costly recomputation.
+  elements of all outputs.
-.. function:: flops(inputs, outputs)
+  This method conveys two pieces of information that are otherwise
+  not part of the theano graph:
-   Optional.
+  1) Which of the op's inputs are truly ancestors of each of the
+     op's outputs. Suppose an op has two inputs, x and y, and
+     outputs f(x) and g(y). y is not really an ancestor of f, but
+     it appears to be so in the theano graph.
+  2) Whether the actual elements of each input/output are relevant
+     to a computation.
+     For example, the shape op does not read its input's elements,
+     only its shape metadata. d shape(x) / dx should thus raise
+     a disconnected input exception (if these exceptions are
+     enabled).
+     As another example, the elements of the Alloc op's outputs
+     are not affected by the shape arguments to the Alloc op.
-   It is only used to have more information printed by the memory
+  Failing to implement this function for an op that needs it can
-   profiler.  It makes it print the mega flops and giga flops per
+  result in two types of incorrect behavior:
-   second for each apply node. It takes as inputs two lists: one for the
-   inputs and one for the outputs. They contain tuples that are the
-   shapes of the corresponding inputs/outputs.
-.. function:: make_thunk(node, storage_map, compute_map, no_recycling)
+  1) gradient.grad erroneously raising a TypeError reporting that
+     a gradient is undefined.
+  2) gradient.grad failing to raise a ValueError reporting that
+     an input is disconnected.
-   TODO
+  Even if connection_pattern is not implemented correctly, if
+  gradient.grad returns an expression, that expression will be
+  numerically correct.
 .. function:: R_op(inputs, eval_points)
-   Optional.
+   Optional, to work with gradient.R_op().
   This function implements the application of the R-operator on the
   function represented by your op. Let assume that function is :math:`f`,
@@ -373,54 +495,6 @@ following methods:
   the outputs) back to their corresponding shapes and return them as the 
   output of the :func:`R_op` method.
-.. attribute:: default_output
-  *Default:* None
-  If this member variable is an integer, then the default
-  implementation of ``__call__`` will return
-  ``node.outputs[self.default_output]``, where ``node`` was returned
-  by ``make_node``.  Otherwise, the entire list of outputs will be
-  returned.
-.. function:: __call__(*inputs)
-  Syntactic shortcut to make_node which returns the output
-  Variables of the Op.
-  *Default:* this is implemented in the parent class and you do not need to change it.
-.. function:: __str__()
-   *Default:* python default: module_path_to_your_class.CLASSNAME
-   This allows you to specify a more informative string representation of your
-   Op. If an Op has parameters, it is highly recommended to have the
-   ``__str__`` method include the name of the op and the Op's parameters'
-   values.
-.. function:: do_constant_folding(node)
-   *Default:* Return True
-   By default when optimizations are enabled, we remove during
-   function compilation Apply nodes whose inputs are all constants.
-   We replace the Apply node with a Theano constant variable.
-   This way, the Apply node is not executed at each function
-   call. If you want to force the execution of an op during the
-   function call, make do_constant_folding return False.
-   As done in the Alloc op, you can return False only in some cases by
-   analyzing the graph from the node parameter.
-At a bare minimum, a new Op must define ``make_node`` and ``perform``, which
-have no defaults.
-You can also provide a :ref:`C implementation <cop>` of
-``perform()``. For more details, refer to the documentation for
-:ref:`op`.
 Defining an Op: ``mul``
 =======================