Merge pull request #2069 from abergeron/doc

Doc

Merge pull request #2069 from abergeron/doc
48c63a85 · Frédéric Bastien · cfc493d1 · 75fa53f8 · 48c63a85 · 48c63a85
--- a/doc/extending/op.txt
+++ b/doc/extending/op.txt
@@ -6,28 +6,26 @@ Making arithmetic Ops on double
 Now that we have a ``double`` type, we have yet to use it to perform
 computations. We'll start by defining multiplication.
 .. _op_contract:
 Op's contract
 =============
-An Op (:class:`gof.Op`) is any object which defines the
+An Op is any object which inherits from :class:`gof.Op`.  It has to
-following methods:
+define the following methods.
 .. function:: make_node(*inputs)
  This method is responsible for creating output Variables of a
-  suitable symbolic Type to serve as the outputs of this Op's application.
+  suitable symbolic Type to serve as the outputs of this Op's
-  The Variables found in ``*inputs`` must be operated on using Theano's
+  application.  The Variables found in ``*inputs`` must be operated on
-  symbolic language to compute the symbolic output Variables. This method
+  using Theano's symbolic language to compute the symbolic output
-  should put these outputs into an Apply instance, and return the
+  Variables. This method should put these outputs into an Apply
-  Apply instance.
+  instance, and return the Apply instance.
  This method creates an Apply node representing the application of
-  the Op on the inputs provided. If the Op cannot be applied to
+  the Op on the inputs provided. If the Op cannot be applied to these
-  these inputs, it must raise an appropriate exception.
+  inputs, it must raise an appropriate exception.
  The inputs of the Apply instance returned by this call must be
  ordered correctly: a subsequent ``self.make_node(*apply.inputs)``
@@ -35,10 +33,12 @@ following methods:
 .. function:: perform(node, inputs, output_storage)
-  This method computes the function associated to this Op. ``node`` is an Apply node created by the Op's ``make_node``
+  This method computes the function associated to this Op. ``node`` is
-  method. ``inputs`` is a list of references to data to operate on using non-symbolic statements, 
+  an Apply node created by the Op's ``make_node`` method. ``inputs``
-  (i.e., statements in Python, Numpy and C languages). ``output_storage`` is a list of storage cells where the
+  is a list of references to data to operate on using non-symbolic
-  variables of the computation must be put.
+  statements, (i.e., statements in Python, Numpy). ``output_storage``
+  is a list of storage cells where the variables of the computation
+  must be put.
  More specifically:
@@ -52,20 +52,20 @@ following methods:
    - ``output_storage``: This is a list of storage cells where the output is to be stored.
      A storage cell is a one-element list. It is forbidden to change
-      the length of the list(s) contained in ``output_storage``.  There is
+      the length of the list(s) contained in ``output_storage``.
-      one storage cell for each output of the Op.
+      There is one storage cell for each output of the Op.
      The data put in ``output_storage`` must match the type of the
      symbolic output. This is a situation where the ``node`` argument
      can come in handy.
-      A function Mode may allow ``output_storage`` elements to persist between
+      A function Mode may allow ``output_storage`` elements to persist
-      evaluations, or it may reset ``output_storage`` cells to hold a value of
+      between evaluations, or it may reset ``output_storage`` cells to
-      ``None``.  It can also pre-allocate some memory for the Op to use.
+      hold a value of ``None``.  It can also pre-allocate some memory
-      This feature can allow ``perform`` to reuse memory between
+      for the Op to use.  This feature can allow ``perform`` to reuse
-      calls, for example. If there is something  preallocated in the
+      memory between calls, for example. If there is something
-      ``output_storage``, it will be of the good dtype, but can have
+      preallocated in the ``output_storage``, it will be of the good
-      the wrong shape and have any stride pattern.
+      dtype, but can have the wrong shape and have any stride pattern.
  This method must be determined by the inputs. That is to say, if
  it is evaluated once on inputs A and returned B, then if ever
@@ -77,6 +77,10 @@ following methods:
  operations <views_and_inplace>` before writing a ``perform``
  implementation that does either of these things.
+Instead (or in addition to) ``perform()`` You can also provide a
+:ref:`C implementation <cop>` of For more details, refer to the
+documentation for :ref:`op`.
 .. function:: __eq__(other)
  ``other`` is also an Op.
@@ -89,6 +93,10 @@ following methods:
  inputs (same view_map). For more details, see
  :ref:`views_and_inplace`.
+   .. note::
+     If you set `__props__`, this will be automatically generated.
 .. function:: __hash__()
  If two Op instances compare equal, then they **must** return the
@@ -98,179 +106,286 @@ following methods:
  lifetime of self.  Op instances should be immutable in this
  sense.
-.. function:: connection_pattern( node ):
+   .. note::
-  Optional method; sometimes needed for gradient.grad to
+     If you set `__props__`, this will be automatically generated.
-  work correctly.
-  Returns a list of list of bools.
+.. op_optional:
-  Op.connection_pattern[input_idx][output_idx] is true if the
+Optional methods or attributes
-  elements of inputs[input_idx] have an effect on the elements of
+==============================
-  outputs[output_idx].
-  The ``node`` parameter is needed to determine the number of
+.. attribute:: __props__
-  inputs. Some ops such as Subtensor take a variable number of
-  inputs.
-  If no connection_pattern is specified, gradient.grad will
+  *Default:* Undefined
-  assume that all inputs have some elements connected to some
-  elements of all outputs.
-  This method conveys two pieces of information that are otherwise
+  Must be a tuple.  Lists the name of the attributes which influence
-  not part of the theano graph:
+  the computation performed.  This will also enable the automatic
+  generation of appropriate __eq__, __hash__ and __str__ methods.
+  Should be set to `()` if you have no attributes that are relevant to
+  the computation to generate the methods.
-  1) Which of the op's inputs are truly ancestors of each of the
+  .. versionadded:: 0.7
-     op's outputs. Suppose an op has two inputs, x and y, and
-     outputs f(x) and g(y). y is not really an ancestor of f, but
-     it appears to be so in the theano graph.
-  2) Whether the actual elements of each input/output are relevant
-     to a computation.
-     For example, the shape op does not read its input's elements,
-     only its shape metadata. d shape(x) / dx should thus raise
-     a disconnected input exception (if these exceptions are
-     enabled).
-     As another example, the elements of the Alloc op's outputs
-     are not affected by the shape arguments to the Alloc op.
-  Failing to implement this function for an op that needs it can
+.. attribute:: default_output
-  result in two types of incorrect behavior:
-  1) gradient.grad erroneously raising a TypeError reporting that
+  *Default:* None
-     a gradient is undefined.
-  2) gradient.grad failing to raise a ValueError reporting that
+  If this member variable is an integer, then the default
-     an input is disconnected.
+  implementation of ``__call__`` will return
+  ``node.outputs[self.default_output]``, where ``node`` was returned
+  by ``make_node``.  Otherwise, the entire list of outputs will be
+  returned, unless it is of length 1, where the single element will be
+  returned by itself.
+.. function:: make_thunk(node, storage_map, compute_map, no_recycling)
+   This function must return a thunk, that is a zero-arguments
+   function that encapsulates the computation to be performed by this
+   op on the arguments of the node.
+   :param node: Apply instance
+     The node for which a thunk is requested.
+   :param storage_map: dict of lists
+     This maps variables to a one-element lists holding the variable's
+     current value. The one-element list acts as pointer to the value
+     and allows sharing that "pointer" with other nodes and instances.
+   :param compute_map: dict of lists
+     This maps variables to one-element lists holding booleans.  If
+     the value is 0 then the variable has not been computed and the
+     value should not be considered valid.  If the value is 1 the
+     variable has been computed and the value is valid.  If the value
+     is 2 the variable has been garbage-collected and is no longer
+     valid, but shouldn't be required anymore for this call.
+   :param no_recycling: WRITEME
+     WRITEME
+   The returned function must ensure that is sets the computed
+   variables as computed in the `compute_map`.
+   Defining this function removes the requirement for :meth:`perform`
+   or C code, as you will define the thunk for the computation
+   yourself.
+.. function:: __call__(*inputs, **kwargs)
+   By default this is a convenience function which calls
+   :meth:`make_node` with the supplied arguments and returns the
+   result indexed by `default_output`.  This can be overridden by
+   subclasses to do anything else, but must return either a theano
+   Variable or a list of Variables.
+   If you feel the need to override `__call__` to change the graph
+   based on the arguments, you should instead create a function that
+   will use your Op and build the graphs that you want and call that
+   instead of the Op instance directly.
+.. function:: infer_shape(node, shapes)
+   This function is needed for shape optimization. ``shapes`` is a
+   list with one tuple for each input of the Apply node (which corresponds
+   to the inputs of the op).  Each tuple contains as many elements as the
+   number of dimensions of the corresponding input. The value of each element
+   is the shape (number of items) along the corresponding dimension of that
+   specific input.
+   While this might sound complicated, it is nothing more than the shape
+   of each input as symbolic variables (one per dimension).
+   The function should return a list with one tuple for each output.
+   Each tuple should contain the corresponding output's computed shape.
+   Implementing this method will allow Theano to compute the output's
+   shape without computing the output itself, potentially sparing you
+   a costly recomputation.
+.. function:: flops(inputs, outputs)
+   It is only used to have more information printed by the memory
+   profiler.  It makes it print the mega flops and giga flops per
+   second for each apply node. It takes as inputs two lists: one for the
+   inputs and one for the outputs. They contain tuples that are the
+   shapes of the corresponding inputs/outputs.
+.. function:: __str__()
+   This allows you to specify a more informative string representation of your
+   Op. If an Op has parameters, it is highly recommended to have the
+   ``__str__`` method include the name of the op and the Op's parameters'
+   values.
+   .. note::
+     If you set `__props__`, this will be automatically generated.
+     You can still overide it for custom output.
+.. function:: do_constant_folding(node)
+   *Default:* Return True
+   By default when optimizations are enabled, we remove during
+   function compilation Apply nodes whose inputs are all constants.
+   We replace the Apply node with a Theano constant variable.
+   This way, the Apply node is not executed at each function
+   call. If you want to force the execution of an op during the
+   function call, make do_constant_folding return False.
+   As done in the Alloc op, you can return False only in some cases by
+   analyzing the graph from the node parameter.
+If you want your op to work with gradient.grad() you also need to
+implement the functions described below.
-  Even if connection_pattern is not implemented correctly,
+Gradient
-  if gradient.grad returns an expression, that expression will
+========
-  be numerically correct.
+These are the function required to work with gradient.grad().
 .. function:: grad(inputs, output_gradients)
-  Optional (but needed to have it work with gradient.grad()).
+  If the Op being defined is differentiable, its gradient may be
+  specified symbolically in this method. Both ``inputs`` and
-  If the Op being defined is differentiable, its gradient may be specified 
+  ``output_gradients`` are lists of symbolic Theano Variables and
-  symbolically in this method. Both ``inputs`` and ``output_gradients``
+  those must be operated on using Theano's symbolic language. The grad
-  are lists of symbolic Theano Variables and those must be operated on using 
+  method must return a list containing one Variable for each
-  Theano's symbolic language. The grad method must return a list containing 
+  input. Each returned Variable represents the gradient with respect
-  one Variable for each input. Each returned Variable represents 
+  to that input computed based on the symbolic gradients with respect
-  the gradient with respect to that input computed based on the symbolic gradients with
+  to each output.
-  respect to each output.
+  If the output is not differentiable with respect to an input then
-  If the output is not differentiable with respect to an input
+  this method should be defined to return a variable of type NullType
-  then this method should be defined to return a variable of type
+  for that input. Likewise, if you have not implemented the grad
-  NullType for that input. Likewise, if you have not implemented the
+  computation for some input, you may return a variable of type
-  grad computation for some input, you may return a variable of type
+  NullType for that input. theano.gradient contains convenience
-  NullType for that input. theano.gradient contains convenience methods
+  methods that can construct the variable for you:
-  that can construct the variable for you: :func:`theano.gradient.grad_undefined` and
+  :func:`theano.gradient.grad_undefined` and
  :func:`theano.gradient.grad_not_implemented`, respectively.
-  If an element of output_gradient is of type theano.gradient.DisconnectedType,
+  If an element of output_gradient is of type
-  it means that the cost is not a function of this output. If any of the
+  theano.gradient.DisconnectedType, it means that the cost is not a
-  op's inputs participate in the computation of only disconnected outputs,
+  function of this output. If any of the op's inputs participate in
-  then Op.grad should return DisconnectedType variables for those inputs.
+  the computation of only disconnected outputs, then Op.grad should
+  return DisconnectedType variables for those inputs.
  If the grad method is not defined, then Theano assumes it has been
  forgotten.  Symbolic differentiation will fail on a graph that
  includes this Op.
-  It must be understood that the Op's grad method is not meant to return the
+  It must be understood that the Op's grad method is not meant to
-  gradient of the Op's output. theano.tensor.grad computes gradients; Op.grad
+  return the gradient of the Op's output. theano.tensor.grad computes
-  is a helper function that computes terms that appear in gradients.
+  gradients; Op.grad is a helper function that computes terms that
+  appear in gradients.
-  If an Op has a single vector-valued output y and a single vector-valued input x,
-  then the grad method will be passed x and a second vector z. Define J to be
+  If an Op has a single vector-valued output y and a single
-  the Jacobian of y with respect to x. The Op's grad method should return
+  vector-valued input x, then the grad method will be passed x and a
-  dot(J.T,z). When theano.tensor.grad calls the grad method, it will set z to
+  second vector z. Define J to be the Jacobian of y with respect to
-  be the gradient of the cost C with respect to y. If this op is the only op
+  x. The Op's grad method should return dot(J.T,z). When
-  that acts on x, then dot(J.T,z) is the gradient of C with respect to x.
+  theano.tensor.grad calls the grad method, it will set z to be the
-  If there are other ops that act on x, theano.tensor.grad will have to add up
+  gradient of the cost C with respect to y. If this op is the only op
-  the terms of x's gradient contributed by the other op's grad method.
+  that acts on x, then dot(J.T,z) is the gradient of C with respect to
+  x.  If there are other ops that act on x, theano.tensor.grad will
-  In practice, an op's input and output are rarely implemented as single vectors.
+  have to add up the terms of x's gradient contributed by the other
-  Even if an op's output consists of a list containing a scalar, a sparse matrix,
+  op's grad method.
-  and a 4D tensor, you can think of these objects as being formed by rearranging
-  a vector. Likewise for the input. In this view, the values computed by the grad
+  In practice, an op's input and output are rarely implemented as
-  method still represent a Jacobian-vector product.
+  single vectors.  Even if an op's output consists of a list
+  containing a scalar, a sparse matrix, and a 4D tensor, you can think
-  In practice, it is probably not a good idea to explicitly construct the Jacobian,
+  of these objects as being formed by rearranging a vector. Likewise
-  which might be very large and very sparse. However, the returned value should
+  for the input. In this view, the values computed by the grad method
-  be equal to the Jacobian-vector product.
+  still represent a Jacobian-vector product.
-  So long as you implement this product correctly, you need not understand what
+  In practice, it is probably not a good idea to explicitly construct
-  theano.tensor.grad is doing, but for the curious the mathematical justification
+  the Jacobian, which might be very large and very sparse. However,
-  is as follows:
+  the returned value should be equal to the Jacobian-vector product.
-  In essence, the grad method must simply implement through symbolic Variables
+  So long as you implement this product correctly, you need not
-  and operations the chain rule of differential calculus. The chain rule
+  understand what theano.tensor.grad is doing, but for the curious the
-  is the mathematical procedure that allows one to calculate the total derivative
+  mathematical justification is as follows:
-  :math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
-  primitive symbolic Variable x found in the list ``inputs``.
+  In essence, the grad method must simply implement through symbolic
-  The grad method does this using ``output_gradients`` which provides the total
+  Variables and operations the chain rule of differential
-  derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic Variable
+  calculus. The chain rule is the mathematical procedure that allows
-  that is returned by the Op (this is provided
+  one to calculate the total derivative :math:`\frac{d C}{d x}` of the
-  in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the
+  final scalar symbolic Variable C with respect to a primitive
-  latter with respect to the primitive Variable (this has to be computed).
+  symbolic Variable x found in the list ``inputs``.  The grad method
+  does this using ``output_gradients`` which provides the total
-  In mathematics, the total derivative of a scalar variable (C) with respect to a vector of
+  derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic
-  scalar variables (x), i.e. the gradient, is customarily represented as the
+  Variable that is returned by the Op (this is provided in
-  row vector of the partial derivatives, whereas the total derivative of a vector of
+  ``output_gradients``), as well as the knowledge of the total
-  scalar variables (f) with respect to another (x), is customarily represented by the matrix of
+  derivative :math:`\frac{d f}{d x}` of the latter with respect to the
-  the partial derivatives, i.e.the jacobian matrix. In this convenient setting,
+  primitive Variable (this has to be computed).
-  the chain rule instructs that the gradient of the final scalar variable C with respect
-  to the primitive scalar variables in x through those in f is simply given by the matrix product: 
+  In mathematics, the total derivative of a scalar variable (C) with
-  :math:`\frac{d C}{d x} = \frac{d C}{d f} * \frac{d f}{d x}`.
+  respect to a vector of scalar variables (x), i.e. the gradient, is
+  customarily represented as the row vector of the partial
-  Here, the chain rule must be implemented in a similar but slightly more complex
+  derivatives, whereas the total derivative of a vector of scalar
-  setting: Theano provides in the list ``output_gradients`` one gradient for each
+  variables (f) with respect to another (x), is customarily
-  of the Variables returned by the Op. Where f is one such particular Variable,
+  represented by the matrix of the partial derivatives, i.e.the
-  the corresponding gradient found in ``output_gradients`` and representing
+  jacobian matrix. In this convenient setting, the chain rule
-  :math:`\frac{d C}{d f}` is provided with a shape similar to f and thus not
+  instructs that the gradient of the final scalar variable C with
-  necessarily as a row vector of scalars.  Furthermore, for each Variable x of 
+  respect to the primitive scalar variables in x through those in f is
-  the Op's list of input variables ``inputs``, the returned gradient representing
+  simply given by the matrix product: :math:`\frac{d C}{d x} = \frac{d
-  :math:`\frac{d C}{d x}` must have a shape similar to that of Variable x.
+  C}{d f} * \frac{d f}{d x}`.
-  If the output list of the op is :math:`[f_1, ... f_n]`, then the list 
+  Here, the chain rule must be implemented in a similar but slightly
-  ``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C), ... , grad_{f_n}(C)]`.
+  more complex setting: Theano provides in the list
-  If ``inputs`` consists of the list :math:`[x_1, ..., x_m]`, then Op.grad
+  ``output_gradients`` one gradient for each of the Variables returned
-  should return the list :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`,
+  by the Op. Where f is one such particular Variable, the
-  where :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and :math:`i` can stand for multiple dimensions).
+  corresponding gradient found in ``output_gradients`` and
+  representing :math:`\frac{d C}{d f}` is provided with a shape
-  In other words, :func:`grad` does not return
+  similar to f and thus not necessarily as a row vector of scalars.
-  :math:`\frac{d f_i}{d x_j}`, but instead the appropriate dot product specified by the chain rule:  
+  Furthermore, for each Variable x of the Op's list of input variables
-  :math:`\frac{d C}{d x_j} =
+  ``inputs``, the returned gradient representing :math:`\frac{d C}{d
-  \frac{d C}{d f_i} \cdot \frac{d f_i}{d x_j}`.
+  x}` must have a shape similar to that of Variable x.
-  Both the partial differentiation and the multiplication have to be performed by
-  :func:`grad`.
+  If the output list of the op is :math:`[f_1, ... f_n]`, then the
+  list ``output_gradients`` is :math:`[grad_{f_1}(C), grad_{f_2}(C),
+  ... , grad_{f_n}(C)]`.  If ``inputs`` consists of the list
-  Theano currently imposes the following constraints on the values returned by the grad method:
+  :math:`[x_1, ..., x_m]`, then Op.grad should return the list
+  :math:`[grad_{x_1}(C), grad_{x_2}(C), ..., grad_{x_m}(C)]`, where
+  :math:`(grad_{y}(Z))_i = \frac{\partial Z}{\partial y_i}` (and
+  :math:`i` can stand for multiple dimensions).
+  In other words, :func:`grad` does not return :math:`\frac{d f_i}{d
+  x_j}`, but instead the appropriate dot product specified by the
+  chain rule: :math:`\frac{d C}{d x_j} = \frac{d C}{d f_i} \cdot
+  \frac{d f_i}{d x_j}`.  Both the partial differentiation and the
+  multiplication have to be performed by :func:`grad`.
+  Theano currently imposes the following constraints on the values
+  returned by the grad method:
  1) They must be Variable instances.
  2) When they are types that have dtypes, they must never have an integer dtype.
  The output gradients passed *to* Op.grad will also obey these constraints.
-  Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
+  Integers are a tricky subject. Integers are the main reason for
-  NullType or zero gradient. When you have an integer as an argument to your grad method,
+  having DisconnectedType, NullType or zero gradient. When you have an
-  recall the definition of a derivative to help you decide what value to return:
+  integer as an argument to your grad method, recall the definition of
+  a derivative to help you decide what value to return:
  :math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
-  Suppose your function f has an integer-valued output. For most functions you're likely
+  Suppose your function f has an integer-valued output. For most
-  to implement in theano, this means your gradient should be zero, because f(x+epsilon)
+  functions you're likely to implement in theano, this means your
-  = f(x) for almost all x. (The only other option is that the gradient could be undefined,
+  gradient should be zero, because f(x+epsilon) = f(x) for almost all
-  if your function is discontinuous everywhere, like the rational indicator function)
+  x. (The only other option is that the gradient could be undefined,
+  if your function is discontinuous everywhere, like the rational
-  Suppose your function f has an integer-valued input. This is a little trickier, because
+  indicator function)
-  you need to think about what you mean mathematically when you make a variable integer-valued
-  in theano. Most of the time in machine learning we mean "f is a function of a real-valued
+  Suppose your function f has an integer-valued input. This is a
-  x, but we are only going to pass in integer-values of x". In this case, f(x+epsilon) exists,
+  little trickier, because you need to think about what you mean
-  so the gradient through f should be the same whether x is an integer or a floating point
+  mathematically when you make a variable integer-valued in
-  variable. Sometimes what we mean is "f is a function of an integer-valued x, and f is only
+  theano. Most of the time in machine learning we mean "f is a
-  defined where x is an integer." Since f(x+epsilon) doesn't exist, the gradient is undefined.
+  function of a real-valued x, but we are only going to pass in
-  Finally, many times in theano, integer valued inputs don't actually affect the elements of
+  integer-values of x". In this case, f(x+epsilon) exists, so the
-  the output, only its shape.
+  gradient through f should be the same whether x is an integer or a
+  floating point variable. Sometimes what we mean is "f is a function
+  of an integer-valued x, and f is only defined where x is an
+  integer." Since f(x+epsilon) doesn't exist, the gradient is
+  undefined.  Finally, many times in theano, integer valued inputs
+  don't actually affect the elements of the output, only its shape.
  If your function f has both an integer-valued input and an
  integer-valued output, then both rules have to be combined:
@@ -290,63 +405,75 @@ following methods:
        Its gradient is zero almost everywhere, so Op.grad should return
        zeros in the shape of x and y.
  2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
-        In this case the output is floating point. It doesn't matter that y is an integer.
+        In this case the output is floating point. It doesn't matter
-        We consider f to still be defined at f(x,y+epsilon). The gradient is exactly the
+        that y is an integer.  We consider f to still be defined at
-        same as if y were floating point.
+        f(x,y+epsilon). The gradient is exactly the same as if y were
+        floating point.
  3) f(x,y) = argmax of x along axis y.
-        The gradient with respect to y is undefined, because f(x,y) is not defined for
+        The gradient with respect to y is undefined, because f(x,y) is
-        floating point y. How could you take an argmax along a fraActional axis?
+        not defined for floating point y. How could you take an argmax
-        The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
+        along a fraActional axis?  The gradient with respect to x is
-        everywhere.
+        0, because f(x+epsilon, y) = f(x) almost everywhere.
  4) f(x,y) = a vector with y elements, each of which taking on the value x
-        The grad method should return DisconnectedType()() for y, because the elements of
+        The grad method should return DisconnectedType()() for y,
-        f don't depend on y. Only the shape of f depends on y. You probably also want to
+        because the elements of f don't depend on y. Only the shape of
-        implement a connection_pattern method to encode this.
+        f depends on y. You probably also want to implement a
+        connection_pattern method to encode this.
  5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
        If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
        gradient with respect to y will be 0.5, even if y is an
        integer. However, the gradient with respect to x will be 0,
        because the output of f is integer-valued.
+.. function:: connection_pattern(node):
-.. function:: infer_shape(node, shapes)
+  Sometimes needed for proper operation of gradient.grad().
-   Optional.
-   This function is needed for shape optimization. ``shapes`` is a
+  Returns a list of list of bools.
-   list with one tuple for each input of the Apply node (which corresponds
-   to the inputs of the op).  Each tuple contains as many elements as the
-   number of dimensions of the corresponding input. The value of each element
-   is the shape (number of items) along the corresponding dimension of that
-   specific input.
-   While this might sound complicated, it is nothing more than the shape
+  Op.connection_pattern[input_idx][output_idx] is true if the
-   of each input as symbolic variables (one per dimension).
+  elements of inputs[input_idx] have an effect on the elements of
+  outputs[output_idx].
-   The function should return a list with one tuple for each output.
+  The ``node`` parameter is needed to determine the number of
-   Each tuple should contain the corresponding output's computed shape.
+  inputs. Some ops such as Subtensor take a variable number of
+  inputs.
-   Implementing this method will allow Theano to compute the output's
+  If no connection_pattern is specified, gradient.grad will
-   shape without computing the output itself, potentially sparing you
+  assume that all inputs have some elements connected to some
-   a costly recomputation.
+  elements of all outputs.
-.. function:: flops(inputs, outputs)
+  This method conveys two pieces of information that are otherwise
+  not part of the theano graph:
-   Optional.
+  1) Which of the op's inputs are truly ancestors of each of the
+     op's outputs. Suppose an op has two inputs, x and y, and
+     outputs f(x) and g(y). y is not really an ancestor of f, but
+     it appears to be so in the theano graph.
+  2) Whether the actual elements of each input/output are relevant
+     to a computation.
+     For example, the shape op does not read its input's elements,
+     only its shape metadata. d shape(x) / dx should thus raise
+     a disconnected input exception (if these exceptions are
+     enabled).
+     As another example, the elements of the Alloc op's outputs
+     are not affected by the shape arguments to the Alloc op.
-   It is only used to have more information printed by the memory
+  Failing to implement this function for an op that needs it can
-   profiler.  It makes it print the mega flops and giga flops per
+  result in two types of incorrect behavior:
-   second for each apply node. It takes as inputs two lists: one for the
-   inputs and one for the outputs. They contain tuples that are the
-   shapes of the corresponding inputs/outputs.
-.. function:: make_thunk(node, storage_map, compute_map, no_recycling)
+  1) gradient.grad erroneously raising a TypeError reporting that
+     a gradient is undefined.
+  2) gradient.grad failing to raise a ValueError reporting that
+     an input is disconnected.
-   TODO
+  Even if connection_pattern is not implemented correctly, if
+  gradient.grad returns an expression, that expression will be
+  numerically correct.
 .. function:: R_op(inputs, eval_points)
-   Optional.
+   Optional, to work with gradient.R_op().
   This function implements the application of the R-operator on the
   function represented by your op. Let assume that function is :math:`f`,
@@ -373,54 +500,6 @@ following methods:
   the outputs) back to their corresponding shapes and return them as the 
   output of the :func:`R_op` method.
-.. attribute:: default_output
-  *Default:* None
-  If this member variable is an integer, then the default
-  implementation of ``__call__`` will return
-  ``node.outputs[self.default_output]``, where ``node`` was returned
-  by ``make_node``.  Otherwise, the entire list of outputs will be
-  returned.
-.. function:: __call__(*inputs)
-  Syntactic shortcut to make_node which returns the output
-  Variables of the Op.
-  *Default:* this is implemented in the parent class and you do not need to change it.
-.. function:: __str__()
-   *Default:* python default: module_path_to_your_class.CLASSNAME
-   This allows you to specify a more informative string representation of your
-   Op. If an Op has parameters, it is highly recommended to have the
-   ``__str__`` method include the name of the op and the Op's parameters'
-   values.
-.. function:: do_constant_folding(node)
-   *Default:* Return True
-   By default when optimizations are enabled, we remove during
-   function compilation Apply nodes whose inputs are all constants.
-   We replace the Apply node with a Theano constant variable.
-   This way, the Apply node is not executed at each function
-   call. If you want to force the execution of an op during the
-   function call, make do_constant_folding return False.
-   As done in the Alloc op, you can return False only in some cases by
-   analyzing the graph from the node parameter.
-At a bare minimum, a new Op must define ``make_node`` and ``perform``, which
-have no defaults.
-You can also provide a :ref:`C implementation <cop>` of
-``perform()``. For more details, refer to the documentation for
-:ref:`op`.
 Defining an Op: ``mul``
 =======================
@@ -442,12 +521,9 @@ First, we'll instantiate a ``mul`` Op:
 This function must take as many arguments as the operation we are
 defining is supposed to take as inputs---in this example that would be
-two.
+two.  This function ensures that both inputs have the ``double`` type.
-This function ensures that both inputs have the ``double``
+Since multiplying two doubles yields a double, this function makes an
-type.
+Apply node with an output Variable of type ``double``.
-Since multiplying two doubles yields a double,
-this function makes an Apply node with an output Variable of type
-``double``.
 .. If you modify this code, also change :
 .. theano/tests/test_tutorial.py:T_extending.test_extending_1

--- a/doc/index.txt
+++ b/doc/index.txt
@@ -67,16 +67,17 @@ installation and configuration, see :ref:`installing Theano <install>`.
  Status
  ======
-  .. image:: https://secure.travis-ci.org/Theano/Theano.png?branch=master
+  .. raw:: html
-      :target: http://travis-ci.org/Theano/Theano/builds
-  .. image:: https://pypip.in/v/Theano/badge.png
+     <a href="http://travis-ci.org/Theano/Theano/builds"><img src="https://secure.travis-ci.org/Theano/Theano.png?branch=master" /></a>&nbsp;
-      :target: https://crate.io/packages/Theano/
-      :alt: Latest PyPI version
-  .. image:: https://pypip.in/d/Theano/badge.png
+  .. raw:: html
-      :target: https://crate.io/packages/Theano/
-      :alt: Number of PyPI downloads
+     <a href="https://crate.io/packages/Theano/"><img src="https://pypip.in/v/Theano/badge.png" alt="Latest PyPI version" /></a>&nbsp;
+  .. raw:: html
+     <a href="https://crate.io/packages/Theano/"><img src="https://pypip.in/d/Theano/badge.png" alt="Number of PyPI downloads" /></a>&nbsp;
  .. _available on PyPI: http://pypi.python.org/pypi/Theano
  .. _Related Projects: https://github.com/Theano/Theano/wiki/Related-projects

--- a/doc/library/sandbox/linalg.txt
+++ b/doc/library/sandbox/linalg.txt
 ..  ../../../../theano/sandbox/linalg/ops.py
 ..  ../../../../theano/sandbox/linalg
-.. _libdoc_linalg:
+.. _libdoc_sandbox_linalg:
 ===================================================================
 :mod:`sandbox.linalg` --  Linear Algebra Ops

--- a/doc/library/tensor/nnet/conv.txt
+++ b/doc/library/tensor/nnet/conv.txt
@@ -32,18 +32,20 @@ TODO: Give examples on how to use these things! They are pretty complicated.
      Most of the more efficient GPU implementations listed below can be used
      as an automatic replacement for nnet.conv2d by enabling specific graph
      optimizations.
-    - :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>`
+    - :func:`conv2d_fft <theano.sandbox.cuda.fftconv.conv2d_fft>` This
-      This is a GPU-only version of nnet.conv2d that uses an FFT transform
+      is a GPU-only version of nnet.conv2d that uses an FFT transform
-      to perform the work. conv2d_fft should not be called directly as it
+      to perform the work. conv2d_fft should not be used directly as
-      does not provide a gradient. Instead, use nnet.conv2d and allow
+      it does not provide a gradient. Instead, use nnet.conv2d and
-      Theano's graph optimizer to replace it by the FFT version by setting
+      allow Theano's graph optimizer to replace it by the FFT version
-      ``THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full``
+      by setting
+      'THEANO_FLAGS=optimizer_including=conv_fft_valid:conv_fft_full'
      in your environement.  This is not enabled by default because it
-      has some restrictions on input and uses a lot more memory.  Also note
+      has some restrictions on input and uses a lot more memory.  Also
-      that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and PyCUDA to run.
+      note that it requires CUDA >= 5.0, scikits.cuda >= 0.5.0 and
-      To deactivate the FFT optimization on a specific nnet.conv2d
+      PyCUDA to run.  To deactivate the FFT optimization on a specific
-      while the optimization flags are active, you can set its ``version``
+      nnet.conv2d while the optimization flags are active, you can set
-      parameter to ``'no_fft'``. To enable it for just one Theano function:
+      its ``version`` parameter to ``'no_fft'``. To enable it for just
+      one Theano function:
      .. code-block:: python

--- a/doc/library/tensor/slinalg.txt
+++ b/doc/library/tensor/slinalg.txt
 ..  ../../../../theano/sandbox/slinalg.py
-.. _libdoc_linalg:
+.. _libdoc_slinalg:
 ===================================================================
 :mod:`tensor.slinalg` --  Linear Algebra Ops Using Scipy

--- a/doc/tutorial/extending_theano.txt
+++ b/doc/tutorial/extending_theano.txt
@@ -42,25 +42,22 @@ Inputs and Outputs are lists of Theano variables.
   how to make a quality contribution.
-Op Contract
+Op Structure
-===========
+============
+This an overview of the methods you typically have to implement to
+make a new op.  It does not provide extensive coverage of all the
+possibilities you may encounter or need.  For that refer to
+:ref:`op_contract`.
 .. code-block:: python
    import theano
    class MyOp(theano.Op):
-        def make_node(self, *inputs):
+        __props__ = ()
-            pass
-        def __eq__(self, other):
+        def make_node(self, *inputs):
-            pass
-        def __hash__(self):
-            pass
-        def __str__(self):
            pass
        # Python implementation:
@@ -72,11 +69,13 @@ Op Contract
            # ...
            pass
-        # others implementation (pycuda, ...):
+        # Other implementations (pycuda, ...):
        def make_thunk(self, node, storage_map, _, _2):
            pass
        # optional:
+        check_input = True
        def __init__(self, ...):
            pass
@@ -89,43 +88,47 @@ Op Contract
        def infer_shape(node, (i0_shapes, ...)):
            pass
-        def flops(self, inputs, outputs):
-            pass
-        check_input = True
 .. ../extending/op.txt
-There are two mandatory methods that one needs to implement.
+There are two mandatory methods that one needs to implement.  The
-The first one is :func:`make_node`. The second one 
+first one is :func:`make_node`. The second one would describe the
-would describe the computations that are required to be done
+computations that are required to be done at run time. Currently there
-at run time. Currently there are 2 different possibilites:
+are 2 different possibilites: implement the :func:`perform` and/or
-implement the :func:`perform`
+:func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
-and/or :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
+<cop>`), or the :func:`make_thunk` method. ``perform`` allows to
-<cop>`), or the :func:`make_thunk` method. ``perform`` allows
+easily wrap an existing Python function into Theano. ``c_code`` and
-to easily wrap an existing Python function into Theano. ``c_code``
+the related methods allow the op to generate C code that will be
-and the related methods allow the op to generate C code that will be 
+compiled and linked by Theano. On the other hand, ``make_thunk`` will
-compiled and linked by Theano. On the other hand, ``make_thunk``
+be called only once during compilation and should generate a
-will be called only once during compilation and should generate
+``thunk``: a standalone function that when called will do the wanted
-a ``thunk``: a standalone function that when called will do the wanted computations.
+computations.  This is useful if you want to generate code and compile
-This is useful if you want to generate code and compile it yourself. For
+it yourself. For example, this allows you to use PyCUDA to compile GPU
-example, this allows you to use PyCUDA to compile GPU code.
+code.
-Also there are two methods whose implementations are highly recommended. They are
+The :attr:`__props__` attribute serves to make Op generate an
-needed in order to merge duplicate computations involving your op. So if you
+appropriate :func:`__eq__` and :func:`__hash__` for your Op.  It must
-do not want Theano to execute your op multiple times with the same inputs,
+be a tuple that lists the properties that influence how the
-do implement them. Those methods are :func:`__eq__` and
+computation is performed (Ususally these are those that you set in
-:func:`__hash__`.
+:func:`__init__`). If you don't have any properties, then you should
+set this attribute to the emtpy tuple `()`. It will also generate a
+suitable :func:`__str__` for your op. This requires development
+version after September 1st, 2014 or version 0.7.
+:func:`__eq__` and :func:`__hash__` will be used by the optimization
+phase to merge nodes that are doing a equivalent compuation (same
+inputs, same operation).  It is especially important that two Ops that
+compare equal (have the same values for all the properties listed in
+__props__ and the same type) compute the same thing when presented
+with the same inputs.
+Also note that this attribute will also generate a suitable
+:func:`__str__` method for your Op.  You may override this default
+with a custom one if you want another format for the output.
 The :func:`infer_shape` method allows to infer the shape of some variable, somewhere in the
 middle of the computational graph without actually computing the outputs (when possible).
 This could be helpful if one only needs the shape of the output instead of the actual outputs.
-The :func:`flops` method allows to have the number of mega flops and
-giga flops per second printed by the memory profiler. It takes as
-inputs two lists: one for the inputs and one for the outputs. They
-contain tuples that are the shapes of the corresponding inputs/outputs.
 The :func:`grad` method is required if you want to differentiate some cost whose expression
 includes your op.
@@ -135,8 +138,9 @@ string representation of your op.
 The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
 work with your op.
-The optional boolean :func:'check_input' attribute is used to specify if you want the types used in 
+The optional boolean :attr:`check_input` attribute is used to specify
-your op to check their inputs in their c_code. It can be used to speed up compilation, reduce overhead 
+if you want the types used in your op to check their inputs in their
+c_code. It can be used to speed up compilation, reduce overhead
 (particularly for scalars) and reduce the number of generated C files.
 Op Example
@@ -147,16 +151,11 @@ Op Example
    import theano
    class DoubleOp(theano.Op):
-        def __eq__(self, other):
+        __props__ = ()
-            return type(self) == type(other)
-        def __hash__(self):
-            return hash(type(self))
-        def __str__(self):
-            return self.__class__.__name__
        def make_node(self, x):
+            # check that the theano version has support for __props__
+            assert hasattr(self, '_props')
            x = theano.tensor.as_tensor_variable(x)
            return theano.Apply(self, [x], [x.type()])
@@ -327,24 +326,27 @@ For instance, to verify the Rop method of the DoubleOp, you can use this:
 Testing GPU Ops
 ---------------
-Ops to be executed on the GPU should inherit from the ``theano.sandbox.cuda.GpuOp`` 
+Ops to be executed on the GPU should inherit from the
-and not ``theano.Op``. This allows Theano to distinguish them. Currently, we
+``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows
-use this to test if the NVIDIA driver works correctly with our sum reduction code on the
+Theano to distinguish them. Currently, we use this to test if the
-GPU.
+NVIDIA driver works correctly with our sum reduction code on the GPU.
 Running Your Tests
 ==================
-To perform your tests, you may select either one of the three following methods:
+To perform your tests, you may select either one of the three
+following methods:
 theano-nose
 -----------
-The method of choice to conduct tests is to run the file ``theano-nose``. In a regular
+The method of choice to conduct tests is to run the file
-Theano installation, the latter will be on the operating system's path and directly accessible
+``theano-nose``. In a regular Theano installation, the latter will be
-from any folder. Otherwise, it can be accessed in the ``Theano/bin`` folder. The following command
+on the operating system's path and directly accessible from any
-lines may be used for the corresponding purposes:
+folder. Otherwise, it can be accessed in the ``Theano/bin``
+folder. The following command lines may be used for the corresponding
+purposes:
 * ``theano-nose --theano``: Run every test found in Theano's path.
@@ -352,23 +354,25 @@ lines may be used for the corresponding purposes:
 * ``theano-nose test_file.py``: Run every test found in the file *test_file.py*.
-The following are particularly useful for development purposes since they call for
+The following are particularly useful for development purposes since
-particular classes or even for particular tests: 
+they call for particular classes or even for particular tests:
 * ``theano-nose test_file.py:test_DoubleRop``: Run every test found inside the class *test_DoubleRop*.
 * ``theano-nose test_file.py:test_DoubleRop.test_double_op``: Run only the test *test_double_op*
  in the class *test_DoubleRop*.
-Help with the use and functionalities of ``theano-nose`` may be obtained by running
+Help with the use and functionalities of ``theano-nose`` may be
-it with the command line parameter ``--help (-h)``. 
+obtained by running it with the command line parameter ``--help
+(-h)``.
 nosetests
 ---------
-The command ``nosetests`` can also be used.  Although it lacks the useful 
+The command ``nosetests`` can also be used.  Although it lacks the
-functionalities that ``theano-nose`` provides, ``nosetests`` can be called similarly
+useful functionalities that ``theano-nose`` provides, ``nosetests``
-to ``theano-nose`` from any folder in Python's path like so:
+can be called similarly to ``theano-nose`` from any folder in Python's
+path like so:
 ``nosetests [suffix similar to the above]``.
@@ -378,9 +382,10 @@ More documentation on ``nosetests`` is available here:
 In-file
 -------
-One may also add a block of code similar to the following at the end of the
+One may also add a block of code similar to the following at the end
-file containing a specific test of interest and run the file. In this example, the test
+of the file containing a specific test of interest and run the
-*test_DoubleRop* in the class *test_double_op* would be performed.
+file. In this example, the test *test_DoubleRop* in the class
+*test_double_op* would be performed.
 .. code-block:: python
@@ -407,7 +412,8 @@ Modify and execute to compute: x * y.
 Modify and execute the example to return two outputs: x + y and x - y.
-You can omit the Rop functions. Try to implement the testing apparatus described above.
+You can omit the Rop functions. Try to implement the testing apparatus
+described above.
 (Notice that Theano's current *elemwise fusion* optimization is
 only applicable to computations involving a single output. Hence, to gain
@@ -453,6 +459,7 @@ signature:
    It converts the python function to a callable object that takes as
    inputs Theano variables that were declared.
 as_op Example
 -------------

--- a/theano/sandbox/cuda/blas.py
+++ b/theano/sandbox/cuda/blas.py
@@ -7,8 +7,8 @@ from theano import tensor
 from theano.compat.six import StringIO
 from theano.sandbox.cuda.type import CudaNdarrayType
 from theano.sandbox.cuda import GpuOp
-from theano.sandbox.cuda import as_cuda_ndarray_variable
+from theano.sandbox.cuda.basic_ops import (as_cuda_ndarray_variable,
-from theano.sandbox.cuda.basic_ops import gpu_contiguous
+                                           gpu_contiguous)
 class GpuDot22(GpuOp):

--- a/theano/sandbox/cuda/nnet.py
+++ b/theano/sandbox/cuda/nnet.py
 from theano import Op, Apply
 from theano.compat.six import StringIO
-from theano.sandbox.cuda import GpuOp, as_cuda_ndarray_variable
+from theano.sandbox.cuda import GpuOp
+from theano.sandbox.cuda.basic_ops import as_cuda_ndarray_variable
 from theano.sandbox.cuda.kernel_codegen import (nvcc_kernel,
                                                inline_softmax,

--- a/theano/sparse/basic.py
+++ b/theano/sparse/basic.py
@@ -1143,11 +1143,12 @@ class GetItem2Lists(gof.op.Op):
 get_item_2lists = GetItem2Lists()
 """Select elements of sparse matrix, returning them in a vector.
-:param x: Sparse matrix.
+  :param x: Sparse matrix.
-:param index: List of two lists, first list indicating the row 
-of each element and second list indicating its column.
-:return: The corresponding elements in `x`.
+  :param index: List of two lists, first list indicating the row of
+                each element and second list indicating its column.
+  :return: The corresponding elements in `x`.
 """
@@ -1737,13 +1738,14 @@ class Diag(gof.op.Op):
 diag = Diag()
 """Extract the diagonal of a square sparse matrix as a dense vector.
-:param x: A square sparse matrix in csc format.
+  :param x: A square sparse matrix in csc format.
-:return: A dense vector representing the diagonal elements.
+  :return: A dense vector representing the diagonal elements.
-:note: The grad implemented is regular, i.e. not structured, since
+.. note::
-the output is a dense vector.
+  The grad implemented is regular, i.e. not structured, since the
+  output is a dense vector.
 """

--- a/theano/tensor/extra_ops.py
+++ b/theano/tensor/extra_ops.py
@@ -863,18 +863,21 @@ class FillDiagonalOffset(gof.Op):
        return [wr_a, wr_val,wr_offset]
-fill_diagonal_offset = FillDiagonalOffset()
+fill_diagonal_offset_ = FillDiagonalOffset()
-""" Returns a copy of an array with all
+def fill_diagonal_offset(a, val, offset):
+    """
+    Returns a copy of an array with all
    elements of the main diagonal set to a specified scalar value.
      :param a: Rectangular array of two dimensions.
      :param val: Scalar value to fill the diagonal whose type must be
          compatible with that of array 'a' (i.e. 'val' cannot be viewed
          as an upcast of 'a').
-    :params offset : Scalar value Offset of the diagonal from the main 
+      :param offset: Scalar value Offset of the diagonal from the main
          diagonal. Can be positive or negative integer.
      :return: An array identical to 'a' except that its offset diagonal
          is filled with scalar 'val'. The output is unwrapped.
+    """
-"""
+    return fill_diagonal_offset_(a, val, offset)
--- a/theano/tensor/nlinalg.py
+++ b/theano/tensor/nlinalg.py
@@ -496,20 +496,35 @@ def qr(a, mode="full"):
    Factor the matrix a as qr, where q
    is orthonormal and r is upper-triangular.
-    Parameters :
+    :type a:
-    ------------
+        array_like, shape (M, N)
+    :param a:
-    a : array_like, shape (M, N)
        Matrix to be factored.
-    mode : {'reduced', 'complete', 'r', 'raw', 'full', 'economic'}, optional
+    :type mode:
+        one of 'reduced', 'complete', 'r', 'raw', 'full' and
+        'economic', optional
+    :keyword mode:
        If K = min(M, N), then
-        'reduced' : returns q, r with dimensions (M, K), (K, N) (default)
-        'complete' : returns q, r with dimensions (M, M), (M, N)
+        'reduced'
-        'r' : returns r only with dimensions (K, N)
+          returns q, r with dimensions (M, K), (K, N)
-        'raw' : returns h, tau with dimensions (N, M), (K,)
-        'full' : alias of 'reduced', deprecated
+        'complete'
-        'economic' : returns h from 'raw', deprecated. The options 'reduced',
+           returns q, r with dimensions (M, M), (M, N)
+        'r'
+          returns r only with dimensions (K, N)
+        'raw'
+          returns h, tau with dimensions (N, M), (K,)
+        'full'
+          alias of 'reduced', deprecated (default)
+        'economic'
+          returns h from 'raw', deprecated. The options 'reduced',
        'complete', and 'raw' are new in numpy 1.8, see the notes for more
        information. The default is 'reduced' and to maintain backward
        compatibility with earlier versions of numpy both it and the old
@@ -518,21 +533,25 @@ def qr(a, mode="full"):
        deprecated. The modes 'full' and 'economic' may be passed using only
        the first letter for backwards compatibility, but all others
        must be spelled out.
        Default mode is 'full' which is also default for numpy 1.6.1.
-        Note:   Default mode was left to full as full and reduced are both doing
+        :note: Default mode was left to full as full and reduced are
-                the same thing in the new numpy version but only full works on the old
+           both doing the same thing in the new numpy version but only
-                previous numpy version.
+           full works on the old previous numpy version.
-    Returns :
-    ---------
+    :rtype q:
-    q : matrix of float or complex, optional
+      matrix of float or complex, optional
-    A matrix with orthonormal columns. When mode = 'complete'
+    :return q:
-    the result is an orthogonal/unitary matrix depending on whether
+      A matrix with orthonormal columns. When mode = 'complete' the
-    or not a is real/complex. The determinant may be either +/- 1 in that case.
+      result is an orthogonal/unitary matrix depending on whether or
+      not a is real/complex. The determinant may be either +/- 1 in
-    r : matrix of float or complex, optional
+      that case.
+    :rtype r:
+      matrix of float or complex, optional
+    :return r:
      The upper-triangular matrix.
    """
    x = [[2, 1], [3, 4]]
    if isinstance(numpy.linalg.qr(x,mode), tuple):
@@ -549,8 +568,6 @@ class SVD(Op):
    def __init__(self, full_matrices=True, compute_uv=True):
        """
-        inputs :
-        --------
        full_matrices : bool, optional
            If True (default), u and v have the shapes (M, M) and (N, N),
            respectively.
@@ -582,21 +599,18 @@ def svd(a, full_matrices=1, compute_uv=1):
    """
    This function performs the SVD on CPU.
-    Parameters :
+    :type full_matrices: bool, optional
-    ------------
+    :param full_matrices:
-    full_matrices : bool, optional
        If True (default), u and v have the shapes (M, M) and (N, N),
        respectively.
        Otherwise, the shapes are (M, K) and (K, N), respectively,
        where K = min(M, N).
-    compute_uv : bool, optional
+    :type compute_uv: bool, optional
+    :param compute_uv:
        Whether or not to compute u and v in addition to s.
        True by default.
-    Returns :
+    :returns: U, V and D matrices.
-    -------
-    U, V and D matrices.
    """
    return SVD(full_matrices, compute_uv)(a)

--- a/theano/tensor/nnet/Conv3D.py
+++ b/theano/tensor/nnet/Conv3D.py
@@ -533,31 +533,33 @@ class Conv3D(theano.Op):
        return strutil.render_string(codeSource,locals())
+_conv3D = Conv3D()
-conv3D = Conv3D()
+def conv3D(V, W, b, d):
-"""
+    """
-3D "convolution" of multiple filters on a minibatch
+    3D "convolution" of multiple filters on a minibatch
-(does not flip the kernel, moves kernel with a user specified stride)
+    (does not flip the kernel, moves kernel with a user specified stride)
-:param V: Visible unit, input.
+    :param V: Visible unit, input.
        dimensions: (batch, row, column, time, in channel)
-:param W: Weights, filter.
+    :param W: Weights, filter.
        dimensions: (out channel, row, column, time ,in channel)
-:param b: bias, shape == (W.shape[0],)
+    :param b: bias, shape == (W.shape[0],)
-:param d: strides when moving the filter over the input(dx, dy, dt)
+    :param d: strides when moving the filter over the input(dx, dy, dt)
-:note: The order of dimensions does not correspond to the one in `conv2d`.
+    :note: The order of dimensions does not correspond to the one in `conv2d`.
           This is for optimization.
-:note: The GPU implementation is very slow. You should use
+    :note: The GPU implementation is very slow. You should use
-    :func:`conv3d2d <theano.tensor.nnet.conv3d2d.conv3d>` for a GPU
+           :func:`conv3d2d <theano.tensor.nnet.conv3d2d.conv3d>` for a
-    graph instead.
+           GPU graph instead.
-:see: Someone made a script that shows how to swap the axes between
-      both 3d convolution implementations in Theano. See the last
-      `attachment <https://groups.google.com/d/msg/theano-users/1S9_bZgHxVw/0cQR9a4riFUJ>`_.
+    :see: Someone made a script that shows how to swap the axes
+          between both 3d convolution implementations in Theano. See
+          the last `attachment
+          <https://groups.google.com/d/msg/theano-users/1S9_bZgHxVw/0cQR9a4riFUJ>`_.
 """
+    return _conv3D(V, W, b, d)
 def computeH(V,W,b,d):
    assert len(W.shape) == 5