Merge pull request #401 from nouiz/doc_op

Doc op

Merge pull request #401 from nouiz/doc_op
d6ed790a · lamblin · 41103b5d · 14a0070c · d6ed790a · d6ed790a
--- a/NEWS.txt
+++ b/NEWS.txt
-Modifications in the trunk since the 0.4.1 release (August 12th, 2011) up to December 5th, 2011
+UPDATED THIS FILE UP TO: 41103b5d158739e4147428ce776fb5716062d4a8

+ * fix subtensor bug(report RP, fix PL) TODO  BETTER DESCRIPTION! a7be89231eb26f7a39ab5448ef4abf90a6c0d529

-Upgrading to Theano 0.5 is recommended for everyone, but you should first make
+If you have updated to 0.5rc1, you are highly encouraged to update to
+0.5rc2. There is more bug fix and speed uptimization! But there is
+also a small new interface change about sum of [u]int* dtype.
+
+
+Modifications in the trunk since the 0.4.1 release (August 12th, 2011)
+
+
+Upgrading to Theano 0.5rc2 is recommended for everyone, but you should first make
 sure that your code does not raise deprecation warnings with Theano 0.4.1.
 Otherwise, in one case the results can change. In other cases, the warnings are
 turned into errors (see below for details).


-Important changes:
+Highlight:
 * Moved to github: http://github.com/Theano/Theano/
 * Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets
 * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
 * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+ * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm and dot(vector, vector). (James, Frédéric, Pascal)
+ * C implementation of Alloc. (James, Pascal)
+ * theano.grad() now also work with sparse variable. (Arnaud)
+ * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
 * See the Interface changes.


 Interface Behavior Change (was deprecated and generated a warning since Theano 0.3 released Nov. 23rd, 2010):
-    * The current default value of the parameter axis of
-      theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
-      numpy: None. i.e. operate on all dimensions of the tensor. (Frédéric Bastien, Olivier Delalleau)
+ * The current default value of the parameter axis of
+   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
+   numpy: None. i.e. operate on all dimensions of the tensor. (Frédéric Bastien, Olivier Delalleau)
+ * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
+   You can specify the output dtype with a new dtype parameter to sum.
+   The output dtype is the one using for the summation.
+   There is no warning in previous Theano version about this.
+   The consequence is that the sum is done in a dtype with more precession then before.
+   So the sum could be slower, but will be more resistent to overflow.
+   This new behavior is the same as numpy. (Olivier, Pascal)


 Interface Features Removed (most were deprecated):
@@ -32,10 +52,10 @@ Interface Features Removed (most were deprecated):
 * Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
   If you use it, Theano will now raise an error. (Olivier D.)
 * scan interface changes: (Razvan Pascanu)
-    - The use of `return_steps` for specifying how many entries of the output
+    * The use of `return_steps` for specifying how many entries of the output
      to return has been removed. Instead, apply a subtensor to the output
      returned by scan to select a certain slice.
-    - The inner function (that scan receives) should return its outputs and
+    * The inner function (that scan receives) should return its outputs and
      updates following this order:
        [outputs], [updates], [condition].
      One can skip any of the three if not used, but the order has to stay unchanged.
@@ -46,8 +66,30 @@ Interface bug fixes:
 New deprecation (will be removed in Theano 0.6, warning generated if you use them):
 * tensor.shared() renamed to tensor._shared(). You probably want to call theano.shared() instead! (Olivier D.)

+Scan fix:
+ * computing grad of a function of grad of scan(reported by ?, Razvan)
+   before : most of the time crash, but could be wrong value with bad number of dimensions(so a visible bug)
+   now : do the right thing.
+ * gradient with respect to outputs using multiple taps(Timotty reported, fix Razvan)
+   before : it used to return wrong values
+   now : do the right thing.
+   Note: The reported case of this bug was happening in conjunction with the
+         save optimization of scan that give run time errors. So if you didn't
+         manually disable the same memory optimization(number in the list4),
+         you are fine if you didn't manually request multiple taps.
+ * Rop of gradient of scan (reported by Timotty and Justin Buyer, fix by Razvan)
+   before : compilation error when computing R-op
+   now : do the right thing.
+ * save memory optimization of scan (reported by Timotty and Nicolas BL, fix by Razvan)
+   before : for certain corner cases used to result in a runtime shape error
+   now : do the right thing.
+ * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
+ * Scan.infer_shape now works correctly when working with a condition for the number of loops.
+   In the past, it returned n_steps as the length, which is not always true. (Razvan)
+ * Scan.infer_shape crash fix. (Reported by ?, Razvan)

 New features:
+ * AdvancedIncSubtensor grad defined and tested (Justin Bayer)
 * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
 * tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
 * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
@@ -68,6 +110,18 @@ New features:
     * Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
       The new theano.sparse.dot() has a dense gradient for all inputs.
 * GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
+ * TensorVariable.zeros_like() and SparseVariable.zeros_like()
+ * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties()(Frederic)
+ * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info()return free and total gpu memory(Frederic)
+ * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
+     * We also support the "theano_version" substitution.
+ * IntDiv c code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
+ * Internal filter_variable mechanism in Type. (Pascal, Ian)
+    * Ifelse work on sparse.
+    * Make use of gpu shared variable more transparent with theano.function updates and givens parameter.
+ * Added a_tensor.transpose(axes) axes is optional (James)
+    * theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
+ * a_CudaNdarray_object[*] = int, now work (Frederic)


 New optimizations:
@@ -88,9 +142,6 @@ New optimizations:
 Bug fixes (the result changed):
 * On CPU, if the convolution had received explicit shape information, they where not checked at runtime.
   This caused wrong result if the input shape was not the one expected. (Frederic, reported by Sander Dieleman)
- * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
- * Scan.infer_shape now works correctly when working with a condition for the number of loops.
-   In the past, it returned n_steps as the length, which is not always true. (Razvan)
 * Theoretical bug: in some case we could have GPUSum return bad value.
   We were not able to reproduce this problem
     * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
@@ -119,19 +170,21 @@ Crashes fixed:
 * Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
 * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
 * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
- * fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
+ * Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
+ * Fix runtime crash in gemm, dot22. FB
+ * Fix on 32bits computer: make sure all shape are int64.(Olivier)
+ * Fix to deque on python 2.4 (Olivier)
+ * Fix crash when not using c code(or using DebugMode)(not used by default) with numpy 1.6*. Numpy have a bug in the reduction code that make it crash. ufunc.reduce (Pascal)


 Known bugs:
 * CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_).
     * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
- * If you take the grad of a grad of scan, now we raise an error during the construction of the graph. In the past, you could have wrong results in some cases or an error at run time.
- * Scan can raise an IncSubtensor error at run time (no wrong result possible). The current workaround is to disable an optimization with this Theano flag: "optimizer_excluding=scanOp_save_mem".
-   * If you have multiple optimizations to disable, you must separate them with ":".


 Sandbox:
 * cvm interface more consistent with current linker. (James)
+   * Now all tests pass with the linker=cvm flags.
 * vm linker has a callback parameter. (James)
 * review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
 * review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
@@ -142,24 +195,30 @@ Sandbox:
 * review/finish/doc: ensure_sorted_indices. (Li Yao)
 * review/finish/doc: spectral_radius_boud. (Xavier Glorot)
 * review/finish/doc: sparse sum. (Valentin Bisson)
+ * review/finish/doc: Remove0 (Valentin)
+ * review/finish/doc: SquareDiagonal (Eric)


 Sandbox New features (not enabled by default):
 * CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
+ * New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)


 Documentation:
 * Many updates. (Many people)
 * Updates to install doc on MacOS. (Olivier)
 * Updates to install doc on Windows. (David, Olivier)
+ * Doc on the Rop function (Ian)
 * Added how to use scan to loop with a condition as the number of iteration. (Razvan)
 * Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
 * Refactored GPU installation of Theano. (Olivier)


 Others:
- * Better error messages in many places. (David, Ian, Frederic, Olivier)
+ * Better error messages in many places. (Many people)
 * PEP8 fixes. (Many people)
+ * Add a warning about numpy bug with subtensor with more then 2**32 elemenent(TODO, more explicit)
+ * Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code)(Razvan)
 * New min_informative_str() function to print graph. (Ian)
 * Fix catching of exception. (Sometimes we catched interupt) (Frederic, David, Ian, Olivier)
 * Better support for uft string. (David)
@@ -168,13 +227,18 @@ Others:
 * Warning when people have old cache entries. (Olivier)
 * More tests for join on the GPU and CPU. (Frederic)
 * Don't request to load the GPU module by default in scan module. (Razvan)
- * Fixed some import problems.
+ * Fixed some import problems. (Frederic and others)
 * Filtering update. (James)
 * On Windows, the default compiledir changed to be local to the computer/user and not transferred with roaming profile. (Sebastian Urban)
 * New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
   it prints a warning when an error occurs when inferring the shape of some apply node.
   The other accepted value is "raise" to raise an error when this happens. (Frederic)
 * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
+ * better pycuda tests (Frederic)
+ * check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
+ * Fix opt warning when the opt ShapeOpt is disabled(enabled by default) (Frederic)
+ * More internal verification on what each op.infer_shape return. (Frederic, James)
+ * Argmax dtype to int64 (Olivier)

 Reviewers (alphabetical order):
 * David, Frederic, Ian, James, Olivier, Razvan
--- a/doc/library/gradient.txt
+++ b/doc/library/gradient.txt
@@ -16,57 +16,5 @@ function does the underlying work, and is more flexible, but is also more
 awkward to use when :func:`tensor.grad` can do the job.


-.. function:: grad_sources_inputs(sources, graph_inputs, warn_type=True)
-
-    A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
-    a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
-    ``v``. More specifically, ``g_v`` is the gradient of an external
-    scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
-
-    This function traverses the graph backward from the ``r`` sources,
-    calling ``op.grad(...)`` for all ops with some non-None gradient
-    on an output, to compute gradients of ``cost`` wrt intermediate
-    variables and ``graph_inputs``.
-
-    The ``op.grad(...)`` functions are called like this:
-
-    .. code-block:: python
-
-        op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
-
-    This call to ``op.grad`` should return a list or tuple: one symbolic
-    gradient per input. These gradients represent the gradients of
-    the same implicit ``cost`` mentionned above, wrt ``op.inputs``.  Note
-    that this is **not** the same as the gradient of ``op.outputs`` wrt
-    ``op.inputs``.
-
-    If ``op`` has a single input, then ``op.grad`` should return a list
-    or tuple of length 1.
-    For each input wrt to which ``op`` is not differentiable, it should
-    return ``None`` instead of a `Variable` instance.
-
-    If a source ``r`` receives a gradient from another source ``r2``,
-    then the effective gradient on ``r`` is the sum of both gradients.
-
-
-    :type sources: list of pairs of Variable: (v, gradient-on-v) to 
-                   initialize the total_gradient dictionary
-
-    :param sources: gradients to back-propagate using chain rule
-
-    :param warn_type: True will trigger warnings via the logging module when
-       the gradient on an expression has a different type than the original
-       expression
-
-    :type warn_type: bool
-
-    :type graph_inputs: list of Variable
-
-    :param graph_inputs: variables considered to be constant 
-                         (do not backpropagate through them)
-
-    :rtype: dictionary whose keys and values are of type `Variable`
-
-    :returns: mapping from each Variable encountered in the backward traversal to its [total] gradient.
-
-
+.. automodule:: theano.gradient
+    :members:
--- a/doc/library/sandbox/cuda/index.txt
+++ b/doc/library/sandbox/cuda/index.txt
@@ -15,6 +15,4 @@

    var
    type
-
-
-
+    op
--- a/doc/library/sandbox/cuda/op.txt
+++ b/doc/library/sandbox/cuda/op.txt
+.. _libdoc_cuda_op:
+
+======================================================
+:mod:`sandbox.cuda` -- List of CUDA GPU Op implemented
+======================================================
+
+.. moduleauthor:: LISA
+
+Normally you should not call directly those Ops! Theano should automatically transform cpu ops to their gpu equivalent. So this list is just useful to let people know what is implemented on the gpu.
+
+Basic Op
+========
+
+.. automodule:: theano.sandbox.cuda.basic_ops
+    :members:
+
+Blas Op
+=======
+
+.. automodule:: theano.sandbox.cuda.blas
+    :members:
+
+Nnet Op
+=======
+
+.. automodule:: theano.sandbox.cuda.nnet
+    :members:
+
+Curand Op
+=========
+
+Random generator based on the CURAND libraries. It is not inserted automatically.
+
+.. automodule:: theano.sandbox.cuda.rng_curand
+    :members:
--- a/doc/tutorial/gradients.txt
+++ b/doc/tutorial/gradients.txt
@@ -94,9 +94,14 @@ of symbolic differentiation).
 Computing the Jacobian
 ======================

-In order to compute the Jacobian of some function ``y`` with respect to some
-parameter ``x`` we need to use the ``scan``. What we do is to loop over the
-entries in ``y`` and compute the gradient of ``y[i]`` with respect to ``x``.
+Theano implements :func:`theano.gradient.jacobian` macro that does all
+what is needed to compute the Jacobian. The following text explains how
+to do it manually.
+
+In order to manually compute the Jacobian of some function ``y`` with
+respect to some parameter ``x`` we need to use ``scan``. What we
+do is to loop over the entries in ``y`` and compute the gradient of
+``y[i]`` with respect to ``x``.

 .. note::
    
@@ -129,12 +134,17 @@ matrix, which corresponds to the Jacobian.
    seems possible. The reason is that ``y_i`` will not be a function of
    ``x`` anymore, while ``y[i]`` still is. 

+
 Computing the Hessian
 =====================

-Similar to computing the Jacobian we can also compute the Hessian. The only
+Theano implements :func:`theano.gradient.hessian` macro that does all
+that is needed to compute the Hessian. The following text explains how
+to do it manually.
+
+You can compute the Hessian manually as the Jacobian. The only
 difference is that now, instead of computing the Jacobian of some expression
-``y``, we compute the jacobian of ``T.grad(cost,x)``, where ``cost`` is some
+``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some
 scalar. 



--- a/theano/gradient.py
+++ b/theano/gradient.py
@@ -58,14 +58,50 @@ def format_as(use_list, use_tuple, outputs):

 def grad_sources_inputs(sources, graph_inputs, warn_type=True):
    """
-    :type sources: list of pairs of Variable: (v, gradient-on-v)
+    A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
+    a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
+    ``v``. More specifically, ``g_v`` is the gradient of an external
+    scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
+
+    This function traverses the graph backward from the ``r`` sources,
+    calling ``op.grad(...)`` for all ops with some non-None gradient
+    on an output, to compute gradients of ``cost`` wrt intermediate
+    variables and ``graph_inputs``.
+
+    The ``op.grad(...)`` functions are called like this:
+
+    .. code-block:: python
+
+        op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
+
+    This call to ``op.grad`` should return a list or tuple: one symbolic
+    gradient per input. These gradients represent the gradients of
+    the same implicit ``cost`` mentionned above, wrt ``op.inputs``.  Note
+    that this is **not** the same as the gradient of ``op.outputs`` wrt
+    ``op.inputs``.
+
+    If ``op`` has a single input, then ``op.grad`` should return a list
+    or tuple of length 1.
+    For each input wrt to which ``op`` is not differentiable, it should
+    return ``None`` instead of a `Variable` instance.
+
+    If a source ``r`` receives a gradient from another source ``r2``,
+    then the effective gradient on ``r`` is the sum of both gradients.
+
+
+
+    :type sources: list of pairs of Variable: (v, gradient-on-v) to
+                   initialize the total_gradient dictionary
    :param sources: gradients to back-propagate using chain rule
    :type graph_inputs: list of Variable
    :param graph_inputs: variables considered to be constant
        (do not backpropagate through them)
+    :type warn_type: bool
+    :param warn_type: True will trigger warnings via the logging module when
+       the gradient on an expression has a different type than the original
+       expression

-    :rtype: dictionary whose keys and values are of type `Variable`
-
+    :rtype: dictionary whose keys and values are of type Variable
    :return: mapping from each Variable encountered in the backward
        traversal to the gradient with respect to that Variable.

@@ -73,9 +109,6 @@ def grad_sources_inputs(sources, graph_inputs, warn_type=True):
    sources, so that for each v, gradient-on-v is the gradient of J with
    respect to v

-
-
-
    """
    gmap = {}
    for (r, g_r) in sources:
@@ -182,23 +215,22 @@ def Rop(f, wrt, eval_points):
    in `eval_points`. Mathematically this stands for the jacobian of `f` wrt
    to `wrt` right muliplied by the eval points.

-    :type f: `Variable` or list of `Variable`s
-        `f` stands for the output of the computational graph to which you
-        want to apply the R operator
-    :type wrt: `Variable` or list of `Variables`s
-        variables for which you compute the R operator of the expression
-        described by `f`
-    :type eval_points: `Variable` or list of `Variable`s
-        evalutation points for each of the variables in `wrt`
-
-    :rtype: `Variable` or list/tuple of `Variable`s depending on type of f
+    :type f: Variable or list of Variables
+             `f` stands for the output of the computational graph to which you
+             want to apply the R operator
+    :type wrt: Variable or list of `Variables`s
+               variables for which you compute the R operator of the expression
+               described by `f`
+    :type eval_points: Variable or list of Variables
+                       evalutation points for each of the variables in `wrt`
+    :rtype: Variable or list/tuple of Variables depending on type of f
    :return: symbolic expression such that
        R_op[i] = sum_j ( d f[i] / d wrt[j]) eval_point[j]
        where the indices in that expression are magic multidimensional
        indices that specify both the position within a list and all
        coordinates of the tensor element in the last.
        If `wrt` is a list/tuple, then return a list/tuple with the results.
-        """
+    """
    from theano.tensor import as_tensor_variable
    using_list = isinstance(f, list)
    using_tuple = isinstance(f, tuple)
@@ -295,16 +327,16 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
    in `eval_points`. Mathematically this stands for the jacobian of `f` wrt
    to `wrt` left muliplied by the eval points.

-    :type f: `Variable` or list of `Variable`s
+    :type f: Variable or list of Variables
        `f` stands for the output of the computational graph to which you
        want to apply the L operator
-    :type wrt: `Variable` or list of `Variables`s
+    :type wrt: Variable or list of `Variables`s
        variables for which you compute the L operator of the expression
        described by `f`
-    :type eval_points: `Variable` or list of `Variable`s
-        evalutation points for each of the variables in `f`
+    :type eval_points: Variable or list of Variables
+                        evalutation points for each of the variables in `f`

-    :rtype: `Variable` or list/tuple of `Variable`s depending on type of f
+    :rtype: Variable or list/tuple of Variables depending on type of f
    :return: symbolic expression such that
        L_op[i] = sum_i ( d f[i] / d wrt[j]) eval_point[i]
        where the indices in that expression are magic multidimensional
@@ -374,9 +406,9 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
 def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
         disconnected_inputs='raise'):
    """
-    :type cost: Scalar (0-dimensional) `Variable`
-    :type wrt: `Variable` or list of `Variable`s.
-    :type g_cost: Scalar `Variable`, or None
+    :type cost: Scalar (0-dimensional) Variable.
+    :type wrt: Variable or list of Variables.
+    :type g_cost: Scalar Variable, or None.
    :param g_cost: an expression for the gradient through cost.  The default is
        ``ones_like(cost)``.
    :param consider_constant: a list of expressions not to backpropagate
@@ -393,7 +425,7 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
        - 'warn': consider the gradient zero, and print a warning.
        - 'raise': raise an exception.

-    :rtype: `Variable` or list/tuple of `Variable`s (depending upon `wrt`)
+    :rtype: Variable or list/tuple of Variables (depending upon `wrt`)

    :return: symbolic expression of gradient of `cost` with respect to `wrt`.
             If an element of `wrt` is not differentiable with respect
@@ -672,9 +704,9 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None,
    """ Test a gradient by Finite Difference Method. Raise error on failure.

    Example:
-    >>> verify_grad(theano.tensor.tanh,
-                    (numpy.asarray([[2,3,4], [-1, 3.3, 9.9]]),),
-                    rng=numpy.random)
+        >>> verify_grad(theano.tensor.tanh,
+                        (numpy.asarray([[2,3,4], [-1, 3.3, 9.9]]),),
+                        rng=numpy.random)

    Raises an Exception if the difference between the analytic gradient and
    numerical gradient (computed through the Finite Difference Method) of a
@@ -841,8 +873,8 @@ verify_grad.E_grad = GradientError
 def jacobian(expression, wrt, consider_constant=None, warn_type=False,
             disconnected_inputs='raise'):
    """
-    :type expression: Vector (1-dimensional) `Variable`
-    :type wrt: 'Variable' or list of `Variables`s
+    :type expression: Vector (1-dimensional) Variable
+    :type wrt: Variable or list of Variables

    :param consider_constant: a list of expressions not to backpropagate
        through
@@ -858,7 +890,7 @@ def jacobian(expression, wrt, consider_constant=None, warn_type=False,
        - 'warn': consider the gradient zero, and print a warning.
        - 'raise': raise an exception.

-    :return: either a instance of `Variable` or list/tuple of `Variable`s
+    :return: either a instance of Variable or list/tuple of Variables
            (depending upon `wrt`) repesenting the jacobian of `expression`
            with respect to (elements of) `wrt`. If an element of `wrt` is not
            differentiable with respect to the output, then a zero
@@ -914,9 +946,9 @@ def jacobian(expression, wrt, consider_constant=None, warn_type=False,
 def hessian(cost, wrt, consider_constant=None, warn_type=False,
             disconnected_inputs='raise'):
    """
-    :type cost: Scalar (0-dimensional) `Variable`
+    :type cost: Scalar (0-dimensional) Variable.
    :type wrt: Vector (1-dimensional tensor) 'Variable' or list of
-            vectors (1-dimensional tensors) `Variable`s
+               vectors (1-dimensional tensors) Variables

    :param consider_constant: a list of expressions not to backpropagate
        through
@@ -932,7 +964,7 @@ def hessian(cost, wrt, consider_constant=None, warn_type=False,
        - 'warn': consider the gradient zero, and print a warning.
        - 'raise': raise an exception.

-    :return: either a instance of `Variable` or list/tuple of `Variable`s
+    :return: either a instance of Variable or list/tuple of Variables
            (depending upon `wrt`) repressenting the Hessian of the `cost`
            with respect to (elements of) `wrt`. If an element of `wrt` is not
            differentiable with respect to the output, then a zero

--- a/theano/sandbox/cuda/basic_ops.py
+++ b/theano/sandbox/cuda/basic_ops.py
@@ -34,6 +34,9 @@ def as_cuda_array(obj):
        raise TypeError("Don't know how to cast to a CudaNdarray object")

 class HostFromGpu(Op):
+    """
+    Implement the transfer from gpu to the cpu.
+    """
    def __eq__(self, other):
        return type(self) == type(other)
    def __hash__(self):
@@ -63,6 +66,9 @@ class HostFromGpu(Op):
 host_from_gpu = HostFromGpu()

 class GpuFromHost(Op):
+    """
+    Implement the transfer from cpu to the gpu.
+    """
    def __eq__(self, other):
        return type(self) == type(other)
    def __hash__(self):
@@ -93,6 +99,9 @@ class GpuFromHost(Op):
 gpu_from_host = GpuFromHost()

 class GpuElemwise(Op):
+    """
+    Implement a generic elemwise on the gpu.
+    """
    nin = property(lambda self: self.scalar_op.nin)
    nout = property(lambda self: self.scalar_op.nout)

@@ -200,6 +209,9 @@ class GpuElemwise(Op):
        return self.src_generator.cache_version

 class GpuDimShuffle(Op):
+    """
+    Implement DimShuffle on the gpu.
+    """
    def __init__(self, input_broadcastable, new_order):
        input_broadcastable = tuple(input_broadcastable)
        self.input_broadcastable = input_broadcastable
@@ -403,7 +415,7 @@ class GpuSum(Op):
      - reduce_mask == (1,1,1) computes the sum of all elements in a 3-tensor.

    :note: any reduce_mask of all zeros is a sort of 'copy', and may be removed during graph
-    optimization
+           optimization

    """
    def __init__(self, reduce_mask):
@@ -1706,6 +1718,9 @@ class GpuSum(Op):
        return sio.getvalue()

 class GpuReshape(tensor.Reshape):
+    """
+    Implement Reshape on the gpu.
+    """
    # __hash__, __eq__, __str__ come from tensor.Subtensor
    def make_node(self, x, shp):
        host_reshaped = host_from_gpu(x).reshape(shp,ndim=self.ndim)
@@ -1719,6 +1734,9 @@ class GpuReshape(tensor.Reshape):
        out[0] = x.reshape(tuple(shp))

 class GpuSubtensor(tensor.Subtensor):
+    """
+    Implement subtensor on the gpu.
+    """
    # __hash__, __eq__, __str__ come from tensor.Subtensor
    def make_node(self, x, *inputs):
        assert isinstance(x.type, CudaNdarrayType)
@@ -1747,6 +1765,9 @@ class GpuSubtensor(tensor.Subtensor):
        out[0] = x.__getitem__(cdata)

 class GpuAdvancedSubtensor1(tensor.AdvancedSubtensor1):
+    """
+    Implement AdvancedSubtensor1 on the gpu.
+    """
    def make_node(self, x, ilist):
        x_ = as_cuda_ndarray_variable(x)
        ilist_ = tensor.as_tensor_variable(ilist)
@@ -1770,6 +1791,9 @@ class GpuAdvancedSubtensor1(tensor.AdvancedSubtensor1):
        out[0] = o

 class GpuAdvancedIncSubtensor1(tensor.AdvancedIncSubtensor1):
+    """
+    Implement AdvancedIncSubtensor1 on the gpu.
+    """
    def make_node(self, x, y, ilist):
        x_ = as_cuda_ndarray_variable(x)
        y_ = as_cuda_ndarray_variable(y)
@@ -1795,6 +1819,9 @@ class GpuAdvancedIncSubtensor1(tensor.AdvancedIncSubtensor1):
        # so we use the parent version that loop on each indices.

 class GpuIncSubtensor(tensor.IncSubtensor):
+    """
+    Implement IncSubtensor on the gpu.
+    """
    def make_node(self, x, y, *inputs):
        assert isinstance(x.type, CudaNdarrayType)
        assert isinstance(y.type, CudaNdarrayType)
@@ -1802,6 +1829,9 @@ class GpuIncSubtensor(tensor.IncSubtensor):
        return Apply(self, [x,y]+rval.inputs[2:], [x.type()])

 class GpuFlatten(tensor.Flatten):
+    """
+    Implement Flatten on the gpu.
+    """
    def make_node(self, x ):
        assert isinstance(x.type, CudaNdarrayType)
        rval = tensor.Flatten.make_node(self, x)
@@ -1810,11 +1840,17 @@ class GpuFlatten(tensor.Flatten):
        return Apply(self, [x], [out_type()])

 class GpuShape(tensor.Shape):
+    """
+    Implement Shape on the gpu.
+    """
    def make_node(self, x):
        return Apply(self, [x], [tensor.lvector()])
 gpu_shape = GpuShape()

 class GpuJoin(tensor.Join):
+    """
+    Implement Join on the gpu.
+    """
    def make_node(self, *axis_and_tensors):
        axis, tensors = axis_and_tensors[0], axis_and_tensors[1:]
        if not tensors:
@@ -1889,6 +1925,9 @@ class GpuJoin(tensor.Join):
 gpu_join = GpuJoin()

 class GpuAlloc(Op):
+    """
+    Implement Alloc on the gpu.
+    """
    def __init__(self):
        pass

@@ -1967,7 +2006,12 @@ class GpuAlloc(Op):

 gpu_alloc = GpuAlloc()

+
 class GpuContiguous(Op):
+    """
+    Always return a c contiguous output. Copy the input only if it is
+    not already c contiguous.
+    """
    view_map = {0: [0]}

    def __eq__(self, other):

--- a/theano/sandbox/cuda/blas.py
+++ b/theano/sandbox/cuda/blas.py
@@ -6,6 +6,9 @@ import cuda_ndarray.cuda_ndarray as cuda
 from theano.sandbox.cuda.type import CudaNdarrayType

 class GpuDot22(Op):
+    """
+    Implement dot(2d, 2d) on the gpu.
+    """
    def __str__(self):
        return 'GpuDot22'
    def __eq__(self, other):
@@ -74,6 +77,9 @@ class GpuDot22(Op):
 gpu_dot22 = GpuDot22()

 class GpuDot22Scalar(Op):
+    """
+    Implement dot(2d, 2d) * scalar on the gpu.
+    """
    def __str__(self):
        return 'GpuDot22Scalar'
    def __eq__(self, other):
@@ -434,6 +440,7 @@ gpu_ger_no_inplace = GpuGer(inplace=False)
 gpu_ger_inplace = GpuGer(inplace=True)

 class GpuOuter(Op):
+    """ Implement outer on the gpu."""
    def make_node(self, x, y):
        # we suppose type checking has been done, but make sure.
        assert (x.type.ndim == 1 and y.type.ndim == 1 and
@@ -526,6 +533,9 @@ gpu_outer = GpuOuter()
 # Not really a BLAS operation, but whatever.
 #
 class GpuConv(Op):
+    """
+    Implement the batched and stacked 2d convolution on the gpu.
+    """
    @staticmethod
    def logical_output_shape_2d(imshp, kshp, mode):
        if mode == 'valid':
@@ -689,6 +699,9 @@ class GpuConv(Op):


 class GpuDownsampleFactorMax(Op):
+    """
+    Implement downsample with max on the gpu.
+    """
    def __init__(self, ds, ignore_border=False):
        self.ds = tuple(ds)
        self.ignore_border = ignore_border
@@ -846,6 +859,9 @@ class GpuDownsampleFactorMax(Op):
        """ % locals()

 class GpuDownsampleFactorMaxGrad(Op):
+    """
+    Implement the grad of downsample with max on the gpu.
+    """
    def __init__(self, ds, ignore_border):
        self.ds = tuple(ds)
        self.ignore_border = ignore_border

--- a/theano/sandbox/cuda/nnet.py
+++ b/theano/sandbox/cuda/nnet.py
@@ -6,7 +6,11 @@ from theano.sandbox.cuda.type import CudaNdarrayType

 from theano.sandbox.cuda.kernel_codegen import nvcc_kernel, inline_reduce_max, inline_reduce_sum, inline_softmax

+
 class GpuCrossentropySoftmaxArgmax1HotWithBias (Op):
+    """
+    Implement CrossentropySoftmaxArgmax1HotWithBias on the gpu.
+    """
    nin=3
    nout=3
    def __eq__(self, other):
@@ -177,6 +181,9 @@ class GpuCrossentropySoftmaxArgmax1HotWithBias (Op):
 gpu_crossentropy_softmax_argmax_1hot_with_bias = GpuCrossentropySoftmaxArgmax1HotWithBias()

 class GpuCrossentropySoftmax1HotWithBiasDx (Op):
+    """
+    Implement CrossentropySoftmax1HotWithBiasDx on the gpu.
+    """
    nin=3
    nout=1
    """Gradient wrt x of the CrossentropySoftmax1Hot Op"""
@@ -296,7 +303,9 @@ class GpuCrossentropySoftmax1HotWithBiasDx (Op):
 gpu_crossentropy_softmax_1hot_with_bias_dx = GpuCrossentropySoftmax1HotWithBiasDx()

 class GpuSoftmax (Op):
-    """Writeme"""
+    """
+    Implement Softmax on the gpu.
+    """
    def __eq__(self, other):
        return type(self) == type(other)
    def __hash__(self):
@@ -392,7 +401,9 @@ class GpuSoftmax (Op):
 gpu_softmax = GpuSoftmax()

 class GpuSoftmaxWithBias (Op):
-    """Writeme"""
+    """
+    Implement SoftmaxWithBias on the gpu.
+    """
    nin = 2
    nout = 1
    def __eq__(self, other):

--- a/theano/sandbox/cuda/rng_curand.py
+++ b/theano/sandbox/cuda/rng_curand.py
@@ -247,7 +247,8 @@ class CURAND_Uniform(CURAND_Base):


 class CURAND_RandomStreams(object):
-    """RandomStreams instance that creates CURAND-based random variables.
+    """
+    RandomStreams instance that creates CURAND-based random variables.

    One caveat is that generators are not serializable.
    """

--- a/theano/sandbox/linalg/ops.py
+++ b/theano/sandbox/linalg/ops.py
@@ -535,7 +535,7 @@ class MatrixInverse(Op):
    and :math:`A_{inv} \cdot A` equals the identity matrix :math:`I`.

    :note: When possible, the call to this op will be optimized to the call
-    of ``solve``.
+           of ``solve``.
    """

    def __init__(self):