Merge pull request #816 from bouchnic/docstring

Docstring NEWS: Make a good sparse tutorial (NB)

Merge pull request #816 from bouchnic/docstring
4cc35522 · nouiz · 3a2029b7 · 2081af30 · 4cc35522 · 4cc35522
--- a/doc/extending/pipeline.txt
+++ b/doc/extending/pipeline.txt
@@ -37,7 +37,7 @@ computation graph in the compilation phase:
 Step 1 - Create a FunctionGraph
-^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 The subgraph given by the end user is wrapped in a structure called
 *FunctionGraph*. That structure defines several hooks on adding and

--- a/doc/library/gof/fg.txt
+++ b/doc/library/gof/fg.txt
@@ -15,17 +15,17 @@ Guide
 =====
 FunctionGraph
---
+-------------
 .. _libdoc_gof_fgraphfeature:
 FunctionGraph Features
-------------
+----------------------
 .. _libdoc_gof_fgraphfeaturelist:
 FunctionGraph Feature List
-^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^
 * ReplaceValidate
 * DestroyHandler

--- a/doc/library/sparse/index.txt
+++ b/doc/library/sparse/index.txt
 .. _libdoc_sparse:
-===========================================================
+=========================================
-:mod:`sparse` -- Symbolic Sparse Matrices [doc TODO]
+:mod:`sparse` -- Symbolic Sparse Matrices
-===========================================================
+=========================================
-The sparse module has been used in:
+In the tutorial section, you can find a :ref:`sparse tutorial
+<tutsparse>`.
- NLP: Dense linear transformations of sparse vectors.
+The sparse submodule is not loaded when we import Theano. You must
+import ``theano.sparse`` to enable it.
- Audio: Filterbank in Fourier domain.
-The sparse module is less mature than the tensor module.
-This documentation is also not mature.
-The sparse submodule is not loaded when we import theano. You must import theano.sparse to enable it.
-The sparse module provides two kinds of sparse tensors are supported: CSC matrices and CSR matrices.
-Operations that are implemented:
-grad?
- conversion from sparse <-> dense 
-    - theano.sparse.{dense_from_sparse,dense_from_sparse}
-    - 
- [un]packing of sparse matrices from indexlists and nonzero elements.
+The sparse module provide the same functionalities as the tensor
-    - packing: theano.sparse.{CRC,CSR}
+module. The difference lies under the cover because sparse matrices
-    - unpacking: theano.sparse.csm_properties
+does not store data in a contiguous array. Note that there are no GPU
+implementations for sparse matrices implemented in Theano. The sparse
+module has been used in:
- transpose
+- NLP: Dense linear transformations of sparse vectors.
-    - theano.sparse.transpose
+- Audio: Filterbank in Fourier domain.
- negation
-    - neg
- addition/multiplication (elemwise)
-    - theano.sparse.{add,mul}
-    - sparse + sparse, sparse + dense, dense + sparse
-    - sparse * sparse, sparse * dense, dense * sparse
- StructuredDot
-    - with gradient defined such that sparsity pattern is
-      constant.  This function is called "structured_dot"
-    - theano.sparse.structured_dot and its grad (structured_dot_grad)
-    - theano.dot call it.
-    - dot(sparse, dense) and dot(dense, sparse), dot(sparse, sparse)
- Dot
+Compressed Sparse Format
-    - performs the true dot without special semantics.
+========================
-    - dot(sparse, dense), dot(dense, sparse), dot(sparse, sparse)
-    - When the operation has the form dot(csr_matrix, dense) the gradient of
+This section tries to explain how information is store for the two
-      this operation can be performed inplace by UsmmCscDense. This leads to
+sparse formats of SciPy supported by Theano. There is more formats
-      significant speed-ups.
+that can be used with SciPy and some documentation about them may be
- Subtensor
+found `here
-    - sparse_variable[N, N], return a tensor scalar
+<http://deeplearning.net/software/theano/sandbox/sparse.html>`_.
+.. Changes to this section should also result in changes to tutorial/sparse.txt.
+Theano supports two *compressed sparse formats* ``csc`` and ``csr``,
+respectively based on columns and rows. They have both the same
+attributes: ``data``, ``indices``, ``indptr`` and ``shape``.
+  * The ``data`` attribute is a one-dimentionnal ``ndarray`` which
+    contains all the non-zero elements of the sparse matrix.
+  * The ``indices`` and ``indptr`` attributes are used to store the
+    position of the data in the sparse matrix.
+  * The ``shape`` attribute is exactly the same as the ``shape``
+    attribute of a dense (i.e. generic) matrix. It can be explicitly
+    specified at the creation of a sparse matrix if it cannot be
+    infered from the first three attributes.
+CSC Matrix
+----------
+In the *Compressed Sparse Column* format, ``indices`` stands for index
+inside the column vectors of the matrix and ``indptr`` tells where the
+column starts in the ``data`` and in the ``indices``
+attributes. ``indptr`` can be tought as giving the slice which must be
+applied to the other attribute in order to get each column of the
+matrix. In other words, ``slice(indptr[i], indptr[i+1])`` correspond
+to the slice needed to find the i-th column of the matrix in the
+``data`` and in the ``indices`` fields.
+The following example builds a matrix and returns its columns. It
+prints the i-th column, i.e. a list of indices in the column and their
+corresponding value in the second list.
+>>> data = np.asarray([7, 8, 9])
+>>> indices = np.asarray([0, 1, 2])
+>>> indptr = np.asarray([0, 2, 3, 3])
+>>> m = sp.csc_matrix((data, indices, indptr), shape=(3, 3))
+>>> print m.toarray()
+[[7 0 0]
+ [8 0 0]
+ [0 9 0]]
+>>> i = 0
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[0, 1] [7, 8]
+>>> i = 1
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[2] [9]
+>>> i = 2
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[] []
+CSR Matrix
+----------
+In the *Compressed Sparse Row* format, ``indices`` stands for index
+inside the row vectors of the matrix and ``indptr`` tells where the
+row starts in the ``data`` and in the ``indices``
+attributes. ``indptr`` can be tought as giving the slice which must be
+applied to the other attribute in order to get each row of the
+matrix. In other words, ``slice(indptr[i], indptr[i+1])`` correspond
+to the slice needed to find the i-th row of the matrix in the ``data``
+and in the ``indices`` fields.
+The following example builds a matrix and returns its rows. It prints
+the i-th row, i.e. a list of indices in the row and their corresponding value
+in the second list.
+>>> data = np.asarray([7, 8, 9])
+>>> indices = np.asarray([0, 1, 2])
+>>> indptr = np.asarray([0, 2, 3, 3])
+>>> m = sp.csr_matrix((data, indices, indptr), shape=(3, 3))
+>>> print m.toarray()
+[[7 8 0]
+ [0 0 9]
+ [0 0 0]]
+>>> i = 0
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[0, 1] [7, 8]
+>>> i = 1
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[2] [9]
+>>> i = 2
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[] []
+List of Implemented Operations
+==============================
+- Moving from and to sparse
+    - :class:`DenseFromSparse <theano.sparse.basic.DenseFromSparse>` and ``dense_from_sparse``.
+      Both grad are implemented. Structured by default.
+    - :class:`SparseFromDense <theano.sparse.basic.SparseFromDense>` and ``csr_from_dense``, ``csc_from_dense``.
+      The grad implemented is structured.
+- Construction of Sparses and their Properties
+    - :class:`CSM <theano.sparse.basic.CSM>` and ``CSC``, ``CSR`` to construct a matrix.
+      The grad implemented is regular.
+    - :class:`CSMProperties <theano.sparse.basic.CSMProperties>` to get the properties of a sparse matrix.
+      The grad implemented is regular.
+    - :func:`sp_ones_like <theano.sparse.basic.sp_ones_like>`.
+      The grad implemented is regular.
+    - :func:`sp_zeros_like <theano.sparse.basic.sp_zeros_like>`.
+      The grad implemented is regular.
+    - :class:`SquareDiagonal <theano.sparse.basic.SquareDiagonal>` and ``square_diagonal``.
+      The grad implemented is regular.
+- Cast
+    - :class:`Cast <theano.sparse.basic.Cast>` with ``bcast``, ``wcast``, ``icast``, ``lcast``,
+      ``fcast``, ``dcast``, ``ccast``, and ``zcast``.
+      The grad implemented is regular.
+- Transpose
+    - :class:`Transpose <theano.sparse.basic.Transpose>` and ``transpose``.
+      The grad implemented is regular.
+- Basic Arithmetic
+    - :class:`Neg <theano.sparse.basic.Neg>`.
+      The grad implemented is regular.
+    - :func:`add <theano.sparse.basic.add>`.
+      The grad implemented is regular.
+    - :func:`sub <theano.sparse.basic.sub>`.
+      The grad implemented is regular.
+    - :func:`mul <theano.sparse.basic.mul>`.
+      The grad implemented is regular.
+    - :func:`col_scale <theano.sparse.basic.col_scale>` to multiply by a vector along the columns.
+      The grad implemented is structured.
+    - :func:`row_slace <theano.sparse.basic.row_scale>` to multiply by a vector along the rows.
+      The grad implemented is structured.
+- Monoid (Element-wise operation with only one sparse input).
+  `They all have a structured grad.`
+    - ``structured_sigmoid``
+    - ``structured_exp``
+    - ``structured_log``
+    - ``structured_pow``
+    - ``structured_minimum``
+    - ``structured_maximum``
+    - ``structured_add``
+    - ``sin``
+    - ``arcsin``
+    - ``tan``
+    - ``arctan``
+    - ``sinh``
+    - ``arcsinh``
+    - ``tanh``
+    - ``arctanh``
+    - ``rint``
+    - ``ceil``
+    - ``floor``
+    - ``sgn``
+    - ``log1p``
+    - ``sqr``
+    - ``sqrt``
+- Dot Product
+    - :class:`Dot <theano.sparse.basic.Dot>` and ``dot``.
+      The grad implemented is regular.
+    - :class:`StructuredDot <theano.sparse.basic.StructuredDot>`
+      and :func:`structured_dot <theano.sparse.basic.structured_dot>`.
+      The grad implemented is structured.
+    - :class:`SamplingDot <theano.sparse.basic.SamplingDot>` and ``sampling_dot``.
+      The grad implemented is structured for `p`.
+    - :class:`Usmm <theano.sparse.basic.Usmm>` and ``usmm``.
+      There is no grad implemented for this op.
+- Slice Operations
+    - sparse_variable[N, N], return a tensor scalar.
+      There is no grad implemented for this operation.
    - sparse_variable[M:N, O:P], return a sparse matrix
-    - Don't support [M, N:O] and [M:N, O] as we don't support sparse vector
+      There is no grad implemented for this operation.
+    - Sparse variable don't support [M, N:O] and [M:N, O] as we don't support sparse vector
      and returning a sparse matrix would break the numpy interface.
      Use [M:M+1, N:O] and [M:N, O:O+1] instead.
+    - :class:`Diag <theano.sparse.basic.Diag>` and ``diag``.
-There are no GPU implementations for sparse matrices implemented in Theano.
+      The grad implemented is regular.
-Some documentation for sparse has been written
+- Concatenation
-`here <http://deeplearning.net/software/theano/sandbox/sparse.html>`_.
+    - :class:`HStack <theano.sparse.basic.HStack>` and ``hstack``.
+      The grad implemented is regular.
+    - :class:`VStack <theano.sparse.basic.VStack>` and ``vstack``.
+      The grad implemented is regular.
+- Probability
+  `There is no grad implemented for these operations.`
+    - :class:`Poisson <theano.sparse.basic.Poisson>` and ``poisson``
+    - :class:`Binomial <theano.sparse.basic.Binomial>` and ``csc_fbinomial``, ``csc_dbinomial``
+      ``csr_fbinomial``, ``csr_dbinomial``
+    - :class:`Multinomial <theano.sparse.basic.Multinomial>` and ``multinomial``
+- Internal Representation
+  `They all have a regular grad implemented.`
+    - :class:`EnsureSortedIndices <theano.sparse.basic.EnsureSortedIndices>` and ``ensure_sorted_indices``
+    - :class:`Remove0 <theano.sparse.basic.Remove0>` and ``remove0``
+    - :func:`clean <theano.sparse.basic.clean>` to resort indices and remove zeros
 ===================================================================
 :mod:`sparse` --  Sparse Op

--- a/doc/tutorial/sparse.txt
+++ b/doc/tutorial/sparse.txt
@@ -4,35 +4,186 @@
 Sparse
 ======
-This is a very short tutorial on sparse matrices with Theano. There is still
+Sparse Matrices
-some not well documented behavior like how we take care of the
+===============
-gradient. There are 2 types of gradient for sparse operations. ``full
-gradient`` that compute a gradient for values even if they were 0 and
-the ``structured gradient`` that returns a gradient only for values
-that were not 0. You need to check the code to know which gradient an
-Op implements.
-More documentation in the :ref:`Sparse Library Reference <libdoc_sparse>`.
+In general, *sparse* matrices provide the same functionality as regular
+matrices. The difference lies in the way the elements of *sparse* matrices are 
+represented and stored in memory. Only the non-zero elements of the latter are stored.
+This has some potential advantages: first, this
+may obviously lead to reduced memory usage and, second, clever
+storage methods may lead to reduced computation time through the use of
+sparse specific algorithms. We usually refer to the generically stored matrices
+as *dense* matrices.
-A small example:
+Theano's sparse package provides efficient algorithms, but its use is not recommended 
+in all cases or for all matrices. As an obvious example, consider the case where
+the *sparsity proportion* if very low. The *sparsity proportion* refers to the
+ratio of the number of zero elements to the number of all elements in a matrix.
+A low sparsity proportion may result in the use of more space in memory
+since not only the actual data is stored, but also the position of nearly every
+element of the matrix. This would also require more computation
+time whereas a dense matrix representation along with regular optimized algorithms might do a
+better job. Other examples may be found at the nexus of the specific purpose and structure
+of the matrices. More documentation may be found in the
+`SciPy Sparse Reference <http://docs.scipy.org/doc/scipy/reference/sparse.html>`_.
-.. code-block:: python
+Since sparse matrices are not stored in contiguous arrays, there are several
+ways to represent them in memory. This is usually designated by the so-called ``format``
+of the matrix. Since Theano's sparse matrix package is based on the SciPy
+sparse package, complete information about sparse matrices can be found
+in the SciPy documentation. Like SciPy, Theano does not implement sparse formats for
+arrays with a number of dimensions different from two. 
-    import theano
+So far, Theano implements two ``formats`` of sparse matrix: ``csc`` and ``csr``.
-    import theano.tensor as T
+Those are almost identical except that ``csc`` is based on the *columns* of the
-    import scipy.sparse as sp
+matrix and ``csr`` is based on its *rows*. They both have the same purpose:
-    import theano.sparse as S
+to provide for the use of efficient algorithms performing linear algebra operations.
-    import numpy as np
+A disadvantage is that they fail to give an efficient way to modify the sparsity structure
+of the underlying matrix, i.e. adding new elements. This means that if you are
+planning to add new elements in a sparse matrix very often in your computational graph,
+perhaps a tensor variable could be a better choice.
-    x = S.csr_matrix ('x')
+More documentation may be found in the :ref:`Sparse Library Reference <libdoc_sparse>`.
-    #x = T.matrix ('x')
-    y = T.matrix ('y')
-    z = S.dot (x, y)
-    f = theano.function ([x, y], z)
-    #a = np.array ([[0, 1], [1, 0], [1, 0], [0, 1]], dtype=np.float32)
+Before going further, here are the ``import`` statements that are assumed for the rest of the
-    a = sp.coo_matrix (([1] * 4, (range (4), [0, 1, 1, 0])), dtype=np.float32)
+tutorial:
-    b = np.array ([[10, 11], [12, 13]], dtype=np.float32)
+>>> import theano
+>>> import numpy as np
+>>> import scipy.sparse as sp
+>>> from theano import sparse
-    print f (a, b)
+Compressed Sparse Format
+========================
+.. Changes to this section should also result in changes to library/sparse/index.txt.
+Theano supports two *compressed sparse formats*  ``csc`` and ``csr``, respectively based on columns
+and rows. They have both the same attributes: ``data``, ``indices``, ``indptr`` and ``shape``.
+  * The ``data`` attribute is a one-dimentionnal ``ndarray`` which contains all the non-zero
+    elements of the sparse matrix.
+  * The ``indices`` and ``indptr`` attributes are used to store the position of the data in the
+    sparse matrix.
+  * The ``shape`` attribute is exactly the same as the ``shape`` attribute of a dense (i.e. generic)
+    matrix. It can be explicitly specified at the creation of a sparse matrix if it cannot be infered
+    from the first three attributes.
+Which format should I use?
+--------------------------
+At the end, the format does not affect the length of the ``data`` and ``indices`` attributes. They are both
+completly fixed by the number of elements you want to store. The only thing that changes with the format
+is ``indptr``. In ``csc`` format, the matrix is compressed along columns so a lower number of columns will
+result in less memory use. On the other hand, with the ``csr`` format, the matrix is compressed along
+the rows and with a matrix that have a lower number of rows, ``csr`` format is a better choice. So here is the rule:
+.. note::
+    If shape[0] > shape[1], use ``csr`` format. Otherwise, use ``csc``.
+Sometimes, since the sparse module is young, ops does not exist for both format. So here is 
+what may be the most relevent rule:
+.. note::
+    Use the format compatible with the ops in your computation graph.
+The documentation about the ops and their supported format may be found in 
+the :ref:`Sparse Library Reference <libdoc_sparse>`.
+Handling Sparse in Theano
+=========================
+Most of the ops in Theano depend on the ``format`` of the sparse matrix.
+That is why there are two kinds of constructors of sparse variables:
+``csc_matrix`` and ``csr_matrix``. These can be called with the usual
+``name`` and ``dtype`` parameters, but no ``broadcastable`` flags are
+allowed. This is forbidden since the sparse package, as the SciPy sparse module,
+does not provide any way to handle a number of dimensions different from two.
+The set of all accepted ``dtype`` for the sparse matrices can be found in
+``sparse.all_dtypes``.
+>>> sparse.all_dtypes
+set(['int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128'])
+To and Fro
+----------
+To move back and forth from a dense matrix to a sparse matrix representation, Theano
+provides the ``dense_from_sparse``, ``csr_from_dense`` and
+``csc_from_dense`` functions. No additional detail must be provided. Here is
+an example that performs a full cycle from sparse to sparse:
+>>> x = sparse.csc_matrix(name='x', dtype='float32')
+>>> y = sparse.dense_from_sparse(x)
+>>> z = sparse.csc_from_dense(y)
+Properties and Construction
+---------------------------
+Although sparse variables do not allow direct access to their properties, 
+this can be accomplished using the ``csm_properties`` function. This will return
+a tuple of one-dimensional ``tensor`` variables that represents the internal characteristics
+of the sparse matrix.
+In order to reconstruct a sparse matrix from some properties, the functions ``CSC``
+and ``CSR`` can be used. This will create the sparse matrix in the desired
+format. As an example, the following code reconstructs a ``csc`` matrix into
+a ``csr`` one.
+>>> x = sparse.csc_matrix(name='x', dtype='int64')
+>>> data, indices, indptr, shape = sparse.csm_properties(x)
+>>> y = sparse.CSR(data, indices, indptr, shape)
+>>> f = theano.function([x], y)
+>>> a = sp.csc_matrix(np.asarray([[0, 1, 1], [0, 0, 0], [1, 0, 0]]))
+>>> print a.toarray()
+[[0 1 1]
+ [0 0 0]
+ [1 0 0]]
+>>> print f(a).toarray()
+[[0 0 1]
+ [1 0 0]
+ [1 0 0]]
+The last example shows that one format can be obtained from transposition of
+the other. Indeed, when calling the ``transpose`` function,
+the sparse characteristics of the resulting matrix cannot be the same as the one
+provided as input.
+Structured Operation
+--------------------
+Several ops are set to make use of the very peculiar structure of the sparse
+matrices. These ops are said to be *structured* and simply do not perform any
+computations on the zero elements of the sparse matrix. They can be thought as being
+applied only to the data attribute of the latter. Note that these structured ops
+provide a structured gradient. More explication below.
+>>> x = sparse.csc_matrix(name='x', dtype='float32')
+>>> y = sparse.structured_add(x, 2)
+>>> f = theano.function([x], y)
+>>> a = sp.csc_matrix(np.asarray([[0, 0, -1], [0, -2, 1], [3, 0, 0]], dtype='float32'))
+>>> print a.toarray()
+[[ 0.  0. -1.]
+ [ 0. -2.  1.]
+ [ 3.  0.  0.]]
+>>> print f(a).toarray()
+[[ 0.  0.  1.]
+ [ 0.  0.  3.]
+ [ 5.  0.  0.]]
+Gradient
+--------
+The gradients of the ops in the sparse module can also be structured. Some ops provide
+a *flag* to indicate if the gradient is to be structured or not. The documentation can
+be used to determine if the gradient of an op is regular or structured or if its
+implementation can be modified. Similarly to structured ops, when a structured gradient is calculated, the
+computation is done only for the non-zero elements of the sparse matrix.
+More documentation regarding the gradients of specific ops can be found in the
+:ref:`Sparse Library Reference <libdoc_sparse>`.
--- a/theano/sparse/basic.py
+++ b/theano/sparse/basic.py
@@ -159,8 +159,8 @@ def verify_grad_sparse(op, pt, structured=False, *args, **kwargs):
    :param pt: List of inputs to realize the tests.
    :param structured: True to tests with a structured grad,
                       False otherwise.
-    :param *args: Other `verify_grad` parameters if any.
+    :param args: Other `verify_grad` parameters if any.
-    :param **kwargs: Other `verify_grad` keywords if any.
+    :param kwargs: Other `verify_grad` keywords if any.
    :return: None
    """
@@ -233,12 +233,29 @@ def constant(x, name=None):
 def sp_ones_like(x):
+    """Construct a sparse matrix of ones
+    with the same sparsity pattern.
+    :param x: Sparse matrix to take
+              the sparsity pattern.
+    :return: The same as `x` with data
+             changed for ones.
+    """
    # TODO: don't restrict to CSM formats
    data, indices, indptr, shape = csm_properties(x)
    return CSM(format=x.format)(tensor.ones_like(data), indices, indptr, shape)
 def sp_zeros_like(x):
+    """Construct a sparse matrix of zeros.
+    :param x: Sparse matrix to take
+              the shape.
+    :return: The same as `x` with zero entries
+             for all element.
+    """
    #TODO: don't restrict to CSM formats
    _, _, indptr, shape = csm_properties(x)
    return CSM(format=x.format)(numpy.array([], dtype=x.type.dtype),
@@ -545,9 +562,8 @@ class CSMProperties(gof.Op):
    :return: (data, indices, indptr, shape), the properties
             of `csm`.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
+           `infer_shape` method is not available for this op.
-    - `infer_shape` method is not available for this op.
    """
    # NOTE
@@ -655,8 +671,7 @@ class CSM(gof.Op):
    :return: A sparse matrix having the properties
             speficied by the inputs.
-    :note:
+    :note: The grad method returns a dense vector, so it provide
-    - The grad method returns a dense vector, so it provide
           a regular grad.
    """
@@ -970,8 +985,8 @@ class Cast(gof.op.Op):
    :return: Same as `x` but having `out_type` as dtype.
-    :note:
+    :note: The grad implemented is regular, i.e. not
-    - The grad implemented is regular, i.e. not structured.
+           structured.
    """
    def __init__(self, out_type):
@@ -1028,8 +1043,7 @@ class DenseFromSparse(gof.op.Op):
    :return: A dense matrix, the same as `x`.
-    :note:
+    :note: The grad implementation can be controlled
-    - The grad implementation can be controlled
           through the constructor via the `structured`
           parameter. `True` will provide a structured
           grad while `False` will provide a regular
@@ -1092,10 +1106,9 @@ class SparseFromDense(gof.op.Op):
    :return: The same as `x` in a sparse matrix
             format.
-    :note:
+    :note: The grad implementation is regular, i.e.
-    - The grad implementation is regular, i.e.
           not structured.
-    - The output sparse format can also be controlled
+    :note: The output sparse format can also be controlled
           via the `format` parameter in the constructor.
    """
@@ -1173,8 +1186,7 @@ class GetItem2d(gof.op.Op):
    :return: The slice corresponding in `x`.
-    :note:
+    :note: The grad is not implemented for this op.
-    - The grad is not implemented for this op.
    """
    def __eq__(self, other):
@@ -1271,8 +1283,7 @@ class GetItemScalar(gof.op.Op):
    :return: The item corresponding in `x`.
-    :note:
+    :note:  The grad is not implemented for this op.
-    - The grad is not implemented for this op.
    """
    def __eq__(self, other):
@@ -1326,11 +1337,11 @@ class Transpose(gof.op.Op):
    :return: `x` transposed.
-    :note:
+    :note: The returned matrix will not be in the
-    - The returned matrix will not be in the same format. `csc`
+           same format. `csc` matrix will be changed
-      matrix will be changed in `csr` matrix and `csr` matrix in
+           in `csr` matrix and `csr` matrix in `csc`
-      `csc` matrix.
+           matrix.
-    - The grad is regular, i.e. not structured.
+    :note: The grad is regular, i.e. not structured.
    """
    format_map = {'csr': 'csc',
@@ -1373,8 +1384,7 @@ class Neg(gof.op.Op):
    :return: -`x`.
-    :note:
+    :note: The grad is regular, i.e. not structured.
-    - The grad is regular, i.e. not structured.
    """
    def __eq__(self, other):
@@ -1415,8 +1425,7 @@ class ColScaleCSC(gof.op.Op):
    #          each column had been multiply by the corresponding
    #          element of `s`.
-    # :note:
+    # :note: The grad implemented is structured.
-    # - The grad implemented is structured.
    def __eq__(self, other):
        return type(self) == type(other)
@@ -1463,8 +1472,7 @@ class RowScaleCSC(gof.op.Op):
    #          each row had been multiply by the corresponding
    #          element of `s`.
-    # :note:
+    # :note: The grad implemented is structured.
-    # - The grad implemented is structured.
    def __eq__(self, other):
        return type(self) == type(other)
@@ -1513,8 +1521,7 @@ def col_scale(x, s):
             each column had been multiply by the corresponding
             element of `s`.
-    :note:
+    :note:  The grad implemented is structured.
-    - The grad implemented is structured.
    """
    if x.format == 'csc':
@@ -1537,8 +1544,7 @@ def row_scale(x, s):
             each row had been multiply by the corresponding
             element of `s`.
-    :note:
+    :note:  The grad implemented is structured.
-    - The grad implemented is structured.
    """
    return col_scale(x.T, s).T
@@ -1556,12 +1562,11 @@ class SpSum(gof.op.Op):
    :return: The sum of `x` in a dense format.
-    :note:
+    :note: The grad implementation is controlled with the `sparse_grad`
-    - The grad implementation is controlled with the `sparse_grad`
           parameter. `True` will provide a structured grad and `False`
           will provide a regular grad. For both choice, the grad
           return a sparse matrix having the same format as `x`.
-    - This op does not return a sparse matrix, but a dense tensor
+    :note: This op does not return a sparse matrix, but a dense tensor
           matrix.
    """
@@ -1660,8 +1665,7 @@ class Diag(gof.op.Op):
    :return: A dense vector representing the diagonal elements.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured, since
-    - The grad implemented is regular, i.e. not structured, since
           the output is a dense vector.
    """
@@ -1700,8 +1704,7 @@ class SquareDiagonal(gof.op.Op):
    :return: A sparse matrix having `x` as diagonal.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    def __eq__(self, other):
@@ -1752,8 +1755,7 @@ class EnsureSortedIndices(gof.op.Op):
    :return: The same as `x` with indices sorted.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    def __init__(self, inplace):
@@ -1804,8 +1806,7 @@ def clean(x):
    :return: The same as `x` with indices sorted and zeros
             removed.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    return ensure_sorted_indices(remove0(x))
@@ -1818,8 +1819,7 @@ class AddSS(gof.op.Op):
    :return: `x`+`y`
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    def __eq__(self, other):
@@ -1868,9 +1868,9 @@ class AddSSData(gof.op.Op):
    :return: The sum of the two sparse matrix element wise.
-    :note:
+    :note: `x` and `y` are assumed to have the same
-    - `x` and `y` are assumed to have the same sparsity pattern.
+           sparsity pattern.
-    - The grad implemented is structured.
+    :note: The grad implemented is structured.
    """
    def __eq__(self, other):
@@ -1919,8 +1919,7 @@ class AddSD(gof.op.Op):
    :return: `x`+`y`
-    :note:
+    :note: The grad implemented is structured on `x`.
-    - The grad implemented is structured on `x`.
    """
    def __eq__(self, other):
@@ -2028,10 +2027,9 @@ class StructuredAddSVCSR(gof.Op):
    # :return: A sparse matrix containing the addition of the vector to
    #          the data of the sparse matrix.
-    # :note:
+    # :note: The a_* are the properties of a sparse matrix in csr
-    # - The a_* are the properties of a sparse matrix in csr
    #        format.
-    # - This op is used as an optimization for StructuredAddSV.
+    # :note: This op is used as an optimization for StructuredAddSV.
    def __eq__(self, other):
        return (type(self) == type(other))
@@ -2147,10 +2145,9 @@ def add(x, y):
    :return: `x` + `y`
-    :note:
+    :note: At least one of `x` and `y` must be a sparse matrix.
-    - At least one of `x` and `y` must be a sparse matrix.
+    :note: The grad will be structured only when one of the
-    - The grad will be structured only when one of the variable
+           variable will be a dense matrix.
-      will be a dense matrix.
    """
    if hasattr(x, 'getnnz'):
@@ -2183,9 +2180,8 @@ def sub(x, y):
    :return: `x` - `y`
-    :note:
+    :note: At least one of `x` and `y` must be a sparse matrix.
-    - At least one of `x` and `y` must be a sparse matrix.
+    :note: The grad will be structured only when one of the variable
-    - The grad will be structured only when one of the variable
           will be a dense matrix.
    """
@@ -2200,8 +2196,8 @@ class MulSS(gof.op.Op):
    :return: `x` * `y`
-    :note:
+    :note: At least one of `x` and `y` must be a sparse matrix.
-    - At least one of `x` and `y` must be a sparse matrix.
+    :note: The grad implemented is regular, i.e. not structured.
    """
    def __eq__(self, other):
@@ -2244,8 +2240,7 @@ class MulSD(gof.op.Op):
    :return: `x` * `y`
-    :note:
+    :note: The grad is regular, i.e. not structured..
-    - The grad is regular, i.e. not structured..
    """
    def __eq__(self, other):
@@ -2338,12 +2333,11 @@ class MulSDCSC(gof.Op):
    # :return: The multiplication of the two matrix element wise.
-    # :note:
+    # :note: `a_data`, `a_indices` and `a_indptr` must be the properties
-    # - `a_data`, `a_indices` and `a_indptr` must be the properties
    #         of a sparse matrix in csc format.
-    # - The dtype of `a_data`, i.e. the dtype of the sparse matrix,
+    # :note: The dtype of `a_data`, i.e. the dtype of the sparse matrix,
    #        cannot be a complex type.
-    # - This op is used as an optimization of mul_s_d.
+    # :note: This op is used as an optimization of mul_s_d.
    def __eq__(self, other):
        return (type(self) == type(other))
@@ -2452,12 +2446,11 @@ class MulSDCSR(gof.Op):
    # :return: The multiplication of the two matrix element wise.
-    # :note:
+    # :note: `a_data`, `a_indices` and `a_indptr` must be the properties
-    # - `a_data`, `a_indices` and `a_indptr` must be the properties
    #         of a sparse matrix in csr format.
-    # - The dtype of `a_data`, i.e. the dtype of the sparse matrix,
+    # :note: The dtype of `a_data`, i.e. the dtype of the sparse matrix,
    #        cannot be a complex type.
-    # - This op is used as an optimization of mul_s_d.
+    # :note: This op is used as an optimization of mul_s_d.
    def __eq__(self, other):
        return (type(self) == type(other))
@@ -2564,8 +2557,7 @@ class MulSV(gof.op.Op):
    :Return: The product x * y element wise.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    def __eq__(self, other):
@@ -2616,12 +2608,11 @@ class MulSVCSR(gof.Op):
    # :return: The multiplication of the two matrix element wise.
-    # :note:
+    # :note: `a_data`, `a_indices` and `a_indptr` must be the properties
-    # - `a_data`, `a_indices` and `a_indptr` must be the properties
    #         of a sparse matrix in csr format.
-    # - The dtype of `a_data`, i.e. the dtype of the sparse matrix,
+    # :note: The dtype of `a_data`, i.e. the dtype of the sparse matrix,
    #        cannot be a complex type.
-    # - This op is used as an optimization of MulSV.
+    # :note: This op is used as an optimization of MulSV.
    def __eq__(self, other):
        return (type(self) == type(other))
@@ -2726,9 +2717,8 @@ def mul(x, y):
    :return: `x` + `y`
-    :note:
+    :note: At least one of `x` and `y` must be a sparse matrix.
-    - At least one of `x` and `y` must be a sparse matrix.
+    :note: The grad is regular, i.e. not structured.
-    - The grad is regular, i.e. not structured.
    """
    x = as_sparse_or_tensor_variable(x)
@@ -2758,9 +2748,8 @@ class HStack(gof.op.Op):
    :return: The concatenation of the sparse arrays column wise.
-    :note:
+    :note: The number of line of the sparse matrix must agree.
-    - The number of line of the sparse matrix must agree.
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    def __init__(self, format=None, dtype=None):
@@ -2840,9 +2829,8 @@ def hstack(blocks, format=None, dtype=None):
    :return: The concatenation of the sparse array column wise.
-    :note:
+    :note: The number of line of the sparse matrix must agree.
-    - The number of line of the sparse matrix must agree.
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    blocks = [as_sparse_variable(i) for i in blocks]
@@ -2861,9 +2849,8 @@ class VStack(HStack):
    :return: The concatenation of the sparse arrays row wise.
-    :note:
+    :note: The number of column of the sparse matrix must agree.
-    - The number of column of the sparse matrix must agree.
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    def perform(self, node, block, (out, )):
@@ -2914,9 +2901,8 @@ def vstack(blocks, format=None, dtype=None):
    :return: The concatenation of the sparse array row wise.
-    :note:
+    :note: The number of column of the sparse matrix must agree.
-    - The number of column of the sparse matrix must agree.
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    blocks = [as_sparse_variable(i) for i in blocks]
@@ -2926,8 +2912,14 @@ def vstack(blocks, format=None, dtype=None):
 class Remove0(gof.Op):
-    """
+    """Remove explicit zeros from a sparse matrix, and
-    Remove explicit zeros from a sparse matrix, and resort indices
+    resort indices.
+    :param x: Sparse matrix.
+    :return: Exactly `x` but with a data attribute
+             exempt of zeros.
+    :note: The grad implemented is regular, i.e. not structured.
    """
    def __init__(self, inplace=False, *args, **kwargs):
@@ -3316,8 +3308,7 @@ class StructuredDot(gof.Op):
    :return: The dot product of `a` and `b`.
-    :note:
+    :note: The grad implemented is structured.
-    - The grad implemented is structured.
    """
    def __eq__(self, other):
@@ -3412,8 +3403,7 @@ def structured_dot(x, y):
    :return: The dot product of `a` and `b`.
-    :note:
+    :note: The grad implemented is structured.
-    - The grad implemented is structured.
    """
    # @todo: Maybe the triple-transposition formulation (when x is dense)
@@ -3451,9 +3441,8 @@ class StructuredDotCSC(gof.Op):
    # :return: The dot product of `a` and `b`.
-    # :note:
+    # :note: The grad implemented is structured.
-    # - The grad implemented is structured.
+    # :note: This op is used as an optimization for StructuredDot.
-    # - This op is used as an optimization for StructuredDot.
    def __eq__(self, other):
        return (type(self) == type(other))
@@ -3641,9 +3630,8 @@ class StructuredDotCSR(gof.Op):
    # :return: The dot product of `a` and `b`.
-    # :note:
+    # :note: The grad implemented is structured.
-    # - The grad implemented is structured.
+    # :note: This op is used as an optimization for StructuredDot.
-    # - This op is used as an optimization for StructuredDot.
    def __eq__(self, other):
        return (type(self) == type(other))
@@ -3813,8 +3801,7 @@ class SamplingDot(gof.op.Op):
    :return: A dense matrix containing the dot product of `x` by `y`.T only
             where `p` is 1.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
    """
    def __eq__(self, other):
@@ -3893,11 +3880,10 @@ class SamplingDotCSR(gof.Op):
    # :return: A dense matrix containing the dot product of `x` by `y`.T only
    #          where `p` is 1.
-    # :note:
+    # :note: If we have the input of mixed dtype, we insert cast elemwise
-    # - If we have the input of mixed dtype, we insert cast elemwise
    #        in the graph to be able to call blas function as they don't
    #        allow mixed dtype.
-    # - This op is used as an optimization for SamplingDot.
+    # :note: This op is used as an optimization for SamplingDot.
    def __eq__(self, other):
        return type(self) == type(other)
@@ -4137,9 +4123,8 @@ class StructuredDotGradCSC(gof.Op):
    # :return: The grad of `a`.`b` for `a` accumulated
    #          with g_ab.
-    # :note:
+    # :note: The grad implemented is structured.
-    # - The grad implemented is structured.
+    # :note: a_* are the corresponding properties of a sparse
-    # - a_* are the corresponding properties of a sparse
    #        matrix in csc format.
    def __eq__(self, other):
@@ -4273,9 +4258,8 @@ class StructuredDotGradCSR(gof.Op):
    # :return: The grad of `a`.`b` for `a` accumulated
    #          with g_ab.
-    # :note:
+    # :note: The grad implemented is structured.
-    # - The grad implemented is structured.
+    # :note: a_* are the corresponding properties of a sparse
-    # - a_* are the corresponding properties of a sparse
    #        matrix in csr format.
    def __eq__(self, other):
@@ -4411,9 +4395,11 @@ class Dot(gof.op.Op):
    :return: The dot product `x`.`y` in a dense format.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
+    :note: At least one of `x` or `y` must be a sparse matrix.
-    - At least one of `x` or `y` must be a sparse matrix.
+    :note: When the operation has the form dot(csr_matrix, dense)
+           the gradient of this operation can be performed inplace
+           by UsmmCscDense. This leads to significant speed-ups.
    """
    def __eq__(self, other):
@@ -4490,9 +4476,8 @@ def dot(x, y):
    :return: The dot product `x`.`y` in a dense format.
-    :note:
+    :note: The grad implemented is regular, i.e. not structured.
-    - The grad implemented is regular, i.e. not structured.
+    :note: At least one of `x` or `y` must be a sparse matrix.
-    - At least one of `x` or `y` must be a sparse matrix.
    """
    if hasattr(x, 'getnnz'):
@@ -4519,9 +4504,8 @@ class Usmm(gof.op.Op):
    :return: The dense matrix resulting from `alpha` * `x` `y` + `z`.
-    :note:
+    :note: The grad is not implemented for this op.
-    - The grad is not implemented for this op.
+    :note: At least one of `x` or `y` must be a sparse matrix.
-    - At least one of `x` or `y` must be a sparse matrix.
    """
    # We don't implement the infer_shape as it is
@@ -4593,9 +4577,8 @@ class UsmmCscDense(gof.Op):
    # :return: The dense matrix resulting from `alpha` * `x` `y` + `z`.
-    # :note:
+    # :note: The grad is not implemented for this op.
-    # - The grad is not implemented for this op.
+    # :note: Optimized version os Usmm when `x` is in csc format and
-    # - Optimized version os Usmm when `x` is in csc format and
    #        `y` is dense.
    def __init__(self, inplace):