Merge pull request #816 from bouchnic/docstring

Docstring NEWS: Make a good sparse tutorial (NB)

Merge pull request #816 from bouchnic/docstring
4cc35522 · nouiz · 3a2029b7 · 2081af30 · 4cc35522 · 4cc35522
--- a/doc/extending/pipeline.txt
+++ b/doc/extending/pipeline.txt
@@ -37,7 +37,7 @@ computation graph in the compilation phase:


 Step 1 - Create a FunctionGraph
-^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The subgraph given by the end user is wrapped in a structure called
 *FunctionGraph*. That structure defines several hooks on adding and

--- a/doc/library/gof/fg.txt
+++ b/doc/library/gof/fg.txt
@@ -15,17 +15,17 @@ Guide
 =====

 FunctionGraph
---
+-------------

 .. _libdoc_gof_fgraphfeature:

 FunctionGraph Features
-------------
+----------------------

 .. _libdoc_gof_fgraphfeaturelist:

 FunctionGraph Feature List
-^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^
 * ReplaceValidate
 * DestroyHandler


--- a/doc/library/sparse/index.txt
+++ b/doc/library/sparse/index.txt
--- a/doc/tutorial/sparse.txt
+++ b/doc/tutorial/sparse.txt
@@ -4,35 +4,186 @@
 Sparse
 ======

-This is a very short tutorial on sparse matrices with Theano. There is still
-some not well documented behavior like how we take care of the
-gradient. There are 2 types of gradient for sparse operations. ``full
-gradient`` that compute a gradient for values even if they were 0 and
-the ``structured gradient`` that returns a gradient only for values
-that were not 0. You need to check the code to know which gradient an
-Op implements.
+Sparse Matrices
+===============

-More documentation in the :ref:`Sparse Library Reference <libdoc_sparse>`.
+In general, *sparse* matrices provide the same functionality as regular
+matrices. The difference lies in the way the elements of *sparse* matrices are 
+represented and stored in memory. Only the non-zero elements of the latter are stored.
+This has some potential advantages: first, this
+may obviously lead to reduced memory usage and, second, clever
+storage methods may lead to reduced computation time through the use of
+sparse specific algorithms. We usually refer to the generically stored matrices
+as *dense* matrices.

-A small example:
+Theano's sparse package provides efficient algorithms, but its use is not recommended 
+in all cases or for all matrices. As an obvious example, consider the case where
+the *sparsity proportion* if very low. The *sparsity proportion* refers to the
+ratio of the number of zero elements to the number of all elements in a matrix.
+A low sparsity proportion may result in the use of more space in memory
+since not only the actual data is stored, but also the position of nearly every
+element of the matrix. This would also require more computation
+time whereas a dense matrix representation along with regular optimized algorithms might do a
+better job. Other examples may be found at the nexus of the specific purpose and structure
+of the matrices. More documentation may be found in the
+`SciPy Sparse Reference <http://docs.scipy.org/doc/scipy/reference/sparse.html>`_.

-.. code-block:: python
+Since sparse matrices are not stored in contiguous arrays, there are several
+ways to represent them in memory. This is usually designated by the so-called ``format``
+of the matrix. Since Theano's sparse matrix package is based on the SciPy
+sparse package, complete information about sparse matrices can be found
+in the SciPy documentation. Like SciPy, Theano does not implement sparse formats for
+arrays with a number of dimensions different from two. 

-    import theano
-    import theano.tensor as T
-    import scipy.sparse as sp
-    import theano.sparse as S
-    import numpy as np
+So far, Theano implements two ``formats`` of sparse matrix: ``csc`` and ``csr``.
+Those are almost identical except that ``csc`` is based on the *columns* of the
+matrix and ``csr`` is based on its *rows*. They both have the same purpose:
+to provide for the use of efficient algorithms performing linear algebra operations.
+A disadvantage is that they fail to give an efficient way to modify the sparsity structure
+of the underlying matrix, i.e. adding new elements. This means that if you are
+planning to add new elements in a sparse matrix very often in your computational graph,
+perhaps a tensor variable could be a better choice.

-    x = S.csr_matrix ('x')
-    #x = T.matrix ('x')
-    y = T.matrix ('y')
-    z = S.dot (x, y)
-    f = theano.function ([x, y], z)
+More documentation may be found in the :ref:`Sparse Library Reference <libdoc_sparse>`.

-    #a = np.array ([[0, 1], [1, 0], [1, 0], [0, 1]], dtype=np.float32)
-    a = sp.coo_matrix (([1] * 4, (range (4), [0, 1, 1, 0])), dtype=np.float32)
+Before going further, here are the ``import`` statements that are assumed for the rest of the
+tutorial:

-    b = np.array ([[10, 11], [12, 13]], dtype=np.float32)
+>>> import theano
+>>> import numpy as np
+>>> import scipy.sparse as sp
+>>> from theano import sparse

-    print f (a, b)
+Compressed Sparse Format
+========================
+
+.. Changes to this section should also result in changes to library/sparse/index.txt.
+
+Theano supports two *compressed sparse formats*  ``csc`` and ``csr``, respectively based on columns
+and rows. They have both the same attributes: ``data``, ``indices``, ``indptr`` and ``shape``.
+
+  * The ``data`` attribute is a one-dimentionnal ``ndarray`` which contains all the non-zero
+    elements of the sparse matrix.
+
+  * The ``indices`` and ``indptr`` attributes are used to store the position of the data in the
+    sparse matrix.
+
+  * The ``shape`` attribute is exactly the same as the ``shape`` attribute of a dense (i.e. generic)
+    matrix. It can be explicitly specified at the creation of a sparse matrix if it cannot be infered
+    from the first three attributes.
+
+Which format should I use?
+--------------------------
+
+At the end, the format does not affect the length of the ``data`` and ``indices`` attributes. They are both
+completly fixed by the number of elements you want to store. The only thing that changes with the format
+is ``indptr``. In ``csc`` format, the matrix is compressed along columns so a lower number of columns will
+result in less memory use. On the other hand, with the ``csr`` format, the matrix is compressed along
+the rows and with a matrix that have a lower number of rows, ``csr`` format is a better choice. So here is the rule:
+
+.. note::
+
+    If shape[0] > shape[1], use ``csr`` format. Otherwise, use ``csc``.
+
+Sometimes, since the sparse module is young, ops does not exist for both format. So here is 
+what may be the most relevent rule:
+
+.. note::
+
+    Use the format compatible with the ops in your computation graph.
+
+The documentation about the ops and their supported format may be found in 
+the :ref:`Sparse Library Reference <libdoc_sparse>`.
+
+Handling Sparse in Theano
+=========================
+
+Most of the ops in Theano depend on the ``format`` of the sparse matrix.
+That is why there are two kinds of constructors of sparse variables:
+``csc_matrix`` and ``csr_matrix``. These can be called with the usual
+``name`` and ``dtype`` parameters, but no ``broadcastable`` flags are
+allowed. This is forbidden since the sparse package, as the SciPy sparse module,
+does not provide any way to handle a number of dimensions different from two.
+The set of all accepted ``dtype`` for the sparse matrices can be found in
+``sparse.all_dtypes``.
+
+>>> sparse.all_dtypes
+set(['int8', 'int16', 'int32', 'int64', 'float32', 'float64', 'complex64', 'complex128'])
+
+To and Fro
+----------
+
+To move back and forth from a dense matrix to a sparse matrix representation, Theano
+provides the ``dense_from_sparse``, ``csr_from_dense`` and
+``csc_from_dense`` functions. No additional detail must be provided. Here is
+an example that performs a full cycle from sparse to sparse:
+
+>>> x = sparse.csc_matrix(name='x', dtype='float32')
+>>> y = sparse.dense_from_sparse(x)
+>>> z = sparse.csc_from_dense(y)
+
+Properties and Construction
+---------------------------
+
+Although sparse variables do not allow direct access to their properties, 
+this can be accomplished using the ``csm_properties`` function. This will return
+a tuple of one-dimensional ``tensor`` variables that represents the internal characteristics
+of the sparse matrix.
+
+In order to reconstruct a sparse matrix from some properties, the functions ``CSC``
+and ``CSR`` can be used. This will create the sparse matrix in the desired
+format. As an example, the following code reconstructs a ``csc`` matrix into
+a ``csr`` one.
+
+>>> x = sparse.csc_matrix(name='x', dtype='int64')
+>>> data, indices, indptr, shape = sparse.csm_properties(x)
+>>> y = sparse.CSR(data, indices, indptr, shape)
+>>> f = theano.function([x], y)
+>>> a = sp.csc_matrix(np.asarray([[0, 1, 1], [0, 0, 0], [1, 0, 0]]))
+>>> print a.toarray()
+[[0 1 1]
+ [0 0 0]
+ [1 0 0]]
+>>> print f(a).toarray()
+[[0 0 1]
+ [1 0 0]
+ [1 0 0]]
+
+The last example shows that one format can be obtained from transposition of
+the other. Indeed, when calling the ``transpose`` function,
+the sparse characteristics of the resulting matrix cannot be the same as the one
+provided as input.
+
+Structured Operation
+--------------------
+
+Several ops are set to make use of the very peculiar structure of the sparse
+matrices. These ops are said to be *structured* and simply do not perform any
+computations on the zero elements of the sparse matrix. They can be thought as being
+applied only to the data attribute of the latter. Note that these structured ops
+provide a structured gradient. More explication below.
+
+>>> x = sparse.csc_matrix(name='x', dtype='float32')
+>>> y = sparse.structured_add(x, 2)
+>>> f = theano.function([x], y)
+>>> a = sp.csc_matrix(np.asarray([[0, 0, -1], [0, -2, 1], [3, 0, 0]], dtype='float32'))
+>>> print a.toarray()
+[[ 0.  0. -1.]
+ [ 0. -2.  1.]
+ [ 3.  0.  0.]]
+>>> print f(a).toarray()
+[[ 0.  0.  1.]
+ [ 0.  0.  3.]
+ [ 5.  0.  0.]]
+
+Gradient
+--------
+
+The gradients of the ops in the sparse module can also be structured. Some ops provide
+a *flag* to indicate if the gradient is to be structured or not. The documentation can
+be used to determine if the gradient of an op is regular or structured or if its
+implementation can be modified. Similarly to structured ops, when a structured gradient is calculated, the
+computation is done only for the non-zero elements of the sparse matrix.
+
+More documentation regarding the gradients of specific ops can be found in the
+:ref:`Sparse Library Reference <libdoc_sparse>`.
--- a/theano/sparse/basic.py
+++ b/theano/sparse/basic.py