First draft.

2ad3e6a0 · Nicolas Bouchard · 555af254 · 2ad3e6a0
--- a/doc/tutorial/sparse.txt
+++ b/doc/tutorial/sparse.txt
@@ -4,35 +4,206 @@
 Sparse
 ======

-This is a very short tutorial on sparse matrices with Theano. There is still
-some not well documented behavior like how we take care of the
-gradient. There are 2 types of gradient for sparse operations. ``full
-gradient`` that compute a gradient for values even if they were 0 and
-the ``structured gradient`` that returns a gradient only for values
-that were not 0. You need to check the code to know which gradient an
-Op implements.
+Sparse matrices
+===============
+
+In general, sparse matrices provide the same functionnality as regular
+matrices. The difference lie in the way the element of the matrix are 
+stored. These matrices do not store zeros element of a matrix, so it takes
+advantage of a high number of zeros elements. This reduce compution 
+time by using specific algorithms and, obviously, it reduces memory usage
+since zeros are not stored in memory. We usually refer to normaly stored
+matrices as dense matrices.
+
+The sparse package povide efficient algorithms, but it is not recommended 
+in all cases or for all matrices. An obvious exemple is if the
+sparsity proportion if very low. The sparsity proportion refer to the
+ratio of the number of zeros elements over the number of all element.
+A low sparsity proportion may results in use of more space in memory
+since not only the actual data is stored, but also the position of all
+the elements of the matrix. This would also takes more computation
+time and a dense matrix with regular optimized algorithms may do a
+better jobs. Other exemples depend on what would be done with the
+matrix and the structure of the matrix itself.
+
+Since sparse matrices are not store in contiguous array, there is many
+way to represent them in memory. This is usually designated by the ``format``
+of the matrix. Theano sparse matrix package is based on the scipy
+sparse package, so all informations about the sparse matrices can be found
+in the scipy documentation. As scipy, Theano does not implement sparse for
+arrays with a number of dimension different of 2. 
+
+So far, theano implements two ``formats`` of sparse matrix: ``csc`` and ``csr``.
+They are almost the same format except ``csc`` is based on the columns of the
+matrix and ``csr`` is based on the rows. They both have the same purpose:
+provide efficient algorithms to make linear algebra operations. On the other
+hand, they fail to access their elements easily. That mean if you are planning
+to access elements of a sparse matrix a lot in your computation graph, maybe
+a tensor variable could be a better choice.

 More documentation in the :ref:`Sparse Library Reference <libdoc_sparse>`.

-A small example:
+Compressed sparse format
+========================
+
+Compressed sparse formats are the ``format`` supported by Theano. There is two
+of them: one based on columns and one based on rows. They both have the same attributes:
+``data``, ``indices``, ``indptr`` and ``shape``. The ``shape`` atribute is exactly
+the same as the shape attribute of a dense matrix. It can be specified at the
+creation of a sparse matrix if the shape cannot be infered from the first
+three attributes.
+
+The ``data`` attribute is a one dimentionnal ndarray which contains all the
+non zeros elements of the sparse matrix. The ``indices`` and ``indptr``
+attribute are used to store the position of the data in the sparse matrix.
+
+Before going further, here is the import that is assumed for the rest of the
+tutorial.
+
+>>> import theano
+>>> import numpy as np
+>>> import scipy.sparse as sp
+>>> from theano import tensor as T
+>>> from theano import sparse as S
+
+CSC matrix
+----------
+
+For the Compressed Sparse Column matrices, ``indices`` stands for the indices
+of the data along the column and ``indptr`` stands for the column index of the 
+matrix. The folowing exemple returns the i-th column of the matrix.
+
+>>> data = np.asarray([7, 8, 9])
+>>> indices = np.asarray([0, 1, 2])
+>>> indptr = np.asarray([0, 2, 3, 3])
+>>> m = sp.csc_matrix((data, indices, indptr), shape=(3, 3))
+>>> print m.toarray()
+[[7 0 0]
+ [8 0 0]
+ [0 9 0]]
+>>> i = 0
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[0, 1] [7, 8]
+>>> i = 1
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[2] [3]
+>>> i = 2
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[] []
+
+CSR matrix
+----------
+
+For the Compressed Sparse Row matrices, ``indices`` stands for the indices
+of the data along the row and ``indptr`` stands for the row index of the 
+matrix. The folowing exemple returns the i-th row of the matrix.
+
+>>> data = np.asarray([1, 2, 3])
+>>> indices = np.asarray([0, 1, 2])
+>>> indptr = np.asarray([0, 2, 3, 3])
+>>> m = sp.csr_matrix((data, indices, indptr), shape=(3, 3))
+>>> print m.toarray()
+[[7 8 0]
+ [0 0 9]
+ [0 0 0]]
+>>> i = 0
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[0, 1] [7, 8]
+>>> i = 1
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[2] [3]
+>>> i = 2
+>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
+[] []
+
+
+Handling sparse in Theano
+=========================
+
+Most of the ops in Theano depends on the ``format`` of the sparse matrix.
+That is why there is two kinds of sparse variable: ``csc_matrix`` and 
+``csr_matrix``. These two constructors can be called with the usual name and
+specific dtype, but no broadcastable flags is allowed. This is forbiden 
+since sparse package does not provide any way to handle a number of
+dimension different of two. The set of all accepted dtypes for the sparse
+matrices can be found in ``sparse.all_dtypes``.
+
+>>> S.all_dtypes
+set(['int32', 'int16', 'float64', 'complex128', 'complex64', 'int64', 'int8', 'float32'])
+
+Properties and construction
+---------------------------
+
+Sparse variable does not provide direct access to its properties, but
+this can be done using the ``csm_properties`` function. This will return
+a tuple of one dimensionnal tensor variable that represent the internal
+of the sparse matrix.
+
+In order to reconstruct a sparse matrix from some properties, ``CSC``
+and ``CSR`` can be used. This will create the sparse matrix in the desired
+format. As an example, the folowing code reconstructs a ``csc`` matrix into
+a ``csr`` one.
+
+>>> x = S.csc_matrix(name='x', dtype='int64')
+>>> data, indices, indptr, shape = S.csm_properties(x)
+>>> y = S.CSR(data, indices, indptr, shape)
+>>> f = theano.function([x], y)
+>>> a = sp.csc_matrix(np.asarray([[0, 1, 1], [0, 0, 0], [1, 0, 0]]))
+>>> print a.toarray()
+[[0 1 1]
+ [0 0 0]
+ [1 0 0]]
+>>> print f(a).toarray()
+[[0 0 1]
+ [1 0 0]
+ [1 0 0]]
+
+The last example show that one format can be obtained by transposition of
+the other. In fact, when calling the ``transpose`` function,
+the format of the resulting matrix will not be the same as the one
+in input.
+
+To and fro
+----------
+
+To move back and forth from dense matrix to sparse matrix, theano
+provide the ``dense_from_sparse``, ``csr_from_dense`` and 
+``csc_from_dense`` functions. No details must be added; here is 
+an example that does completly nothing.
+
+>>> x = S.csc_matrix(name='x', dtype='float32')
+>>> y = S.dense_from_sparse(x)
+>>> z = S.csc_from_dense(y)

-.. code-block:: python
+Structured operation
+--------------------

-    import theano
-    import theano.tensor as T
-    import scipy.sparse as sp
-    import theano.sparse as S
-    import numpy as np
+Many ops are set to make use of the very peculiar structure of the sparse
+matrices. These ops are said structured and they simply do not make any
+computation to the zeros elements of the matrix. They can be tough as being
+apply only on the data attribute of the sparse matrix.

-    x = S.csr_matrix ('x')
-    #x = T.matrix ('x')
-    y = T.matrix ('y')
-    z = S.dot (x, y)
-    f = theano.function ([x, y], z)
+>>> x = S.csc_matrix(name='x', dtype='float32')
+>>> y = S.structured_add(x, 2)
+>>> f = theano.function([x], y)
+>>> a = sp.csc_matrix(np.asarray([[0, 0, -1], [0, -2, 1], [3, 0, 0]], dtype='float32'))
+>>> print a.toarray()
+[[ 0.  0. -1.]
+ [ 0. -2.  1.]
+ [ 3.  0.  0.]]
+>>> print f(a).toarray()
+[[ 0.  0.  1.]
+ [ 0.  0.  3.]
+ [ 5.  0.  0.]]

-    #a = np.array ([[0, 1], [1, 0], [1, 0], [0, 1]], dtype=np.float32)
-    a = sp.coo_matrix (([1] * 4, (range (4), [0, 1, 1, 0])), dtype=np.float32)
+Gradient
+--------

-    b = np.array ([[10, 11], [12, 13]], dtype=np.float32)
+The gradient of the sparse ops can also be structured. Some ops provide
+a way to change if the grad is structured or not and the documentation can
+be used to determine if the grad of an op is regular or structured or the
+implementation can be modify. When a structured grad is calculated, the
+computation is done only for the non zeros elements of the sparse matrix.

-    print f (a, b)
+More documentation about the grad of specific ops is in the
+:ref:`Sparse Library Reference <libdoc_sparse>`.