提交 2ad3e6a0 authored 作者: Nicolas Bouchard's avatar Nicolas Bouchard

First draft.

上级 555af254
......@@ -4,35 +4,206 @@
Sparse
======
This is a very short tutorial on sparse matrices with Theano. There is still
some not well documented behavior like how we take care of the
gradient. There are 2 types of gradient for sparse operations. ``full
gradient`` that compute a gradient for values even if they were 0 and
the ``structured gradient`` that returns a gradient only for values
that were not 0. You need to check the code to know which gradient an
Op implements.
Sparse matrices
===============
In general, sparse matrices provide the same functionnality as regular
matrices. The difference lie in the way the element of the matrix are
stored. These matrices do not store zeros element of a matrix, so it takes
advantage of a high number of zeros elements. This reduce compution
time by using specific algorithms and, obviously, it reduces memory usage
since zeros are not stored in memory. We usually refer to normaly stored
matrices as dense matrices.
The sparse package povide efficient algorithms, but it is not recommended
in all cases or for all matrices. An obvious exemple is if the
sparsity proportion if very low. The sparsity proportion refer to the
ratio of the number of zeros elements over the number of all element.
A low sparsity proportion may results in use of more space in memory
since not only the actual data is stored, but also the position of all
the elements of the matrix. This would also takes more computation
time and a dense matrix with regular optimized algorithms may do a
better jobs. Other exemples depend on what would be done with the
matrix and the structure of the matrix itself.
Since sparse matrices are not store in contiguous array, there is many
way to represent them in memory. This is usually designated by the ``format``
of the matrix. Theano sparse matrix package is based on the scipy
sparse package, so all informations about the sparse matrices can be found
in the scipy documentation. As scipy, Theano does not implement sparse for
arrays with a number of dimension different of 2.
So far, theano implements two ``formats`` of sparse matrix: ``csc`` and ``csr``.
They are almost the same format except ``csc`` is based on the columns of the
matrix and ``csr`` is based on the rows. They both have the same purpose:
provide efficient algorithms to make linear algebra operations. On the other
hand, they fail to access their elements easily. That mean if you are planning
to access elements of a sparse matrix a lot in your computation graph, maybe
a tensor variable could be a better choice.
More documentation in the :ref:`Sparse Library Reference <libdoc_sparse>`.
A small example:
Compressed sparse format
========================
Compressed sparse formats are the ``format`` supported by Theano. There is two
of them: one based on columns and one based on rows. They both have the same attributes:
``data``, ``indices``, ``indptr`` and ``shape``. The ``shape`` atribute is exactly
the same as the shape attribute of a dense matrix. It can be specified at the
creation of a sparse matrix if the shape cannot be infered from the first
three attributes.
The ``data`` attribute is a one dimentionnal ndarray which contains all the
non zeros elements of the sparse matrix. The ``indices`` and ``indptr``
attribute are used to store the position of the data in the sparse matrix.
Before going further, here is the import that is assumed for the rest of the
tutorial.
>>> import theano
>>> import numpy as np
>>> import scipy.sparse as sp
>>> from theano import tensor as T
>>> from theano import sparse as S
CSC matrix
----------
For the Compressed Sparse Column matrices, ``indices`` stands for the indices
of the data along the column and ``indptr`` stands for the column index of the
matrix. The folowing exemple returns the i-th column of the matrix.
>>> data = np.asarray([7, 8, 9])
>>> indices = np.asarray([0, 1, 2])
>>> indptr = np.asarray([0, 2, 3, 3])
>>> m = sp.csc_matrix((data, indices, indptr), shape=(3, 3))
>>> print m.toarray()
[[7 0 0]
[8 0 0]
[0 9 0]]
>>> i = 0
>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
[0, 1] [7, 8]
>>> i = 1
>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
[2] [3]
>>> i = 2
>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
[] []
CSR matrix
----------
For the Compressed Sparse Row matrices, ``indices`` stands for the indices
of the data along the row and ``indptr`` stands for the row index of the
matrix. The folowing exemple returns the i-th row of the matrix.
>>> data = np.asarray([1, 2, 3])
>>> indices = np.asarray([0, 1, 2])
>>> indptr = np.asarray([0, 2, 3, 3])
>>> m = sp.csr_matrix((data, indices, indptr), shape=(3, 3))
>>> print m.toarray()
[[7 8 0]
[0 0 9]
[0 0 0]]
>>> i = 0
>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
[0, 1] [7, 8]
>>> i = 1
>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
[2] [3]
>>> i = 2
>>> print m.indices[m.indptr[i]:m.indptr[i+1]], m.data[m.indptr[i]:m.indptr[i+1]]
[] []
Handling sparse in Theano
=========================
Most of the ops in Theano depends on the ``format`` of the sparse matrix.
That is why there is two kinds of sparse variable: ``csc_matrix`` and
``csr_matrix``. These two constructors can be called with the usual name and
specific dtype, but no broadcastable flags is allowed. This is forbiden
since sparse package does not provide any way to handle a number of
dimension different of two. The set of all accepted dtypes for the sparse
matrices can be found in ``sparse.all_dtypes``.
>>> S.all_dtypes
set(['int32', 'int16', 'float64', 'complex128', 'complex64', 'int64', 'int8', 'float32'])
Properties and construction
---------------------------
Sparse variable does not provide direct access to its properties, but
this can be done using the ``csm_properties`` function. This will return
a tuple of one dimensionnal tensor variable that represent the internal
of the sparse matrix.
In order to reconstruct a sparse matrix from some properties, ``CSC``
and ``CSR`` can be used. This will create the sparse matrix in the desired
format. As an example, the folowing code reconstructs a ``csc`` matrix into
a ``csr`` one.
>>> x = S.csc_matrix(name='x', dtype='int64')
>>> data, indices, indptr, shape = S.csm_properties(x)
>>> y = S.CSR(data, indices, indptr, shape)
>>> f = theano.function([x], y)
>>> a = sp.csc_matrix(np.asarray([[0, 1, 1], [0, 0, 0], [1, 0, 0]]))
>>> print a.toarray()
[[0 1 1]
[0 0 0]
[1 0 0]]
>>> print f(a).toarray()
[[0 0 1]
[1 0 0]
[1 0 0]]
The last example show that one format can be obtained by transposition of
the other. In fact, when calling the ``transpose`` function,
the format of the resulting matrix will not be the same as the one
in input.
To and fro
----------
To move back and forth from dense matrix to sparse matrix, theano
provide the ``dense_from_sparse``, ``csr_from_dense`` and
``csc_from_dense`` functions. No details must be added; here is
an example that does completly nothing.
>>> x = S.csc_matrix(name='x', dtype='float32')
>>> y = S.dense_from_sparse(x)
>>> z = S.csc_from_dense(y)
.. code-block:: python
Structured operation
--------------------
import theano
import theano.tensor as T
import scipy.sparse as sp
import theano.sparse as S
import numpy as np
Many ops are set to make use of the very peculiar structure of the sparse
matrices. These ops are said structured and they simply do not make any
computation to the zeros elements of the matrix. They can be tough as being
apply only on the data attribute of the sparse matrix.
x = S.csr_matrix ('x')
#x = T.matrix ('x')
y = T.matrix ('y')
z = S.dot (x, y)
f = theano.function ([x, y], z)
>>> x = S.csc_matrix(name='x', dtype='float32')
>>> y = S.structured_add(x, 2)
>>> f = theano.function([x], y)
>>> a = sp.csc_matrix(np.asarray([[0, 0, -1], [0, -2, 1], [3, 0, 0]], dtype='float32'))
>>> print a.toarray()
[[ 0. 0. -1.]
[ 0. -2. 1.]
[ 3. 0. 0.]]
>>> print f(a).toarray()
[[ 0. 0. 1.]
[ 0. 0. 3.]
[ 5. 0. 0.]]
#a = np.array ([[0, 1], [1, 0], [1, 0], [0, 1]], dtype=np.float32)
a = sp.coo_matrix (([1] * 4, (range (4), [0, 1, 1, 0])), dtype=np.float32)
Gradient
--------
b = np.array ([[10, 11], [12, 13]], dtype=np.float32)
The gradient of the sparse ops can also be structured. Some ops provide
a way to change if the grad is structured or not and the documentation can
be used to determine if the grad of an op is regular or structured or the
implementation can be modify. When a structured grad is calculated, the
computation is done only for the non zeros elements of the sparse matrix.
print f (a, b)
More documentation about the grad of specific ops is in the
:ref:`Sparse Library Reference <libdoc_sparse>`.
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论