Unverified 提交 367351f3 authored 作者: Pham Nguyen Hung's avatar Pham Nguyen Hung 提交者: GitHub

Fixed dead wiki links (#950)

* Fixed dead wiki links * Fixed dead wiki links * Deleted old documentation at doc/sandbox.
上级 d4554608
...@@ -157,9 +157,9 @@ to extend PyTensor, please feel free to ask. ...@@ -157,9 +157,9 @@ to extend PyTensor, please feel free to ask.
install install
tutorial/index tutorial/index
.. _LISA: https://mila.umontreal.ca/ .. _LISA: https://mila.quebec/en
.. _Greek mathematician: http://en.wikipedia.org/wiki/Theano_(mathematician) .. _Greek mathematician: http://en.wikipedia.org/wiki/Theano_(mathematician)
.. _numpy: http://numpy.scipy.org/ .. _numpy: https://numpy.org/
.. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms .. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
.. _sympy: http://www.sympy.org/ .. _sympy: http://www.sympy.org/
......
...@@ -39,18 +39,18 @@ This is a sort of memo for developers and would-be developers. ...@@ -39,18 +39,18 @@ This is a sort of memo for developers and would-be developers.
.. _git: http://git-scm.com/ .. _git: http://git-scm.com/
.. _pytest: http://docs.pytest.org/en/latest/ .. _pytest: http://docs.pytest.org/en/latest/
.. _numpy: http://numpy.scipy.org/ .. _numpy: https://numpy.org/
.. _python: http://www.python.org .. _python: http://www.python.org
.. _scipy: http://scipy.org/ .. _scipy: http://scipy.org/
.. _autodiff: http://www.autodiff.org .. _autodiff: http://www.autodiff.org
.. _boost.python: http://www.boost.org/doc/libs/1_38_0/libs/python/doc/index.html .. _boost.python: https://www.boost.org/doc/libs/1_85_0/libs/python/doc/html/index.html
.. _cython: http://www.cython.org/ .. _cython: http://www.cython.org/
.. _liboil: http://liboil.freedesktop.org/wiki/ .. _liboil: http://liboil.freedesktop.org/wiki/
.. _llvm: http://llvm.org/ .. _llvm: http://llvm.org/
.. _networkx: http://networkx.lanl.gov/ .. _networkx: https://networkx.org/
.. _pypy: http://codespeak.net/pypy/dist/pypy/doc/ .. _pypy: https://doc.pypy.org/en/latest/
.. _swig: http://www.swig.org/ .. _swig: http://www.swig.org/
.. _unpython: http://code.google.com/p/unpython/ .. _unpython: https://code.google.com/archive/p/unpython/
.. _pycppad: http://www.seanet.com/~bradbell/pycppad/index.xml .. _pycppad: https://github.com/Simple-Robotics/pycppad
.. _shedskin: http://shed-skin.blogspot.com/ .. _shedskin: https://shedskin.github.io/shedskin/
差异被折叠。
.. _compilation:
=======================
Compilation and Linking
=======================
.. index::
single: Linker
.. _linker:
Linker
======
WRITEME
.. _sandbox_debugging_step_mode:
Debugging with a customized so-called StepMode
==============================================
One convenient trick I've found for debugging my programs that are running with pytensor is to
use what I call a 'StepMode'. There is no such StepMode in the standard library because the
purpose of it is to hack it to investigate what your own particular program is doing.
.. code-block:: python
from pytensor.link import WrapLinkerMany
from pytensor.configdefaults import config
from pytensor.compile.mode import (Mode, register_mode, predefined_modes, predefined_linkers,
predefined_optimizers)
class StepMode(Mode):
def __init__(self, linker=None, optimizer='default'):
if linker is None:
linker = config.linker
if optimizer is 'default':
optimizer = config.optimizer
def blah(i, node, th):
# This function will be run for each node in your compiled program.
# here you can inspect all the values as they are computed,
# ... you can even change them !
# 'i' is the execution position in the serialized graph
# node is the symbolic Apply instance
# th is a callable thing that will compute the node.
print i, node, len(th.inputs)
# the symbolic inputs of the node are in node.inputs
# the j'th non-symbolic input of the node is in th.inputs[j][0]
th() # call the function to actually 'run' the graph
# the symbolic outputs of the node are in node.outputs
# the j'th non-symbolic output of the node is in th.outputs[j][0]
print type(th.outputs[0][0])
if i == 39:
print 'this node is weird...', th.outputs[0][0]
self.provided_linker = linker
self.provided_optimizer = optimizer
if isinstance(linker, basestring) or linker is None:
linker = predefined_linkers[linker]
self.linker = WrapLinkerMany([linker], [blah])
if isinstance(optimizer, basestring) or optimizer is None:
optimizer = predefined_optimizers[optimizer]
self._optimizer = optimizer
The way to use it is like this:
.. code-block:: python
fn = function(inputs, outputs, mode=StepMode())
When you call fn, your function in the stepmode will be called for each node in the compiled
program. You can print out some or all of the values, you can change them in mid-execution.
You can see where bizarre values are first occurring in your computations. It's a very
powerful way to understand your program's execution.
Remember, if you give names your variables then printing nodes will give you a better idea of
where in the calculations you are.
.. _sandbox_elemwise:
==========================
:class:`Elemwise` compiler
==========================
.. todo:: Stale specification page. Upgrade this to provide useful developer doc. 2008.09.04
Definitions
===========
The element-wise compiler takes inputs {{{(in0, in1, in2, ...)}}}, outputs {{{(out0, out1, out2, ...)}}}, broadcast modes {{{(mod0, mod1, mod2, ...)}}} where each mode corresponds to an output as well as {{{order}}} which determines if we broadcast/accumulate over the first or last dimensions (the looping order, basically, but some operations are only valid for one particular order!).
The broadcast mode serves to calculate the rank of the corresponding output and how to map each input element to an output element:
* {{{broadcast}}}
* output.rank = max(input.rank)
* the inputs of lesser rank are broadcasted over missing dimensions
* if {{{order == f}}} ([3, 5], [5]) => [3, 5] or ([7, 8, 9], [8, 9]) => [7, 8, 9]
* if {{{order == c}}} ([3, 5], [3]) => [3, 5] or ([7, 8, 9], [7, 8]) => [7, 8, 9]
* {{{(accumulate, Accumulator)}}}
* output.rank = min(input.rank)
* for the inputs of greater rank, we use Accumulator (sum, product, etc.) to accumulate over the first dimensions
* e.g. {{{if Accumulator == sum, order == c, x.rank == 2, y.rank == 1 and z = f(x, y) then z[i] = f(sum_j(x[i, j]), y[i])}}}
* if {{{order == f}}} ([3, 5], [5]) => [5] or ([7, 8, 9], [8, 9]) => [8, 9]
* if {{{order == c}}} ([3, 5], [3]) => [3] or ([7, 8, 9], [7, 8]) => [7, 8]
{{{order == c}}} is equivalent to transposing the outputs of an {{{order == f}}} operation on transposed inputs.
This does not cover all cases of broadcasting, but I believe they cover enough. Other cases of broadcasting can be emulated with proper transposition and/or slicing.
* Could you give some examples of what kinds of broadcasting are and are not covered by your proposed implementation?
* For rank <= 2, I think only operations of the form {{{add(ones(3,1), ones(1,3)))}}} are missing. I actually didn't think of that one before now.
* In general, it only handles f(shape(head, ...), shape(head, ...), ...) and f(shape(..., tail), shape(..., tail), ...)
* Maybe I could add a general case later... the thing is that I think the ones I am considering here are easier to streamline.
Point of clarification: the order discussed here corresponds to a set of broadcasting rules, and is independent from the storage order. The 'f' order corresponds to numpy's broadcasting rules, while the 'c' order is something new and different (TODO VERIFY!)
Question: does it make sense to apply the order to the loop, or is this broadcast order something which will be local to each input argument. What happens when the elemwise compiler deals with more complex subgraphs with multiple inputs and outputs?
The loop
========
Here is the loop for {{{order == c}}}. Check for errors!
.. code-block:: cpp
<initialize iterators>
i1 = -1
while (++i1 < dim1) {
i2 = -1
rank_N-1_accumulator = init
while (++i2 < dim2) {
...
iN = -1
while (++iN < dimN) {
<accumulate rank N input>
<SET rank N output using broadcasted inputs>
<NEXT rank N iterator>
}
...
}
<SET rank 1 output using accumulated inputs>
<NEXT rank 1 iterator>
}
When {{{order == f}}}, the iterators ''ideally'' (but not necessarily) iterate in FORTRAN order, i.e. the while loops are on {{{dimN..dim1}}} instead of {{{dim1..dimN}}}.
{{{order}}} does __not__ represent the {{{C/F_CONTIGUOUS}}} flags of the inputs or outputs. Depending on combinations of those parameters, different loops will be used. If {{{order == f and C_CONTIGUOUS(array)}}}, for example, the loop will be on {{{dim1..dimN}}} and the matrices of lesser rank will need to be looped over several times.
An rewrite should look at the operations in the graph and figure out whether to allocate C_CONTIGUOUS (ideal for {{{order == c}}}) or F_CONTIGUOUS (ideal for {{{order == f}}}) arrays.
Gradient
========
The input ranks become the output ranks and gradients of the same rank as the outputs are added to the input list. If an output was given mode {{{broadcast}}}, then all inputs used to calculate it had to be broadcasted to that shape, so we must sum over the broadcasted dimensions on the gradient. The mode that we give to those inputs is therefore {{{(accumulate, sum)}}}. Inversely, if an output was given mode {{{(accumulate, sum)}}}, then all inputs used to calculate it had to be summed over those dimensions. Therefore, we give them mode {{{broadcast}}} in grad. Other accumulators than sum might prove more difficult. For example, the ith gradient for product is grad*product/x_i. Not sure how to handle that automatically.
* I don't exactly follow this paragraph, but I think I catch the general idea and it seems to me like it will work very well.
* In a nutshell for {{{broadcast}}} I calculate the gradient as normal assuming the shape is broadcasted and then I sum over what I had to broadcast.
* Could you explain why the accumulator gradient (e.g. product) can be trickier?
* I thought about it and I figured that the general case is {{{g_accum[N-i+1], g_m[i] = grad_fn(accum[i-1], m[i], g_accum[N-i])}}} where {{{g_accum}}} is the accumulated gradient wrt the accumulator {{{accum}}}. It can be short-circuited in sum and product's case: for sum, grad_fn is the identity on its last argument so {{{g_m[i] == g_accum[i] == g_accum[0] == g_z for all i}}}. In product's case, {{{accum[i-1] == product(m[1:i-1]) and g_accum[N-i] == g_z * product(m[i+1:N])}}}, multiply them together and you obtain {{{g_z * product(m)/m[i]}}} where obviously we only need to compute {{{product(m)}}} once. It's worth handling those two special cases, for the general case I don't know.
.. _function:
==================
function interface
==================
WRITEME
==========
Functional
==========
Want to know about PyTensor's `function design
<http://groups.google.com/group/theano-dev/browse_thread/thread/fd4c6947d8a20510>`?
差异被折叠。
:orphan:
=========================================================
Sandbox, this documentation may or may not be out-of-date
=========================================================
.. toctree::
:glob:
*
.. _advanced:
====================================
Advanced Topics (under construction)
====================================
.. toctree::
:maxdepth: 2
compilation
ccodegen
function
debugging_with_stepmode
====================
Interactive Debugger
====================
'''Seed of discussion for what an interactive debugging tool might look like. 2009.03.27.'''
== Interactive debugger ( #352 ) ==
The interactive debugger should allow the user to go step by step in a graph to debug it. It should allow setting breakpoints on arbitrary Ops or subgraphs. If we can group ops by the user's function that defined them, we could have a logical grouping of the graph into subgraphs.
The debugger should save the inputs at each step so the user loses no info through inplace operations. Ideally, the debugger should be a normal python shell enriched with commands to control the flow and all the inputs should be made available so the user can use numpy interactively on them.
Command wishlist
* py_perform (perform the current operation using the python implementation)
* c_perform (perform the current operation using the C implementation)
* perform (use the Linker's preference)
* get_inputs (get the inputs of the current op)
* set_inputs (set the inputs of the current op)
* get_outputs (get the outputs of the current op)
* set_outputs (set the outputs of the current op (bypasses its perform))
* next (perform and go to the next breakpoint)
* breakpoint (set a breakpoint on the current Op or subgraph)
* step (perform and go to the next Op or subgraph)
* step_in (go to the first Op inside the current subgraph)
* step_out (exit the subgraph containing this Op)
* Of course, normal python shell functionality!
* The global context where the debugger was called (so the user can define his own helper functions, etc.)
A good, simple way to do it would be to have those commands as methods of a structure that would be returned by a DebugLinker. This would allow an interactive session like the following:
{{{
>>> a, b, c = Tensor(), Tensor(), Tensor()
>>> d = b * c
>>> e = a + d
>>> debug = make_function(DebugLinker(FunctionGraph([a, b, c], [e])))
>>> debug.set_breakpoint(d)
>>> debug.debug(10, 20, 30) # a, b, c = 10, 20, 30
Now at: Mul(b, c)
Context: d = b * c
>>> debug.get_inputs() # we are at the node d = b * c
[20, 30]
>>> debug.get_outputs()
[None]
>>> debug.py_perform()
>>> debug.get_outputs()
[600]
>>> debug.step()
Now at: Add(a, Mul)
Context: e = a + d
>>> debug.get_inputs()
[30, 600]
>>> debug.step()
Finished.
[630]
>>>
}}}
.. _logistic_regression_example:
State example
=============
In this example, we'll look at a complete logistic regression model, with
training by gradient descent.
BUT, YOU GOTTA RUN THIS CODE AND MAKE SURE IT STILL WORKS NICELY, HEY?
.. code-block:: python
def build_logistic_regression_model(n_in, n_out, l2_coef=30.0)
# DECLARE SOME VARIABLES
import pytensor.tensor as pt
x = pt.matrix() #our points, one point per row
y = pt.matrix() #store our labels as place codes (label 3 of 5 is vector [00100])
w = pt.matrix() #the linear transform to apply to our input points
b = pt.vector() #a vector of biases, which make our transform affine instead of linear
stepsize = pt.scalar('stepsize') # a stepsize for gradient descent
# REGRESSION MODEL AND COSTS TO MINIMIZE
prediction = pt.softmax(pt.dot(x, w) + b)
cross_entropy = pt.sum(y * pt.log(prediction), axis=1)
cost = pt.sum(cross_entropy) + l2_coef * pt.sum(pt.sum(w*w))
# GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS
grad_w, grad_b = pt.grad(cost, [w, b])
#
# GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS
update_fn = pytensor.function(
inputs = [x, y, stepsize,
In(w,
name='w',
value=numpy.zeros((n_in, n_out)),
update=w - stepsize * grad_w,
mutable=True,
strict=True)
In(b,
name='b',
value=numpy.zeros(n_out),
update=b - lr * grad_b,
mutable=True,
strict=True)
],
outputs = cost,
mode = 'EXPENSIVE_OPTIMIZATIONS')
apply_fn = pytensor.function(
inputs = [x, In(w, value=update_fn.storage[w]), In(b, value=update_fn.storage[b])],
outputs = [prediction])
return update_fn, apply_fn
#USUALLY THIS WOULD BE IN A DIFFERENT FUNCTION/CLASS
#FIT SOME DUMMY DATA: 100 points with 10 attributes and 3 potential labels
up_fn, app_fn = build_logistic_regression_model(n_in=10, n_out=3, l2_coef=30.0)
x_data = numpy.random.standard_normal((100, 10))
y_data = numpy.random.standard_normal((100, 3))
y_data = _asarray(y_data == numpy.max(y_data, axis=1), dtype='int64')
print "Model Training ..."
for iteration in range(1000):
print " iter", iteration, "cost", update_fn(x_data, y_data, stepsize=0.0001)
print "Model Predictions"
print apply_fn(x_data)
===========
Performance
===========
PyTensor uses several tricks to obtain good performance:
* common sub-expression elimination
* [custom generated] C code for many operations
* pre-allocation of temporary storage
* loop fusion (which gcc normally can't do)
On my neural net experiments for my course projects, I was getting around 10x
speed improvements over basic numpy by using pytensor.
[More specific speed tests would be nice.]
With a little work, PyTensor could also implement more sophisticated
rewrites:
* automatic ordering of matrix multiplications
* profile-based memory layout decisions (e.g. row-major vs. col-major)
* gcc intrinsics to use MMX, SSE2 parallelism for faster element-wise arithmetic
* conditional expressions
差异被折叠。
'''An open proposal. This is still relevant. 20080904'''
======================
New C code generation?
======================
Issues
======
There are several issues with the current way C code is generated:
* Ops cannot declare their own persistent variables.
* Reliance on weave, but most of weave's features go unused.
* There could easily be conflicts between support code from different Ops/Results.
* It is currently impossible to specialize support code based on the self.
* Caching of the generated code for graphs is greatly suboptimal.
Structure
=========
Currently, the general structure of the generated C code is approximately as follows:
.. code-block:: c
<imports>
<weave type converters>
<op/result support code>
struct my_computation {
<input/output storage>
<persistent fields>
init(<input/output storage>) { <initialize persistent fields> }
cleanup { <clean up persistent fields> }
run { <run the computation> }
};
<runner for the struct>
PyObject* instantiate(PyObject* args) {
<weave stuff>
<make up a CObject out of the runner and a my_computation instance>
<weave stuff>
}
<python exports for instantiate>
The module produced via that method then has to be used as such::
obj = module.instantiate(error_storage, input_storage, output_storage, orphan_storage)
cutils.run_cthunk(obj)
We would like to get rid of weave dependencies, avoid name conflicts with the support code and have a nicer user interface for the produced module. The proposed new structure is as follows:
.. code-block:: c
<imports>
struct op1 {
<persistent variables>
<support code>
init() { <initialize persistent fields> }
cleanup { <clean up persistent fields> }
run(<inputs>) { <run the computation for op1> }
};
struct op2 { <same> };
...
struct opN { <ditto> };
struct driver {
op1 o1; op2 o2; ... opN oN;
<input storage>
<output storage>
init(<storage>) { <initialize ops, storage> }
cleanup() { <free storage?> }
run() {
<extract inputs>
o1.run(input1, input2);
o2.run(o1.output1);
...
oN.run(...);
<sync outputs>
}
}
PyObject* <name>(PyObject* inputs) {
<init driver, input/output storage>
<put inputs in input storage>
driver.run()
<free input storage>
<return output storage>
}
PyObject* <name>_driver(PyObject* storage) {
<init driver with storage>
<return driver>
}
<export <name> and <name>_driver>
Gains:
* support code can be put inside a struct and become private to the Op
* we can export several functions that can be used directly, eg ``z = module.add(1, 2)``
* this won't do filtering like ``Result.filter`` so the usefulness is limited by that
* the sequence of operations might be clearer to read
* we can use more descriptive names in each Op struct representing its input names (if we can find them using the inspect module) without worrying about name conflicts
Losses:
* maybe gcc can't optimize it as well?
* make functions static and inline as much as possible
Caching
=======
The current way of caching is from a hash of the generated code. That is inefficient because code has to be generated each time, which might be a costly process. Furthermore, usage of hashing in sets make it difficult to ensure a consistent ordering of Ops in graphs where several orderings are valid, so the generated C code is potentially different each time. Here is a proposal for a better way to compute the hash:
* Result_hash = Result version + Result desc
* Op_hash = Op version + Op desc + input/output hashes
* FunctionGraph_hash = FunctionGraph version + combination of the Op hashes and their traversal order wrt a consistent traversal method
The version could be set explicitly via a ``__version__`` field or it could simply be equal to the file's last modification date. We could also have a ``__nocache__`` field indicating that code produced by the Op or Result cannot be cached.
It should also be easier to bypass the cache (eg an option to CLinker to regenerate the code).
Basically, this file contains stuff that should be documented, but is not.
Feel free to contribute things that you want documented, as well as to add
or correct documentation.
======================================
How do you define the grad function?
======================================
Let's talk about defining the :meth:`Op.grad` function in an :class:`Op`, using an
illustrative example.
In Poisson regression (Ranzato and Szummer, 2008), the target *t* is
integer data, which we predict using a continuous output *o*.
In the negative log likelihood of the Poisson regressor, there is a term:
.. math::
\log(t!)
Let's say we write a logfactorial :class:`Op`. We then compute the gradient
You should define gradient, even if it is undefined.
[give log factorial example]
If an :class:`Op` does not define ``grad``, but this :class:`Op` does not appear in the path when
you compute the gradient, then there is no problem.
If an :class:`Op` does not define ``grad``, and this :class:`Op` *does* appear in the path when
you compute the gradient, **WRITEME**.
Gradients for a particular variable can be one of four kinds:
1) forgot to implement it
You will get an exception of the following form::
pytensor.graph.utils.MethodNotDefined: ('grad', <class 'pylearn.algorithms.sandbox.cost.LogFactorial'>, 'LogFactorial')
2) a symbolic variable
3) None / zero
4) undefined mathematically
currently, there is no way for a ``grad()`` method to distinguish between cases 3
and 4
but the distinction is important because graphs with type-3 gradients are ok
to run, whereas graphs with type-4 gradients are not.
so I suggested that Joseph return a type-4 gradient by defining an :class:`Op` with no
perform method.
the idea would be that this would suit the graph-construction phase, but would
prevent linking.
how does that sound to you?
**This documentation is useful when we show users how to write :class:`Op`\s.**
======================================
What is staticmethod, st_impl?
======================================
``st_impl`` is an optional method in an :class:`Op`.
``@staticmethod`` is a Python decorator for a class method that does not
implicitly take the class instance as a first argument. Hence, st_impl
can be used for :class:`Op` implementations when no information from the :class:`Op`
instance is needed. This can be useful for testing an implementation.
See the ``XlogX`` class below for an example.
**This documentation is useful when we show users how to write :class:`Op`\s.
Olivier says this behavior should be discouraged but I feel that st_impl
should be encouraged where possible.**
============================================================
how do we write scalar ops and upgrade them to tensor ops?
============================================================
**Olivier says that** :class:`~pytensor.tensor.xlogx.XlogX` **gives a good example. In fact, I would
like to beef xlogx up into our running example for demonstrating how to
write an :class:`Op`:**
.. code-block:: python
class XlogX(scalar.UnaryScalarOp):
"""
Compute X * log(X), with special case 0 log(0) = 0.
"""
@staticmethod
def st_impl(x):
if x == 0.0:
return 0.0
return x * numpy.log(x)
def impl(self, x):
return XlogX.st_impl(x)
def grad(self, inp, grads):
x, = inp
gz, = grads
return [gz * (1 + scalar.log(x))]
def c_code(self, node, name, inp, out, sub):
x, = inp
z, = out
if node.inputs[0].type in [scalar.float32, scalar.float64]:
return """%(z)s =
%(x)s == 0.0
? 0.0
: %(x)s * log(%(x)s);""" % locals()
raise NotImplementedError('only floatingpoint is implemented')
scalar_xlogx = XlogX(scalar.upgrade_to_float, name='scalar_xlogx')
xlogx = pytensor.tensor.elemwise.Elemwise(scalar_xlogx, name='xlogx')
**It is also necessary to talk about UnaryScalarOp vs. BinaryOp.**
UnaryScalarOp is the same as scalar.ScalarOp with member variable nin=1.
**give an example of this**
=======================================================
How to use the `PrintOp`
=======================================================
** This is also useful in the How to write an :class:`Op` tutorial. **
=======================================================
Mammouth
=======================================================
**This is internal documentation. Guillaume can you make sure to hit these points:**
export PYTENSOR_BLAS_LDFLAGS='-lmkl -liomp5 -fopenmp'
**Do we want the following:**
export OMP_NUM_THREADS=2
=======================================================
Type checking
=======================================================
* Are there functions for doing type checking?
like dtype of this matrix is an int-type (not just int32
or int64)
"if isinstance(item, int):" is the preferred way to do it in
python now, so mimic this
If the type is wrong, what exception should be raised?
======================================
More simple numpy stuff
======================================
* If we have a matrix with only one row, how do we convert it to a vector?
``x.reshape(x.size)``
You can also use ``resize`` but there is not reason to ''resize''
* How do you convert the type of a numpy array?
``pytensor._asarray(x, dtype = 'int32')``
Note that using ``numpy.asarray`` is potentially dangerous, due to
a problem in numpy where the type may not be properly set (see
numpy's Track ticket #870).
=========================================
How to reuse (overwrite) a storage tensor
=========================================
``pytensor.compile.io.Out(gw1, borrow = True)`` for that value in
``pytensor.compile.function.function``
===============
Others software
===============
Other software to look at and maybe recommend to users:
* [http://www.pytables.org/moin PyTables] - This is looking really
promising for dataset storage and experiment logging... This might
actually be useful for large data sets.
* [http://matplotlib.sourceforge.net/ MatPlotLib] - visualization tools
(plot curves interactively, like matlab's figure window)
* [http://www.pythonware.com/products/pil/ PIL] - Python Image Library:
write your matrices out in png! (Kinda a weird recommendation, I think)
* [http://www.logilab.org/857 pylint] - Syntax checker for python to
help beautify your code. (We'd be hypocrites to recommend this :)
* [http://www.winpdb.org/ Winpdb] - A Platform Independent Python
Debugger. (Except it doesn't really help you debug PyTensor graphs)
* [http://wiki.python.org/moin/IntegratedDevelopmentEnvironments Python
Integrated Development Environments] - for all your coding needs
.. _sparse:
===============
Sparse matrices
===============
scipy.sparse
------------
Note that you want SciPy >= 0.7.2
.. warning::
In SciPy 0.6, `scipy.csc_matrix.dot` has a bug with singleton
dimensions. There may be more bugs. It also has inconsistent
implementation of sparse matrices.
We do not test against SciPy versions below 0.7.2.
We describe the details of the compressed sparse matrix types.
`scipy.sparse.csc_matrix`
should be used if there are more rows than column (``shape[0] > shape[1]``).
`scipy.sparse.csr_matrix`
should be used if there are more columns than rows (``shape[0] < shape[1]``).
`scipy.sparse.lil_matrix`
is faster if we are modifying the array. After initial inserts,
we can then convert to the appropriate sparse matrix format.
The following types also exist:
`dok_matrix`
Dictionary of Keys format. From their doc: This is an efficient structure for constructing sparse matrices incrementally.
`coo_matrix`
Coordinate format. From their lil doc: consider using the COO format when constructing large matrices.
There seems to be a new format planned for SciPy 0.7.x:
`bsr_matrix`
Block Compressed Row (BSR). From their doc: The Block Compressed Row
(BSR) format is very similar to the Compressed Sparse Row (CSR)
format. BSR is appropriate for sparse matrices with dense sub matrices
like the last example below. Block matrices often arise in vector-valued
finite element discretizations. In such cases, BSR is considerably more
efficient than CSR and CSC for many sparse arithmetic operations.
`dia_matrix`
Sparse matrix with DIAgonal storage
There are four member variables that comprise a compressed matrix ``sp`` (for at least csc, csr and bsr):
``sp.shape``
gives the shape of the matrix.
``sp.data``
gives the values of the non-zero entries. For CSC, these should
be in order from (I think, not sure) reading down in columns,
starting at the leftmost column until we reach the rightmost
column.
``sp.indices``
gives the location of the non-zero entry. For CSC, this is the
row location.
``sp.indptr``
gives the other location of the non-zero entry. For CSC, there are
as many values of indptr as there are ``columns + 1`` in the matrix.
``sp.indptr[k] = x`` and ``indptr[k+1] = y`` means that column
``k`` contains ``sp.data[x:y]``, i.e. the ``x``-th through the y-1th non-zero values.
See the example below for details.
.. code-block:: python
>>> import scipy.sparse
>>> sp = scipy.sparse.csc_matrix((5, 10))
>>> sp[4, 0] = 20
SparseEfficiencyWarning: changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
>>> sp[0, 0] = 10
>>> sp[2, 3] = 30
>>> sp.todense()
matrix([[ 10., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 30., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 20., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
>>> print sp
(0, 0) 10.0
(4, 0) 20.0
(2, 3) 30.0
>>> sp.shape
(5, 10)
>>> sp.data
array([ 10., 20., 30.])
>>> sp.indices
array([0, 4, 2], dtype=int32)
>>> sp.indptr
array([0, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3], dtype=int32)
Several things should be learned from the above example:
* We actually use the wrong sparse matrix type. In fact, it is the
*rows* that are sparse, not the columns. So, it would have been
better to use ``sp = scipy.sparse.csr_matrix((5, 10))``.
* We should have actually created the matrix as a `lil_matrix`,
which is more efficient for inserts. Afterwards, we should convert
to the appropriate compressed format.
* ``sp.indptr[0] = 0`` and ``sp.indptr[1] = 2``, which means that
column 0 contains ``sp.data[0:2]``, i.e. the first two non-zero values.
* ``sp.indptr[3] = 2`` and ``sp.indptr[4] = 3``, which means that column
three contains ``sp.data[2:3]``, i.e. the third non-zero value.
TODO: Rewrite this documentation to do things in a smarter way.
Speed
-----
For faster sparse code:
* Construction: lil_format is fast for many inserts.
* Operators: "Since conversions to and from the COO format are
quite fast, you can use this approach to efficiently implement lots
computations on sparse matrices." (Nathan Bell on scipy mailing list)
Misc
----
The sparse equivalent of `dmatrix` is `csc_matrix` and `csr_matrix`.
:class:`~pytensor.sparse.basic.Dot` vs. :class:`~pytensor.sparse.basic.StructuredDot`
-------------------------------------------------------------------------------------
Often when you use a sparse matrix it is because there is a meaning to the
structure of non-zeros. The gradient on terms outside that structure
has no meaning, so it is computationally efficient not to compute them.
`StructuredDot` is when you want the gradient to have zeroes corresponding to
the sparse entries in the matrix.
`TrueDot` and `Structured` dot have different gradients
but their perform functions should be the same.
The gradient of `TrueDot` can have non-zeros where the sparse matrix had zeros.
The gradient of `StructuredDot` can't.
Suppose you have ``dot(x,w)`` where ``x`` and ``w`` are square matrices.
If ``w`` is dense, like ``standard_normal((5,5))`` and ``x`` is of full rank (though
potentially sparse, like a diagonal matrix of ones) then the output will
be dense too.
What's important is the density of the gradient on the output.
If the gradient on the output is dense, and ``w`` is dense (as we said it was)
then the ``True`` gradient on ``x`` will be dense.
If our dot is a `TrueDot`, then it will say that the gradient on ``x`` is dense.
If our dot is a `StructuredDot`, then it will say the gradient on ``x`` is only
defined on the diagonal and ignore the gradients on the off-diagonal.
.. _tensoroptools:
================
Tensor Op Tools
================
WRITEME - describe how to use Elemwise here
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论