提交 ef629114 authored 作者: Joseph Turian's avatar Joseph Turian

Moved trac wiki into sphinx

上级 fb9edf78
......@@ -20,4 +20,3 @@ Contents
glossary
links
internal/index
sandbox/index
......@@ -64,4 +64,33 @@ old changes by hand.
For more info, check out the `homepage <http://www.selenic.com/mercurial/wiki/>`_ and `hg book <http://hgbook.red-bean.com/hgbook.html>`_.
Tip: Commit before pull
------------------------
"This is the general rule of thumb when using Mercurial: finish your work
and commit it before you start pulling in stuff from the outside world."
-Martin Geisler
(http://www.selenic.com/pipermail/mercurial/2008-April/018817.html)
Tip: Graph logs
---------------
Update your .hgrc::
[ui]
username = Foo Bar <barfoo@iro.umontreal.ca>
[extensions]
hgext.graphlog =
Now try::
hg glog
Troubleshooting
---------------
If you get message: "abort: push
creates new remote heads!", read `this thread
<http://www.selenic.com/pipermail/mercurial/2008-April/018804.html>`_
to understand.
......@@ -5,6 +5,9 @@
Internal documentation
======================
If you're feeling ambitious, go fix some `pylint
<http://lgcm.iro.umontreal.ca/auto_theano_pylint/pylint_global.html>` errors!
Structure
=========
......@@ -12,6 +15,7 @@ Structure
:maxdepth: 2
dev_start_guide
python
hg_primer
mammouth
metadocumentation
......@@ -6,13 +6,13 @@ LISA Labo specific instructions
===============================
Tips for running at LISA
++++++++++++++++++++++++
------------------------
Use the fast BLAS library that Fred installed, by setting
`THEANO_BLAS_LDFLAGS=-lgoto`.
Tips for running on a cluster
+++++++++++++++++++++++++++++
-----------------------------
OUTDATED(was for mammouth1, should be updated for mammouth2)
......
......@@ -51,6 +51,16 @@ API documentation is processed by `epydoc
<http://epydoc.sourceforge.net/manual-othermarkup.html#restructuredtext>`__
for details about how to use reST with epydoc documentation.
Use ReST for API and Sphinx documentation
-----------------------------------------
* ReST is standardized. epydoc is not. trac wiki-markup is not.
This means that ReST can be cut-and-pasted between epydoc, code, other
docs, and TRAC. This is a huge win!
* ReST is extensible: we can write our own roles and directives to automatically link to WIKI, for example.
* ReST has figure and table directives, and can be converted (using a standard tool) to latex documents.
* No text documentation has good support for math rendering, but ReST is closest: it has three renderer-specific solutions (render latex, use latex to build images for html, use itex2mml to generate MathML)
How to build documentation
---------------------------------------
......
.. _python:
================
Python booster
================
`This page
<http://wordaligned.org/articles/essential-python-reading-list>`_ will
give you a warm feeling in your stomach.
Non-Basic Python features
-------------------------
Theano doesn't use your grandfather's python.
* properties
a specific attribute that has get and set methods which python automatically invokes.
See [http://www.python.org/doc/newstyle/ New style classes].
* static methods vs. class methods vs. instance methods
* Decorators:
.. code-block:: python
@f
def g():
...
runs function ``f`` before each invocation of ``g``.
See `PEP 0318 <http://www.python.org/dev/peps/pep-0318/>`_.
``staticmethod`` is a specific decorator, since python 2.2
* ``__metaclass__`` is kinda like a decorator for classes. It runs the metaclass __init__ after the class is defined
* ``setattr`` + ``getattr`` + ``hasattr``
* ``*args`` is a tuple like argv in C++, ``**kwargs`` is a keyword args version
* ``pass`` is no-op.
* functions (function objects) can have attributes too. This technique
is often used to define a function's error messages.
.. code-block:: python
def f(): return f.a
f.a = 5
f() # returns 5
* Warning about mutual imports:
* script a.py file defined a class A.
* script a.py imported file b.py
* file b.py imported a, and instantiated a.A()
* script a.py instantiated its own A(), and passed it to a function in b.py
* that function saw its argument as being of type __main__.A, not a.A.
Incidentally, this behaviour is one of the big reasons to put autotests in
different files from the classes they test!
If all the test cases were put into <file>.py directly, then during the test
cases, all <file>.py classes instantiated by unit tests would have type
``__main__.<classname>``, instead of type ``<file>.<classname>``. This should never
happen under normal usage, and can cause problems (like the one you are/were
experiencing).
......@@ -7,17 +7,23 @@ Introduction
Theano is a Python library that allows you to define, optimize, and
efficiently evaluate mathematical expressions involving multi-dimensional
arrays. Theano was written at the LISA_ lab to support the development
of efficient machine learning algorithms while minimizing human time. We
use it especially in gradient-based learning techniques.
arrays. Using Theano, it is not uncommon to see speed improvements of
ten-fold over using pure NumPy.
The term "mathematical expressions" is used broadly to mean a
computation with some inputs, possibly which might update the inputs
in-place. Neural net forward propagation is an expression, as is fprop +
bprop + a weight-update implementing stochastic gradient descent. Feature
extraction of zero-crossing, 16 mfcc, and 128 rceps coefficients is also
an expression.
Theano melds some aspects of a computer algebra system (CAS) with
aspects of an optimizing compiler. It can even transform some or all
of the expression into C code and compile it into native machine
instructions. This combination of CAS with optimizing compilation
is particularly useful for computational fields in which complicated
mathematical expressions are evaluated numerous times over large data
sets.
aspects of an optimizing compiler. It can even transform some or
all of the mathematical expression into C code and compile it into
native machine instructions. This combination of CAS with optimizing
compilation is particularly useful for computational fields in which
complicated mathematical expressions are evaluated numerous times over
large data sets.
Theano supports a range of numerical types in multiple dimensions and
a number of well-tested operations. It also allows you to compute the
......@@ -36,10 +42,12 @@ not limited to:
* using inplace operations wherever it is safe to do so.
Theano defines several optimizations which improve the numerical
stability of computations. It also provides a framework to add and test
new optimizers.
stability of computations.
Theano was named after the `Greek mathematician`_, who may have
Theano was written at the LISA_ lab to support the development
of efficient machine learning algorithms while minimizing human time. We
use it especially in gradient-based learning techniques.
Theano is named after the `Greek mathematician`_, who may have
been Pythagoras' wife.
Theano is released under a BSD license (:ref:`link <license>`)
......
......@@ -6,6 +6,15 @@
NumPy refresher
===============
Here are some quick guides to NumPy:
* `Numpy quick guide for Matlab users <http://www.scipy.org/NumPy_for_Matlab_Users>`__
* `More detailed table showing the NumPy equivalent of Matlab commands <http://www.scribd.com/doc/26685/Matlab-Python-and-R>`__
.. TODO [DefineBroadcasting Broadcasting]
.. Broadcastable - Implicitly assume that all previous entries are true.
.. [TODO: More doc, e.g. see _test_tensor.py]
---------------------------------------
Matrix conventions for machine learning
......
The following may go either in:
a) numpy refresher.
b) more details of broadcasting in the types section.
=== broadcastable ===
The {{{broadcastable}}} field of a {{{Tensor}}} must be a tuple of boolean values. Each value corresponds to a dimension of the {{{Tensor}}} and specifies whether the {{{Tensor}}} can be "broadcasted" along that dimension.
A value of {{{True}}} means two things:
* The size of the corresponding dimension will necessarily be 1.
* If needed, the {{{Tensor}}} can be ''broadcasted'' or ''replicated'' along the corresponding dimension to emulate a larger {{{Tensor}}}.
A value of {{{False}}} means that the corresponding dimension can take any nonnegative value and that the {{{Tensor}}} cannot be replicated along it (regardless of whether it is 1 or not).
Example: to define a ''row'' type, set broadcastable to {{{(True, False)}}}: this means the shape must be like {{{(1, n)}}}. If you add a row of shape {{{(1, n)}}} to a matrix of shape {{{(m, n)}}}, the row will be "broadcasted" or "replicated" {{{m}}} times along the first dimension, producing a virtual matrix of the correct size {{{(m, n)}}}. Therefore, adding a row to a matrix will add the row to each row of the matrix. If the value of {{{broadcastable}}} for the first dimension of the row was {{{False}}}, the operation would instead raise an exception complaining that the dimensions are not the same.
Similarly, the broadcastable pattern for a column is {{{(False, True)}}}: this means the shape must be like {{{(m, 1)}}}, therefore adding a column to a matrix will add that column to each column of the matrix. Several Ops, such as {{{DimShuffle}}}, can add or remove broadcastable dimensions.
The length of {{{broadcastable}}} is the number of dimensions of the {{{Tensor}}}.
Want to know about Theano's `function design
<http://groups.google.com/group/theano-dev/browse_thread/thread/fd4c6947d8a20510>`?
'''Historical Interest. This has been addressed for now. 20080904'''
There are several [http://en.wikipedia.org/wiki/Comparison_of_free_software_hosting_facilities project hosting services] online, but none is perfect for theano.
Wishlist:
- version control (mercurial)
- bugtracker (TRAC, ideally)
- wiki
- release file hosting
- mailing list
- reliability of hosting service
Currently, [http://sharesource.org/ sharesource] and [http://www.assembla.com/ assembla] are the only hosting services that support mercurial that I know of. Sharesource is young, but supports all the required features. I'll make an account, and see what I can do with it...
Should we get a domain name? To my dismay, theano.org, theano.com and theano.net are all taken. The first two seem legit, but theano.net doesn't look like it has anything on it and expires on May 29, so maybe there's a chance we can snag it? -ob
We could also get [http://www.theano.io]. -jpt
--------
On Fri, May 09, 2008 at 03:49:31PM -0400, Joseph Turian wrote:
> Another option for backup:
>
> Since we have access to LGCM, there is a single SQLite db file (AFAIK)
> that we can back up periodically.
> e.g. cron job to gzip and email it to us once a week.
There are instructions for how to backup a Trac site, i just haven't gotten
around to it. Currently, the whole directory is rsynced to the lisa account,
which is close to ok, but not quite.
> Besides mailing list, is there anything else we need? Besides figuring
> out how to administer trac? :}
Writing scripts to update p-omega1/.ssh/authorized_keys2 automatically from
certain user accounts' authorized_keys2 file. I've written this script, but not
really tested it.
Hooking up mercurial to trac would be nice, so we can associate commits and
tickets.
lgcm's uptime is usually about a week or two at max, so there's the pain in the
ass of having to re-log in, start up a screen session, find the directories,
restart trac, restart hg serve. We should be restarting hg serve for tlearn too
soon.
Even if I do set up the authorized_keys2 script to do the right thing, the users
on TRAC and the users on the system are totally independent, so adding a new
user is non-standard and only I can do it right now.
My choices seem to be:
- document all these hoops and good ideas
- fix them so they are easier to use and document
- replace them with hosting service
All of these options take time, mental effort, and the support of our
development group (look the large number of messages today on the topic)... so
i'm trying to find the least of all evils. The Right Thing doesn't seem to have
appeared yet.
Theano uses several tricks to obtain good performance:
* common sub-expression elimination
* [custom generated] C code for many operations
* pre-allocation of temporary storage
* loop fusion (which gcc normally can't do)
On my neural net experiments for my course projects, I was getting around 10x
speed improvements over basic numpy by using theano.
[More specific speed tests would be nice.]
With a little work, Theano could also implement more sophisticated
optimizations:
* automatic ordering of matrix multiplications
* profile-based memory layout decisions (e.g. row-major vs. col-major)
* gcc intrinsics to use MMX, SSE2 parallelism for faster element-wise arithmetic
* conditional expressions
Other software to look at and maybe recommend to users:
* [http://www.pytables.org/moin PyTables] - This is looking really
promising for dataset storage and experiment logging... This might
actually be useful for large data sets.
* [http://matplotlib.sourceforge.net/ MatPlotLib] - visualization tools
(plot curves interactively, like matlab's figure window)
* [http://www.pythonware.com/products/pil/ PIL] - Python Image Library:
write your matrices out in png! (Kinda a weird recommendation, I think)
* [http://www.logilab.org/857 pylint] - Syntax checker for python to
help beautify your code. (We'd be hypocrites to recommend this :)
* [http://www.winpdb.org/ Winpdb] - A Platform Independent Python
Debugger. (Except it doesn't really help you debug Theano graphs)
* [http://wiki.python.org/moin/IntegratedDevelopmentEnvironments Python Integrated Development Environments] - for all your coding needs
......@@ -87,8 +87,17 @@ Several things should be learned from the above example:
TODO: Rewrite this documentation to do things in a smarter way.
Speed
-----
For faster sparse code:
* Construction: lil_format is fast for many inserts.
* Operators: "Since conversions to and from the COO format are
quite fast, you can use this approach to efficiently implement lots
computations on sparse matrices." (Nathan Bell on scipy mailing list)
Misc
----------------------------------------
----
The sparse equivalent of dmatrix is csc_matrix and csr_matrix.
:api:`TrueDot` vs. :api:`StructuredDot`
......
......@@ -13,7 +13,7 @@ from setuptools import setup, find_packages
setup(name="Theano",
version="0.1",
description="Optimizing compiler for mathematical expressions",
long_description="""Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. Theano was written at the LISA lab to support the development of efficient machine learning algorithms while minimizing human time. We use it especially in gradient-based learning techniques.""",
long_description="""Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. Using Theano, it is not uncommon to see speed improvements of ten-fold over using pure NumPy."""
author="LISA laboratory, University of Montreal",
author_email="theano-dev@googlegroups.com",
packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论