Merged

5fb197cc · delallea@valhalla.apstat.com · 3b1dc271 · 6e463193 · 5fb197cc · 3b1dc271
--- a/.hgignore
+++ b/.hgignore
@@ -3,6 +3,7 @@ syntax: glob
 *~
 \#*\#
 doc/oplist.txt
+doc/typelist.txt
 compiled/*.cpp
 cutils_ext.cpp
 html

--- a/README.1st
+++ b/README.1st
-THEANO
-
-Documentation et al is in Trac:
-   http://lgcm.iro.umontreal.ca:8000/theano/wiki/WikiStart
-
-The lisa twiki is deprecated for documenting Theano.
-
-Requirements:
-    scipy [version?]
-    numpy [version?]
-    Python >=2.5 (for function all)
-
-
-
--- a/README.txt
+++ b/README.txt
+==============
+README: theano
+==============
+
+.. contents::
+
+
+Project Description
+===================
+
+Theano is a python library for manipulating and evaluating expressions, especially matrix-valued ones.
+What does Theano do that Python and numpy do not?
+
+- *execution speed optimizations*: Theano can use `g++` to compile parts your expression graph into native machine code, which runs much faster than python.
+
+- *symbolic differentiation*: Theano can convert a symbolic graph build symbolic graphs for computing gradients.
+
+- *stability optimizations*: Theano can recognize numerically unstable expressions and compute them with more stable algorithms.
+
+
+Here's a very simple example of how to use Theano.  It doesn't show off many of Theano's features, but it illustrates concretely what Theano is.
+
+.. code-block:: python
+
+    import theano
+    from theano import tensor
+
+    a = tensor.fscalar()            # declare a symbolic floating-point scalar.
+    b = tensor.fscalar()            # declare a symbolic floating-point scalar.
+
+    c = a + b                       # create a simple expression
+
+    f = theano.function([a,b], [c]) # convert the expression into a callable object 
+                                    # that takes (a,b) values as input and computes a value for c
+
+    assert 4.0 == f(1.5, 2.5)       # bind 1.5 to 'a', 2.5 to 'b', and evaluate 'c'
+
+Theano is not a programming language in the normal sense because you write a program in Python that builds expressions for Theano.  Still it is like a programming language in the sense that to use theano, you have to 
+
+- declare variables ({{{a,b}}}) and give their types
+
+- build expressions for how to put those variables together
+
+- compile expression graphs to functions in order to use them for computation.
+
+It is good to think of `theano.function` as the interface to a compiler which builds a callable object from a purely symbolic graph.
+
+
+License
+-------
+
+Theano is licensed under a BSD-like license.  See the LICENSE file in the project root folder.
+
+
+Installation
+============
+
+(See also the :wiki:`InstallationNotes` on the wiki.)
+
+
+Software Requirements
+---------------------
+
+- linux or OS-X operating system
+
+- python 2.5
+
+- SciPy (specifically numpy, sparse, weave).  Numpy version >= 1.1 fixes memory leak.
+
+- docutils, pygments (optional, to build documentation)
+
+- mercurial (optional, to download the source)
+
+- g++, python-dev (optional, to compile generated C code)
+
+-  `psyco <http://psyco.sourceforge.net/>`__ can make your python code much faster, if you are on a 32-bit x86 architecture.  If you use compiled C code, this can be less important.
+
+Downloading Theano
+------------------
+
+There are two ways to get the source: mercurial (required for library developers) and unix tar.
+There are no stable releases yet.
+
+*To get the source via mercurial,* you must have `mercurial <http://www.selenic.com/mercurial/wiki/>`__ installed.
+
+Get the source and run the auto-tests like this:
+
+.. code-block:: bash
+    
+    hg clone http://pylearn.org/hg/theano theano
+    cd theano
+    python autotest.py
+
+To update your library to the latest on pylearn.org, change directory (`cd`) to this `theano` folder and type
+
+.. code-block:: bash
+
+    hg pull -u
+
+*To get the source via unix tar*, you can download the latest source directly as a gzip'd tar file:
+`<http://pylearn.org/hg/theano/archive/tip.tar.gz>`__.
+
+Two environment variables are used to control automatic code generation.
+(It is possible to use theano in a way that avoids all automatic code generation, but the functions you make using {{{theano.function}}} will execute more slowly.)
+
+- `THEANO_BLAS_LDFLAGS`: 
+    a space-separated list of library names to link against for BLAS functions. Default: `-lblas`
+
+- `THEANO_COMPILEDIR`:
+    a directory with read/write access permissions, where theano will store
+    autogenerated code and c modules.  Default: `$HOME/.theano`.  If this
+    directory does not exist, or does not have the correct permissions, then
+    theano will try to create it with the correct permissions.  If that fails,
+    an exception will be raised and no C code will be compiled.
+
+Setup on Linux
++++++++++++++
+
+
+Setup on OS-X
+++++++++++++
+
+- Install [http://www.macports.org/ MacPorts]
+
+- `sudo port install gcc42 py25-zlib py25-numpy py25-scipy mercurial`.
+    Note that compiling gcc42 takes a significant time (hours) so it's probably
+    not the best solution if you're in a rush! In my (Doomie) experience, scipy
+    failed to compile the first time I tried the command, but the second time
+    it compiled just fine. Same thing with py25-zlib.
+
+
+- Install some kind of BLAS library (TODO: how?)
+
+- Set THEANO_BLAS_LDFLAGS to something which will link against said BLAS
+  library.  (e.g., `THEANO_BLAS_LDFLAGS='-lcblas -latlas -lgfortran'`).
+
+
+
+Setup on Windows
++++++++++++++++
+
+No one has done this yet. WRITEME.
+
+
+Tips for running at LISA
++++++++++++++++++++++++
+
+Use the fast BLAS library that Fred installed, by setting
+`THEANO_BLAS_LDFLAGS=-lgoto`.
+
+Tips for running on a cluster
+++++++++++++++++++++++++++++
+
+Use something like the following in your .bashrc:
+
+.. code-block:: bash
+
+    #use the intel math-kernel library for BLAS routines
+    THEANO_BLAS_LDFLAGS=-lmkl
+
+    # use up to two threads in the MKL routines
+    OMP_NUM_THREADS=2
+
+    # IMPORTANT!
+    # Use the local-temporary directory as a cache.
+    # If several jobs start simultaneously and use a common
+    # cache, then the cache may be corrupted.
+    # Theano is not process-safe or thread-safe in this sense.
+    THEANO_COMPILEDIR=/ltmp/<username>_theano
+
+
+Running the Test Suite
+======================
+
+Test your installation by running the autotests.  Type at the shell:
+
+.. code-block:: bash
+
+    cd theano
+    python2.5 autotest.py
+
+All tests should pass.
+
+
+Using Theano
+============
+
+Now that you've got theano installed and running, check out the `n00b tutorial <doc/n00b.html>`__ for how to use it.
+
+
+Getting Help
+============
+
+If these installation instructions don't work, search the theano-users archive for similar cases.  If you don't find a solution, write to theano-users and explain the situation.
+
+
+.. header:: |THEANO| - README_ - Download_ - Documentation_ - Wiki_ - `Task List`_
+
+.. _README: README.html
+.. _Download: README.html#downloading-theano
+.. _Documentation: doc/index.html
+.. _Wiki: http://pylearn.org/theano
+.. _task list: http://lgcm.iro.umontreal.ca/theano/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
+
+.. |THEANO| image:: http://lgcm.iro.umontreal.ca/theano/chrome/site/theano_logo.png
+   :target: http://pylearn.org/auto_theano
+   :alt: THEANO
+   :align: top
+   :class: borderless
+   :width: 60
+   :height: 18
--- a/__init__.py
+++ b/__init__.py
@@ -27,6 +27,7 @@ __docformat__ = "restructuredtext en"

 from gof import \
     CLinker, OpWiseCLinker, DualLinker, Linker, LocalLinker, PerformLinker, Profiler, \
+     Container, \
     InconsistencyError, Env, \
     Apply, Result, Constant, Value, \
     Op, \
@@ -35,7 +36,12 @@ from gof import \
     Type, Generic, generic, \
     object2, utils

-from compile import function, eval_outputs, fast_compute, OpFromGraph
+from compile import \
+    SymbolicInput, SymbolicInputKit, In, \
+    SymbolicOutput, Out, \
+    Mode, \
+    predefined_modes, predefined_linkers, predefined_optimizers, \
+    FunctionMaker, function, OpFromGraph #, eval_outputs, fast_compute

 import tensor
 import tensor_random

--- a/_test_compile.py
+++ b/_test_compile.py
@@ -9,137 +9,151 @@ import tensor

 PatternOptimizer = lambda p1, p2, ign=True: gof.OpKeyOptimizer(gof.PatternSub(p1, p2), ignore_newtrees=ign)

+def checkfor(testcase, fn, E):
+    try:
+        fn()
+    except Exception, e:
+        if isinstance(e, E):
+            # we got the exception we wanted
+            return
+        else:
+            # we did not get the exception we wanted
+            raise
+    # fn worked, but it shouldn't have
+    testcase.fail()

-def graph1(): # (x+y) * (x/z)
-    x, y, z = floats('xyz')
-    o = mul(add(x, y), div(x, z))
-    return [x,y,z], [o]

+# def graph1(): # (x+y) * (x/z)
+#     x, y, z = floats('xyz')
+#     o = mul(add(x, y), div(x, z))
+#     return [x,y,z], [o]

-class T_Function(unittest.TestCase):
+
+# class T_Function(unittest.TestCase):
    
-    def test_noopt(self):
-        gi, go = graph1()
-        p = function(gi, go, optimizer = None, linker = 'py')
-        self.failUnless(p(1.0,3.0,4.0) == 1.0)
-
-    def test_opt(self):
-        opt = PatternOptimizer((div, '1', '2'), (div, '2', '1'))
-        gi, go = graph1()
-        p = function(gi,go, optimizer=opt.optimize, linker = 'py')
-        self.failUnless(p(1.,3.,4.) == 16.0)
-
-    def test_multiout(self):
-        def graph2():
-            x, y, z = floats('xyz')
-            o = mul(add(x, y), div(x, z))
-            return [x,y,z], [o, o.owner.inputs[1]]
-        opt = PatternOptimizer((div, '1', '2'), (div, '2', '1'))
-        gi, go = graph2()
-        p = function(gi,go, optimizer=opt.optimize)
-        a,b = p(1.,3.,4.)
-        self.failUnless(a == 16.0)
-        self.failUnless(b == 4.0)
-
-    def test_make_many_functions(self):
-        x, y, z = tensor.scalars('xyz')
-        e0, e1, e2 = x+y+z, x*y-z, z*z+x*x+y*y
-        f1 = function([x, y, z], [e0])
-        f2 = function([x, y, z], [e0])
-        f3 = function([x, y, z], [e1])
-        f4 = function([x, y, z], [e2])
-        f5 = function([e0], [e0 * e0])
-        ff = FunctionFactory([x, y, z], [e0])
-        f6 = ff.create()
-        f7 = ff.create()
-        f8 = ff.create()
-        f9 = ff.partial(1.0, 2.0)
-        assert f1(1.0, 2.0, 3.0) == 6.0
-        assert f2(1.0, 2.0, 3.0) == 6.0
-        assert f3(1.0, 2.0, 3.0) == -1.0
-        assert f4(1.0, 2.0, 3.0) == 14.0
-        assert f5(7.0) == 49.0
-        assert f6(1.0, 2.0, 3.0) == 6.0
-        assert f7(1.0, 2.0, 3.0) == 6.0
-        assert f8(1.0, 2.0, 3.0) == 6.0
-        assert f9(3.0) == 6.0
-
-    def test_no_inputs(self):
-        x, y, z = tensor.value(1.0), tensor.value(2.0), tensor.value(3.0)
-        e = x*x + y*y + z*z
-        assert function([], [e], linker = 'py')() == 14.0
-        assert function([], [e], linker = 'c')() == 14.0
-        assert function([], [e], linker = 'c|py')() == 14.0
-        assert function([], [e], linker = 'c&py')() == 14.0
-        assert eval_outputs([e]) == 14.0
-        assert fast_compute(e) == 14.0
-
-    def test_closure(self):
-        x, y, z = tensor.scalars('xyz')
-        v = tensor.value(numpy.zeros(()))
-        e = x + tensor._add_inplace(v, 1)
-        f = function([x], [e])
-        assert f(1.) == 2.
-        assert f(1.) == 3.
-        assert f(1.) == 4.
-
-    def test_borrow_true(self):
-        x, y, z = tensor.scalars('xyz')
-        e = x + y + z
-        f = function([x, y, z], [e], borrow_outputs = True)
-        res1 = f(1.0, 2.0, 3.0)
-        assert res1 == 6.0
-        res2 = f(1.0, 3.0, 5.0)
-        assert res1 is res2
-        assert res1 == 9.0
-        assert res2 == 9.0
-
-    def test_borrow_false(self):
-        x, y, z = tensor.scalars('xyz')
-        e = x + y + z
-        for linker in 'py c c|py c&py'.split():
-            f = function([x, y, z], [e], borrow_outputs = False, linker = linker)
-            res1 = f(1.0, 2.0, 3.0)
-            self.failUnless(res1 == 6.0, (res1, linker))
-            res2 = f(1.0, 3.0, 5.0)
-            self.failUnless(res1 is not res2, (res1, res2, linker))
-            self.failUnless(res1 == 6.0, (res1, linker))
-            self.failUnless(res2 == 9.0, (res2, linker))
-
-    def test_borrow_false_through_inplace(self):
-        x, y, z = tensor.scalars('xyz')
-        # if borrow_outputs is False, we must not reuse the temporary created for x+y
-        e = tensor._add_inplace(x + y, z)
-        for linker in 'py c c|py c&py'.split():
-            f = function([x, y, z], [e], borrow_outputs = False, linker = linker)
-            res1 = f(1.0, 2.0, 3.0)
-            self.failUnless(res1 == 6.0, (res1, linker))
-            res2 = f(1.0, 3.0, 5.0)
-            self.failUnless(res1 is not res2, (res1, res2, linker))
-            self.failUnless(res1 == 6.0, (res1, linker))
-            self.failUnless(res2 == 9.0, (res2, linker))
-
-
-class T_fast_compute(unittest.TestCase):
+#     def test_noopt(self):
+#         gi, go = graph1()
+#         p = function(gi, go, optimizer = None, linker = 'py')
+#         self.failUnless(p(1.0,3.0,4.0) == 1.0)

-    def test_straightforward(self):
-        x, y, z = tensor.value(1.0), tensor.value(2.0), tensor.value(3.0)
-        e = x*x + y*y + z*z
-        assert fast_compute(e) == 14.0
-        assert compile._fcache[(e, )]() == 14.0
+#     def test_opt(self):
+#         opt = PatternOptimizer((div, '1', '2'), (div, '2', '1'))
+#         gi, go = graph1()
+#         p = function(gi,go, optimizer=opt.optimize, linker = 'py')
+#         self.failUnless(p(1.,3.,4.) == 16.0)
+
+#     def test_multiout(self):
+#         def graph2():
+#             x, y, z = floats('xyz')
+#             o = mul(add(x, y), div(x, z))
+#             return [x,y,z], [o, o.owner.inputs[1]]
+#         opt = PatternOptimizer((div, '1', '2'), (div, '2', '1'))
+#         gi, go = graph2()
+#         p = function(gi,go, optimizer=opt.optimize)
+#         a,b = p(1.,3.,4.)
+#         self.failUnless(a == 16.0)
+#         self.failUnless(b == 4.0)
+
+#     def test_make_many_functions(self):
+#         x, y, z = tensor.scalars('xyz')
+#         e0, e1, e2 = x+y+z, x*y-z, z*z+x*x+y*y
+#         f1 = function([x, y, z], [e0])
+#         f2 = function([x, y, z], [e0])
+#         f3 = function([x, y, z], [e1])
+#         f4 = function([x, y, z], [e2])
+#         f5 = function([e0], [e0 * e0])
+#         ff = FunctionFactory([x, y, z], [e0])
+#         f6 = ff.create()
+#         f7 = ff.create()
+#         f8 = ff.create()
+#         f9 = ff.partial(1.0, 2.0)
+#         assert f1(1.0, 2.0, 3.0) == 6.0
+#         assert f2(1.0, 2.0, 3.0) == 6.0
+#         assert f3(1.0, 2.0, 3.0) == -1.0
+#         assert f4(1.0, 2.0, 3.0) == 14.0
+#         assert f5(7.0) == 49.0
+#         assert f6(1.0, 2.0, 3.0) == 6.0
+#         assert f7(1.0, 2.0, 3.0) == 6.0
+#         assert f8(1.0, 2.0, 3.0) == 6.0
+#         assert f9(3.0) == 6.0
+
+#     def test_no_inputs(self):
+#         x, y, z = tensor.value(1.0), tensor.value(2.0), tensor.value(3.0)
+#         e = x*x + y*y + z*z
+#         assert function([], [e], linker = 'py')() == 14.0
+#         assert function([], [e], linker = 'c')() == 14.0
+#         assert function([], [e], linker = 'c|py')() == 14.0
+#         assert function([], [e], linker = 'c&py')() == 14.0
+#         assert eval_outputs([e]) == 14.0
+#         assert fast_compute(e) == 14.0
+
+#     def test_closure(self):
+#         x, y, z = tensor.scalars('xyz')
+#         v = tensor.value(numpy.zeros(()))
+#         e = x + tensor.add_inplace(v, 1)
+#         f = function([x], [e])
+#         assert f(1.) == 2.
+#         assert f(1.) == 3.
+#         assert f(1.) == 4.
+
+#     def test_borrow_true(self):
+#         x, y, z = tensor.scalars('xyz')
+#         e = x + y + z
+#         f = function([x, y, z], [e], borrow_outputs = True)
+#         res1 = f(1.0, 2.0, 3.0)
+#         assert res1 == 6.0
+#         res2 = f(1.0, 3.0, 5.0)
+#         assert res1 is res2
+#         assert res1 == 9.0
+#         assert res2 == 9.0
+
+#     def test_borrow_false(self):
+#         x, y, z = tensor.scalars('xyz')
+#         e = x + y + z
+#         for linker in 'py c c|py c&py'.split():
+#             f = function([x, y, z], [e], borrow_outputs = False, linker = linker)
+#             res1 = f(1.0, 2.0, 3.0)
+#             self.failUnless(res1 == 6.0, (res1, linker))
+#             res2 = f(1.0, 3.0, 5.0)
+#             self.failUnless(res1 is not res2, (res1, res2, linker))
+#             self.failUnless(res1 == 6.0, (res1, linker))
+#             self.failUnless(res2 == 9.0, (res2, linker))
+
+#     def test_borrow_false_through_inplace(self):
+#         x, y, z = tensor.scalars('xyz')
+#         # if borrow_outputs is False, we must not reuse the temporary created for x+y
+#         e = tensor.add_inplace(x + y, z)
+#         for linker in 'py c c|py c&py'.split():
+#             f = function([x, y, z], [e], borrow_outputs = False, linker = linker)
+#             res1 = f(1.0, 2.0, 3.0)
+#             self.failUnless(res1 == 6.0, (res1, linker))
+#             res2 = f(1.0, 3.0, 5.0)
+#             self.failUnless(res1 is not res2, (res1, res2, linker))
+#             self.failUnless(res1 == 6.0, (res1, linker))
+#             self.failUnless(res2 == 9.0, (res2, linker))
+
+
+# class T_fast_compute(unittest.TestCase):
+
+#     def test_straightforward(self):
+#         x, y, z = tensor.value(1.0), tensor.value(2.0), tensor.value(3.0)
+#         e = x*x + y*y + z*z
+#         assert fast_compute(e) == 14.0
+#         assert compile._fcache[(e, )]() == 14.0


 import tensor as T
 import random
 import numpy as N
+
 class T_OpFromGraph(unittest.TestCase):

    def test_straightforward(self):
        x, y, z = T.matrices('xyz')
        e = x + y * z
-        op = OpFromGraph([x, y, z], [e], linker='c|py')
+        op = OpFromGraph([x, y, z], [e], mode='FAST_RUN')
        f = op(x, y, z) - op(y, z, x)
-        fn = function([x, y, z], [f])
+        fn = function([x, y, z], f)
        xv, yv, zv = N.ones((2, 2)), N.ones((2, 2))*3, N.ones((2, 2))*5
        assert numpy.all(8.0 == fn(xv, yv, zv))
        assert numpy.all(8.0 == fn(xv, yv, zv))
@@ -147,9 +161,9 @@ class T_OpFromGraph(unittest.TestCase):
    def test_size_changes(self):
        x, y, z = T.matrices('xyz')
        e = T.dot(x, y)
-        op = OpFromGraph([x, y], [e], linker='c|py')
+        op = OpFromGraph([x, y], [e], mode='FAST_RUN')
        f = op(x, op(y, z))
-        fn = function([x, y, z], [f])
+        fn = function([x, y, z], f)
        xv, yv, zv = N.ones((2, 3)), N.ones((3, 4))*3, N.ones((4, 5))*5
        res = fn(xv, yv, zv)
        assert res.shape == (2, 5)
@@ -161,20 +175,371 @@ class T_OpFromGraph(unittest.TestCase):
    def test_grad(self):
        x, y, z = T.matrices('xyz')
        e = x + y * z
-        op = OpFromGraph([x, y, z], [e], linker='c|py', grad_depth = 2)
+        op = OpFromGraph([x, y, z], [e], mode='FAST_RUN', grad_depth = 2)
        f = op(x, y, z)
        f = f - T.grad(f, y)
-        fn = function([x, y, z], [f])
+        fn = function([x, y, z], f)
        xv, yv, zv = N.ones((2, 2)), N.ones((2, 2))*3, N.ones((2, 2))*5
        assert numpy.all(11.0 == fn(xv, yv, zv))


+class T_function(unittest.TestCase):
+    def test_empty(self):
+        fn = function([], []) #ok
+        self.failUnless(fn() == [])
+
+    def test_missing_inputs(self):
+
+        MissingInputException = TypeError
+
+        def fn():
+            x,s = T.scalars('xs')
+            fn = function([], [x])
+        checkfor(self, fn, MissingInputException)
+
+        def fn():
+            x,s = T.scalars('xs')
+            fn = function([s], [x])
+        checkfor(self, fn, MissingInputException)
+
+        def fn():
+            x,s = T.scalars('xs')
+            fn = function([s], x)
+        checkfor(self, fn, MissingInputException)
+
+        def fn():
+            x,s = T.scalars('xs')
+            fn = function([s], Out(x))
+        checkfor(self, fn, MissingInputException)
+
+        def fn():
+            x,s = T.scalars('xs')
+            fn = function([In(x, update=s+x)], x)
+        checkfor(self, fn, MissingInputException)
+
+        def fn():
+            x,s = T.scalars('xs')
+            fn = function([In(x, update=mul(s,s)+x)], x)
+        checkfor(self, fn, MissingInputException)
+
+    def test_input_anon_singleton(self):
+        x,s = T.scalars('xs')
+        fn = function([s,x], [x+s])
+        self.failUnless(fn(2,3) == [5])
+        # no state
+        self.failUnless(fn(2,3) == [5])
+
+    def test_input_anon_unpack(self):
+        x,s = T.scalars('xs')
+        fn = function([s,x], x+s)
+        self.failUnless(fn(2,3) == 5)
+
+    def test_naming_rule0(self):
+        x,s = T.scalars('xs')
+        f = function([x,s], x/s)
+        self.failUnless(f(1,2) == 0.5)
+        self.failUnless(f(2,1) == 2.0)
+        self.failUnless(f(s=2,x=1) == 0.5)
+        self.failUnless(f(x=2,s=1) == 2.0)
+        self.failUnless(f(2, s=1) == 2.0)
+        checkfor(self, lambda :f(2, x=2.0), TypeError) #got multiple values for keyword argument 'x'
+        checkfor(self, lambda :f(x=1), TypeError) #takes exactly 2 non-keyword arguments (1 given)
+        checkfor(self, lambda :f(s=1), TypeError) #takes exactly 2 non-keyword arguments (0 given)
+
+    def test_naming_rule1(self):
+        a = T.scalar() # the a is for 'anonymous' (un-named).
+        x,s = T.scalars('xs')
+        f = function([a, s], a/s)
+        self.failUnless(f(1,2) == 0.5)
+        self.failUnless(f(2,1) == 2.0)
+        self.failUnless(f(2, s=1) == 2.0)
+        checkfor(self, lambda:f(q=2,s=1), TypeError) #got unexpected keyword argument 'q'
+        checkfor(self, lambda:f(a=2,s=1), TypeError) #got unexpected keyword argument 'a'
+
+    def test_naming_rule2(self):
+        a = T.scalar() # the a is for 'anonymous' (un-named).
+        x,s = T.scalars('xs')
+
+        #x's name is ignored because it is followed by anonymous parameter a.
+        f = function([x, a, s], a/s)
+        self.failUnless(f(9,1,2) == 0.5)
+        self.failUnless(f(9,2,1) == 2.0)
+        self.failUnless(f(9,2, s=1) == 2.0)
+        checkfor(self, lambda:f(x=9,a=2,s=1), TypeError) #got unexpected keyword argument 'x'
+        checkfor(self, lambda:f(5.0,x=9), TypeError) #got unexpected keyword argument 'x'
+
+    def test_naming_rule3(self):
+        a = T.scalar() # the a is for 'anonymous' (un-named).
+        x,s = T.scalars('xs')
+
+        #x's name is not ignored (as in test_naming_rule2) because a has a default value.
+        f = function([x, In(a, value=1.0), s], a/s+x)
+        self.failUnless(f(9,2,4) == 9.5) #can specify all args in order
+        self.failUnless(f(9,2,s=4) == 9.5) # can give s as kwarg
+        self.failUnless(f(9,s=4) == 9.25) # can give s as kwarg, get default a
+        self.failUnless(f(x=9,s=4) == 9.25) # can give s as kwarg, omit a, x as kw
+        checkfor(self, lambda:f(x=9,a=2,s=4), TypeError) #got unexpected keyword argument 'a'
+        checkfor(self, lambda:f(), TypeError) #takes exactly 3 non-keyword arguments (0 given)
+        checkfor(self, lambda:f(x=9), TypeError) #takes exactly 3 non-keyword arguments (1 given)
+
+    def test_naming_rule4(self):
+        a = T.scalar() # the a is for 'anonymous' (un-named).
+        x,s = T.scalars('xs')
+
+        f = function([x, In(a, value=1.0,name='a'), s], a/s+x)
+
+        self.failUnless(f(9,2,4) == 9.5) #can specify all args in order
+        self.failUnless(f(9,2,s=4) == 9.5) # can give s as kwarg
+        self.failUnless(f(9,s=4) == 9.25) # can give s as kwarg, get default a
+        self.failUnless(f(9,a=2,s=4) == 9.5) # can give s as kwarg, a as kwarg
+        self.failUnless(f(x=9,a=2, s=4) == 9.5) # can give all kwargs
+        self.failUnless(f(x=9,s=4) == 9.25) # can give all kwargs
+        checkfor(self, lambda:f(), TypeError) #takes exactly 3 non-keyword arguments (0 given)
+        checkfor(self, lambda:f(5.0,x=9), TypeError) #got multiple values for keyword argument 'x'
+
+    def test_state_access(self):
+        a = T.scalar() # the a is for 'anonymous' (un-named).
+        x,s = T.scalars('xs')
+
+        f = function([x, In(a, value=1.0,name='a'), In(s, value=0.0, update=s+a*x)], s+a*x)
+
+        self.failUnless(f[a] == 1.0)
+        self.failUnless(f[s] == 0.0)
+
+        self.failUnless(f(3.0) == 3.0)
+        self.failUnless(f(3.0,a=2.0) == 9.0) #3.0 + 2*3.0
+
+        self.failUnless(f[a] == 1.0) #state hasn't changed permanently, we just overrode it last line
+        self.failUnless(f[s] == 9.0)
+
+        f[a] = 5.0
+        self.failUnless(f[a] == 5.0)
+        self.failUnless(f(3.0) == 24.0) #9 + 3*5
+        self.failUnless(f[s] == 24.0)
+
+    def test_same_names(self):
+        a,x,s = T.scalars('xxx')
+        #implicit names would cause error.  What do we do?
+        f = function([a, x, s], a+x+s)
+        self.failUnless(f(1,2,3) == 6)
+        checkfor(self, lambda:f(1,2,x=3), TypeError)
+
+    def test_weird_names(self):
+        a,x,s = T.scalars('xxx')
+        
+        checkfor(self, lambda:function([In(a,name=[])],[]), TypeError)
+
+        def t():
+            f = function([In(a,name=set(['adsf',()]), value=1.0),
+                          In(x,name=(), value=2.0),
+                          In(s,name=T.scalar(), value=3.0)], a+x+s)
+        checkfor(self, t, TypeError)
+
+    def test_copy(self):
+        a = T.scalar() # the a is for 'anonymous' (un-named).
+        x,s = T.scalars('xs')
+
+        f = function([x, In(a, value=1.0,name='a'), In(s, value=0.0, update=s+a*x, mutable=True)], s+a*x)
+
+        g = copy(f)
+        #if they both return, assume  that they return equivalent things.
+
+        self.failIf(g.container[x].storage is f.container[x].storage)
+        self.failIf(g.container[a].storage is f.container[a].storage)
+        self.failIf(g.container[s].storage is f.container[s].storage)
+
+        self.failIf(g.value[a] is not f.value[a]) # should not have been copied
+        self.failIf(g.value[s] is f.value[s]) # should have been copied because it is mutable.
+        self.failIf((g.value[s] != f.value[s]).any()) # its contents should be identical
+
+        self.failUnless(f(2, 1) == g(2)) #they should be in sync, default value should be copied.
+        self.failUnless(f(2, 1) == g(2)) #they should be in sync, default value should be copied.
+        f(1,2) # put them out of sync
+        self.failIf(f(1, 2) == g(1, 2)) #they should not be equal anymore.
+
+    def test_shared_state0(self):
+        a = T.scalar() # the a is for 'anonymous' (un-named).
+        x,s = T.scalars('xs')
+
+        f = function([x, In(a, value=1.0,name='a'), In(s, value=0.0, update=s+a*x, mutable=True)], s+a*x)
+        g = function([x, In(a, value=1.0,name='a'), In(s, value=f.container[s], update=s-a*x, mutable=True)], s+a*x)
+
+        f(1, 2)
+        self.failUnless(f[s] == 2)
+        self.failUnless(g[s] == 2)
+        g(1, 2)
+        self.failUnless(f[s] == 0)
+        self.failUnless(g[s] == 0)
+
+
+# class T_function_examples(unittest.TestCase):
+#     def test_accumulator(self):
+#         """Test low-level interface with state."""
+#         x = T.scalar('x')
+#         s = T.scalar('s')
+
+#         fn, states = program_states(inputs = [x], outputs = [], states = [(s, 0, s+x)])
+
+#         sum = 0
+#         for inc in [1, 4, 5,23, -324]:
+#             sum += inc
+#             fn.run([inc], states)
+#             assert sum == states[0].value
+
+
+#     def test_misc0(self):
+
+#         fn_inc, states_inc = function_states(\
+#                 inputs = [x], outputs = [], states = [(s, 0, s+x)])
+
+#         fn_inc2, states_inc2 = function_states(\
+#                 inputs = [x], outputs = [], states = [(s, 0, s+x)])
+
+#         fn_inc_copy = copy.copy(fn_inc) #USE fn copy
+
+#         # run() is like __call__, but requires an explicit state argument
+
+#         fn_inc.run([5], states_inc) #run on own state object
+#         fn_inc2.run([3], states_inc) #run on compatible state object
+#         assert states_inc[0].value == 8
+
+#         states_inc_copy = copy.copy(states_inc) #USE state copy
+#         fn_inc_copy.run([2], states_inc_copy)
+#         assert states_inc[0].value == 10   #compatible
+
+#         fn_dec, states_dec = function_states(\
+#                 inputs = [x], outputs = [], states = [((s, s-x), states_inc[0])])
+
+#         try:
+#             fn_inc.run([5], states_dec) # wrong kind of state for given program
+#             self.fail("fn accepted an invalid state argument")
+#         except SpecificException:
+#             raise NotImplementedError() #TODO
+#         except Exception:
+#             self.fail("fn accepted an invalid state argument")
+
+#     def test_perceptron(self):
+#         """Test high-level state interface."""
+
+#         mu0 = numpy.array([1.0,0.0])
+#         mu1 = numpy.array([0.0,0.1])
+#         si0 = numpy.ones_like(mu0) #unit variance
+#         si1 = numpy.ones_like(mu1) #unit variance
+
+#         #implicit internal state
+#         r_state = random.random_state()
+#         label = r_state.bernoulli(0.5) 
+
+#         #implicit internal state for each DiagGaussian
+#         x = label * DiagGaussian(mu0, si0, state=r_state) \
+#                 + (1 - label) * random.DiagGaussian(mu1, si1, state=r_state)
+
+#         w = T.tensor.dvector()
+#         b = T.tensor.dscalar()
+#         lr = 0.01
+
+#         decision = dot(x,w) + b > 0
+#         new_w = w + neq(label, decision) * lr * x
+#         new_b = b + neq(label, decision) * (label * (-lr) + (1-label)*lr)
+
+#         init_w = numpy.array([0.0, 0.0])
+#         init_b = 0.0
+
+#         io_stream = T.function([], [label, x], state={'seed':(r_state, 42)})
+
+#         perceptron_learn = T.function([x, label], [decision], 
+#                 state={
+#                     'w':((w, update_w), init_w),
+#                     'b':((b, update_b), init_b),
+#                     'lr':(lr, 0.01)})
+
+#         perceptron_use = T.function([x], [decision],
+#                 state={
+#                     'w':(w, perceptron_learn.shared['w']),
+#                     'b':(b, perceptron_learn.shared['b'])})
+
+#         errs = 0
+#         for i in xrange(100):
+#             il, ix = io_stream()
+
+#             d0 = perceptron_use(ix)
+#             d1 = perceptron_learn(ix, il)
+
+#             assert d0 == d1
+
+#             errs += (d0 != d1)
+
+#             print d0
+#         print 'errs =', errs 
+
+
+# class T_dict_interface(unittest.TestCase):
+
+#     def test_keyword(self):
+#         x = T.scalar('x')
+#         y = T.scalar('y')
+#         s = T.scalar('s')
+
+#         fn = function(input_kw = {'a':x, 'b':y}, outputs = [], state = {'s':(s, 0, s+x/y)})
+
+#         try:
+#             fn(1, 1)
+#             self.fail("non-keyword call accepted!")
+#         except SpecificException:
+#             raise NotImplementedError()
+#         except Exception:
+#             self.fail("non-keyword call accepted!")
+
+#         try:
+#             fn(a=1)
+#             self.fail("incomplete call accepted!")
+#         except SpecificException:
+#             raise NotImplementedError()
+#         except Exception:
+#             self.fail("incomplete call accepted!")
+
+#         try:
+#             fn(a=1, b=1, c=1)
+#             self.fail("overcomplete call accepted!")
+#         except SpecificException:
+#             raise NotImplementedError()
+#         except Exception:
+#             self.fail("overcomplete call accepted!")
+
+#     def test_aliased_state(self):
+#         """Test keyword input and copy."""
+#         x = T.scalar('x')
+#         y = T.scalar('y')
+#         s = T.scalar('s')
+
+#         fn = function(input_kw = {'a':x, 'b':y}, outputs = [], state = {'s':(s, 0, s+x/y)})
+#         fn2 = fn.copy()
+#         fn3 = fn.copy()
+
+#         fn(a=2, b=5)
+#         fn2(a=5, b=2)
+#         fn3(b=2, a=5)
+#         assert fn.state['s'] == 2.0/5
+#         assert fn2.state['s'] == 5.0/2 
+#         assert fn3.state['s'] == 5.0/2
+
+#         #fn and fn3 use the same sort of state, so this is OK.
+#         fn3.state = fn.state 
+
+#         fn.state['s'] = 0
+#         fn(a=1, b=1)   #increment the shared state
+#         assert fn3.state['s'] == 1
+#         fn3(a=-1, b=1) #decrement the shared state
+#         assert fn.state['s'] == 0
+
+
 if __name__ == '__main__':

    if 1:
        unittest.main()
    else:
-        testcases = [T_dict_interface, T_state]
+        testcases = []
+        testcases.append(T_function)

        #<testsuite boilerplate>
        testloader = unittest.TestLoader()

--- a/_test_sparse.py
+++ b/_test_sparse.py
@@ -8,6 +8,10 @@ from sparse import _is_dense, _is_sparse, _is_dense_result, _is_sparse_result
 from sparse import _mtypes, _mtype_to_str

 import random
+import gof
+
+def eval_outputs(outputs):
+    return compile.function([], outputs)()[0]

 class T_transpose(unittest.TestCase):
    def setUp(self):
@@ -23,7 +27,7 @@ class T_transpose(unittest.TestCase):
        self.failUnless(ta.type.dtype == 'float64', ta.type.dtype)
        self.failUnless(ta.type.format == 'csr', ta.type.format)

-        vta = compile.eval_outputs([ta])
+        vta = eval_outputs([ta])
        self.failUnless(vta.shape == (3,5))
    def test_transpose_csr(self):
        a = as_sparse(sparse.csr_matrix(sparse.speye(5,3)))
@@ -34,7 +38,7 @@ class T_transpose(unittest.TestCase):
        self.failUnless(ta.type.dtype == 'float64', ta.type.dtype)
        self.failUnless(ta.type.format == 'csc', ta.type.format)

-        vta = compile.eval_outputs([ta])
+        vta = eval_outputs([ta])
        self.failUnless(vta.shape == (3,5))

 class T_Add(unittest.TestCase):
@@ -60,7 +64,7 @@ class T_Add(unittest.TestCase):
            self.failUnless(apb.type.format == aR.type.format, apb.type.format)
            self.failUnless(apb.type.format == bR.type.format, apb.type.format)

-            val = compile.eval_outputs([apb])
+            val = eval_outputs([apb])
            self.failUnless(val.shape == (3,2))
            self.failUnless(numpy.all(val.todense() == (a + b).todense()))
            self.failUnless(numpy.all(val.todense() == numpy.array([[1., 2], [3, 4], [5, 6]])))
@@ -85,7 +89,7 @@ class T_Add(unittest.TestCase):
            self.failUnless(apb.type.dtype == aR.type.dtype, apb.type.dtype)
            self.failUnless(apb.type.dtype == bR.type.dtype, apb.type.dtype)

-            val = compile.eval_outputs([apb])
+            val = eval_outputs([apb])
            self.failUnless(val.shape == (3, 2))
            self.failUnless(numpy.all(val == (a + b)))
            self.failUnless(numpy.all(val == numpy.array([[1., 2], [3, 4], [5, 6]])))
@@ -110,7 +114,7 @@ class T_Add(unittest.TestCase):
            self.failUnless(apb.type.dtype == aR.type.dtype, apb.type.dtype)
            self.failUnless(apb.type.dtype == bR.type.dtype, apb.type.dtype)

-            val = compile.eval_outputs([apb])
+            val = eval_outputs([apb])
            self.failUnless(val.shape == (3, 2))
            self.failUnless(numpy.all(val == (a + b)))
            self.failUnless(numpy.all(val == numpy.array([[1., 2], [3, 4], [5, 6]])))
@@ -122,14 +126,14 @@ class T_conversion(unittest.TestCase):
    def test0(self):
        a = tensor.as_tensor(numpy.random.rand(5))
        s = csc_from_dense(a)
-        val = compile.eval_outputs([s])
+        val = eval_outputs([s])
        self.failUnless(str(val.dtype)=='float64')
        self.failUnless(val.format == 'csc')

    def test1(self):
        a = tensor.as_tensor(numpy.random.rand(5))
        s = csr_from_dense(a)
-        val = compile.eval_outputs([s])
+        val = eval_outputs([s])
        self.failUnless(str(val.dtype)=='float64')
        self.failUnless(val.format == 'csr')

@@ -138,7 +142,7 @@ class T_conversion(unittest.TestCase):
            s = t((2,5))
            d = dense_from_sparse(s)
            s[0,0] = 1.0
-            val = compile.eval_outputs([d])
+            val = eval_outputs([d])
            self.failUnless(str(val.dtype)=='float64')
            self.failUnless(numpy.all(val[0] == [1,0,0,0,0]))

@@ -159,7 +163,7 @@ class _testCase_dot(unittest.TestCase):

            zop = dot(x,xT)
            self.failUnless(_is_sparse_result(zop))
-            z = compile.eval_outputs([zop])
+            z = eval_outputs([zop])
            self.failUnless(_is_sparse(z))
            self.failUnless(z.shape == (500,500))
            self.failUnless(type(z) is mtype)
@@ -190,7 +194,7 @@ class _testCase_dot(unittest.TestCase):

            zop = dot(x,y)
            self.failUnless(_is_sparse_result(zop))
-            z = compile.eval_outputs([zop])
+            z = eval_outputs([zop])
            self.failUnless(_is_sparse(z))
            self.failUnless(z.shape == (500,2))
            self.failUnless(type(z) is mtype)
@@ -227,7 +231,7 @@ class _testCase_dot(unittest.TestCase):
 #            zop = dot(y, x)
            zop = transpose(dot(y, x))
            self.failUnless(_is_sparse_result(zop))
-            z = compile.eval_outputs([zop])
+            z = eval_outputs([zop])
            self.failUnless(_is_sparse(z))
            self.failUnless(z.shape == (500,2))
 #            self.failUnless(type(z) is mtype)

--- a/_test_tensor.py
+++ b/_test_tensor.py
@@ -6,7 +6,7 @@ import tensor # for hidden symbols

 import unittest
 from copy import copy
-from compile import function, FunctionFactory, eval_outputs
+import compile
 import gradient
 import gof, gof.graph
 from gof.python25 import any
@@ -15,6 +15,21 @@ from gof.utils import AbstractFunctionError

 from elemwise import DimShuffle

+
+default_mode = compile.Mode(optimizer = None,
+                            linker = 'c&py')
+
+def function(inputs, outputs, mode = default_mode):
+    return compile.function(inputs, outputs, mode = mode, accept_inplace = True)
+
+
+def eval_outputs(outputs, mode = default_mode):
+    results = function([], outputs, mode = mode)()
+    if len(results) == 1:
+        return results[0]
+    return results
+
+
 def _numpy_checker(x, y):
    """
    Checks if x.data and y.data have the same contents.
@@ -64,9 +79,8 @@ def make_tester(name, op, expected, checks = {}, good = {}, bad_build = {}, bad_

                try:
                    f = function(inputrs, node.outputs,
-                                 linker = 'c&py', ##lambda env, **kwargs: gof.DualLinker(env, checker = _numpy_checker, **kwargs),
-                                 unpack_single = False,
-                                 optimizer = None)
+                                 mode = default_mode, ##lambda env, **kwargs: gof.DualLinker(env, checker = _numpy_checker, **kwargs),
+                                 )
                except:
                    type, exc_value, traceback = sys.exc_info()
                    err_msg = "Test %s::%s: Error occurred while trying to make a Function" \
@@ -124,9 +138,8 @@ def make_tester(name, op, expected, checks = {}, good = {}, bad_build = {}, bad_

                try:
                    f = function(inputrs, node.outputs,
-                                 linker = 'c&py', #lambda env, **kwargs: gof.DualLinker(env, checker = _numpy_checker, **kwargs),
-                                 unpack_single = False,
-                                 optimizer = None)
+                                 mode = default_mode, #lambda env, **kwargs: gof.DualLinker(env, checker = _numpy_checker, **kwargs),
+                                 )
                except:
                    type, exc_value, traceback = sys.exc_info()
                    err_msg = "Test %s::%s: Error occurred while trying to make a Function" \
@@ -520,64 +533,6 @@ DotTester = make_tester(name = 'DotTester',
 #      rationale: it's tricky, and necessary everytime you want to verify
 #      gradient numerically

-def verify_grad(testcase, op, pt, n_tests=1, rng=numpy.random, eps=0.0000001, tol=0.0001,
-        linker='c&py'):
-    """testcase.failUnless(analytic gradient matches finite-diff gradient)"""
-    pt = [numpy.asarray(p) for p in pt]
-
-    for test_num in xrange(n_tests):
-#        tensor_pt = [as_tensor(p,name='input %i'%i) for i,p in enumerate(pt)]
-        tensor_pt = [constant(p).type('input %i'%i) for i,p in enumerate(pt)]
-        #o = op.make_node(*[tpt.copy() for tpt in tensor_pt])
-        o = safe_make_node(op, *[tpt.copy() for tpt in tensor_pt])
-        
-        if hasattr(o, 'outputs'):
-            o_outputs = o.outputs
-        else:
-            o_outputs = o
-
-        if len(o_outputs) > 1:
-            raise NotImplementedError('cant (yet) autotest gradient of op with multiple outputs')
-            # we could make loop over outputs making random projections R for each,
-            # but this doesn't handle the case where not all the outputs are
-            # differentiable... so I leave this as TODO for now -JB.
-        o_fn = function(tensor_pt, o_outputs, linker=linker)
-        o_fn_out = o_fn(*pt)
-        random_projection = rng.rand(*o_fn_out.shape)
-        t_r = as_tensor(random_projection)
-
-        #random projection of o onto t_r
-        cost = sum(t_r * o_outputs[0])
-        cost_fn = function(tensor_pt, [cost], linker=linker)
-
-        num_grad = gradient.numeric_grad(cost_fn, pt)
-
-        symbolic_grad = grad(cost, tensor_pt,as_tensor(1.0,name='g_cost'))
-
-        if 0:
-            print '-------'
-            print '----------'
-            for op in gof.graph.io_toposort(tensor_pt, symbolic_grad):
-                print op
-
-        grad_fn = function(tensor_pt, symbolic_grad,linker=linker)
-
-        analytic_grad = grad_fn(*pt)
-        if not isinstance(analytic_grad, (list, tuple)):
-            analytic_grad = [analytic_grad]
-
-#         if num_grad.max_err(analytic_grad) > 1.0e-4:
-#             print "aaaaaaaaaa"
-#             print gof.Env(tensor_pt, [cost])
-#             print gof.Env(tensor_pt, symbolic_grad)
-#             print analytic_grad
-#             print num_grad.gf
-#             print num_grad.max_err(analytic_grad)
-#             print "bbbbbbbbbb"
-
-        if num_grad.max_err(analytic_grad) > 1.0e-4:
-            raise Exception(verify_grad.E_grad)
-verify_grad.E_grad = 'gradient error exceeded tolerance'


 #useful mostly for unit tests
@@ -595,24 +550,24 @@ def _approx_eq(a,b,eps=1.0e-9):
    return  True
 _approx_eq.debug = 0

-def check_eq(self, node_in, node_out, arg_in, arg_out):
-    fn = Function([node_in], [node_out])
-    self.failUnless( numpy.all(fn(arg_in) == arg_out), (arg_in, arg_out))
+# def check_eq(self, node_in, node_out, arg_in, arg_out):
+#     fn = Function([node_in], node_out)
+#     self.failUnless( numpy.all(fn(arg_in) == arg_out), (arg_in, arg_out))

-def check_eq2(self, inputs, output, args_in, arg_out):
-    fn = Function(inputs, [output])
-    val = fn(*args_in)
-    self.failUnless( numpy.all(val == arg_out), (val, arg_out))
+# def check_eq2(self, inputs, output, args_in, arg_out):
+#     fn = Function(inputs, output)
+#     val = fn(*args_in)
+#     self.failUnless( numpy.all(val == arg_out), (val, arg_out))

-def check_eq2_c(self, inputs, output, args_in, arg_out):
-    fn = Function(inputs, [output], linker_cls = gof.CLinker)
-    val = fn(*args_in)
-    self.failUnless( numpy.all(val == arg_out), (val, arg_out))
+# def check_eq2_c(self, inputs, output, args_in, arg_out):
+#     fn = Function(inputs, [output], linker_cls = gof.CLinker)
+#     val = fn(*args_in)
+#     self.failUnless( numpy.all(val == arg_out), (val, arg_out))

-def check_eq2_both(self, inputs, output, args_in, arg_out):
-    fn = Function(inputs, [output], linker_cls = lambda env: gof.DualLinker(env, _numpy_checker))
-    val = fn(*args_in)
-    self.failUnless( numpy.all(val == arg_out), (val, arg_out))
+# def check_eq2_both(self, inputs, output, args_in, arg_out):
+#     fn = Function(inputs, [output], linker_cls = lambda env: gof.DualLinker(env, _numpy_checker))
+#     val = fn(*args_in)
+#     self.failUnless( numpy.all(val == arg_out), (val, arg_out))

 class T_Shape(unittest.TestCase):
    def test_basic0(self):
@@ -633,7 +588,7 @@ class T_Cast(unittest.TestCase):
                                        [convert_to_int8, convert_to_int16, convert_to_int32, convert_to_int64,
                                         convert_to_float32, convert_to_float64]):
                y = converter(x)
-                f = function([x], [y], strict = True, linker = 'c&py')
+                f = function([compile.In(x, strict = True)], y, mode = default_mode)
                a = numpy.arange(10, dtype = type1)
                b = f(a)
                self.failUnless(numpy.all(b == numpy.arange(10, dtype = type2)))
@@ -701,7 +656,7 @@ class T_transpose(unittest.TestCase):
        n = as_tensor(numpy.ones(()))
        t = transpose(n)
        self.failUnless(t.owner.op == tensor._transpose_inplace)
-        f = function([n], [t])
+        f = function([n], t)
        tval = f(n.data)
        self.failUnless(tval.shape == n.data.shape)

@@ -713,7 +668,7 @@ class T_transpose(unittest.TestCase):
        n = as_tensor(numpy.ones(5))
        t = transpose(n)
        self.failUnless(t.owner.op == tensor._transpose_inplace)
-        f = function([n], [t])
+        f = function([n], t)
        tval = f(n.data)
        self.failUnless(tval.shape == n.data.shape)
        #test aliasing
@@ -724,7 +679,7 @@ class T_transpose(unittest.TestCase):
        n = as_tensor(numpy.ones((5,3)))
        t = transpose(n)
        self.failUnless(t.owner.op == tensor._transpose_inplace)
-        f = function([n], [t])
+        f = function([n], t)
        tval = f(n.data)
        self.failUnless(tval.shape == (3,5))
        #test aliasing
@@ -736,7 +691,7 @@ class T_transpose(unittest.TestCase):
        n = as_tensor(numpy.ones((5,3,2)))
        t = tensor._transpose_inplace(n)
        self.failUnless(t.owner.op == tensor._transpose_inplace)
-        f = function([n], [t])
+        f = function([n], t)
        tval = f(n.data)
        self.failUnless(tval.shape == (2,3,5))
        #test aliasing
@@ -932,29 +887,101 @@ class T_subtensor(unittest.TestCase):



-class T_Stack(unittest.TestCase):
-    def test_hstack(self):
-        a = as_tensor(numpy.array([[1, 2, 3], [4, 5, 6]]))
-        b = as_tensor(numpy.array([[7], [8]]))
-        s = horizontal_stack(a, b)
-        c = numpy.array([[1, 2, 3, 7], [4, 5, 6, 8]])
-        self.failUnless((eval_outputs([s]) == c).all())
-    def test_vstack(self):
-        a = as_tensor(numpy.array([[1, 2, 3], [4, 5, 6]]))
-        b = as_tensor(numpy.array([[7, 8, 9]]))
-        s = vertical_stack(a, b)
-        c = numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
-        self.failUnless((eval_outputs([s]) == c).all())
+class T_Join_and_Split(unittest.TestCase):
+    """
+    Split is tested by each verify_grad method.
+    """
+
+    class Join1(Op):
+        def make_node(self, *inputs):
+            inputs = [as_tensor(t) for t in inputs]
+            outputs = [lscalar()] + [i.type() for i in inputs]
+            return Apply(self, inputs, outputs)
+        def perform(self, node, inputs, outputs):
+            outputs[0][0] = 1
+            for i,o in zip(inputs, outputs[1:]):
+                o[0] = i.copy()
+        def grad(self, inputs, g_outputs):
+            return g_outputs[1:]
+
+    def setUp(self):
+        Join.debug = False
+
+    def test_join_scalar(self):
+        a = as_tensor(1)
+        b = as_tensor(2)
+        try:
+            s = join(0, a, b)
+        except:
+            return
+        self.fail()
+
+    def test_stack_scalar(self):
+        a = as_tensor(1)
+        b = as_tensor(2)
+        c = as_tensor(3)
+        s = stack(a, b, c)
+
+        want = numpy.array([1, 2, 3])
+        self.failUnless((eval_outputs([s]) == want).all())
+
+
+    def test_join_vector(self):
+        a = as_tensor(numpy.array([1, 2, 3]))
+        b = as_tensor(numpy.array([7, 8, 9]))

-    def test_vstack_grad(self):
+        s = join(0, a, b)
+        want = numpy.array([1, 2, 3, 7, 8, 9])
+        self.failUnless((eval_outputs([s]) == want).all())
+
+    def test_stack_vector(self):
+        a = as_tensor(numpy.array([1, 2, 3]))
+        b = as_tensor(numpy.array([7, 8, 9]))
+
+        s = stack(a, b)
+        want = numpy.array([[1, 2, 3],[ 7, 8, 9]])
+        self.failUnless((eval_outputs([s]) == want).all())
+
+    def test_join_matrix0(self):
        a = as_tensor(numpy.array([[1, 2, 3], [4, 5, 6]]))
        b = as_tensor(numpy.array([[7, 8, 9]]))
-        s = vertical_stack(a, b)
-        ga,gb = grad(sum(vertical_stack(a,b)), [a,b])
+        s = join(0, a, b)
+
+        want = numpy.array([[1, 2, 3],[4,5,6],[7, 8, 9]])
+        self.failUnless((eval_outputs([s]) == want).all())
+
+    def test_join_matrix1(self):
+        av=numpy.array([[1, 2, 3], [4, 5, 6]], dtype='float32')
+        bv= numpy.array([[7], [8]],dtype='float32')
+        a = as_tensor(av)
+        b = as_tensor(bv)
+        s = join(1, a, b)
+        want = numpy.array([[1, 2, 3, 7], [4, 5, 6, 8]], dtype='float32')
+        self.failUnless((eval_outputs([s]) == want).all())
+
+        verify_grad(self, lambda a, b: join(1,a,b), [av, bv], eps=1.0e-4, tol=1.0e-3)
+
+    def test_join_matrixV(self):
+        """variable join axis"""
+        v = numpy.array([[1., 2., 3.], [4., 5., 6.]])
+        a = as_tensor(v.copy())
+        b = as_tensor(v.copy())
+        ax = lscalar()
+        s = join(ax, a, b)
+
+        f = function([ax], [s])
+
+        want = numpy.array([[1, 2, 3], [4, 5, 6] ,[1, 2, 3], [4, 5, 6]])
+        got = f(0)
+        self.failUnless((got == want).all(), (got, want))
+
+        want = numpy.array([[ 1, 2, 3, 1, 2, 3], [4, 5, 6, 4, 5, 6]])
+        got = f(1)
+        self.failUnless((got == want).all(), (got, want))
+
+        verify_grad(self, lambda a, b: join(0,a,b), [v, 2*v])
+        verify_grad(self, lambda a, b: join(1,a,b), [v, 2*v])

-        gval = eval_outputs([ga, gb])
-        self.failUnless(numpy.all(gval[0] == 1.0))
-        self.failUnless(numpy.all(gval[1] == 1.0))

 class T_Concatenate(unittest.TestCase):
    def test_concatenate(self):
@@ -979,7 +1006,7 @@ class T_Concatenate(unittest.TestCase):
 class _test_comparison(unittest.TestCase):
    def test_gt(self):
        x, y = fvector(), fvector()
-        fn = function([x,y], [x > y])
+        fn = function([x,y], x > y)
        l = numpy.asarray([0.,-1.,1.])
        r = numpy.asarray([0.,1.,-1.])
        v = fn(l, r)
@@ -987,7 +1014,7 @@ class _test_comparison(unittest.TestCase):

    def test_lt(self):
        x, y = fvector(), fvector()
-        fn = function([x,y], [x < y])
+        fn = function([x,y], x < y)
        l = numpy.asarray([0.,-1.,1.])
        r = numpy.asarray([0.,1.,-1.])
        v = fn(l, r)
@@ -995,7 +1022,7 @@ class _test_comparison(unittest.TestCase):

    def test_le(self):
        x, y = fvector(), fvector()
-        fn = function([x,y], [x <= y])
+        fn = function([x,y], x <= y)
        l = numpy.asarray([0.,-1.,1.])
        r = numpy.asarray([0.,1.,-1.])
        v = fn(l, r)
@@ -1003,7 +1030,7 @@ class _test_comparison(unittest.TestCase):

    def test_ge(self):
        x, y = fvector(), fvector()
-        fn = function([x,y], [x >= y])
+        fn = function([x,y], x >= y)
        l = numpy.asarray([0.,-1.,1.])
        r = numpy.asarray([0.,1.,-1.])
        v = fn(l, r)
@@ -1011,7 +1038,7 @@ class _test_comparison(unittest.TestCase):

    def test_eq(self):
        x, y = fvector(), fvector()
-        fn = function([x,y], [eq(x,y)])
+        fn = function([x,y], eq(x,y))
        l = numpy.asarray([0.,-1.,1.])
        r = numpy.asarray([0.,1.,-1.])
        v = fn(l, r)
@@ -1019,7 +1046,7 @@ class _test_comparison(unittest.TestCase):

    def test_neq(self):
        x, y = fvector(), fvector()
-        fn = function([x,y], [neq(x, y)])
+        fn = function([x,y], neq(x, y))
        l = numpy.asarray([0.,-1.,1.])
        r = numpy.asarray([0.,1.,-1.])
        v = fn(l, r)
@@ -1028,7 +1055,7 @@ class _test_comparison(unittest.TestCase):
 class _test_bitwise(unittest.TestCase):
    def test_or(self):
        x, y = bvector(), bvector()
-        fn = function([x,y], [x|y])
+        fn = function([x,y], x|y)
        l = numpy.asarray([0,0,1,1], dtype = 'int8')
        r = numpy.asarray([0,1,0,1], dtype = 'int8')
        v = fn(l, r)
@@ -1036,10 +1063,10 @@ class _test_bitwise(unittest.TestCase):

    def test_xor(self):
        x, y = bvector(), bvector()
-        fn = function([x,y], [x^y])
+        fn = function([x,y], x^y)
        ix = x
        ix ^= y
-        gn = function([x,y], [ix])
+        gn = function([x,y], ix)
        l = numpy.asarray([0,0,1,1], dtype = 'int8')
        r = numpy.asarray([0,1,0,1], dtype = 'int8')
        v = fn(l, r)
@@ -1050,7 +1077,7 @@ class _test_bitwise(unittest.TestCase):

    def test_and(self):
        x, y = bvector(), bvector()
-        fn = function([x,y], [x&y])
+        fn = function([x,y], x&y)
        l = numpy.asarray([0,0,1,1], dtype = 'int8')
        r = numpy.asarray([0,1,0,1], dtype = 'int8')
        v = fn(l, r)
@@ -1058,7 +1085,7 @@ class _test_bitwise(unittest.TestCase):

    def test_inv(self):
        x, y = bvector(), bvector()
-        fn = function([x,y], [~x])
+        fn = function([x,y], ~x)
        l = numpy.asarray([0,0,1,1], dtype = 'int8')
        r = numpy.asarray([0,1,0,1], dtype = 'int8')
        v = fn(l, r)
@@ -1077,7 +1104,7 @@ class T_add(unittest.TestCase):
                     ("*", lambda x,y: x*y),
                     ("/", lambda x,y: x/y))
            for s, fn in tests:
-                f = function([a,b], [fn(a, b)], linker = 'c')
+                f = function([a,b], fn(a, b), mode = compile.Mode(optimizer = None, linker = 'c'))
                self.failUnless(numpy.all(fn(a.data, b.data) == f(a.data, b.data)))

    def test_grad_scalar_l(self):
@@ -1343,7 +1370,7 @@ class t_dot(unittest.TestCase):
    def not_aligned(self, x, y):
        z = dot(x,y)
        try:
-            tz = eval_outputs([z])
+            tz = eval_outputs([z], mode = compile.Mode(optimizer = None, linker = 'py'))
        except ValueError, e:
            self.failUnless(e[0].split()[1:4] == ['are', 'not', 'aligned'], e)
            return
@@ -1389,7 +1416,7 @@ class t_gemm(unittest.TestCase):
            z_orig = z.copy()
            tz,ta,tx,ty,tb = [as_tensor(p).type() for p in z,a,x,y,b]

-            f = function([tz,ta,tx,ty,tb], [gemm(tz,ta,tx,ty,tb)], linker=l)
+            f = function([tz,ta,tx,ty,tb], gemm(tz,ta,tx,ty,tb), mode=compile.Mode(optimizer = None, linker = l))
            new_z = f(z,a,x,y,b)
            z_after = self._gemm(z_orig, a, x, y, b)

@@ -1511,7 +1538,7 @@ class t_gemm(unittest.TestCase):

            tz,ta,tx,ty,tb = [value(p) for p in z,a,x,y,b]

-            f = function([tz,ta,tx,ty,tb], [gemm(tz,ta,tx,ty,tb)], linker=l)
+            f = function([tz,ta,tx,ty,tb], gemm(tz,ta,tx,ty,tb), mode = compile.Mode(optimizer = None, linker=l))
            f(z, a, x, y, b)
            self.failUnless(_approx_eq(z_after, z), (z_orig, z_after, z))
            f(z.T, a, y.T, x.T, b)
@@ -1760,17 +1787,17 @@ class T_op_cache(unittest.TestCase):
        v = matrix()
        v.name = 'v'
        gv = fill(v/v, 1.0)/v - (fill(v/v, 1.0) * v) / (v*v)
-        fn_py = function([v], [gv], linker = 'py')
-        fn_c_or_py = function([v], [gv], linker = 'c|py')
+        fn_py = function([v], gv, mode = compile.Mode(optimizer = None, linker = 'py'))
+        fn_c_or_py = function([v], gv, compile.Mode(optimizer = None, linker = 'c|py'))

        a = numpy.random.rand(5,2)
        self.failUnless(numpy.all(fn_py(a) == fn_c_or_py(a)))

 if __name__ == '__main__':
-    if 1:
+    if 0:
        unittest.main()
    else:
-        testcase =  T_Concatenate
+        testcase =  AbsInplaceTester

        suite = unittest.TestLoader()
        suite = suite.loadTestsFromTestCase(testcase)

--- a/_test_tensor_opt.py
+++ b/_test_tensor_opt.py
@@ -100,18 +100,18 @@ class _test_dimshuffle_lift(unittest.TestCase):

 from tensor import *

-from sandbox  import pprint
+#from sandbox import pprint

 class _test_greedy_distribute(unittest.TestCase):
    def test_main(self):
        a, b, c, d, x, y, z = matrices('abcdxyz')
        e = (a/z + b/x) * x * z
        g = Env([a,b,c,d,x,y,z], [e])
-        print pprint.pp.process(g.outputs[0])
+        ##print pprint.pp.process(g.outputs[0])
        mul_canonizer.optimize(g)
        gof.TopoOptimizer(gof.LocalOptGroup(local_fill_cut, local_fill_lift), order = 'out_to_in').optimize(g)
        gof.TopoOptimizer(gof.LocalOptGroup(local_greedy_distributor), order = 'out_to_in').optimize(g)
-        print pprint.pp.process(g.outputs[0])
+        ##print pprint.pp.process(g.outputs[0])
        


@@ -131,10 +131,10 @@ class _test_canonize(unittest.TestCase):
 #        e = x / y / x
        e = (x / x) * (y / y)
        g = Env([x, y, z, a, b, c, d], [e])
-        print pprint.pp.process(g.outputs[0])
+        ##print pprint.pp.process(g.outputs[0])
        mul_canonizer.optimize(g)
        gof.TopoOptimizer(gof.LocalOptGroup(local_fill_cut, local_fill_lift), order = 'out_to_in').optimize(g)
-        print pprint.pp.process(g.outputs[0])
+        ##print pprint.pp.process(g.outputs[0])
        
 #     def test_plusmin(self):
 #         x, y, z = inputs()

--- a/_test_tensor_random.py
+++ b/_test_tensor_random.py
+## TODO: REDO THESE TESTS
+
 import unittest

 from tensor_random import *
@@ -7,7 +9,7 @@ import compile
 def Uniform(s, n):
    return NumpyGenerator(s, n, numpy.random.RandomState.uniform)

-class T_Random(unittest.TestCase):
+class T_Random:#(unittest.TestCase):
    def test0(self):

        rng = Uniform(12345, 2)

--- a/compile.py
+++ b/compile.py
 """Convenient driver of graph construction, optimization, and linking."""

+import copy_reg
+import cPickle
+
+from functools import partial
+
+
 import numpy
 import gof
 import sys
 from copy import copy

-#TODO: put together some default optimizations (TRAC #67)
-
-def exec_py_opt(inputs, outputs, features=[]):
-    """Return an optimized graph running purely python implementations"""
-    return Function(intputs, outputs, features, exec_py_opt.optimizer, gof.link.PerformLinker(), False)
-exec_py_opt.optimizer = None
-
-def exec_opt(inputs, outputs, features=[]):
-    """Return a fast implementation"""
-    return Function(intputs, outputs, features, exec_opt.optimizer, gof.link.PerformLinker(), False)
-exec_opt.optimizer = None
-
-class _DefaultOptimizer(object):
-    #const = gof.opt.ConstantFinder()
-    merge = gof.opt.MergeOptimizer()
-    def __call__(self, env):
-        #self.const(env)
-        self.merge(env)
-default_optimizer = _DefaultOptimizer()
-        
-def _mark_indestructible(results):
-    for r in results:
-        r.tag.indestructible = True
-
-# def linker_cls_python_and_c(env, **kwargs):
-#     """Use this as the linker_cls argument to Function.__init__ to compare
-#     python and C implementations"""
-
 def check_equal(x, y):
+    """
+    Returns True iff x[0] and y[0] are equal (checks the dtype and
+    shape if x and y are numpy.ndarray instances). Used internally.
+    """
    x, y = x[0], y[0]
    if isinstance(x, numpy.ndarray) or isinstance(y, numpy.ndarray):
        if x.dtype != y.dtype or x.shape != y.shape or numpy.any(abs(x - y) > 1e-10):
            raise Exception("Output mismatch.", {'performlinker': x, 'clinker': y})
    else:
        if x != y:
-                raise Exception("Output mismatch.", {'performlinker': x, 'clinker': y})
-
-#     return gof.DualLinker(checker, **kwargs).accept(env)
-
+            raise Exception("Output mismatch.", {'performlinker': x, 'clinker': y})

 def infer_reuse_pattern(env, outputs_to_disown):
+    """
+    Given an env and a list of results, returns the list of all
+    results which may share the same underlying data storage as any of
+    the specified results. Used internally by function, FunctionMaker.
+    """
    do_not_reuse = list()
    seen = set()
    def walk(r):
@@ -64,32 +48,9 @@ def infer_reuse_pattern(env, outputs_to_disown):
        walk(output)
    return do_not_reuse

-
-def cloned_env(inputs, outputs):
-    inputs, outputs = gof.graph.clone(inputs, outputs)
-    env = gof.env.Env(inputs, outputs)
-    return env
-
-def std_env(inputs, outputs, disown_inputs = False,
-            use_destroy_handler = True):
-    inputs, outputs = gof.graph.clone(inputs, outputs)
-    _mark_indestructible(outputs)
-    env = gof.env.Env(inputs, outputs)
-    if use_destroy_handler:
-        env.extend(gof.DestroyHandler())
-    env.extend(gof.ReplaceValidate())
-    env.validate()
-    for input in inputs:
-        input.destroyed_by_user = use_destroy_handler and len(env.destroyers(input)) != 0
-        if not input.destroyed_by_user and not disown_inputs:
-            # prevent optimizations from destroying the inputs
-            input.tag.indestructible = True
-    return env
-
-def std_opt(env):
-    pass
-
-
+# If a string is passed as the linker argument in the constructor for
+# Mode, it will be used as the key to retrieve the real linker in this
+# dictionary
 predefined_linkers = {
    'py'   : gof.PerformLinker(),
    'c'    : gof.CLinker(),
@@ -97,84 +58,790 @@ predefined_linkers = {
    'c&py' : gof.DualLinker(checker = check_equal)
    }

-class FunctionFactory:
-
-    def __init__(self, inputs, outputs, linker = 'py', optimizer = std_opt, borrow_outputs = False, disown_inputs = False,
-                 use_destroy_handler = True):
-        if len(inputs) != len(set(inputs)):
-            print >>sys.stderr, "Warning: duplicate inputs"
-        for r in list(inputs) + list(outputs):
-            if not isinstance(r, gof.Result):
-                raise TypeError("All inputs and outputs to FunctionFactory should be Result instances. Received:", type(r), r)
-        env = std_env(inputs, outputs, disown_inputs = disown_inputs,
-                      use_destroy_handler = use_destroy_handler)
-        if None is not optimizer:
-            optimizer(env)
-        env.validate()
+default_linker = 'c|py'
+
+def register_linker(name, linker):
+    """Add a `Linker` which can be referred to by `name` in `Mode`."""
+    if name in predefined_linkers:
+        raise ValueError('Linker name already taken: %s' % name)
+    predefined_linkers[name] = linker
+
+
+# If a string is passed as the optimizer argument in the constructor
+# for Mode, it will be used as the key to retrieve the real optimizer
+# in this dictionary
+predefined_optimizers = {
+    None    : lambda env: None,
+    'merge' : gof.MergeOptimizer(),
+    }
+default_optimizer = 'merge'
+
+def register_optimizer(name, opt):
+    """Add a `Optimizer` which can be referred to by `name` in `Mode`."""
+    if name in predefined_optimizers:
+        raise ValueError('Optimizer name already taken: %s' % name)
+    predefined_optimizers[name] = opt
+
+
+class Mode(object):
+    """
+    The Mode represents a way to optimize and then link a computation
+    graph.
+
+     * optimizer -> a structure of type Optimizer. An Optimizer may
+       simplify the math, put similar computations together, improve
+       numerical stability and various other improvements.
+     * linker -> a structure of type Linker. A Linker decides which
+       implementations to use (C or Python, for example) and how to
+       string them together to perform the computation.
+
+    See predefined_linkers, predefined_optimizers and also
+    predefined_modes.
+    """
+    
+    def __init__(self, linker = default_linker, optimizer = default_optimizer):
+        self.__setstate__((linker, optimizer))
+
+    def __getstate__(self):
+        return (self.provided_linker, self.provided_optimizer)
+
+    def __setstate__(self, (linker, optimizer)):
+        self.provided_linker = linker
+        self.provided_optimizer = optimizer
+        if isinstance(linker, str) or linker is None:
+            linker = predefined_linkers[linker]
+        self.linker = linker
+        if isinstance(optimizer, str) or optimizer is None:
+            optimizer = predefined_optimizers[optimizer]
+        self.optimizer = optimizer
+
+    def __str__(self):
+        return "Mode(linker = %s, optimizer = %s)" % (self.provided_linker, self.provided_optimizer)
+
+# If a string is passed as the mode argument in function or
+# FunctionMaker, the Mode will be taken from this dictionary using the
+# string as the key
+predefined_modes = {'FAST_COMPILE': Mode('py', 'merge')} 
+default_mode = 'FAST_COMPILE'
+
+def register_mode(name, mode):
+    """Add a `Mode` which can be referred to by `name` in `function`."""
+    if name in predefined_modes:
+        raise ValueError('Mode name already taken: %s' % name)
+    predefined_modes[name] = mode
+
+
+
+class SymbolicInput(object):
+    """
+    Represents a symbolic input for use with function or FunctionMaker.
+
+    result: a Result instance. 
+        This will be assigned a value before running the function,
+        not computed from its owner.
+
+    name: Any type. (If autoname=True, defaults to result.name). 
+        If name is a valid Python identifier, this input can be set by kwarg, and its value
+        can be accessed by self.<name>.
+
+    update: Result instance (default: None)
+        value (see previous) will be replaced with this expression result after each function call.
+        If update is None, the update will be the default value of the input.
+
+    mutable: Bool (default: False if update is None, True if update is not None)
+        True: permit the compiled function to modify the python object being passed as the input
+        False: do not permit the compiled function to modify the python object being passed as the input.
+
+    strict: Bool (default: False)
+        True: means that the value you pass for this input must have exactly the right type
+        False: the value you pass for this input may be casted automatically to the proper type
+
+    autoname: Bool (default: True)
+        See the name option.
+    """
+
+    def __init__(self, result, name=None, update=None, mutable=None, strict=False, autoname=True):
+        self.result = result
+        self.name = result.name if (autoname and name is None) else name
+        if self.name is not None and not isinstance(self.name, str):
+            raise TypeError("name must be a string! (got: %s)" % self.name)
+        self.update = update
+        self.mutable = mutable if (mutable is not None) else (update is not None)
+        self.strict = strict
+
+    def __str__(self):
+        if self.update:
+            return "In(%s -> %s)" % (self.result, self.update)
+        else:
+            return "In(%s)" % self.result
+
+    def __repr__(self):
+        return str(self)
+
+
+class SymbolicInputKit(object):
+    """
+    Represents a group ("kit") of SymbolicInputs. If fed into function or
+    FunctionMaker, only the inputs which are needed to compile the function
+    properly will be taken.
+
+    A SymbolicInputKit provides the distribute function in order to set or
+    initialize several inputs from a single value. Specialized Kits should
+    override it.
+    """
+
+    def __init__(self, name):
+        if not isinstance(name, str):
+            raise TypeError('naem must be a string (got: %s)' % name)
+        self.name = name
+        self.sinputs = []
+        self.results = []
+
+    def add_input(self, sinput):
+        """
+        Add a SymbolicInput to this SymbolicInputKit. It will be given the
+        next available index.
+        """
+        self.sinputs.append(sinput)
+        self.results.append(sinput.result)
+
+    def distribute(self, value, indices, containers):
+        """
+        Given a list of indices corresponding to SymbolicInputs in this kit
+        as well as a corresponding list of containers, initialize all the
+        containers using the provided value.
+        """
+        raise NotImplementedError
+
+    def complete(self, inputs):
+        """
+        Given inputs (a list of Result instances), checks through all
+        the SymbolicInputs in the kit and return a sorted list of
+        indices and a list of their corresponding SymbolicInputs such
+        that each of them represents some result in the inputs list.
+
+        Not all the provided inputs will have a corresponding
+        SymbolicInput in the kit.
+        """
+        ret = []
+        for input in inputs:
+            try:
+                i = self.results.index(input)
+                ret.append((i, self.sinputs[i]))
+            except ValueError:
+                pass
+        ret.sort()
+        return zip(*ret)
+
+
+class In(SymbolicInput):
+    """
+    Represents a symbolic input for use with function or FunctionMaker.
+
+    result: a Result instance. 
+        This will be assigned a value before running the function,
+        not computed from its owner.
+
+    name: Any type. (If autoname=True, defaults to result.name). 
+        If name is a valid Python identifier, this input can be set by kwarg, and its value
+        can be accessed by self.<name>.
+
+    value: Any type.
+        The initial/default value for this input. If update is None, this input acts just like
+        an argument with a default value in Python. If update is not None, changes to this
+        value will "stick around", whether due to an update or a user's explicit action.
+
+    update: Result instance (default: None)
+        value (see previous) will be replaced with this expression result after each function call.
+        If update is None, the update will be the default value of the input.
+
+    mutable: Bool (default: False if update is None, True if update is not None)
+        True: permit the compiled function to modify the python object being passed as the input
+        False: do not permit the compiled function to modify the python object being passed as the input.
+
+    strict: Bool (default: False)
+        True: means that the value you pass for this input must have exactly the right type
+        False: the value you pass for this input may be casted automatically to the proper type
+
+    autoname: Bool (default: True)
+        See the name option.
+    """
+    def __init__(self, result, name=None, value=None, update=None, mutable=None, strict=False, autoname=True):
+        super(In, self).__init__(result, name, update, mutable, strict, autoname)
+        self.value = value
+
+
+class SymbolicOutput(object):
+    """
+    Represents a symbolic output for use with function or FunctionMaker.
+
+    borrow: set this to True to indicate that a reference to
+            function's internal storage may be returned. A value
+            returned for this output might be clobbered by running
+            the function again, but the function might be faster.
+    """
+    
+    def __init__(self, result, borrow=False):
+        self.result = result
+        self.borrow = borrow
+
+Out = SymbolicOutput
+
+
+
+class Supervisor:
+    """
+    Listener for Env events which makes sure that no operation overwrites the
+    contents of protected Results. The outputs of the Env are protected by default.
+    """
+
+    def __init__(self, protected):
+        self.protected = list(protected)
+
+    def validate(self, env):
+        if not hasattr(env, 'destroyers'):
+            return True
+        for r in self.protected + list(env.outputs):
+            if env.destroyers(r):
+                raise gof.InconsistencyError("Trying to destroy a protected Result.")
+
+
+def std_env(input_specs, output_specs, accept_inplace = False):
+    """
+    Makes an Env corresponding to the input specs and the output
+    specs.  Any SymbolicInput in the input_specs, if its update field
+    is not None, will add an output to the Env corresponding to that
+    update. The return value is the Env as well as a list of
+    SymbolicOutput instances corresponding to the updates.
+
+    If accept_inplace is False, the graph will be checked for inplace
+    operations and an exception will be raised if it has any. If
+    accept_inplace is True, a DestroyHandler will be added to the Env
+    if there are any inplace operations.
+
+    The returned Env is a clone of the graph between the provided
+    inputs and outputs.
+    """
+    orig_inputs = [spec.result for spec in input_specs]
+    updates = [spec.update for spec in input_specs if spec.update]
+    orig_outputs = [spec.result for spec in output_specs] + updates
+
+    inputs, outputs = gof.graph.clone(orig_inputs, orig_outputs)
+    env = gof.env.Env(inputs, outputs)
+
+    for node in env.nodes:
+        if getattr(node.op, 'destroy_map', None):
+            if not accept_inplace:
+                raise TypeError("Graph must not contain inplace operations", node)
+            else:
+                env.extend(gof.DestroyHandler())
+                break
+
+    # We need to protect all immutable inputs from inplace operations.
+    env.extend(Supervisor(input for spec, input in zip(input_specs, inputs) if not spec.mutable))
+    return env, map(SymbolicOutput, updates)
+
+
+class FunctionMaker(object):
+
+    @staticmethod
+    def wrap_in(input):
+        if isinstance(input, (SymbolicInput, SymbolicInputKit)):
+            return input
+        elif isinstance(input, gof.Result):
+            # r -> SymbolicInput(result=r)
+            return SymbolicInput(input)
+        elif isinstance(input, (list, tuple)):
+            # (r, u) -> SymbolicInput(result=r, update=u)
+            if len(input) == 2:
+                return SymbolicInput(input[0], update = input[1])
+            else:
+                raise TypeError("Expected two elements in the list or tuple.", input)
+        else:
+            raise TypeError("Unknown input type:", type(input), input)
+
+    @staticmethod
+    def expand_in(sinput, rinputs):
+        # For SymbolicInputKits, this extracts a list of SymbolicInput instances
+        # and corresponding indices such that these SymbolicInputs are representative
+        # of some of the Result instances in inputs.
+        # For SymbolicInput, this returns None as the list of indices and a list with
+        # just the SymbolicInput.
+        if isinstance(sinput, SymbolicInputKit):
+            return sinput.complete(rinputs)
+        elif isinstance(sinput, SymbolicInput):
+            return [None, [sinput]]
+
+    @staticmethod
+    def wrap_out(output):
+        if isinstance(output, SymbolicOutput):
+            return output
+        elif isinstance(output, gof.Result):
+            return SymbolicOutput(output)
+        else:
+            raise TypeError("Unknown output type:", type(output), output)
+
+    def __init__(self, inputs, outputs, mode = 'FAST_RUN', accept_inplace = False):
+        """
+        Create a FunctionMaker for the specified inputs, outputs and mode.
+
+        @param inputs: a list of SymbolicInput instances
+        @param outputs: a list of SymbolicOutput instances
+                   outputs may also be a single Result (not a list), in which
+                   case the functions produced by FunctionMaker will return
+                   their output value directly
+        @param mode: a Mode instance telling FunctionMaker how to optimize and link
+        @param accept_inplace: True iff it is acceptable to have inplace operations
+                          in the graph from the inputs to the outputs
+        """
+
+        # Handle the case where inputs and/or outputs is a single Result (not in a list)
+        unpack_single = False
+        if not isinstance(outputs, (list, tuple)):
+            unpack_single = True
+            outputs = [outputs]
+        if not isinstance(inputs, (list, tuple)):
+            inputs = [inputs]
+
+        # Wrap them in In or Out instances if needed.
+        inputs, outputs =  map(self.wrap_in, inputs), map(self.wrap_out, outputs)
+        _inputs = gof.graph.inputs([o.result for o in outputs])
+        indices = [[input] + self.expand_in(input, _inputs) for input in inputs]
+        expanded_inputs = reduce(list.__add__, [list(z) for x, y, z in indices], [])
+
+        # make the env
+        env, additional_outputs = std_env(expanded_inputs, outputs, accept_inplace)
        self.env = env
-        linker = copy(predefined_linkers.get(linker, linker))
+
+        # Fetch the mode and then the optimizer and linker
+        mode = predefined_modes.get(mode, mode)
+        optimizer, linker = mode.optimizer, copy(mode.linker)
+
+        # optimize the env
+        optimizer(env)
+
+        # initialize the linker
        if not hasattr(linker, 'accept'):
            raise ValueError("'linker' parameter of FunctionFactory should be a Linker with an accept method " \
-                             "or one of ['py', 'c', 'c|py', 'c&py']")
-        if borrow_outputs:
+                             "or one of %s" % predefined_linkers.keys())
+
+        no_borrow = [output for output, spec in zip(env.outputs, outputs+additional_outputs) if not spec.borrow]
+        if not no_borrow:
            self.linker = linker.accept(env)
        else:
-            self.linker = linker.accept(env, no_recycling = infer_reuse_pattern(env, env.outputs))
-            
-            
-    def create(self, profiler = None, unpack_single = True, strict = 'if_destroyed'):
-        if strict not in [True, False, 'if_destroyed']:
-            raise ValueError("'strict' parameter of create should be one of [True, False, 'if_destroyed']")
-        if profiler is None:
-            fn = self.linker.make_function(unpack_single=unpack_single)
-        else:
-            fn  = self.linker.make_function(unpack_single=unpack_single,
-                                            profiler=profiler)
-        for env_input, fn_input in zip(self.env.inputs, fn.inputs):
-            if strict is True or (env_input.destroyed_by_user and strict == 'if_destroyed'):
-                fn_input.strict = True
+            self.linker = linker.accept(env, no_recycling = infer_reuse_pattern(env, no_borrow))
+        
+        self.indices = indices
+        self.inputs = inputs
+        self.expanded_inputs = expanded_inputs
+        self.outputs = outputs
+        self.unpack_single = unpack_single
+        self.mode = mode
+        self.accept_inplace = accept_inplace
+
+    def create(self, defaults = None, trustme = False):
+        """
+        Create a function.
+
+        defaults -> a list matching the inputs list and providing default values
+                    if the default for an input is None, then that input is a
+                    required input. For an input with an update, the default
+                    acts as initialization.
+        trustme -> disables some exceptions, used internally
+        """
+        if defaults is None:
+            defaults = [None]*len(self.inputs)
+        input_storage = [] # list of independent one-element lists, will be passed to the linker
+        _defaults = []
+
+        # The following loop is to fill in the input_storage and _defaults lists.
+        for (input, indices, subinputs), default in zip(self.indices, defaults):
+            __default = default
+
+            # If the default is a gof.Container, this means we want to share
+            # the same storage. This is done by appending default.storage
+            # to input_storage
+            if isinstance(default, gof.Container):
+                if indices is not None:
+                    raise TypeError("Cannot take a Container instance as default for a SymbolicInputKit.")
+                input_storage.append(default.storage)
+                default = None
+            # If the input is a SymbolicInputKit, it represents more than
+            # one storage unit. The indices and subinputs lists represent which
+            # of the kit's inputs are active in this graph, so we make as many
+            # storage units as needed
+            elif isinstance(input, SymbolicInputKit):
+                input_storage += [[None] for i in indices]
+            # Normal case: one new, independent storage unit
+            else:
+                input_storage.append([None])
+
+            # Filling _defaults. Each entry is a tuple of three elements:
+            # (required, refeed, value)
+            # - required means that the user must provide a value when calling the function
+            # - refeed means that we want to put the default back in the storage after each function call
+            # - value is the value that will be put in the storage initially
+
+            # Even though a SymbolicInputKit represents more than one input,
+            # we still only have one entry for the defaults list.
+            if isinstance(input, SymbolicInputKit):
+                if default is None:
+                    _defaults.append((True, True, None))
+                else:
+                    _defaults.append((False, False, default))
+            elif input.update is not None:
+                # If the input has an update, then (logically) it is not required since
+                # it is just a parameter and of course we don't want to refeed the default
+                # back into the storage as it would defeat the point of updating it. We
+                # always do this policy.
+                if default is None:
+                    if trustme or isinstance(__default, gof.Container):
+                        _defaults.append((False, False, default))
+                    else:
+                        # This might catch some bugs early
+                        raise ValueError("A default (initial) value is required for an input which can update itself.", input)
+                else:
+                    _defaults.append((False, False, default))
+            else:
+                if default is None:
+                    # No default, so this is a required input. Nothing to feed back, initial value is None.
+                    _defaults.append((True, False, None))
+                else:
+                    # Default value. It is not required, but we want to put it back into the storage
+                    # everytime so it behaves like most programming languages' default values
+                    _defaults.append((False, True, default))
+        defaults = _defaults
+
+        # Get a function instance
+        _fn, _i, _o = self.linker.make_thunk(input_storage = input_storage)
+        fn = Function(_fn, _i, _o, self.indices, self.outputs, defaults, self.unpack_single, self)
        return fn

-    def partial(self, *first, **kwargs):
-        fn = self.create(**kwargs)
-        return lambda *last: fn(*(first + last))
-
-
-def function(inputs,
-             outputs,
-             linker = 'py',
-             optimizer = std_opt,
-             borrow_outputs = False,
-             disown_inputs = False,
-             profiler = None,
-             unpack_single = True,
-             strict = 'if_destroyed',
-             use_destroy_handler = True):
-    ff = FunctionFactory(inputs,
-                         outputs,
-                         linker = linker,
-                         optimizer = optimizer,
-                         borrow_outputs = borrow_outputs,
-                         disown_inputs = disown_inputs,
-                         use_destroy_handler = use_destroy_handler)
-    return ff.create(profiler = profiler,
-                     unpack_single = unpack_single,
-                     strict = strict)
-
-
-def eval_outputs(outputs, **kwargs):
-    return function([], outputs, **kwargs)()
-
-
-_fcache = {} # it would be nice to use weakref.WeakKeyDictionary()
-
-def fast_compute(*outputs):
-    if outputs in _fcache:
-        f = _fcache[outputs]
-    else:
-        f = function([], outputs, linker = 'c')
-        _fcache[outputs] = f
-    return f()
+
+def _pickle_FunctionMaker(fm):
+    return (_constructor_FunctionMaker, (fm.inputs, fm.outputs, fm.mode, fm.accept_inplace))
+
+def _constructor_FunctionMaker(*args):
+    return FunctionMaker(*args)
+
+copy_reg.pickle(FunctionMaker, _pickle_FunctionMaker)
+
+
+def _pickle_slice(s):
+    return (slice, (s.start, s.stop, s.step))
+
+copy_reg.pickle(slice, _pickle_slice)
+
+
+
+
+DUPLICATE = ['DUPLICATE'] # unique id object used as a placeholder for duplicate entries
+class Function(object):
+    """
+    Type of the functions returned by theano.function or theano.FunctionMaker.create.
+    """
+
+    def __init__(self, fn, input_storage, output_storage, indices, outputs, defaults, unpack_single, maker):
+        """
+        fn -> a function returned by some linker's make_thunk method
+        input_storage -> list of Container instances used by fn to fetch the inputs
+        output_storage -> list of Container instances used by fn to store the outputs in
+        indices -> list of (SymbolicInput|SymbolicInputKit, indices, [SymbolicInput,...]), one tuple for each input
+        defaults -> list of (required (bool), refeed (bool), value), one tuple for each input
+            required -> whether this input is required or optional
+            refeed -> whether this input's contents must be reverted to value after each call or not
+            value -> the initial or default value of the input
+        unpack_single -> if the function has one output and unpack_single is True, return that output. Else,
+            return [output].
+        maker -> FunctionMaker instance used to make this Function (used for copy)
+        """
+
+        self.fn = fn
+        self.input_storage = input_storage
+        self.output_storage = output_storage
+        self.indices = indices
+
+        containers = list(self.input_storage)
+        finder = {}
+        inv_finder = {}
+
+        def distribute(indices, cs, value):
+            input.distribute(value, indices, cs)
+            for c in cs:
+                c.provided += 1
+        def set(c, v):
+            c.data = v
+
+        setters = []
+        # Initialize the storage
+        for i, ((input, indices, sinputs), (required, refeed, value)) in enumerate(zip(self.indices, defaults)):
+            if indices is None: # this is true iff input is not a SymbolicInputKit
+                c = containers[0]
+                if input.strict:
+                    c.strict = True
+                if value is not None:
+                    # always initialize the storage
+                    c.data = value
+                c.required = required
+                c.provided = 0 # this is a count of how many times the input has been provided (reinitialized to 0 on __call__)
+                # We set an entry in finder for:
+                # - the index of the input
+                # - the result instance the input is based on
+                # - the name of the input
+                # All entries map to the container or to DUPLICATE if an ambiguity is detected
+                finder[i] = c
+                finder[input.result] = c
+                finder[input.name] = c if input.name not in finder else DUPLICATE
+                # inv_finder maps the container to the input (useful for one error message)
+                inv_finder[c] = input
+                setters.append(partial(set, c))
+                containers[:1] = []
+            else:
+                # The input is a SymbolicInputKit, so we take as many containers as the Kit provides inputs
+                cs = containers[:len(indices)]
+                # distribute does the initialization of the containers
+                input.distribute(value, indices, cs)
+                f = partial(distribute, indices, cs)
+                # Like before, we set a finder entry for the kit. Note that
+                # we are not mapping to a container but to a function which
+                # can reinitialize all the containers
+                finder[i] = f
+                finder[input] = f
+                finder[input.name] = f if input.name not in finder else DUPLICATE
+                setters.append(f)
+                # For each input in the kit and its corresponding container, we put an entry in finder.
+                # This allows the user to micro-manage elements of the kit if need be.
+                # All containers inherit the required field and have their own "provided" counter
+                for c, sin in zip(cs, sinputs):
+                    finder[sin.result] = c
+                    finder[sin.name] = c
+                    finder[sin.name] = c if sin.name not in finder else DUPLICATE
+                    inv_finder[c] = input
+                    c.required = required
+                    c.provided = 0
+                containers[:len(indices)] = []
+
+        self.finder = finder
+        self.inv_finder = inv_finder
+        self.outputs = outputs
+        self.defaults = defaults
+        self.unpack_single = unpack_single
+        self.maker = maker
+
+        # this class is important in overriding the square-bracket notation:
+        #     fn.value[x]
+        # self reference is available via the closure on the class
+        class ValueAttribute(object):
+            def __getitem__(self, item):
+                try:
+                    s = finder[item]
+                except KeyError:
+                    raise TypeError("Unknown input or state: %s" % item)
+                if s is DUPLICATE:
+                    raise TypeError("Ambiguous name: %s - please check the names of the inputs of your function for duplicates." % item)
+                if isinstance(s, gof.Container):
+                    return s.value
+                else:
+                    raise NotImplementedError
+            def __setitem__(self, item, value):
+                try:
+                    s = finder[item]
+                except KeyError:
+                    raise TypeError("Unknown input or state: %s" % item)
+                if s is DUPLICATE:
+                    raise TypeError("Ambiguous name: %s - please check the names of the inputs of your function for duplicates." % item)
+                if isinstance(s, gof.Container):
+                    s.value = value
+                    s.provided += 1
+                else:
+                    s(value)
+
+        # this class is important in overriding the square-bracket notation:
+        #     fn.container[x]
+        # self reference is available via the closure on the class
+        class ContainerAttribute(object):
+            def __getitem__(self, item):
+                return finder[item]
+            # You cannot set the container
+
+        self._value = ValueAttribute()
+        self._container = ContainerAttribute()
+
+    def __getitem__(self, item):
+        return self.value[item]
+
+    def __setitem__(self, item, value):
+        self.value[item] = value
+        
+    
+    def __copy__(self):
+        defaults = [default for _1, _2, default in self.defaults]
+        cpy = self.maker.create(defaults, trustme = True)
+        for (input,_1,_2), here, there in zip(self.indices, self.input_storage, cpy.input_storage):
+            if input.mutable and here is not None:
+                there.data = copy(here.data)
+            else:
+                there.data = here.data
+        return cpy
+
+    def __call__(self, *args, **kwargs):
+        # Reinitialize each container's 'provided' counter
+        for c in self.input_storage:
+            c.provided = 0
+        # Set positional arguments
+        for i, arg in enumerate(args):
+            self[i] = arg
+        # Set keyword arguments
+        for k, arg in kwargs.iteritems():
+            self[k] = arg
+        # Check if inputs are missing or if inputs were set more than once
+        for c in self.input_storage:
+            if c.required and not c.provided:
+                raise TypeError("Missing required input: %s" % self.inv_finder[c].result)
+            if c.provided > 1:
+                raise TypeError("Multiple values for input: %s" % self.inv_finder[c].result)
+        # Do the actual work
+        self.fn()
+        outputs = [x.data for x in self.output_storage]
+        # Update the inputs that have an update function
+        for input, storage in reversed(zip(self.maker.expanded_inputs, self.input_storage)):
+            if input.update:
+                storage.data = outputs.pop()
+        # Put default values back in the storage
+        for i, (required, refeed, value) in enumerate(self.defaults):
+            if refeed:
+                self[i] = value
+        if self.unpack_single and len(outputs) == 1:
+            return outputs[0]
+        else:
+            return outputs
+
+    value = property(
+        lambda self: self._value,
+        None, #not settable
+        doc="""TODOC""")
+    container = property(
+        lambda self: self._container,
+        None,
+        doc="""TODOC""")
+
+
+def _pickle_Function(f):
+    ins = list(f.input_storage)
+    defaults = []
+    for (input, indices, inputs), (required, refeed, default) in zip(f.indices, f.defaults):
+        if isinstance(input, SymbolicInputKit):
+            defaults.append(default)
+            ins[:len(indices)] = []
+        else:
+            defaults.append(ins[0])
+            del ins[0]
+    return (_constructor_Function, (f.maker, defaults, [x.data for x in f.input_storage]))
+
+def _constructor_Function(maker, defaults, data):
+    f = maker.create(defaults, trustme = True)
+    for container, x in zip(f.input_storage, data):
+        container.data = x
+    return f
+
+copy_reg.pickle(Function, _pickle_Function)
+
+
+def function(inputs, outputs, mode='FAST_RUN', accept_inplace = False):
+    """
+    Return a function calculating the outputs from the inputs.
+
+    inputs -> list of SymbolicInput or In instances
+    outputs -> a SymbolicOutput or a list of SymbolicOutput or Out instances
+      The return value of the returned function will match the format of this
+      argument (either the value itself or a list of one or more return values)
+    mode -> a descriptive string or a Mode instance; descriptive strings can be one of:
+      * SANITY_CHECK
+      * FAST_COMPILE
+      * FAST_RUN (default)
+      * EXPENSIVE_OPTIMIZATION
+    accept_inplace -> True iff the graph can contain inplace operations
+      prior to the optimization phase (default is False)
+
+    Every element of the input list will be upgraded to an In instance if necessary,
+    using the following rules:
+
+    * a Result instance r will be upgraded like In(r)
+    * a tuple (name, r) will be In(r, name=name)
+    * a tuple (r, val) will be In(r, value=value, autoname=True)
+    * a tuple ((r,up), val) will be In(r, value=value, update=up, autoname=True)
+    * a tuple (name, r, val) will be In(r, name=name, value=value)
+    * a tuple (name, (r,up), val) will be In(r, name=name, value=val, update=up, autoname=True)
+
+    Similarly, every element of the output list will be upgraded to an
+    Out instance if necessary:
+
+    * a Result instance r will be upgraded like Out(r)
+    """
+
+    def wrap_in(input):
+        if isinstance(input, (SymbolicInput, SymbolicInputKit)):
+            return input
+        elif isinstance(input, gof.Result):
+            return In(input)
+        elif isinstance(input, (list, tuple)):
+            orig = input
+            if not input:
+                raise TypeError("Nonsensical input specification: %s" % input)
+            if isinstance(input[0], str):
+                name = input[0]
+                input = input[1:]
+            else:
+                name = None
+            if isinstance(input[0], (list, tuple)):
+                if len(input[0]) != 2 or len(input) != 2:
+                    raise TypeError("Invalid input syntax: %s (check documentation or use an In instance)" % orig)
+                (result, update), value = input
+            elif isinstance(input[0], gof.Result):
+                if len(input) == 1:
+                    result, update, value = input[0], None, None
+                elif len(input) == 2:
+                    (result, value), update = input, None
+                else:
+                    raise TypeError("Invalid input syntax: %s (check documentation or use an In instance)" % orig)
+            elif isinstance(input[0], (SymbolicInput, SymbolicInputKit)):
+                if len(input) == 1:
+                    return input[0]
+                elif len(input) == 2:
+                    input, value = input
+                    if name is not None: input.name = name
+                    input.value = value
+                    return input
+            else:
+                raise TypeError("The input specification is not valid: %s" % input)
+
+            if not isinstance(result, gof.Result):
+                raise TypeError("Unknown input type: %s, expected Result instance" % type(result), result)
+            if update is not None and not isinstance(update, gof.Result):
+                raise TypeError("Unknown update type: %s, expected Result instance" % type(update), update)
+            if value is not None and isinstance(value, (gof.Result, SymbolicInput)):
+                raise TypeError("The value for input %s should not be a Result or SymbolicInput instance (got: %s)" % (result, value))
+
+            return In(result, name=name, value=value, update=update)
+        else:
+            raise TypeError("Unknown input type: %s, expected Result instance" % type(input), input)
+
+    def wrap_out(output):
+        if isinstance(output, SymbolicOutput):
+            return output
+        elif isinstance(output, gof.Result):
+            return SymbolicOutput(output)
+        else:
+            raise TypeError("Unknown output type: %s (%s)" % (type(output), output))
+
+    inputs = map(wrap_in, inputs)
+    outputs = map(wrap_out, outputs) if isinstance(outputs, (list, tuple)) else wrap_out(outputs)
+
+    fn = FunctionMaker(inputs, outputs, mode, accept_inplace = accept_inplace).create([getattr(input, 'value', None) for input in inputs])
+
+    return fn
+
+



@@ -210,10 +877,6 @@ class OpFromGraph(gof.Op):
    """
    
    def __init__(self, inputs, outputs, grad_depth = 1, **kwargs):
-        if kwargs.get('borrow_outputs') or kwargs.get('unpack_single'):
-            raise ValueError('The borrow_outputs and unpack_single options cannot be True')
-        kwargs['unpack_single'] = False
-        kwargs['borrow_outputs'] = False
        self.fn = function(inputs, outputs, **kwargs)
        self.inputs = inputs
        self.outputs = outputs
@@ -252,263 +915,3 @@ class OpFromGraph(gof.Op):
        else:
            raise NotImplementedError

-
-
-
-# class State:
-#     def __init__(self, init, next = None):
-#         self.init = init
-#         self.next = next
-
-
-# class StateFunctionFactory(Function):
-
-#     def __init__(self, inputs, outputs, states, **kwargs):
-#         states_
-        
-#         inputs = [state.init for state in states] + inputs
-#         outputs = [state.next for ]
-
-    
-
-
-# class Function:
-#     """
-#     An 'executable' compiled from a graph
-
-#     This class is meant to be used as a function: the idea is to use
-#     __call__(*args) and it will compute your graph's function on the args and
-#     return the value(s) corresponding to the output(s).
-    
-#     @ivar fn: the return value of L{linker.make_function}(False)
-
-#     Additional Attributes if keep_locals == True
-#     inputs - inputs in the env
-#     outputs - outputs in the env
-#     features - features to add to the env
-#     linker_cls - the linker class
-#     linker - the linker allocated from env
-#     env - The env passed to the linker
-
-#     @note: B{Re: Memory ownership, aliasing, re-use:}
-#     That the objects returned by L{Function.__call__}(self, *args) are owned
-#     by self, and that in general these outputs might be overwritten (in-place)
-#     by subsequent calls to L{self.__call__}(*args).  Why?  This behaviour is
-#     necessary for inplace operations to work, and L{Function}'s linker might re-use
-#     memory from one execution to the next in order to make each execution faster.
-
-#     """
-#     def __init__(self, inputs, outputs,
-#             features = [],
-#             optimizer = default_optimizer,
-#             linker_cls = gof.link.PerformLinker,
-#             profiler = None,
-#             unpack_single = True,
-#             except_unreachable_input = True,
-#             keep_locals = True):
-#         """
-#         Copy the graph, optimize, and link it.
-
-#         @param inputs: a list of results to be this function's inputs
-#         @param outputs: a list of results to be this function's outputs
-#         @param features: features to add to the env
-#         @param optimizer: an optimizer to apply to the copied graph, before linking
-#         @param linker_cls: a callable that takes an env and returns a Linker
-#         @param profiler: a L{Profiler} for the produced function (only valid if the
-#                    linker_cls's make_function takes a profiler argument)
-#         @param unpack_single: unpack return value lists of length 1. @see: L{Linker.make_function}
-#         @param keep_locals: add the local variables from __init__ to the class
-#         """
-
-#         _mark_indestructible(outputs)
-
-#         if len(inputs) != len(set(inputs)):
-#             raise Exception('duplicate inputs')
-#         if len(outputs) != len(set(outputs)):
-#             raise Exception('duplicate outputs')
-
-#         #evaluate the orphans, and put these values into the clone of the env
-
-#         orphans = list(gof.graph.results_and_orphans(inputs, outputs,
-#             except_unreachable_input=except_unreachable_input)[1])
-#         orphan_data = eval_outputs(orphans, unpack_single=False)
-
-#         #print 'orphans', orphans
-
-#         #print 'ops', gof.graph.ops(inputs, outputs)
-#         env = gof.env.Env(inputs, outputs)
-
-#         #print 'orphans in env', env.orphans()
-
-#         env, equiv = env.clone_get_equiv(clone_inputs=True)
-#         for feature in features:
-#             env.extend(feature(env))
-#         env.extend(gof.DestroyHandler(env))
-
-#         #print 'orphans after clone', env.orphans()
-
-#         for d, o in zip(orphan_data, [equiv[orphan] for orphan in orphans]):
-#             #print 'assigning orphan value', d
-#             #o.data = d
-#             new_o = gof.Constant(o.type, d)
-#             env.replace(o, new_o)
-#             assert new_o in env.orphans
-
-#         # optimize and link the cloned env
-#         if None is not optimizer:
-#             optimizer(env)
-
-#         linker = linker_cls(env)
-
-#         if keep_locals:# useful flag for debugging!
-#             self.__dict__.update(locals())
-
-#         if profiler is None:
-#             self.fn  = linker.make_function(unpack_single=unpack_single)
-#         else:
-#             self.fn  = linker.make_function(unpack_single=unpack_single,
-#                                             profiler=profiler)
-#         self.inputs = env.inputs
-#         self.outputs = env.outputs
-#         self.features = features
-#         self.optimizer = optimizer
-#         self.linker_cls = linker_cls
-#         self.profiler = profiler
-#         self.unpack_single = unpack_single
-#         self.except_unreachable_input = except_unreachable_input
-#         self.keep_locals = keep_locals
-
-#     def __call__(self, *args):
-#         return self.fn(*args)
-
-
-# def eval_outputs(outputs,
-#         features = [],
-#         optimizer = None,
-#         linker_cls = gof.link.PerformLinker,
-#         unpack_single = True,
-#         keep_locals = True):
-
-#     if len(outputs) == 0:
-#         #print 'returning with no inputs'
-#         if unpack_single:
-#             return None
-#         else:
-#             return []
-
-#     inputs = gof.graph.inputs(outputs)
-#     if any(not isinstance(input, gof.Constant) for input in inputs):
-#         raise TypeError("Cannot evaluate outputs because some of the leaves are not Constant.", outputs)
-#     in_data = [i.data for i in inputs]
-#     #print 'in_data = ', in_data
-#     if len(inputs) != len(in_data):
-#         raise Exception('some input data is unknown')
-
-#     env = gof.env.Env(inputs, outputs)
-#     env.replace_all(dict([(i, i.type()) for i in inputs]))
-#     env = env.clone(clone_inputs=True)
-
-#     _mark_indestructible(env.outputs)
-#     if None is not optimizer:
-#         optimizer(env)
-#     linker = linker_cls(env)
-#     fn = linker.make_function(unpack_single=unpack_single)
-#     rval = fn(*in_data)
-#     return rval
-
-
-# StateFunction([x, y], [e], (w, w + lr * bla()))
-
-
-
-
-# class _Function:
-
-#     def __init__(self,
-#                  inputs,
-#                  outputs,
-#                  optimizer,
-#                  linker_type = 'py',
-#                  unpack_single = True,
-#                  except_unreachable_input = True,
-#                  disposable_inputs = [],
-#                  borrow_outputs = []):
-
-
-
-
-#         _mark_indestructible(outputs)
-
-#         if len(inputs) != len(set(inputs)):
-#             raise Exception('duplicate inputs')
-#         if len(outputs) != len(set(outputs)):
-#             raise Exception('duplicate outputs')
-
-#         orphans = list(gof.graph.results_and_orphans(inputs, outputs,
-#             except_unreachable_input=except_unreachable_input)[1])
-#         orphan_data = eval_outputs(orphans, unpack_single=False)
-
-#         env = gof.env.Env(inputs, outputs, features + [gof.EquivTool], consistency_check = True)
-
-#         env = env.clone(clone_inputs=True)
-
-#         for d, o in zip(orphan_data, [env.equiv(orphan) for orphan in orphans]):
-#             o.data = d
-
-#         # optimize and link the cloned env
-#         if None is not optimizer:
-#             optimizer(env)
-
-#         linker = linker_cls(env)
-
-#         if keep_locals:# useful flag for debugging!
-#             self.__dict__.update(locals())
-
-#         if profiler is None:
-#             self.fn  = linker.make_function(inplace=True,
-#                                             unpack_single=unpack_single)
-#         else:
-#             self.fn  = linker.make_function(inplace=True,
-#                                             unpack_single=unpack_single,
-#                                             profiler=profiler)
-#         self.inputs = env.inputs
-#         self.outputs = env.outputs
-#         self.features = features
-#         self.optimizer = optimizer
-#         self.linker_cls = linker_cls
-#         self.profiler = profiler
-#         self.unpack_single = unpack_single
-#         self.except_unreachable_input = except_unreachable_input
-#         self.keep_locals = keep_locals
-
-#     def __call__(self, *args):
-#         return self.fn(*args)
-
-#     def __copy__(self):
-#         return Function(self.inputs, self.outputs,
-#                         features = self.features,
-#                         optimizer = self.optimizer,
-#                         linker_cls = self.linker_cls,
-#                         profiler = self.profiler,
-#                         unpack_single = self.unpack_single,
-#                         except_unreachable_input = self.except_unreachable_input,
-#                         keep_locals = self.keep_locals)
-
-
-
-
-
-
-
-
-
-
-
-
-# class StateFunction:
-
-#     def __init__(self, inputs, outputs, *states):
-#         in_states, out_states = zip(*states)
-#         env = 
-
-    
--- a/doc/DevStartGuide.txt
+++ b/doc/DevStartGuide.txt
+=====================
+Developer Start Guide
+=====================
+
+
+- Learn about the basics of using mercurial.
+
+- Learn some `non-basic python`_ to understand what's going on in some of the
+  tricker files (like tensor.py).
+
+- BasicNumpy_ essential things to know about numpy.
+
+- Learn to write reStructuredText_ for epydoc_.
+
+- ExternalTools - packages that play well with Numpy
+
+- EssentialUnitTest - essential usage of python.unittest
+
+
+Accounts
+========
+
+To obtain developer access: send an email to an admin with an username and
+temporary password. Pending approval, this will give you access to both the
+repository and Trac. You should then change your password in the
+`<http://pylearn.org/theano/prefs preferences>` tab - do *NOT* use a good 
+password! We are using plain text http which is not secure.
+
+
+Theano code
+===========
+
+The code that makes up Theano is in a single repository available in
+`<http://pylearn.org/hg/theano>`__.
+
+As a developer, you should clone this repository like this:
+
+- `hg clone 'http://username:password@pylearn.org/hg/theano' theano`
+
+
+Setting up your environment
+===========================
+
+Some notes on the environment variable $PYTHONPATH.
+
+If theano lives in $DEV/theano, you should have $DEV in your $PYTHONPATH. You should '''not''' have $DEV/theano in your $PYTHONPATH.
+
+Olivier Breuleux explains:
+
+$PYTHONPATH should contain a ":"-separated list of paths, each of which contains one or several Python packages, in the order in which you would like Python to search for them. If a package has sub-packages of interest to you, do _not_ add them in the path: it is not portable, might shadow other packages or short-circuit important things in its __init__.
+
+I advise to never import theano's files from outside theano itself (and I think that is good advice for Python packages in general). Use "from theano import tensor" instead of "import tensor". ... $PYTHONPATH ... should only contain paths to complete packages, so you don't get surprises if I add files that enter in conflict with other packages.
+
+When you install a package, only the package name can be imported directly. If you want a sub-package, you must import it from the main package. That's how it will work in 99.9% of installs because it is the default. Therefore, if you stray from this practice, your code will not be portable. Also, some ways to circumvent circular dependencies might make it so you have to import files in a certain order, which is best handled by the package's own __init__.py. 
+
+
+.. _non-basic python: http://lgcm.iro.umontreal.ca/theano/wiki/NonbasicPython
+.. _reStructuredText: http://docutils.sourceforge.net/rst.html
+.. _epydoc: http://epydoc.sourceforge.net/
+.. _basicnumpy: http://lgcm.iro.umontreal.ca/theano/wiki/BasicNumpy
+
+
+.. header:: |THEANO| - README_ - Download_ - Documentation_ - Wiki_ - `Task List`_
+
+.. |THEANO| image:: http://lgcm.iro.umontreal.ca/theano/chrome/site/theano_logo.png
+   :target: http://pylearn.org/auto_theano
+   :alt: THEANO
+   :align: top
+   :class: borderless
+   :width: 60
+   :height: 18
+
+.. _README: ../README.html
+.. _Download: ../README.html#downloading-theano
+.. _Documentation: index.html
+.. _Wiki: http://pylearn.org/theano
+.. _TRAC: http://trac.edgewall.org/
+.. _task list: http://lgcm.iro.umontreal.ca/theano/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
+
+
--- a/doc/__init__.py
+++ b/doc/__init__.py
--- a/doc/apirst2html.py
+++ b/doc/apirst2html.py
@@ -29,12 +29,21 @@ except:
 # real ``epydoc`` package.  So remove ``sys.path[0]``, which contains the
 # directory of the script.
 import sys, os.path
-script_path = os.path.abspath(sys.path[0])
-sys.path = [p for p in sys.path if os.path.abspath(p) != script_path]
+
+# I leave this in place actually, so that I can import pygments_code_block_directive
+
+#script_path = os.path.abspath(sys.path[0])
+#sys.path = [p for p in sys.path if os.path.abspath(p) != script_path]

 import epydoc.docwriter.xlink as xlink

 from docutils.core import publish_cmdline, default_description
+try:
+    # .. code-block:: python should look nice with this
+    import pygments_code_block_directive
+except Exception, e:
+    print >> sys.stderr, "Failed to import pygments", e
+
 description = ('Generates (X)HTML documents with API documentation links.  '
                + default_description)
 publish_cmdline(reader=xlink.ApiLinkReader(), writer_name='html',

--- a/doc/build_html.sh
+++ b/doc/build_html.sh
-#!/bin/bash
-
-APIRST2HTML=apirst2html.py
-
-EPYDOC_ARGS='--external-api=api --external-api-file=api:../html/api/api-objects.txt --external-api-root=api:epydoc/'
-
-mkdir html 2> /dev/null
-
-for RST in graph ;  do
-    $APIRST2HTML $EPYDOC_ARGS $RST.txt html/$RST.html
-done
--- a/doc/colorful.css
+++ b/doc/colorful.css
+td.linenos { background-color: #f0f0f0; padding-right: 10px; }
+span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
+pre { line-height: 125%; }
+body  { background: #ffffff; }
+body .c { color: #808080 } /* Comment */
+body .err { color: #F00000; background-color: #F0A0A0 } /* Error */
+body .k { color: #008000; font-weight: bold } /* Keyword */
+body .o { color: #303030 } /* Operator */
+body .cm { color: #808080 } /* Comment.Multiline */
+body .cp { color: #507090 } /* Comment.Preproc */
+body .c1 { color: #808080 } /* Comment.Single */
+body .cs { color: #cc0000; font-weight: bold } /* Comment.Special */
+body .gd { color: #A00000 } /* Generic.Deleted */
+body .ge { font-style: italic } /* Generic.Emph */
+body .gr { color: #FF0000 } /* Generic.Error */
+body .gh { color: #000080; font-weight: bold } /* Generic.Heading */
+body .gi { color: #00A000 } /* Generic.Inserted */
+body .go { color: #808080 } /* Generic.Output */
+body .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */
+body .gs { font-weight: bold } /* Generic.Strong */
+body .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
+body .gt { color: #0040D0 } /* Generic.Traceback */
+body .kc { color: #008000; font-weight: bold } /* Keyword.Constant */
+body .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */
+body .kp { color: #003080; font-weight: bold } /* Keyword.Pseudo */
+body .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */
+body .kt { color: #303090; font-weight: bold } /* Keyword.Type */
+body .m { color: #6000E0; font-weight: bold } /* Literal.Number */
+body .s { background-color: #fff0f0 } /* Literal.String */
+body .na { color: #0000C0 } /* Name.Attribute */
+body .nb { color: #007020 } /* Name.Builtin */
+body .nc { color: #B00060; font-weight: bold } /* Name.Class */
+body .no { color: #003060; font-weight: bold } /* Name.Constant */
+body .nd { color: #505050; font-weight: bold } /* Name.Decorator */
+body .ni { color: #800000; font-weight: bold } /* Name.Entity */
+body .ne { color: #F00000; font-weight: bold } /* Name.Exception */
+body .nf { color: #0060B0; font-weight: bold } /* Name.Function */
+body .nl { color: #907000; font-weight: bold } /* Name.Label */
+body .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */
+body .nt { color: #007000 } /* Name.Tag */
+body .nv { color: #906030 } /* Name.Variable */
+body .ow { color: #000000; font-weight: bold } /* Operator.Word */
+body .w { color: #bbbbbb } /* Text.Whitespace */
+body .mf { color: #6000E0; font-weight: bold } /* Literal.Number.Float */
+body .mh { color: #005080; font-weight: bold } /* Literal.Number.Hex */
+body .mi { color: #0000D0; font-weight: bold } /* Literal.Number.Integer */
+body .mo { color: #4000E0; font-weight: bold } /* Literal.Number.Oct */
+body .sb { background-color: #fff0f0 } /* Literal.String.Backtick */
+body .sc { color: #0040D0 } /* Literal.String.Char */
+body .sd { color: #D04020 } /* Literal.String.Doc */
+body .s2 { background-color: #fff0f0 } /* Literal.String.Double */
+body .se { color: #606060; font-weight: bold; background-color: #fff0f0 } /* Literal.String.Escape */
+body .sh { background-color: #fff0f0 } /* Literal.String.Heredoc */
+body .si { background-color: #e0e0e0 } /* Literal.String.Interpol */
+body .sx { color: #D02000; background-color: #fff0f0 } /* Literal.String.Other */
+body .sr { color: #000000; background-color: #fff0ff } /* Literal.String.Regex */
+body .s1 { background-color: #fff0f0 } /* Literal.String.Single */
+body .ss { color: #A06000 } /* Literal.String.Symbol */
+body .bp { color: #007020 } /* Name.Builtin.Pseudo */
+body .vc { color: #306090 } /* Name.Variable.Class */
+body .vg { color: #d07000; font-weight: bold } /* Name.Variable.Global */
+body .vi { color: #3030B0 } /* Name.Variable.Instance */
+body .il { color: #0000D0; font-weight: bold } /* Literal.Number.Integer.Long */
--- a/doc/graph.txt
+++ b/doc/graph.txt
@@ -12,18 +12,12 @@ Subtitle

 Here is some stuff.

- .. code-block::
-  def fib(n):
-     if n == 0:
-       return 1
-     if n == 1:
-       return 1
-     return fib(n-1) + fib(n-1)
-
- .. python::
-  def fib(n):
-     if n == 0:
-       return 1
-     if n == 1:
-       return 1
-     return fib(n-1) + fib(n-1)
+.. code-block:: python
+
+    def fib(n):
+        if n == 0:
+            return 1
+        if n == 1:
+            return 1
+        return fib(n-1) + fib(n-1)
+
--- a/doc/header.txt
+++ b/doc/header.txt
+.. header:: |THEANO| - README_ - Download_ - Documentation_ - Wiki_ - `Task List`_
+
+.. |THEANO| image:: http://lgcm.iro.umontreal.ca/theano/chrome/site/theano_logo.png
+   :target: http://pylearn.org/auto_theano
+   :alt: THEANO
+   :align: top
+   :class: borderless
+   :width: 60
+   :height: 18
+
+.. _README: ../README.html
+.. _Download: ../README.html#downloading-theano
+.. _Documentation: index.html
+.. _Wiki: http://pylearn.org/theano
+.. _task list: http://lgcm.iro.umontreal.ca/theano/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
+
+
--- a/doc/index.txt
+++ b/doc/index.txt
+=====================================
+Theano Project Documentation Overview
+=====================================
+
+Documentation is divided broadly into two kinds: user documentation and
+developer documentation.  
+`Using Theano` covers how to *use* what is already in the Theano library to
+build graphs and evaluate them.
+`Hacking Theano` introduces you to what's under the hood.  If you want to extend Theano
+to handle new data and expression types, this documentation is for you.
+
+Using Theano
+============
+
+- First of all, read the `n00b guide`_.  It is a cut-and-paste, tutorial-style intro to what Theano can do.
+
+- Familiarize yourself with the `glossary of terminology`_.
+
+- Join `theano-users`_.
+
+- Learn to use the typelist_, and the oplist_.  These are the building blocks
+  of theano expression graphs.
+
+- Browse through some of the `Howto`_ recipes on the wiki.
+
+.. _Howto: 
+.. _theano-users: http://groups.google.com/group/theano-users?pli=1
+.. _theano-dev: http://groups.google.com/group/theano-dev?pli=1
+.. _n00b guide: n00b.html
+.. _glossary of terminology: glossary.html
+.. _typelist: typelist.html
+.. _oplist: oplist.html
+
+Hacking Theano
+==============
+
+- `Get Started as a Developer <DevStartGuide.html>`__ by setting up mercurial, getting a few accounts,
+  setting up your environment, and getting some background in mercurial, python,
+  and numpy.
+
+- Join `theano-dev`_ to participate in development discussion.
+
+- Pick a task from the `task list`_, or suggest one on `theano-users`_.
+  Features/ideas are generally discussed on `theano-users`_.   Technical
+  discussions of how to actually implement something should be on
+  `theano-dev`_.
+
+- Read about `How Theano Works <UserAdvanced.html>`__.
+
+- Browse `Theano's API <../api/>`__.
+
+- Keep an eye on the `Mercurial Changelog <http://pylearn.org/hg/theano>`__.
+
+- Send us your work as a patch to `theano-dev`_ or commit directly to the trunk.
+
+.. _theano-dev: http://groups.google.com/group/theano-dev?pli=1
+.. _task list: http://lgcm.iro.umontreal.ca/theano/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
+.. _reStructuredText: http://docutils.sourceforge.net/rst.html
+
+
+.. header:: |THEANO| - README_ - Download_ - Documentation_ - Wiki_ - `Task List`_
+
+.. |THEANO| image:: http://lgcm.iro.umontreal.ca/theano/chrome/site/theano_logo.png
+   :target: http://pylearn.org/auto_theano
+   :alt: THEANO
+   :align: top
+   :class: borderless
+   :width: 60
+   :height: 18
+
+.. _README: ../README.html
+.. _Download: ../README.html#downloading-theano
+.. _Documentation: index.html
+.. _Wiki: http://pylearn.org/theano
+.. _TRAC: http://trac.edgewall.org/
+.. _task list: http://lgcm.iro.umontreal.ca/theano/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
+
+
+
--- a/doc/n00b.txt
+++ b/doc/n00b.txt
+=============
+n00b Tutorial
+=============
+
+.. contents::
+
+*This documentation is still in-progress. 20080919*
+
+Introduction
+============
+
+Great. You know `What theano is`_, and you've even `installed it`_.
+But how do you use it?
+
+.. _`What theano is`: http://lgcm.iro.umontreal.ca/theano/wiki/WhatIsTheano
+.. _`installed it`: http://lgcm.iro.umontreal.ca/theano/wiki/InstallationNotes
+
+If you have never used Theano before, we recommend you read over this tutorial start-to-finish. This will give you a sense of what you can do with Theano, and how.
+Afterwards, we encourage you to read the documentation in accompanying links, which will allow you to understand the underlying concepts behind Theano better.
+
+Scalar example
+==============
+
+In the following example, we will build a function `f(x) = x + 1.5`. We will then evaluate that function
+
+.. code-block:: python
+
+    import theano
+    import theano.tensor as tensor
+
+    # Declare a symbolic constant
+    c = tensor.constant(1.5)
+
+    # Declare a symbolic floating-point scalar
+    x = tensor.fscalar()
+
+    # The symbolic result y is computed by adding x to c
+    y = x + c
+
+    # f is a function we build to compute output y given input x.
+    # f(x) = y
+    #      = x + c
+    #      = x + 1.5
+    f = theano.function([x], [y])
+
+    # We now bind 2.5 to an internal copy of x and evaluate an internal y,
+    # which we return.
+    # We assert that 4.0 == f(2.5) = 2.5 + 1.5
+    assert 4.0 == f(2.5)
+
+In the example above, `c`, `x`, and `y` are each a ''symbolic'' result_. They
+are symbolic because they stand for variables and have a type_, but
+do not necessarily store actual values.  Not yet, at least.  (To give them
+values, we will have to `evaluate` them.  More on this below.)
+
+.. _result: glossary.html#result
+.. _type: glossary.html#type
+
+Since we are using the addition operator (`x + c`) here on symbolic results, the
+output `y` is also symbolic.  The `+` corresponds to an ''operation'' in theano
+terminology, or ''op'' for short.
+
+We use these results and ops to construct a `symbolic graph`_.  The graph is
+symbolic because we declare what it computes, but do not actually perform any
+computation.  Some type-checking is done on while we build our graphs, so if you
+try to do something really crazy you'll see an exception right away.
+
+.. _symbolic graph: glossary.html#symbolicgraph
+
+To actually use our graph for computation, we have to compile (or build) it into
+a function `f`.   The compiled function is actually capable of performing
+computation.  So after we have built f, we use it to compute the value of y from
+a `value input` x.  Some argument checking is only possible at run-time, so if
+you ask for impossible things (i.e. logarithm of a negative number, sum of
+matrices with different shapes) then you will get exceptions from the compiled
+function.  These exceptions can be tricky to understand, but we feel your pain
+and we are working hard to make these problems errors easier to fix.
+
+
+*TODO: Is concrete the opposite of symbolic? Do we actually have a term for this?*
+
+*TODO: Go over TerminologyGlossary and make sure we touch on / link to most basic concepts in the above.*
+
+*It would be worth thinking through the order in which these terms should be introduced.
+Can we inline the text?'''*
+
+*Note: Theano has two types of [DefineScalar scalar].*
+
+Matrix example
+==============
+
+In the following example, we will build a function to evaluate the dot product `f(x) = dot(x, w)`.
+
+*TODO: Are there ways we can nicely format the matrix math?*
+
+.. code-block:: python
+
+    import theano
+    import theano.tensor as tensor
+
+    # Define the symbolic results
+    x_sym = tensor.matrix()
+    w_sym = tensor.matrix()
+    y_sym = tensor.dot(x_sym, w_sym)
+
+    f = theano.function([x_sym, w_sym], [y_sym])
+
+    from numpy import asarray
+
+    # Now, choose concrete x and w values.
+
+    # x = [[1 2 3]
+    #      [4 5 6]]
+    x = asarray([[1, 2, 3], [4, 5, 6]])
+
+    # w = [[ 1  2]
+    #      [-1 -2]
+    #      [ 3  3]]
+    w = asarray([[1, 2], [-1, -2], [3, 3]])
+
+    # f(x, w) = [[  8.   7.]
+    #             [ 17.  16.]]
+    # .all() checks the equality over all matrix entries.
+    assert (f(x, w) == asarray([[8, 7], [17, 16]])).all()
+
+*TODO: Explain the matrix and other interesting things going on here.*
+
+
+*TODO: Explain that we have a lot of numpy functionality reimplemented. Link to
+numpy docs and say familiarity won't hurt. Also link to list of available ops.*
+
+Broadcasting example
+====================
+
+Broadcasting is a subtle and important concept in numpy, which I don't
+completely understand.  Regardless, here is an example of how broadcasting
+works.
+
+*WRITEME: Extend to above example to add a vector.*
+
+Gradient example
+================
+
+We are going to write some gradient-based learning code.
+You may now wish to review some
+`matrix conventions <http://pylearn.org/pylearn/wiki/MatrixConventions>`__.
+(Hint: Each row is a training instance, each column is a feature dimension.)
+
+*WRITEME: A simple logistic regression example.*
+
+State example
+=============
+
+In this example, we'll look at a complete logistic regression model, with
+training by simple gradient descent.
+
+.. code-block:: python
+
+    def build_logistic_regression_model(n_in, n_out, l2_coef=30.0)
+        # DECLARE SOME VARIABLES
+
+        import tensor as T
+
+        x = T.matrix()  #our points, one point per row
+        y = T.matrix()  #store our labels as place codes (label 3 of 5 is vector [00100])
+
+        w = T.matrix()  #the linear transform to apply to our input points
+        b = T.vector()  #a vector of biases, which make our transform affine instead of linear
+
+        stepsize = T.scalar('stepsize')  # a stepsize for gradient descent
+
+        # REGRESSION MODEL AND COSTS TO MINIMIZE
+
+        prediction = T.softmax(T.dot(x, w) + b)
+        cross_entropy = T.sum(y * T.log(prediction) + (1-y) * T.log(1.0 - prediction), axis=1)
+        cost = T.sum(cross_entropy) + l2_coef * T.sum(T.sum(w*w))
+
+        # GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS
+
+        grad_w, grad_b = T.grad(cost, [w, b])
+
+        #
+        # GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS
+
+        update_fn = theano.function(
+            inputs = [x, y, stepsize,
+                In(w, 
+                    name='w', 
+                    value=numpy.zeros((n_in, n_out)),
+                    update=w - stepsize * grad_w,
+                    mutable=True,
+                    strict=True)
+                In(b, 
+                    name='b', 
+                    value=numpy.zeros(n_out),
+                    update=b - lr * grad_b,
+                    mutable=True,
+                    strict=True)
+            ],
+            outputs = cost,
+            mode = 'EXPENSIVE_OPTIMIZATIONS')
+
+        apply_fn = theano.function(
+            inputs = [x, In(w, value=update_fn.storage[w]), In(b, value=update_fn.storage[b])],
+            outputs = [prediction])
+
+        return update_fn, apply_fn
+
+    #USUALLY THIS WOULD BE IN A DIFFERENT FUNCTION/CLASS
+    #FIT SOME DUMMY DATA: 100 points with 10 attributes and 3 potential labels
+
+    up_fn, app_fn = build_logistic_regression_model(n_in=10, n_out=3, l2_coef=30.0)
+
+    x_data = numpy.random.randn(100, 10)
+    y_data = numpy.random.randn(100, 3)
+    y_data = numpy.asarray(y_data == numpy.max(y_data, axis=1), dtype='int64')
+
+    print "Model Training ..."
+    for iteration in xrange(1000):
+        print "  iter", iteration, "cost", update_fn(x_data, y_data, stepsize=0.0001)
+
+    print "Model Predictions"
+    print apply_fn(x_data)
+
+
+Summary
+=======
+
+
+*TODO: Rewrite above examples to use doctest strings?*
+
+*TODO: Go through above and link all terms, either to wiki documentation or to
+epydoc documentation.*
+
+*TODO: I would be useful to actually have example files like this in the source
+code. The question is how to automatically extract the source files and inline
+them into this documentation.*
+
+
+.. header:: |THEANO| - README_ - Download_ - Documentation_ - Wiki_ - `Task List`_
+
+.. |THEANO| image:: http://lgcm.iro.umontreal.ca/theano/chrome/site/theano_logo.png
+   :target: http://pylearn.org/auto_theano
+   :alt: THEANO
+   :align: top
+   :class: borderless
+   :width: 60
+   :height: 18
+
+.. _README: ../README.html
+.. _Download: ../README.html#downloading-theano
+.. _Documentation: index.html
+.. _Wiki: http://pylearn.org/theano
+.. _task list: http://lgcm.iro.umontreal.ca/theano/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
+
+
--- a/doc/pygments_code_block_directive.py
+++ b/doc/pygments_code_block_directive.py
+#!/usr/bin/python
+
+# :Author: a Pygments author|contributor; Felix Wiemann; Guenter Milde
+# :Date: $Date: 2007-06-13 12:20:42 +0200 (Wed, 13 Jun 2007) $
+# :Copyright: This module has been placed in the public domain.
+# 
+# This is a merge of `Using Pygments in ReST documents`_ from the pygments_
+# documentation, and a `proof of concept`_ by Felix Wiemann.
+# 
+# ========== ===========================================================
+# 2007-06-01 Removed redundancy from class values.
+# 2007-06-04 Merge of successive tokens of same type
+#            (code taken from pygments.formatters.others).
+# 2007-06-05 Separate docutils formatter script
+#            Use pygments' CSS class names (like the html formatter)
+#            allowing the use of pygments-produced style sheets.
+# 2007-06-07 Merge in the formatting of the parsed tokens
+#            (misnamed as docutils_formatter) as class DocutilsInterface
+# 2007-06-08 Failsave implementation (fallback to a standard literal block 
+#            if pygments not found)
+# ========== ===========================================================
+# 
+# ::
+
+"""Define and register a code-block directive using pygments
+"""
+
+# Requirements
+# ------------
+# ::
+
+from docutils import nodes
+from docutils.parsers.rst import directives
+
+try:
+    import pygments
+    from pygments import highlight
+    from pygments.lexers import get_lexer_by_name
+    from pygments.formatters.html import _get_ttype_class
+    from pygments.styles import get_style_by_name
+    from pygments.lexers import PythonLexer
+    from pygments.formatters import HtmlFormatter
+
+    # Customisation
+    # -------------
+    # 
+    # Do not insert inline nodes for the following tokens.
+    # (You could add e.g. Token.Punctuation like ``['', 'p']``.) ::
+
+    unstyled_tokens = ['']
+
+    # DocutilsInterface
+    # -----------------
+    # 
+    # This interface class combines code from
+    # pygments.formatters.html and pygments.formatters.others.
+    # 
+    # It does not require anything of docutils and could also become a part of
+    # pygments::
+
+    class DocutilsInterface(object):
+        """Parse `code` string and yield "classified" tokens.
+        
+        Arguments
+        
+          code     -- string of source code to parse
+          language -- formal language the code is written in.
+        
+        Merge subsequent tokens of the same token-type. 
+        
+        Yields the tokens as ``(ttype_class, value)`` tuples, 
+        where ttype_class is taken from pygments.token.STANDARD_TYPES and 
+        corresponds to the class argument used in pygments html output.
+
+        """
+
+        def __init__(self, code, language):
+            self.code = code
+            self.language = language
+            
+        def lex(self):
+            # Get lexer for language (use text as fallback)
+            try:
+                lexer = get_lexer_by_name(self.language)
+            except ValueError:
+                # info: "no pygments lexer for %s, using 'text'"%self.language
+                lexer = get_lexer_by_name('text')
+            return pygments.lex(self.code, lexer)
+            
+                
+        def join(self, tokens):
+            """join subsequent tokens of same token-type
+            """
+            tokens = iter(tokens)
+            (lasttype, lastval) = tokens.next()
+            for ttype, value in tokens:
+                if ttype is lasttype:
+                    lastval += value
+                else:
+                    yield(lasttype, lastval)
+                    (lasttype, lastval) = (ttype, value)
+            yield(lasttype, lastval)
+
+        def __iter__(self):
+            """parse code string and yield "clasified" tokens
+            """
+            try:
+                tokens = self.lex()
+            except IOError:
+                print "INFO: Pygments lexer not found, using fallback"
+                # TODO: write message to INFO 
+                yield ('', self.code)
+                return
+
+            for ttype, value in self.join(tokens):
+                yield (_get_ttype_class(ttype), value)
+
+
+
+    # code_block_directive
+    # --------------------
+    # ::
+
+    def code_block_directive(name, arguments, options, content, lineno,
+                           content_offset, block_text, state, state_machine):
+        """parse and classify content of a code_block
+        """
+        language = arguments[0]
+        # create a literal block element and set class argument
+        if 0:
+            code_block = nodes.literal_block(classes=["code-block", language])
+            code_block += nodes.raw('<b>hello</b> one', 'hello two')
+        else:
+            code_block = nodes.literal_block(classes=["code-block", language])
+            
+            # parse content with pygments and add to code_block element
+            for cls, value in DocutilsInterface(u'\n'.join(content), language):
+                if cls in unstyled_tokens:
+                    # insert as Text to decrease the verbosity of the output.
+                    code_block += nodes.Text(value, value)
+                else:
+                    code_block += nodes.inline(value, value, classes=[cls])
+
+            if 0:
+                v = highlight(u'\n'.join(content), PythonLexer(), 
+                        HtmlFormatter(style='colorful', full=True, cssfile='blah.css'))
+                print help(nodes.Inline)
+
+        return [code_block]
+
+
+    # Register Directive
+    # ------------------
+    # ::
+
+    code_block_directive.arguments = (1, 0, 1)
+    code_block_directive.content = 1
+    directives.register_directive('code-block', code_block_directive)
+
+    # .. _doctutils: http://docutils.sf.net/
+    # .. _pygments: http://pygments.org/
+    # .. _Using Pygments in ReST documents: http://pygments.org/docs/rstdirective/
+    # .. _proof of concept:
+    #      http://article.gmane.org/gmane.text.docutils.user/3689
+    # 
+    # Test output
+    # -----------
+    # 
+    # If called from the command line, call the docutils publisher to render the
+    # input::
+
+except ImportError:
+    print >> sys.stderr, "Failed to import pygments"
+    pass
+
+
+
+if __name__ == '__main__':
+    from docutils.core import publish_cmdline, default_description
+    description = "code-block directive test output" + default_description
+    try:
+        import locale
+        locale.setlocale(locale.LC_ALL, '')
+    except:
+        pass
+    # Uncomment the desired output format:
+    publish_cmdline(writer_name='pseudoxml', description=description)
+    # publish_cmdline(writer_name='xml', description=description)
+    # publish_cmdline(writer_name='html', description=description)
+    # publish_cmdline(writer_name='latex', description=description)
+    # publish_cmdline(writer_name='newlatex2e', description=description)
+    
+
+
--- a/doc/style.css
+++ b/doc/style.css
+/*
+ * :Author: Your Name
+ * :Contact: Your Email Address
+ * :Copyright: This stylesheet has been placed in the public domain.
+ *
+ * Stylesheet for use with Docutils.  [Optionally place a more
+ * detailed description here.]
+ * */
+
+@import url(html4css1.css); /* for basic rst stuff */
+@import url(colorful.css); /* for source highlighting */
+
+/* Your customizations go here.  For example: */
+
+/*
+h1, h2, h3, h4, h5, h6, p.topic-title {
+      font-family: sans-serif }
+      */
+
--- a/doc/test0.txt
+++ b/doc/test0.txt
-Title
-=====
-
-Some text.
-
-
-Subtitle
--------
-
-
-More stuff_.
-
-
- .. _stuff:: http://www.google.com
-
--- a/elemwise.py
+++ b/elemwise.py
@@ -7,6 +7,7 @@ import scalar
 from scalar import Scalar
 import gof
 from gof.python25 import all
+from copy import copy


 # tensor depends on elemwise to provide definitions for several ops
@@ -231,6 +232,15 @@ class Elemwise(Op):
        else:
            self.ufunc = None

+    def __getstate__(self):
+        d = copy(self.__dict__)
+        d.pop('ufunc')
+        return d
+    
+    def __setstate__(self, d):
+        self.__dict__.update(d)
+        self.ufunc = numpy.frompyfunc(self.scalar_op.impl, self.scalar_op.nin, self.scalar_op.nout)
+
    def make_node(self, *inputs):
        """
        If the inputs have different number of dimensions, their shape

--- a/gen_oplist.py
+++ b/gen_oplist.py
@@ -3,100 +3,164 @@ __docformat__ = "restructuredtext en"
 import sys
 import gof

-def isOpClass(thing):
-    return hasattr(thing, 'perform') and not isinstance(thing, gof.Op)
+def print_title(title_string, under_char, over_char=''):
+    l = len(title_string)
+    if over_char:
+        print over_char * l

-def isOpConstructor(thing, module):
-    return hasattr(thing, 'perform') and isinstance(thing, gof.Op)\
-            or thing in getattr(module, '_constructor_list', [])
-
-def print_title(title_string, under_char):
    print title_string
-    print under_char * len(title_string)
-    print ""

-def chomp(s):
-    """interpret and left-align a docstring"""
+    if under_char:
+        print under_char * l

-    if 'subtensor' in s: 
-        debug = 0
-    else:
-        debug = 0
+    print ""

-    r = []
-    leadspace = True
-    for c in s:
-        if leadspace and c in ' \n\t':
-            continue
+def print_hline():
+    print '-' * 80
+
+class Entry:
+    """Structure for generating the oplist file"""
+    symbol = None
+    name = None
+    module = None
+    docstring = None
+    tags = []
+
+    def __init__(self, symbol, name, current_module):
+        self.symbol = symbol
+        self.name = name
+        self.module = symbol.__module__ #current_module.__name__ # symbol.__module__
+        self.docstring = symbol.__doc__
+        self.tags = ['module:%s' % current_module.__name__] + getattr(symbol, '__oplist_tags', [])
+
+    def mini_desc(self, maxlen=50):
+        """Return a short description of the op"""
+        def chomp(s):
+            """interpret and left-align a docstring"""
+
+            if 'subtensor' in s: 
+                debug = 0
+            else:
+                debug = 0
+
+            r = []
+            leadspace = True
+            for c in s:
+                if leadspace and c in ' \n\t':
+                    continue
+                else:
+                    leadspace = False
+
+                if c == '\n':
+                    if debug:
+                        print >> sys.stderr, 'breaking'
+                    break
+                if c in '\t*`': 
+                    c = ' ';
+                r.append(c)
+
+            if debug: 
+                print >> sys.stderr, r
+
+            return "".join(r)
+
+        minmax = 5
+        assert maxlen >= minmax
+        if not self.docstring: 
+            return "" #+ '(no doc)'
+        elif len(self.docstring) < maxlen:
+            return chomp(self.docstring)
        else:
-            leadspace = False
-
-        if c == '\n':
-            if debug:
-                print >> sys.stderr, 'breaking'
-            break
-        if c == '\t': 
-            c = ' ';
-        r.append(c)
-
-    if debug: 
-        print >> sys.stderr, r
+            return "%s ..."% chomp(self.docstring[:maxlen-minmax])
+
+    apilink = property(lambda self: ":api:`%s.%s`"% (self.module, self.name))
+    """Return the ReST link into the epydoc of this symbol"""
+
+class EntryOp(Entry):
+    def __init__(self, symbol, *args):
+        has_perform = hasattr(symbol, 'perform')
+        if symbol is gof.Op:
+            raise TypeError('not an Op subclass')
+        if not issubclass(symbol, gof.Op):
+            raise TypeError('not an Op subclass')
+        Entry.__init__(self, symbol, *args)
+
+class EntryConstructor(Entry):
+    def __init__(self, symbol, name, module):
+        is_op = isinstance(symbol, gof.Op)
+        is_ctor = symbol in getattr(module, '__oplist_constructor_list', [])
+        if not (is_op or is_ctor):
+            raise TypeError('not a constructor', symbol)
+        Entry.__init__(self, symbol, name, module)
+
+
+def search_entries(module_list):
+    ops = []
+    constructors = []
+
+    for module in module_list:
+        symbol_name_list = [s for s in dir(module) if not s[0] == '_']

-    return "".join(r)
+	for symbol_name in symbol_name_list:
+	    symbol = getattr(module, symbol_name)
+            try:
+                ops.append(EntryOp(symbol, symbol_name, module))
+            except TypeError:
+                try:
+                    constructors.append(EntryConstructor(symbol, symbol_name, module))
+                except TypeError:
+                    pass
+
+    return ops, constructors
+
+def print_entries(ops, constructors):
+    tags = {}
+    for o in ops + constructors:
+        for t in o.tags:
+            tags.setdefault(t, []).append(o)
+
+    for t in tags:
+        print_title(t, '=')
+
+        tagged_ops = [op for op in tags[t] if isinstance(op, EntryOp)]
+        if len(tagged_ops):
+            print_title('Op Classes', '-')
+            for op in tagged_ops:
+                print "- %s" % op.apilink
+                print "  %s" % op.mini_desc()
+                print ""
+
+        tagged_ops = [op for op in tags[t] if isinstance(op, EntryConstructor)]
+        if len(tagged_ops):
+            print_title('Op Constructors', '-')
+            for op in tagged_ops:
+                print "- %s" % op.apilink
+                print "  %s" % op.mini_desc()
+                print ""


-def generate():
+if __name__ == "__main__":
    """Generate the op list"""
    import scalar, sparse, tensor

-    print_title("Theano Op List", "~")
+    print_title("Op List", "~", "~")
+    print """
+This page lists the `Op Classes` and `constructors` that are provided by the Theano library.
+`Op Classes` drive from :api:`Op`, whereas `constructors` are typically `Op Class` instances, but may be true Python functions.
+
+In the future, this list may distinguish `constructors` that are Op instances from true Python functions.
+
+"""
+    print_hline()
    print ""
    print ".. contents:: "
    print ""

-    for module in [scalar, sparse, tensor]:
-	print_title('module: `%s`' % module.__name__, '=')
-
-	print_title('Op Classes', '-')
+    ops, constructors = search_entries([scalar, sparse, tensor])

-        symbol_name_list = [s for s in dir(module) if not s[0] == '_']
-
-	for symbol_name in symbol_name_list:
-
-	    symbol = getattr(module, symbol_name)
-
-	    if isOpClass(symbol):
-		print ""
-		print "- :api:`%s.%s`" % (symbol.__module__, symbol_name)
-		docstring = getattr(symbol, '__doc__', "")
-
-		if not docstring: 
-		    print " ", '(no doc)'
-		elif len(docstring) < 50:
-		    print " ", chomp(docstring)
-		else:
-		    print " ", chomp(docstring[:40]), "..."
-	# a little trailing whitespace
-	print ""
-
-	print_title('Op Constructors', '-')
-	for symbol_name in symbol_name_list:
-
-	    symbol = getattr(module, symbol_name)
+    print_entries(ops, constructors)

-	    if isOpConstructor(symbol, module):
-		print ""
-		print "- :api:`%s.%s`" % (symbol.__module__, symbol_name)
-		docstring = getattr(symbol, '__doc__', "")
-
-		if not docstring: 
-		    print " ", 'No documentation'
-		elif len(docstring) < 50:
-		    print " ", chomp(docstring)
-		else:
-		    print " ", chomp(docstring[:40]), "..."
-	# a little trailing whitespace
-	print ""
+    print ""

-if __name__ == "__main__":
-    generate()
+    for line in open("doc/header.txt"):
+        print line[:-1]
--- a/gen_typelist.py
+++ b/gen_typelist.py
+
+from gen_oplist import print_title, print_hline
+
+
+if __name__ == '__main__':
+    print_title("Type List", "~", "~")
+
+    print "*THIS PAGE IS A PLACEHOLDER: WRITEME*"
+    print ""
+    print_hline()
+
+    print ""
+    print ".. contents::"
+    print ""
+
+    print_title("Type Classes", '=')
+
+    print "- scalar.Scalar\n"
+    print "- tensor.Tensor\n"
+    print "- sparse.Sparse\n"
+
+    print_title("Type Instances", '=')
+
+    print "- scalar.int8\n"
+    print "- tensor.lvector\n"
+    print "- sparse.??\n"
+
+    print ""
+
+    for line in open("doc/header.txt"):
+        print line[:-1]
--- a/gof/__init__.py
+++ b/gof/__init__.py
@@ -12,7 +12,7 @@ from graph import \
    Apply, Result, Constant, Value, view_roots

 from link import \
-    Linker, LocalLinker, PerformLinker, WrapLinker, Profiler
+    Container, Linker, LocalLinker, PerformLinker, WrapLinker, Profiler

 from op import \
    Op
@@ -22,7 +22,8 @@ from opt import \
    MergeOptimizer, MergeOptMerge, \
    LocalOptimizer, local_optimizer, LocalOptGroup, LocalOpKeyOptGroup, \
    OpSub, OpRemove, PatternSub, \
-    NavigatorOptimizer, TopoOptimizer, OpKeyOptimizer
+    NavigatorOptimizer, TopoOptimizer, OpKeyOptimizer, \
+    PureThenInplaceOptimizer

 from toolbox import \
    Bookkeeper, History, Validator, ReplaceValidate, NodeFinder, PrintListener

--- a/gof/cc.py
+++ b/gof/cc.py
@@ -631,8 +631,8 @@ class CLinker(link.Linker):
                                    input_storage,
                                    output_storage)
        return thunk, \
-            [link.Filter(input, storage) for input, storage in zip(self.env.inputs, input_storage)], \
-            [link.Filter(output, storage, True) for output, storage in zip(self.env.outputs, output_storage)], \
+            [link.Container(input, storage) for input, storage in zip(self.env.inputs, input_storage)], \
+            [link.Container(output, storage, True) for output, storage in zip(self.env.outputs, output_storage)], \
            error_storage

    def make_thunk(self, input_storage = None, output_storage = None):
@@ -881,8 +881,8 @@ class OpWiseCLinker(link.LocalLinker):

        f = link.streamline(env, thunks, order, no_recycling = no_recycling, profiler = profiler)

-        return f, [link.Filter(input, storage) for input, storage in zip(env.inputs, input_storage)], \
-            [link.Filter(output, storage, True) for output, storage in zip(env.outputs, output_storage)], \
+        return f, [link.Container(input, storage) for input, storage in zip(env.inputs, input_storage)], \
+            [link.Container(output, storage, True) for output, storage in zip(env.outputs, output_storage)], \
            thunks, order


@@ -948,6 +948,7 @@ class DualLinker(link.Linker):
        no_recycling = self.no_recycling

        _f, i1, o1, thunks1, order1 = link.PerformLinker().accept(env, no_recycling = no_recycling).make_all(**kwargs)
+        kwargs.pop('input_storage', None)
        _f, i2, o2, thunks2, order2 =      OpWiseCLinker().accept(env, no_recycling = no_recycling).make_all(**kwargs)

        def f():

--- a/gof/graph.py
+++ b/gof/graph.py
@@ -184,7 +184,7 @@ class Result(utils.object2):
            else:
                return str(self.owner.op) + "." + str(self.index)
        else:
-            return "<?>::" + str(self.type)
+            return "<%s>" % str(self.type)
    def __repr__(self):
        return str(self)
    def clone(self):
@@ -422,8 +422,6 @@ def clone_get_equiv(i, o, copy_inputs_and_orphans = True):
        else:
            d[input] = input

-
-
    for apply in io_toposort(i, o):
        for input in apply.inputs:
            if input not in d:
@@ -438,6 +436,10 @@ def clone_get_equiv(i, o, copy_inputs_and_orphans = True):
        for output, new_output in zip(apply.outputs, new_apply.outputs):
            d[output] = new_output

+    for output in o:
+        if output not in d:
+            d[output] = output.clone()
+
    return d

 def general_toposort(r_out, deps, debug_print = False):

--- a/gof/link.py
+++ b/gof/link.py
 """WRITEME"""
 import utils
 import graph
+from type import Type

 import sys, traceback
 from copy import copy
@@ -109,27 +110,32 @@ class Linker(object):
        return execute


-class Filter(object):
-    """WRITEME"""
-    def __init__(self, r, storage, readonly = False, strict = False, trace = ()):
-        self.r = r
-        self.type = r.type
+class Container(object):
+    def __init__(self, r, storage, readonly = False, strict = False, name = None):
+        #self.r = r
+        if isinstance(r, Type):
+            self.type = r
+        else:
+            self.type = r.type
+        self.name = name or r.name
        self.storage = storage
        self.readonly = readonly
        self.strict = strict
    def __get(self):
        return self.storage[0]
    def __set(self, value):
+        if self.readonly:
+            raise Exception("Cannot set readonly storage: %s" % self.name)
        try:
-            if self.readonly:
-                raise Exception("Cannot set readonly storage.")
            if self.strict:
                self.storage[0] = self.type.filter(value, strict = True)
            else:
                self.storage[0] = self.type.filter(value)
-        except:
-            raise_with_op(self.r)
+        except Exception, e:
+            e.args = e.args + (self.name,)
+            raise
    data = property(__get, __set)
+    value = property(__get, __set)
    def __str__(self):
        return "<" + str(self.storage[0]) + ">"
    def __repr__(self):
@@ -260,8 +266,8 @@ class PerformLinker(LocalLinker):

        f = streamline(env, thunks, order, no_recycling = no_recycling, profiler = profiler)
  
-        return f, [Filter(input, storage) for input, storage in zip(env.inputs, input_storage)], \
-            [Filter(output, storage, True) for output, storage in zip(env.outputs, output_storage)], \
+        return f, [Container(input, storage) for input, storage in zip(env.inputs, input_storage)], \
+            [Container(output, storage, True) for output, storage in zip(env.outputs, output_storage)], \
            thunks, order


@@ -333,7 +339,9 @@ class WrapLinker(Linker):
    def make_thunk(self, **kwargs):
        no_recycling = self.no_recycling

-        make_all = [l.make_all(**kwargs) for l in self.linkers]
+        make_all = [self.linkers[0].make_all(**kwargs)]
+        kwargs.pop('input_storage', None)
+        make_all += [l.make_all(**kwargs) for l in self.linkers[1:]]

        fns, input_lists, output_lists, thunk_lists, order_lists \
                = zip(*make_all)

--- a/gof/opt.py
+++ b/gof/opt.py
@@ -12,6 +12,7 @@ import toolbox
 import op
 from copy import copy
 from collections import deque
+import destroyhandler as dh


 class Optimizer:
@@ -61,8 +62,7 @@ class FromFunctionOptimizer(Optimizer):
    def __init__(self, fn):
        self.apply = fn
    def add_requirements(self, env):
-        """WRITEME"""
-        env.extend(gof.toolbox.ReplaceValidate)
+        env.extend(toolbox.ReplaceValidate())

 def optimizer(f):
    """WRITEME"""
@@ -215,7 +215,7 @@ class FromFunctionLocalOptimizer(LocalOptimizer):
    def __init__(self, fn):
        self.transform = fn
    def add_requirements(self, env):
-        env.extend(gof.toolbox.ReplaceValidate)
+        env.extend(toolbox.ReplaceValidate())

 def local_optimizer(f):
    """WRITEME"""
@@ -624,6 +624,21 @@ def check_chain(r, *chain):



+############
+### Misc ###
+############
+
+class PureThenInplaceOptimizer(Optimizer):
+
+    def __init__(self, pure, inplace):
+        self.pure = pure
+        self.inplace = inplace
+
+    def apply(self, env):
+        self.pure(env)
+        env.extend(dh.DestroyHandler())
+        self.inplace(env)
+




--- a/gof/type.py
+++ b/gof/type.py
@@ -63,6 +63,9 @@ class CLinkerType(object):
        """
        raise AbstractFunctionError()

+    def c_init(self, name, sub):
+        raise AbstractFunctionError()
+
    def c_extract(self, name, sub):
        """Required: Return c code to extract a PyObject * instance.


--- a/gradient.py
+++ b/gradient.py
@@ -110,62 +110,4 @@ def grad_sources_inputs(sources, graph_inputs):
                    gmap[r] = g_r
    return gmap

-class numeric_grad:
-    def __init__(self, f, pt, eps=1.0e-7):
-        """Return the gradient of f at pt.
-        
-        This function computes the gradient by a one-sided finite differences of a
-        fixed step size (eps).
-        
-        It is assumed that f(...) will return a scalar.
-        It is assumed that all f's inputs are numpy.ndarray objects.
-        """
-        gf = [numpy.ndarray(x.shape) for x in pt]
-        f_pt = f(*pt)
-        if isinstance(f, (list, tuple)):
-            f_pt = [numpy.copy(x) for x in f_pt]
-        else:
-            f_pt = numpy.copy(f_pt)
-
-        for idx in xrange(len(gf)):
-            if len(pt[idx].shape) == 0:
-                orig = pt[idx]
-                pt[idx] = numpy.asarray(pt[idx] + eps)
-                f_eps = f(*pt)
-                gf[idx] = numpy.asarray((f_eps - f_pt)/eps)
-                pt[idx] = orig
-
-            elif len(pt[idx].shape) == 1:
-                for i in xrange(pt[idx].shape[0]):
-                    orig = pt[idx][i]
-                    pt[idx][i] = pt[idx][i] + eps
-                    f_eps = f(*pt)
-                    gf[idx][i] = numpy.asarray((f_eps - f_pt)/eps)
-                    pt[idx][i] = orig
-            elif len(pt[idx].shape) == 2:
-                for i in xrange(pt[idx].shape[0]):
-                    for j in xrange(pt[idx].shape[1]):
-                        orig = pt[idx][i,j]
-                        pt[idx][i,j] = pt[idx][i,j] + eps
-                        f_eps = f(*pt)
-                        gf[idx][i,j] = numpy.asarray((f_eps - f_pt)/eps)
-                        pt[idx][i,j] = orig
-            else:
-                raise NotImplementedError()
-
-        self.gf = gf
-
-    @staticmethod
-    def abs_rel_err(a,b,eps=1.0e-10):
-        """Return a small number when a and b are close, relative to how big they are"""
-        return abs(a-b) / (abs(a)+abs(b)+eps)
-
-    def max_err(self, g_pt):
-        """Return the biggest relative error between g_pt and self.gf"""
-        assert len(g_pt) == len(self.gf)
-        errs = []
-        for a, b in zip(g_pt, self.gf):
-            errs.append(numpy.max(numeric_grad.abs_rel_err(a,b)))
-        return max(errs)
-

--- a/index.txt
+++ b/index.txt
+======
+Theano
+======
+---------------------------------------------------------------
+An optimizing compiler for matrix valued expressions in Python
+---------------------------------------------------------------
+
+Theano is an optimizing compiler in Python, built to evaluate complicated
+expressions (especially matrix-valued ones) as quickly as possible.
+It was written at LISA_ to explore techniques for machine learning.
+Our project uses the name to honour the ancient Greek mathematician. 
+
+--------------------------------------------------------------------------------
+
+.. _not in the normal sense: :wiki:`WhatIsTheano`
+
+Overview
+========
+
+**To get up & running quickly** see README_.
+
+All **documentation** can be reached from the `Theano Project Documentation Overview`_.
+
+As developers of an open source project, we rely on **feedback** for
+determining what features to implement, and what documentation needs to be
+improved.  The best forum for feedback is the theano-users_ mailing list.
+
+All **discussion** about theano also takes place on the theano-users_ mailing list.
+
+If you find a **bug**, please file a `bug report`_ or send email to
+the theano-users_ mailing list.  **Patch** submissions should be
+sent to theano-dev_.
+
+We welcome all kinds of **contributions**.  Our `task list`_ is
+full of interesting ideas awaiting a champion.  If you have any
+questions regarding how to extend Theano, please feel free to ask on
+the Theano-dev_ mailing list.
+
+Theano is in active development and should be considered **experimental**.
+APIs are subject to change at any time.
+
+
+Download
+========
+
+We recommend that you use the `latest snapshot`_,
+Better yet, use `mercurial`_ to keep your installation fresh.
+The snapshots usually contain *more features* and *fewer bugs* than the
+"official" releases |---| they're not only for developers!
+
+.. class:: credits
+
+    Docs by docutils_ and epydoc_.
+    Project by Mercurial_ and TRAC_.
+    Powered by Python_ and SciPy_.
+    Coded at the LISA_ lab.
+
+.. class:: hidden
+
+    Google should index the mailing lists: 
+    `theano-users <http://groups.google.com/group/theano-users?pli=1>`__,
+    and
+    `theano-dev <http://groups.google.com/group/theano-dev?pli=1>`__.
+
+.. |---| unicode:: U+02014 .. em dash
+   :trim:
+.. _latest snapshot: http://pylearn.org/hg/theano/archive/tip.tar.gz
+.. _bug report: http://lgcm.iro.umontreal.ca/theano/newticket
+.. _theano-users: http://groups.google.com/group/theano-users?pli=1
+.. _theano-dev: http://groups.google.com/group/theano-dev?pli=1
+.. _reStructuredText: rst.html
+.. _task list: http://lgcm.iro.umontreal.ca/theano/query?status=accepted&status=assigned&status=new&status=reopened&group=milestone&max=200&col=id&col=summary&col=status&col=owner&col=type&col=priority&col=component&col=time&report=9&order=priority
+.. _README: README.html
+.. _Quick-Start: README.html#quick-start
+
+.. _Theano Project Documentation Overview: doc/index.html
+.. _Mercurial: http://www.selenic.com/mercurial/wiki/
+.. _docutils: http://docutils.sourceforge.net
+.. _epydoc: http://epydoc.sourceforge.net/
+.. _scipy: http://scipy.org/
+.. _Python: http://www.python.org/
+.. _TRAC: http://trac.edgewall.org/
+.. _LISA:  http://www.iro.umontreal.ca/rubrique.php3?id_rubrique=27
+
+.. |TRAC| image:: http://www.edgewall.org/gfx/trac_logo.png
+   :target: http://www.edgewall.org/
+   :alt: Trac Logo
+   :align: middle
+   :class: borderless
+   :width: 193
+   :height: 32
+.. |Python| image:: python.png
+   :alt: Python Logo
+   :align: middle
+   :class: borderless
+   :width: 193
+   :height: 32
+.. |LISA| image:: http://www.iro.umontreal.ca/images/neurone_chip2.jpg
+   :target: http://www.iro.umontreal.ca/rubrique.php3?id_rubrique=27
+   :width: 193
+   :height: 32
+   :alt: LISA Logo
+   :align: middle
+   :class: borderless
+
+.. header:: |THEANO| - README_ - Download_ - Documentation_ - Wiki_ - `Task List`_
+
+.. _Download: README.html#downloading-theano
+.. _Documentation: doc/index.html
+.. _Wiki: http://pylearn.org/theano
+
+.. |THEANO| image:: http://lgcm.iro.umontreal.ca/theano/chrome/site/theano_logo.png
+   :target: http://pylearn.org/auto_theano
+   :alt: THEANO
+   :align: top
+   :class: borderless
+   :width: 60
+   :height: 18
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   End:
+
--- a/local.build_html.sh
+++ b/local.build_html.sh
 #!/bin/bash

-APIRST2HTML=doc/apirst2html.py
-EPYDOC_ARGS='--external-api=api --external-api-file=api:html/api/api-objects.txt --external-api-root=api:../api/'
-
-
 mkdir -p html/api && mkdir -p html/doc

 # this builds some stuff or something... basically makes the rest work properly
 # for a reason I don't understand.  -JB 20080924
 python __init__.py

+#runs if you called $./local.build_html.sh epydoc
 if [ " $1" != " rst" ]; then
 ./epydoc --config local.epydoc
 fi

+#runs if you called $./local.build_html.sh rst
 if [ " $1" != " epydoc" ]; then
-python gen_oplist.py > doc/oplist.txt
-for RST in graph oplist ;  do
-
-    $APIRST2HTML $EPYDOC_ARGS doc/$RST.txt html/doc/$RST.html
-done
+    APIRST2HTML=doc/apirst2html.py
+    EPYDOC_ARGS='--external-api=api --external-api-file=api:html/api/api-objects.txt --external-api-root=api:../api/ --link-stylesheet'
+
+    # install the stylesheets
+    HTML4CSS1='/usr/lib/python2.5/site-packages/docutils/writers/html4css1/html4css1.css'
+    cp $HTML4CSS1 html/html4css1.css
+    cp doc/colorful.css html/colorful.css
+    cp doc/style.css html/style.css
+
+    #generate the index & readme files
+    echo "$APIRST2HTML $EPYDOC_ARGS index.txt html/index.html..."
+    $APIRST2HTML -stg $EPYDOC_ARGS --stylesheet=style.css index.txt html/index.html
+    echo "$APIRST2HTML $EPYDOC_ARGS README.txt html/README.html..."
+    $APIRST2HTML -stg $EPYDOC_ARGS --stylesheet=style.css README.txt html/README.html
+
+    #generate the oplist in ReST format
+    echo "gen oplist..."
+    python gen_oplist.py > doc/oplist.txt
+    python gen_typelist.py > doc/typelist.txt
+
+    #generate html files for all the ReST documents in doc/
+    echo "gen doc/*.txt..."
+    for RST in doc/*.txt;  do
+        BASENAME=$(basename $RST .txt)
+        echo "gen doc/$BASENAME.txt..."
+        $APIRST2HTML -stg $EPYDOC_ARGS --stylesheet=../style.css doc/$BASENAME.txt html/doc/$BASENAME.html
+    done
 fi

--- a/scalar.py
+++ b/scalar.py
@@ -86,7 +86,7 @@ class Scalar(Type):
        return str(self.dtype)

    def __repr__(self):
-        return "Scalar{%s}" % self.dtype
+        return "Scalar(%s)" % self.dtype

    def c_literal(self, data):
        if 'complex' in self.dtype:
@@ -252,16 +252,17 @@ def upcast_out(*types):
    return Scalar(dtype = Scalar.upcast(*types)),
 def same_out(type):
    return type,
-def transfer_type(i):
-    assert type(i) == int
-    def f(*types):
-        return types[i],
-    f.__name__ = "transfer_type_%i" % i
-    return f
-def specific_out(*spec):
-    def f(*types):
-        return spec
-    return f
+class transfer_type:
+    def __init__(self, i):
+        assert type(i) == int
+        self.i = i
+    def __call__(self, *types):
+        return types[self.i],
+class specific_out:
+    def __init__(self, *spec):
+        self.spec = spec
+    def __call__(self, *types):
+        return self.spec
 def int_out(*types):
    return int64,
 def float_out(*types):
@@ -283,7 +284,7 @@ class ScalarOp(Op):
        self.name = name
        if output_types_preference is not None:
            if not callable(output_types_preference):
-                raise TypeError("Expected a callable for the 'output_types_preference' argument to %s." % self.__class__)
+                raise TypeError("Expected a callable for the 'output_types_preference' argument to %s. (got: %s)" % (self.__class__, output_types_preference))
            self.output_types_preference = output_types_preference

    def make_node(self, *inputs):

--- a/tensor.py
+++ b/tensor.py
@@ -2,6 +2,7 @@

 __docformat__ = "restructuredtext en"

+import __builtin__
 import sys # for sys.maxint
 import inspect
 import functools
@@ -20,18 +21,26 @@ import elemwise
 import scalar as scal
 from gof.python25 import partial

+import compile
+

 ### set up the external interface
 from elemwise import Elemwise, DimShuffle, CAReduce, Sum
-import tensor_random as random


-_constructor_list = []
+__oplist_constructor_list = []
 """List of functions to be listed as op constructors in the oplist (`gen_oplist`, doc/oplist.txt)."""
 def constructor(f):
-    """Make `f` appear as a constructor in the oplist (`gen_oplist`, doc/oplist.txt)."""
-    _constructor_list.append(f)
+    """Add `f` to :doc:`oplist`.
+    
+    Make `f` appear as a constructor in the oplist (`gen_oplist`, doc/oplist.txt).
+    """
+    __oplist_constructor_list.append(f)
    return f
+def __oplist_tag(thing, tag):
+    tags = getattr(thing, '__oplist_tags', [])
+    tags.append(tag)
+    thing.__oplist_tags = tags


 def as_tensor(x, name = None):
@@ -92,7 +101,7 @@ def constant(x):
    except:
        raise TypeError("Could not convert %s to Tensor" % x, type(x))

-def value(x):
+def value(x, name=None):
    """Return a symbolic `Value` with default value `x`
    
    :Exceptions:
@@ -103,8 +112,12 @@ def value(x):
    else:
        x_ = numpy.asarray(x)
    try:
-        return TensorValue(Tensor(dtype = x_.dtype,
+        if name is None:
+            return TensorValue(Tensor(dtype = x_.dtype,
                                  broadcastable = [d == 1 for d in x_.shape]), x_)
+        else:
+            return TensorValue(Tensor(dtype = x_.dtype,
+                                  broadcastable = [d == 1 for d in x_.shape]), x_, name=name)
    except:
        raise TypeError("Could not convert %s to Tensor" % x, type(x))

@@ -113,7 +126,7 @@ def value(x):
 class Tensor(Type):
    """Symbolic `Type` representing a numpy.ndarray value."""

-    def __init__(self, dtype, broadcastable):
+    def __init__(self, dtype, broadcastable, name = None):
        """Initialize self.dtype and self.broadcastable.

        :Parameters:
@@ -126,11 +139,13 @@ class Tensor(Type):
           must be 1.  Secondly, the length of this list is the number of
           dimensions that an associated value must have.  See
           :doc:`broadcasting` for an explanation of how this list is used.
-
+         - `name`: str
+           Optional name for this type.
        """
        self.dtype = str(dtype)
        self.broadcastable = tuple(broadcastable)
        self.dtype_specs() # error checking is done there
+        self.name = name
    
    def filter(self, data, strict = False):
        """Convert `data` to something which can be associated to a `TensorResult`.
@@ -202,14 +217,25 @@ class Tensor(Type):
         - `name`: str
           A pretty name to identify this `Result` when printing and debugging

-       """
+        """
        return TensorResult(self, name = name)

    def __str__(self):
-        return "%s(%s)" % (str(self.dtype), str(self.broadcastable))
+        if self.name:
+            return self.name
+        else:
+            b = self.broadcastable
+            #bcast = str(self.broadcastable)
+            bcast = {(): 'scalar',
+                     (False,): 'vector',
+                     (False, True): 'col',
+                     (True, False): 'row',
+                     (False, False): 'matrix'}.get(b, "%iD" % len(b) if not any(b) else str(b))
+            return "Tensor(%s, %s)" % (str(self.dtype), bcast)

    def __repr__(self):
-        return "Tensor{%s, %s}" % (str(self.dtype), str(self.broadcastable))
+        return str(self)
+        #"Tensor{%s, %s}" % (str(self.dtype), str(self.broadcastable))

    def c_declare(self, name, sub):
        """Override `CLinkerOp.c_declare` """
@@ -466,6 +492,18 @@ class _tensor_py_operators:
        raise TypeError('Tensor does not support iteration. '
        'Maybe you are using builtin.sum instead of theano.tensor.sum? (Maybe .max?)')
        
+
+    # CONVENIENT ACCESS TO TYPE PROPERTIES
+    ndim = property(lambda self: self.type.ndim)
+    """The rank of this tensor."""
+    broadcastable = property(lambda self: self.type.broadcastable)
+    """The broadcastable signature of this tensor.
+
+    See :doc:`broadcasting` for details.
+    
+    """
+    dtype = property(lambda self: self.type.dtype)
+    """ The dtype of this tensor.  """
    

 class TensorResult(Result, _tensor_py_operators):
@@ -508,11 +546,12 @@ def _elemwise(scalar_op, name, doc_prefix=''):

    return straight, inplace

-def _redefine(real_symbol_value):
+def _redefine(real_symbol_value, module='tensor'):
    """Replace the value associated with a function symbol.
    
    This is useful to trick epydoc into doing what we want.  It's a hack.
    """
+    real_symbol_value.__module__ = 'tensor'
    def decorator(f):
        return real_symbol_value
    return decorator
@@ -542,6 +581,7 @@ def _scal_elemwise(symbol):
    #for the meaning of this see the ./epydoc script
    # it makes epydoc display rval as if it were a function, not an object
    rval.__epydoc_asRoutine = symbol
+    rval.__module__ = 'tensor'

    return rval

@@ -591,6 +631,8 @@ def cast(t, dtype):

 #to be removed as we get the epydoc routine-documenting thing going -JB 20080924
 def _conversion(real_value):
+    __oplist_tag(real_value, 'casting')
+    real_value.__module__='tensor'
    return real_value

 convert_to_int8  = _conversion(elemwise.Elemwise(scal.Identity(scal.specific_out(scal.int8))))
@@ -1299,31 +1341,101 @@ class SetSubtensor(Subtensor):
        x.__setitem__(cdata, y)
        out[0] = x

+class Split(Op):
+    """Partition a `TensorResult` along some axis.

-class MakeVector(Op):
-    """WRITEME"""
-    def __init__(self, stype):
-        self.stype = stype
-    def make_node(self, *inputs):
-        assert all(a.type == self.stype for a in inputs)
-        return Apply(self, inputs, [Tensor(broadcastable = (False,),
-                                           dtype = self.stype.dtype)()])
-    def perform(self, inputs, (out,)):
-        return numpy.asarray([i[0] for i in inputs])
-    def grad(self, inputs, (gout,)):
-        return [None]*len(inputs)
-
-make_lvector = MakeVector(lscalar)
-"""WRITEME"""
+    .. python::
+        
+        x = vector()
+        splits = lvector()
+        # you have to declare right away how many split_points there will be.
+        ra, rb, rc = split(x, axis=0, points=splits, n_splits=3)  
+
+        f = compile([x, splits], [ra, rb, rc])
+
+        a, b, c = f([0,1,2,3,4,5,6], [3, 2, 1])
+
+        #a == [0,1,2]
+        #b == [3, 4]
+        #c == [5]
+
+    """
+
+    len_splits = None
+    """A Split instance will have this many outputs, and require that the splits argument to
+    `perform` have exactly this many elements.
+    """
+
+    def __init__(self, len_splits):
+        self.len_splits = int(len_splits)
+    
+    def make_node(self, x, axis, splits):
+        """WRITEME"""
+        x = as_tensor(x)
+        axis = as_tensor(axis)
+        splits = as_tensor(splits)

-class Concatenate(Op):
+        if splits.type != lvector: 
+            raise TypeError('splits must have type tensor.lvector', splits.type)
+        if axis.type != lscalar: 
+            raise TypeError('axis must have type lscalar', axis.type)
+
+        inputs = [x, axis, splits]
+        outputs = [x.type() for i in xrange(self.len_splits)]
+
+        return Apply(self, inputs, outputs)
+
+
+    def perform(self, node, (x, axis, splits), outputs):
+        """WRITEME"""
+        try:
+            len_along_axis = x.shape[axis]
+        except :
+            raise ValueError('Split.perform() with axis=(%s) is invalid for x.shape==(%s)'
+                    %(axis, x.shape))
+        if len(splits) != self.len_splits:
+            raise ValueError('In Split.perform(), len(splits) != len_splits.', 
+                    (len(splits), self.len_splits))
+         
+        # Checking is done, let's roll the splitting algorithm!
+        # Basically we step along the given axis of x, extracting subtensors of size splits[i]
+        # as we go along.
+
+        general_key = [slice(None, None, None) for s in x.shape]
+        lower_idx = 0
+        for i in xrange(self.len_splits):
+            upper_idx = lower_idx + splits[i]
+            general_key[axis] = slice(lower_idx, upper_idx, None)
+            outputs[i][0] = x.__getitem__(general_key).copy()
+            lower_idx = upper_idx
+
+    def grad(self, (x, axis, splits), g_outputs):
+        """Join the gradients along the axis that was used to split x."""
+        return [join(axis, *g_outputs), None, None]
+
+class Join(Op):
    """
-    Concatenate two L{Tensor}s along the given axis.
-    These L{Tensor}s must have the same shape along all dimensions other than
-    this axis.
+    Concatenate two `TensorResult`s along some axis.
+
+    These tensors must have the same shape along all dimensions other than this axis.
+    Of course, TensorResult instances don't have a shape, so this error can't be caught until
+    runtime.  See `perform()`.
+
+    .. python::
+        
+        x, y, z = tensor.matrix(), tensor.matrix(), tensor.matrix()
+        u = tensor.vector()
+
+        r = join(0, x, y, z)
+        c = join(1, x, y, z)
+        join(2, x, y, z)     # WRONG: the axis has to be an index into the shape
+        join(0, x, y)        # WRONG: tensors have to have the same rank to be joined
    """

    def make_node(self, *axis_and_tensors):
+        """
+        WRITEME
+        """
        axis, tensors = axis_and_tensors[0], axis_and_tensors[1:]
        as_tensor_args= [as_tensor(x) for x in tensors]
        dtypes = [x.type.dtype for x in as_tensor_args]
@@ -1351,18 +1463,36 @@ class Concatenate(Op):
            bcastable[:] = as_tensor_args[0].type.broadcastable
            bcastable[axis] = False

-        inputs = [scal.as_scalar(axis)] + as_tensor_args
+        inputs = [as_tensor(axis)] + as_tensor_args
+        if inputs[0].type != lscalar: 
+            raise TypeError('Axis could not be cast to lscalar', axis)

        outputs = [tensor(dtype = dtypes[0],
                          broadcastable = bcastable)]
        return Apply(self, inputs, outputs)

    def perform(self, node, axis_and_tensors, (out, )):
+        """
+        WRITEME
+        """
        axis, tensors = axis_and_tensors[0], axis_and_tensors[1:]
        out[0] = numpy.concatenate(tensors, axis = axis)

    def grad(self, axis_and_tensors, (gz,)):
-        raise RuntimeError, 'Not working yet'
+        """ The gradient wrt a join op is a `Split`, used to partition the gradient along the
+        `axis` which was used for joining.
+        """
+        axis, tensors = axis_and_tensors[0], axis_and_tensors[1:]
+        if 'float' in tensors[0].dtype or 'complex' in tensors[0].dtype:
+            # assume that this isn't differentiable
+            split = Split(len(tensors))
+            return [None] + split(gz, axis, stack(*[shape(x)[axis] for x in tensors]))
+        else:
+            # assume that this isn't differentiable
+            return [None] * (1 + len(tensors)) 
+
+    def _native_grad(self, axis_and_tensors, (gz,)):
+        """WRITEME"""
        axis, tensors = axis_and_tensors[0], axis_and_tensors[1:]
        sizes_along_axis = [shape(x)[axis] for x in tensors]
        n_dims = len(shape(tensors[0]))
@@ -1375,81 +1505,172 @@ class Concatenate(Op):
                [slice(None)] * (n_dims - axis - 1)] \
                for k in range(len(sizes_along_axis))]

-def concatenate(tensors, axis=0):
+    def vec_length(self, node):
+        assert isinstance(node.owner.op, Join)
+        if node.ndim != 1:
+            raise TypeError('argument must be symbolic vector')
+        inputs = node.owner.inputs
+        axis, tensors = inputs[0], inputs[1]
+        # if v is a vector, axis must be 0
+        # the question is whether all the inputs are broadcastable.
+        if all(i.broadcastable[0] for i in tensors):
+            return len(tensors)
+
+@_redefine_asRoutine(Join())
+def join(axis, *tensors):
    """
    Convenience function to concatenate `Tensor`s along the given axis.
-    The `axis` parameter may either be an integer or an object that can be
-    converted to a scalar using `as_scalar`(`axis`). In the former case,
-    the axis is fixed at construction, while in the latter it may vary over
-    time depending on the value of the `axis` variable.
+
+    :Parameters:
+     - `tensors` : list of tensors (or list-like)
+       A list of tensors to be concatenated along the given axis.
+     - `axis` : int (symbolic or literal)
+       On which dimension should the tensors be joined?  The `axis` must be a valid index into
+       the shape of the tensors to be concatenated.
+       The `axis` parameter may either be an integer or an object that can be converted to a
+       scalar using `as_scalar`(`axis`). In the former case, the axis is fixed at construction,
+       while in the latter it may vary over time depending on the value of the `axis` variable.
+
+    The shapes of the tensors to be concatenated must be all identical, except in the dimension
+    (`axis`) on which they are to be joined.
+
+    """
+
+@constructor
+def leftpad_shape(tensor, n_ones):
+    """Reshape `tensor` by left-padding the shape with `n_ones` 1s"""
+    pattern = ['x']*n_ones + [i for i in range(tensor.type.ndim)]
+    return DimShuffle(tensor.broadcastable, pattern)(tensor)
+
+@constructor
+def stack(*tensors):
+    """Insert the arguments as slices into a tensor of 1 rank greater.
+    EXAMPLE
+    """
+    return join(0, *[leftpad_shape(t, 1) for t in tensors])
+
+@constructor
+def concatenate(tensor_list, axis=0):
+    """Alias for `join`(axis, *tensor_list).
+    
+    This function is similar to `join`, but uses the signature of numpy's concatenate function.
+
+    This function 
+    :Exceptions:
+     - `TypeError` : the tensor_list must be a tuple or list
+
    """
    # Check someone did not make the common mistake to do something like:
    #   c = concatenate(x, y)
    # instead of
    #   c = concatenate((x, y))
-    if not isinstance(tensors, (tuple, list)):
+    if not isinstance(tensor_list, (tuple, list)):
        raise TypeError("The 'tensors' argument must be either a tuple "
                "or a list, make sure you did not forget () or [] around "
                "arguments of concatenate.", tensors)
-    # Ensure we only create one instance of 'Concatenate', to simplify the
-    # merging job.
-    if not hasattr(concatenate, 'obj'):
-        concatenate.obj = Concatenate()
-    return concatenate.obj(axis, *tensors)
+    return join(axis, *tensor_list)

-class VerticalStack(Op):
-    """
-    Vertically stack two L{Tensor}s.
-    Stack two L{Tensor}s along the first axis (row wise). These
-    L{Tensor}s must have the same shape along all dimensions but the
-    first.
+def get_vector_length(v):
+    """Return the run-time length of a symbolic vector.
+
+    :Parameters:
+     - `v` : A rank-1 Tensor result.
+
+    :Exceptions:
+     - `TypeError` : `v` hasn't the proper type.
+     - `ValueError` : No special case applies, the length is not known.
+    
+    In general this is not possible, but for a number of special cases the length can be
+    determined at compile / graph-construction time.  This function implements these special
+    cases.

-    @attention: Because we use vstack as the implementation, if the
-    inputs have 1-dimension, the output will have 2-dimensions.
    """
-    def make_node(self, x, y):
-        x = as_tensor(x)
-        y = as_tensor(y)
-        assert x.type.dtype == y.type.dtype
-        if x.type.broadcastable[1:] != y.type.broadcastable[1:]:
-            raise NotImplementedError
-        inputs = [x, y]
-        bcastable = (False, ) + x.type.broadcastable[1:]
-        outputs = [tensor(dtype = x.type.dtype,
-                          broadcastable = bcastable)]
-        return Apply(self, inputs, outputs)
-    def perform(self, node, (x, y), (out, )):
-        assert x.ndim == y.ndim
-        # Make sure every dimension (save the first) is the same
-        for i in range(x.ndim): assert i == 0 or x.shape[i] == y.shape[i]
-        out[0] = numpy.vstack([x, y])
-    def grad(self, (x, y), (gz,)):
+    if v.ndim != 1:
+        raise TypeError('argument must be symbolic vector')
+    if isinstance(v, gof.Constant) and v.type.ndim == 1:
+        return len(v.data)
+    if v.owner and isinstance(v.owner.op, join):
+        try:
+            return join.vec_length(v)
+        except:
+            pass
+    if v.owner and v.owner.op == shape:
+        return v.owner.inputs[0].type.ndim
+    raise ValueError("length not known")
+
+if 0: #vertical and horizontal stacking are deprecated.  Better to use stack() and join().
+    class VerticalStack(Op):
+        """
+        Vertically stack two L{Tensor}s.
+        Stack two L{Tensor}s along the first axis (row wise). These
+        L{Tensor}s must have the same shape along all dimensions but the
+        first.
+
+        @attention: Because we use vstack as the implementation, if the
+        inputs have 1-dimension, the output will have 2-dimensions.
        """
-        @todo: Make VSplit (or this grad implementation) its own L{Op},
-        that way we can do more sanity-checking::
+        def make_node(self, x, y):
+            x = as_tensor(x)
+            y = as_tensor(y)
+            assert x.type.dtype == y.type.dtype
+            if x.type.broadcastable[1:] != y.type.broadcastable[1:]:
+                raise NotImplementedError
+            inputs = [x, y]
+            bcastable = (False, ) + x.type.broadcastable[1:]
+            outputs = [tensor(dtype = x.type.dtype,
+                              broadcastable = bcastable)]
+            return Apply(self, inputs, outputs)
+        def perform(self, node, (x, y), (out, )):
            assert x.ndim == y.ndim
            # Make sure every dimension (save the first) is the same
-            for i in range(x.data.ndim): assert i == 0 or x.data.shape[i] == y.shape[i]
-            etc...
+            for i in range(x.ndim): assert i == 0 or x.shape[i] == y.shape[i]
+            out[0] = numpy.vstack([x, y])
+        def grad(self, (x, y), (gz,)):
+            """
+            @todo: Make VSplit (or this grad implementation) its own L{Op},
+            that way we can do more sanity-checking::
+                assert x.ndim == y.ndim
+                # Make sure every dimension (save the first) is the same
+                for i in range(x.data.ndim): assert i == 0 or x.data.shape[i] == y.shape[i]
+                etc...
+            """
+            xs = shape(x)
+            ys = shape(y)
+            return gz[:xs[0]], gz[xs[0]:]
+    vertical_stack = VerticalStack()
+
+    def horizontal_stack(x, y):
        """
-        xs = shape(x)
-        ys = shape(y)
-        return gz[:xs[0]], gz[xs[0]:]
-vertical_stack = VerticalStack()
+        Horizontally stack two L{Tensor}s.
+        Stack two L{Tensor}s along the second axis (column wise). These
+        L{Tensor}s must have the same shape along all dimensions but the
+        second.

-def horizontal_stack(x, y):
-    """
-    Horizontally stack two L{Tensor}s.
-    Stack two L{Tensor}s along the second axis (column wise). These
-    L{Tensor}s must have the same shape along all dimensions but the
-    second.
+        @note: Unlike VerticalStack, we assume that the L{Tensor}s have
+        two dimensions.
+        """
+        assert x.type.ndim == 2
+        assert y.type.ndim == 2
+        return transpose(vertical_stack(x.T, y.T))
+    class MakeVector(Op):
+        """WRITEME"""
+        def __init__(self, stype):
+            self.stype = stype
+        def make_node(self, *inputs):
+            inputs = map(as_tensor, inputs)
+            assert all(a.type == self.stype for a in inputs)
+            return Apply(self, inputs, [Tensor(broadcastable = (False,),
+                                               dtype = self.stype.dtype)()])
+        def perform(self, node, inputs, (out,)):
+            out[0] = numpy.asarray(inputs)
+        def grad(self, inputs, (gout,)):
+            return [None]*len(inputs)
+
+    make_lvector = MakeVector(lscalar)
+    """WRITEME"""

-    @note: Unlike VerticalStack, we assume that the L{Tensor}s have
-    two dimensions.
-    """
-    assert x.type.ndim == 2
-    assert y.type.ndim == 2
-    return transpose(vertical_stack(x.T, y.T))
+else:
+    pass


 #########################
@@ -1820,3 +2041,146 @@ def grad(cost, wrt, g_cost=None):
    else:
        return gmap.get(wrt, zero(wrt))

+class numeric_grad:
+    """WRITEME"""
+    def __init__(self, f, pt, eps=1.0e-7):
+        """Return the gradient of f at pt.
+        
+        This function computes the gradient by a one-sided finite differences of a
+        fixed step size (eps).
+        
+        It is assumed that f(...) will return a scalar.
+        It is assumed that all f's inputs are numpy.ndarray objects.
+        """
+
+        def prod(inputs):
+            rval = 1
+            for i in inputs:
+                rval *= i
+            return rval
+
+        packed_pt = False
+        if not isinstance(pt, (list, tuple)):
+            pt = [pt]
+            packed_pt = True
+
+        apt = [numpy.array(p) for p in pt]
+
+        shapes = [p.shape for p in apt]
+        dtypes = [str(p.dtype) for p in apt]
+
+        if not dtypes == [dtypes[0]] * len(apt):
+            raise TypeError('All function arguments must have same dtype')
+
+        total_size = __builtin__.sum(prod(sh) for sh in shapes)
+
+        #create un-initialized memory
+        x = numpy.ndarray((total_size,), dtype=dtypes[0])
+        gx = numpy.ndarray((total_size,), dtype=dtypes[0])
+
+        #set up aliases so that apt[i] is backed by memory in x
+        # and self.gf is backed by memory in gx
+        cur_pos = 0
+        self.gf = []
+        for i,p in enumerate(apt):
+            p_size = prod(p.shape)
+            # set up alias
+            apt[i] = x[cur_pos:cur_pos+p_size].reshape(p.shape)
+            self.gf.append(gx[cur_pos:cur_pos+p_size].reshape(p.shape))
+            # initialize with p's value
+            apt[i][:] = p
+            cur_pos += p_size
+
+        f_x = f(*[p.copy() for p in apt])
+
+        # now iterate over the elements of x, and call f on apt.
+        x_copy = x.copy()
+        for i in xrange(total_size):
+            x[:] = x_copy
+
+            x[i] += eps
+            f_eps = f(*apt)
+            gx[i] = numpy.asarray((f_eps - f_x)/eps)
+
+        if packed_pt:
+            self.gf = self.gf[0]
+
+    @staticmethod
+    def abs_rel_err(a,b,eps=1.0e-10):
+        """Return a small number when a and b are close, relative to how big they are"""
+        return abs(a-b) / (abs(a)+abs(b)+eps)
+
+    def max_err(self, g_pt):
+        """Return the biggest relative error between g_pt and self.gf"""
+        assert len(g_pt) == len(self.gf)
+        errs = []
+        for a, b in zip(g_pt, self.gf):
+            errs.append(numpy.max(numeric_grad.abs_rel_err(a,b)))
+        return numpy.max(errs)
+
+def verify_grad(testcase, op, pt, n_tests=1, rng=numpy.random, eps=1.0e-7, tol=0.0001,
+        linker='c&py'):
+    """ WRITEME
+    
+    testcase.failUnless(analytic gradient matches finite-diff gradient)
+    
+    """
+    pt = [numpy.array(p) for p in pt]
+
+    #print "PT", pt
+
+    def function(inputs, output):
+        return compile.function(inputs, output, 
+                mode=compile.Mode(optimizer = None, linker = linker), 
+                accept_inplace=True)
+
+    for test_num in xrange(n_tests):
+        tensor_pt = [value(p.copy(), name='input %i'%i) for i,p in enumerate(pt)]
+        
+        #op can be either a function or an actual Op instance
+        #print "OP", op
+        #print "TENSOR PT", tensor_pt
+        o_output = op(*tensor_pt) 
+
+        if isinstance(o_output,list) > 1:
+            raise NotImplementedError('cant (yet) autotest gradient of op with multiple outputs')
+            # we could make loop over outputs making random projections R for each,
+            # but this doesn't handle the case where not all the outputs are
+            # differentiable... so I leave this as TODO for now -JB.
+        o_fn = function(tensor_pt, o_output)
+        #print "PT B", pt
+        o_fn_out = o_fn(*[p.copy() for p in pt])
+        #print "PT C", pt
+        random_projection = rng.rand(*o_fn_out.shape)
+        t_r = as_tensor(random_projection)
+
+        #random projection of o onto t_r
+        cost = sum(t_r * o_output)
+        cost_fn = function(tensor_pt, cost)
+
+        num_grad = numeric_grad(cost_fn, [p.copy() for p in pt], eps)
+
+        symbolic_grad = grad(cost, tensor_pt,as_tensor(1.0,name='g_cost'))
+
+        if 0:
+            print '----------'
+            for op in gof.graph.io_toposort(tensor_pt, symbolic_grad):
+                print op
+
+        grad_fn = function(tensor_pt, symbolic_grad)
+
+        #print "PT D", pt
+        analytic_grad = grad_fn(*pt)
+        
+        #print "PT Z", pt
+        if not isinstance(analytic_grad, (list, tuple)):
+            analytic_grad = [analytic_grad]
+
+        max_err = num_grad.max_err(analytic_grad)
+        if  max_err > tol:
+            #print 'analytic grad', analytic_grad
+            #print 'numeric grad', num_grad.gf
+            raise Exception(verify_grad.E_grad, (max_err, tol))
+verify_grad.E_grad = 'gradient error exceeded tolerance'
+"""This error is raised when a gradient is calculated, but incorrect."""
+
--- a/tensor_opt.py
+++ b/tensor_opt.py
@@ -7,6 +7,8 @@ import tensor as T
 import numpy as N
 import operator
 import itertools
+import sys
+import compile  #to register the optimizer built by this file


 # Utilities
@@ -31,17 +33,17 @@ gemm_pattern_1 = gof.PatternSub((T._sub_inplace,

 # gemm: (d,a,b,c,s) -> d = d*s + a*dot(b,c)
 # Transforms dot(a, b) into gemm(zeros(2)(hstack(shape(a)[:1], shape(b)[1:])), 1.0, a, b, 1.0)
+# The construction of the 'gemm' node may fail if, for example, a and b are not both matrices.
 dot_to_gemm = gof.PatternSub((T.dot, 'a', 'b'),
                             (T.gemm, (T.Zeros(2),
-                                       (T.vertical_stack,
+                                       (T.stack,
                                        (T.Subtensor([slice(0, 1)]), (T.shape, 'a')),
                                        (T.Subtensor([slice(1, 2)]), (T.shape, 'b')))),
                              T.constant(1.0), 'a', 'b', T.constant(1.0)),
                             allow_multiple_clients = False)


-@gof.optimizer
-def insert_inplace_optimizer(self, env):
+def _insert_inplace_optimizer(env):
    """
    Usage: inplace_optimizer.optimize(env)
    
@@ -66,14 +68,16 @@ def insert_inplace_optimizer(self, env):
            for candidate_input in candidate_inputs:
                inplace_pattern = dict(baseline, **{candidate_output: candidate_input})
                try:
-                    new = Elemwise(op.scalar_op, inplace_pattern).make_node(op.inputs)
-                    env.replace_all_validate(dict(zip(node.outputs, new.outputs)))
-                except:
+                    new = Elemwise(op.scalar_op, inplace_pattern).make_node(*node.inputs)
+                    env.replace_all_validate(zip(node.outputs, new.outputs))
+                except Exception, e:
                    continue
                candidate_inputs.remove(candidate_input)
                node = new
                baseline = inplace_pattern
                break
+insert_inplace_optimizer = gof.optimizer(_insert_inplace_optimizer)
+

 inplace_optimizer = gof.SeqOptimizer(out2in(gemm_pattern_1),
                                     out2in(dot_to_gemm),
@@ -229,7 +233,15 @@ def local_subtensor_make_vector(node):

    If the index or slice is constant.
    """
-    if not opt.check_chain(node, T.Subtensor, T.MakeVector):
+    if not opt.check_chain(node, T.Subtensor, T.Join):
+        return False
+    
+    joined_r = node.inputs[0]
+
+    try: 
+        #check that join is being used to join scalars
+        veclen = T.join.vec_length(joined_r)
+    except:
        return False

    idxlist = node.op.idx_list
@@ -642,6 +654,16 @@ def _math_optimizer():

 math_optimizer = _math_optimizer()

+compile.register_optimizer('math', 
+        gof.MergeOptMerge(
+            gof.PureThenInplaceOptimizer(
+                math_optimizer,
+                inplace_optimizer)))
+
+
+compile.register_mode('SANITY_CHECK', compile.Mode('c&py', 'math'))
+compile.register_mode('FAST_RUN', compile.Mode('c|py', 'math'))
+compile.register_mode('EXPENSIVE_OPTIMIZATIONS', compile.Mode('c|py', 'math'))


 # @gof.local_optimizer

--- a/tensor_random.py
+++ b/tensor_random.py
@@ -4,146 +4,153 @@ import tensor
 import numpy
 import functools

-class RandomState(object):
-    """The Theano version of numpy.RandomState
+from compile import SymbolicInputKit, SymbolicInput
+from copy import copy

-    This class generates a sequence of L{Op} instances via the gen() and
-    gen_like() methods.
+class RandomFunction(gof.Op):

-    @ivar seed: an integer which determines the initial state of the L{Op}
-    instances returned by gen(), gen_like()
-    @type seed: int
+    def __init__(self, fn, outtype, *args, **kwargs):
+        """
+        fn: a random function with the same signature as functions in numpy.random.RandomState
+        outtype: the type of the output
+        args: a list of default arguments for the function
+        kwargs: if the 'inplace' key is there, its value will be used to determine if the op operates inplace or not
+        """
+        self.fn = fn
+        self.outtype = outtype
+        self.args = tuple(tensor.as_tensor(arg) for arg in args)
+        self.inplace = kwargs.pop('inplace', False)
+        if self.inplace:
+            self.destroy_map = {0: [0]}
+
+    def make_node(self, r, shape, *args):
+        """
+        in: r -> RandomState (gof.generic),
+            shape -> lvector
+            args -> the arguments expected by the numpy function
+        out: r2 -> the new RandomState (gof.generic)
+             out -> the random numbers we generated
+        """
+        args = map(tensor.as_tensor, args)
+        shape = tensor.as_tensor(shape)
+        assert shape.type == tensor.lvector
+        assert len(args) <= len(self.args)
+        args += (None,) * (len(self.args) - len(args))
+        inputs = []
+        for arg, default in zip(args, self.args):
+            assert arg is None or default.type.dtype == arg.type.dtype
+            input = default if arg is None else arg
+            inputs.append(input)
+        return gof.Apply(self,
+                         [r, shape] + inputs,
+                         [r.type(), self.outtype()])
+
+    def perform(self, node, inputs, (rout, out)):
+        r, shape, args = inputs[0], inputs[1], inputs[2:]
+        assert self.outtype.ndim == len(shape)
+        if not self.inplace:
+            r = copy(r)
+        rout[0] = r
+        out[0] = self.fn(r, *(args + [shape]))
+
+    def __eq__(self, other):
+        return type(self) == type(other) \
+            and self.fn == other.fn\
+            and self.outtype == other.outtype\
+            and self.args == other.args\
+            and self.inplace == other.inplace
+
+    def __hash__(self):
+        return hash(self.fn) ^ hash(self.outtype) ^ hash(self.args) ^ hash(self.inplace)
+
+
+def random_function(fn, dtype, *rfargs, **rfkwargs):
    """
+    Returns a wrapper around RandomFunction which automatically infers the number 
+    of dimensions of the output from the given shape. If the shape cannot be inferred,
+    the user can give an integer as first argument, which will be interpreted as the 
+    number of dimensions.
+
+    The number of dimensions for the following shape arguments can be inferred:
+    - shape(x)
+    - make_lvector(x, y, z, ...)
+    - constants
+    """
+    def f(ndim, *args, **kwargs):
+        if isinstance(ndim, int):
+            r, shape, args = args[0], args[1], args[2:]
+        else:
+            r, shape, args = ndim, args[0], args[1:]
+            shape = tensor.as_tensor(shape)
+            ndim = tensor.get_vector_length(shape)
+            if ndim is None:
+                raise ValueError('Cannot infer the number of dimensions from the shape argument.')
+        # note: rf should probably be cached for future use
+        rf = RandomFunction(fn, tensor.Tensor(dtype = dtype, broadcastable = (False,)*ndim), *rfargs, **rfkwargs)
+        return rf(r, shape, *args, **kwargs)
+    return f

-    def __init__(self, seed):
-        self.seed = seed

-    def gen(self, dist, shape=(), ndim=None):
-        """
-        @param dist: identifier of a sampling distribution. See L{_fn_from_dist}.
-        @param shape: tuple
+RS = numpy.random.RandomState

-        @return: A tensor of random numbers, with given shape.
-        @rtype: L{Result} (output of L{Apply} of L{NumpyGenerator} instance)
-        """
-        self.seed += 1
-        fn = RandomState._fn_from_dist(dist)
-        if isinstance(shape, tuple):
-            return NumpyGenerator(self.seed-1, len(shape),fn) (shape)
-        return NumpyGenerator(self.seed - 1, ndim, fn)(shape)
+# we need to provide defaults for all the functions in order to infer the argument types...
+uniform = random_function(RS.uniform, 'float64', 0.0, 1.0)
+binomial = random_function(RS.binomial, 'int64', 1, 0.5)
+normal = random_function(RS.normal, 'float64', 0.0, 1.0)
+random_integers = random_function(RS.random_integers, 'int64', 0, 1)

-    def gen_like(self, dist, x):
-        """
-        @param dist: identifier of a sampling distribution. See L{_fn_from_dist}.
-        @param x: L{Result} of type L{Tensor}

-        @return: A tensor of random numbers, with the same shape as x.
-        @rtype: L{Result} (output of L{Apply} of L{NumpyGenerator} instance)
-        """
-        self.seed += 1
-        fn = RandomState._fn_from_dist(dist)
-        return NumpyGenerator(self.seed-1, x.type.ndim, fn)(tensor.shape(x))
+@gof.local_optimizer
+def random_make_inplace(node):
+    op = node.op
+    if isinstance(op, RandomFunction) and not op.inplace:
+        return RandomFunction(op.fn, op.outtype, *op.args, **dict(inplace=True)).make_node(*node.inputs).outputs

-    def uniform_like(self, template, low=0.,high=1.):
-        """
-        Return a multivariate uniform(low,high)
-        random variable in a tensor of the same shape as template
-        (template can either be a tensor or a shape tuple). Each element of the
-        resulting tensor is sampled independently. low and high can
-        be scalars or have the same shape as the template (or broadcastable
-        to it).
-        """
-        return self.gen_like(('uniform',{'low':low,'high':high}),template)

-    def binomial_like(self, template, n=1, p=0.5):
-        """
-        Return a multivariate binomial(n,p) random variable in a tensor of the same shape as template
-        (template can either be a tensor or a shape tuple). Each element of the
-        resulting tensor is sampled independently. low and high can
-        be scalars or have the same shape as the template (or broadcastable
-        to it).
-        """
-        return self.gen_like(('binomial',{'n':n,'p':p}),template)
+import sys
+from functools import partial
+from collections import deque

-    @staticmethod
-    def _fn_from_dist(dist, cache={}):
-        """Return a function from a distribution description
+class RandomKit(SymbolicInputKit):

-        @param dist: identifier of a sampling distribution.
-        @type dist: callable or str or tuple(str, dict)
+    def __init__(self, name, value = None):
+        super(RandomKit, self).__init__(name)
+        self.value = value

-        @param cache: The optional cache argument implements a closure, which ensures that
-        multiple requests for the same sampling function will get the same
-        sampling function. L{NumpyGenerator}.__hash__ depends on this.
+    def gen(self, op, *args, **kwargs):
+        r = gof.generic()
+        new_r, out = op(r, *args, **kwargs)
+        self.add_input(SymbolicInput(r, update = new_r))
+        out.rng = r
+        out.auto = self
+        return out

-        @type cache: dict
-        """
-        if callable(dist):
-            return dist
-        if isinstance(dist, str):
-            return getattr(numpy.random.RandomState, dist)
-
-        name, kwargs = dist
-        key = (name, tuple(kwargs.items()))
-        if key not in cache:
-            fn = getattr(numpy.random.RandomState, name)
-            fn = functools.partial(fn, **kwargs)
-            cache[key] = fn
-        return cache[key]
-
-
-class NumpyGenerator(gof.op.Op):
-    """Supply a sequence of random tensors of a given shape, from a given
-    distribution.
-
-    @param seed: initial state for instances of this L{Op}.
-    @type seed: anything that numpy.random.RandomState accepts.
-    @param ndim: the rank of random tensors produced by this op.
-    @type ndim: non-negative integer
-    @param fn: a sampling function
-    @type fn: a callable that can reply to fn(numpy.RandomState(), size=<tuple>)
-    """
-    destroy_map = {0: [0]}
+    def distribute(self, value, indices, containers):
+        rg = partial(numpy.random.RandomState(value).randint, sys.maxint)
+        elems = deque(zip(indices, containers))
+        i = 0
+        while elems:
+            index, container = elems.popleft()
+            while i <= index:
+                curr = rg()
+                i += 1
+            rs = numpy.random.RandomState(int(curr))
+            container.data = rs

-    def __init__(self, seed, ndim, fn, **kwargs):
-        gof.op.Op.__init__(self, **kwargs)
-        self.seed = seed
-        self.ndim = ndim
-        self.fn = fn
-        assert numpy.random.RandomState(seed) #test the seed
-        assert 'int' in str(type(ndim))
-        assert callable(self.fn)
+    def binomial(self, *args, **kwargs):
+        return self.gen(binomial, *args, **kwargs)
+
+    def uniform(self, *args, **kwargs):
+        return self.gen(uniform, *args, **kwargs)
+
+    def normal(self, *args, **kwargs):
+        return self.gen(normal, *args, **kwargs)
+
+    def random_integers(self, *args, **kwargs):
+        return self.gen(random_integers, *args, **kwargs)
+
+
+
+rk = RandomKit('rk', 0xBAD5EED)

-    def __eq__(self, other):
-        return (type(self) is type(other))\
-                and self.__class__ is NumpyGenerator \
-                and self.seed == other.seed \
-                and self.ndim == other.ndim \
-                and self.fn == other.fn
-    def __hash__(self):
-        return self.seed ^ self.ndim ^ hash(self.fn)
-
-    def make_node(self, _shape):
-        #TODO: check for constant shape, and guess the broadcastable bits
-        shape = tensor.convert_to_int64(_shape)
-        if shape.type.ndim != 1:
-            raise TypeError('shape argument was not converted to 1-d tensor', _shape)
-
-        # we generate one random number with the distribution to determine what dtype to expect
-        output_dtype = str(self.fn(numpy.random.RandomState(18), size=(1,)).dtype)
-
-        inputs = [gof.Value(gof.type.generic, numpy.random.RandomState(self.seed)), shape]
-        outputs = [tensor.Tensor(dtype=output_dtype, broadcastable = [False]*self.ndim).make_result()]
-        return gof.Apply(op = self, inputs = inputs, outputs = outputs)
-
-    def grad(self, inputs, grad_outputs):
-        return [None, None]
-
-    def perform(self, node, input_storage, output_storage):
-        rng = input_storage[0]
-        shape = input_storage[1]
-        if self.ndim != len(shape):
-            raise ValueError('shape argument %s had the wrong length (!=%i)' %
-                    (shape, self.ndim) )
-        output_storage[0][0] = self.fn(rng, size=shape)