Theano 'Module' is a structure which implements what could be called a
"theano class". A ``Module`` can contain ``Members``, which act like
instance variables ("state"). It can also contain an arbitrary number
of ``Methods``, which are functions that share the same ``Members`` in
addition to their own inputs. Last but not least, ``Modules`` can be
nested (explanations and examples follow). ``Module`` is meant to:
#. ease the sharing of variables between several Theano functions,
#. streamline automatic naming, and
#. allow a hierarchy of "modules" whose states can interact.
import
======
All examples suppose that you have done those import:
.. code-block:: python
#!/usr/bin/env python
import theano
import numpy as N
from theano import tensor as T
from theano.tensor import nnet as NN
from theano.compile import module as M
Module
======
A ``Module`` can contain ``Members``, ``Methods`` and inner ``Modules``. Each type has a special meaning.
.. code-block:: python
module = M.Module()
``Member``
------------
Usage:
.. code-block:: python
#module.state = variable
module.state = T.scalar()
A ``Member`` represents a state variable (i.e., whose value remains after a ``Method`` is called). It will be named automatically after that field and it will be an implicit input of all ``Methods`` of the ``Module``. Its storage (i.e. where the value is stored) will be shared by all ``Methods`` of the ``Module``.
A ``Variable`` which is the variable of a previous computation (by opposition to being ``updated``) is not a ``Member``. Internally this is called an External. You should not need to care about this.
For sharing state between modules, see ``Inner Module`` section.
Each key in the updates dictionary must be the name of an existing ``Member`` of the ``Module`` and the value associated to that key is the update expression for the state. When called on a ``ModuleInstance`` produced by the ``Module``, the method will calculate the outputs from the inputs and will update all the states as specified by the update expressions. See the basic example below.
Inner Module
------------
To share a ``Member`` between modules, the modules must be linked through the inner module mechanism.
Usage:
.. code-block:: python
module2.submodule = module
``ModuleInstance``
====================
A ``Module`` can produce a ``ModuleInstance`` with its ``make`` method. Think of this as a class and an object in C++/Java. If an attribute was a ``Member``, it will become a read/write access to actual data for the state. If it was a ``M.Method``, a function will be compiled with the proper signature and semantics.
'''make''' compiles all ``Methods`` and allocates storage for all ``Members`` into a ``ModuleInstance`` object, which is returned. The ``init`` dictionary can be used to provide initial values for the members.
.. code-block:: python
def resolve(self, symbol, filter = None)
Resolves a symbol in this module. The symbol can be a string or a ``Variable``. If the string contains dots (eg ``"x.y"``), the module will resolve the symbol hierarchically in its inner modules. The filter argument is None or a class and it can be used to restrict the search to ``Member`` or ``Method`` instances for example.
.. code-block:: python
def _instance_initialize(self, inst, **init)
The inst argument is a ``ModuleInstance``. For each key, value pair in init: ``setattr(inst, key, value)`` is called. This can be easily overriden by ``Module`` subclasses to initialize an instance in different ways. If you don't know what to put their, don't put it and it will execute a default version. If you want to call the parent version call: ``M.default_initialize(inst,**init)``
Basic example
=============
The problem here is to create two functions, ``inc`` and ``dec`` and a shared state ``c`` such that ``inc(n)`` increases ``c`` by ``n`` and ``dec(n)`` decreases ``c`` by ``n``. We also want a third function, ``plus10``, which return 10 + the current state without changing the current state. Using the function interface, the feature can be implemented as follows:
.. code-block:: python
n, c = T.scalars('nc')
inc = theano.function([n, ((c, c + n), 0)], [])
dec = theano.function([n, ((c, c - n), inc.container[c])], []) # we need to pass inc's container in order to share
plus10 = theano.function([(c, inc.container[c])], c + 10)
m.plus10 = M.Method([], m.c + 10) # m.c is always accessible since it is a member of this mlass
inst = m.make(c = 0) # here, we make an "instance" of the module with c initialized to 0
assert inst.c == 0
inst.inc(2)
assert inst.c == 2
inst.dec(3)
assert inst.c == -1
assert inst.plus10() == 9
Benefits of ``Module`` over ``function`` in this example:
* There is no need to manipulate the containers directly
* The fact inc and dec share a state is more obvious syntactically.
* ``Method`` does not require the states to be anywhere in the input list.
* The interface of the instance produced by ``m.make()`` is simple and coherent, extremely similar to that of a normal python object. It is directly usable by any user.
Nesting example
===============
The problem now is to create two pairs of ``inc dec`` functions and a function ``sum`` that adds the shared states of the first and second pair.
Using function:
.. code-block:: python
def make_incdec_function():
n, c = T.scalars('nc')
inc = theano.function([n, ((c, c + n), 0)], [])
dec = theano.function([n, ((c, c - n), inc.container[c])], [])#inc and dec share the same state.
return inc,dec
inc1, dec1 = make_incdec_function()
inc2, dec2 = make_incdec_function()
a, b = T.scalars('ab')
sum = theano.function([(a, inc1.container['c']), (b, inc2.container['c'])], a + b)
assert inst.incdec1.c == -2 and inst.incdec2.c == 6
assert inst.sum() == 4 # -2 + 6
Here, we make a new ``Module`` and we give it two inner ``Modules`` like
the one defined in the basic example. Each inner module has methods inc
and dec as well as a state c and their state is directly accessible from
the outer module, which means that it can define methods using them. The
instance (inst) we make from the ``Module`` (m) reflects the hierarchy
that we created. Unlike the method using function, there is no need to
manipulate any containers directly.
Advanced example
================
Complex models can be implemented by subclassing ``Module`` (though that is not mandatory). Here is a complete, extensible (and working) regression model implemented using this system:
data_y = [ [int(x)] for x in N.random.randn(4) > 0]
model = SoftmaxXERegression(regularize = False).make(input_size = 10,
target_size = 1,
stepsize = 0.1)
for i in xrange(1000):
xe = model.update(data_x, data_y)
if i % 100 == 0:
print i, xe
pass
#for inputs, targets in my_training_set():
#print "cost:", model.update(inputs, targets)
print "final weights:", model.w
print "final biases:", model.b
Extending ``Methods``
=======================
``Methods`` can be extended to update more parameters. For example, if we wanted to add a variable holding the sum of all costs encountered so far to ``SoftmaxXERegression``, we could proceed like this:
- resize them as necessary (matrices are generally resizable in theano runtime. The only exception is when a vector of integers is used as the new *shape* of some tensor. Since the rank of a tensor is fixed, that vector of integers must have a fixed length.)
Python expressions provide the nice syntax for sticking Ops together as if they were more like PLearn Variables.
But all the links that you have in PLearn modules are still present with Ops, and they can be manipulated explicitly. Theano's optimizations do this. They look at small sections of the graph (so you can handle the case where your inputs come from certain kinds of Ops differently) and re-wire the graph.
So that's the Op for you. I think that if you only ever wanted to compile one theano function per program execution, the Ops would be enough, and there would be no need for Modules. The Modules exist to make it syntactically easier for multiple theano compiled functions to collaborate by sharing physical variables.
When you compile a Method in a Module, it *automatically* gets access to all of the Module's Member [symbolic] variables, and when that Method is compiled to a function, then the function gets access to all of the ModuleInstance's [physical] variables.
So I don't think that there is an analog of Modules in PLearn, because PLearn doesn't make a distinction between the symbolic and instantiated functions.
Modules can also be arranged into graphs. Modules can contain Modules. The meaning of this though is not computational... it's still about sharing variables. Variables are shared throughout the entire Module graph: when you add a DAA to a parent Module for example, the parent Module's Methods gain access to the variables in the DAA. This makes it possible to identify each Module with one particular USE of member variables. Complex behaviour can be built up by adding a few Modules that each do something (bring new skills??) with common variables.
For example, you could have one Module that has weight and bias members, and comes with some [symbolic] methods for using these members to compute the hidden-unit activation of a single layer, given some input.
Then, you could add a second module that handles pre-training by RBM. You would tell the second Module to use the weight matrix and bias of the first Module, and it would allocate its own member for the visible-unit bias. This module would come with [symbolic] methods for doing CD.
Then, you could add a third module that handles pre-training by DAA! You would tell this module about the original weights (maybe the bias too, maybe not) and it would allocate the downward weights, and yet another visible-bias, and would add [symbolic] methods for pre-training that way.
When you compile the parent module, all of the methods associated with each algorithm will be compiled so that they update the same weight matrix that the first module uses to compute it's hidden-unit activation.
This ability for Modules to bring algorithms to work on existing member variables is what lets the minimization algorithm be separated from the layer and regression Modules. There is thus a stochastic gradient descent module, that has one little learning-rate Member, and basically just works with the parameter members of other modules.
The module system would make it possible to include several minimization algorithms at once too, so you could start off by using SGD, and then maybe later you could switch to conjugate gradient for really fine tuning.
So, Ops are for computations and Modules are for sharing variables between Theano functions.