提交 ec316ab7 authored 作者: Joseph Turian's avatar Joseph Turian

Added module documentation

上级 b644298b
......@@ -206,6 +206,9 @@ Glossary of terminology
WRITEME
Module
See :ref:`Module`.
Op
a type of operation. Instance is TOI
......
......@@ -16,6 +16,8 @@ developer documentation.
- `Extending Theano` introduces how Theano works and explains how to add new
data and expression types, as well as optimizations to accompany them.
- `Module`
- `Hacking Theano` introduces you to what's under the hood: the compilation
process, the Env, C code generation.
......
.. _module:
######
Module
######
What is a Theano Module
=======================
Theano 'Module' is a structure which implements what could be called a
"theano class". A ``Module`` can contain ``Members``, which act like
instance variables ("state"). It can also contain an arbitrary number
of ``Methods``, which are functions that share the same ``Members`` in
addition to their own inputs. Last but not least, ``Modules`` can be
nested (explanations and examples follow). ``Module`` is meant to:
#. ease the sharing of parameters between several functions,
#. streamline automatic naming, and
#. allow a hierarchy of "modules" whose states can interact.
import
======
all example suppose that you have done those import
.. code-block:: python
#!/usr/bin/env python
import theano
import numpy as N
from theano import tensor as T
from theano.tensor import nnet as NN
from theano.compile import module as M
Module
======
A ``Module`` can contain ``Members``, ``Methods`` and inner ``Modules``. Each type has a special meaning.
.. code-block:: python
module = M.Module()
``Member``
------------
Usage:
.. code-block:: python
#module.state = M.Member(result)
module.state = M.Member(T.scalar())
A ``Member`` wraps a ``Result`` and represents a state variable. If one field of a ``Module`` is set with a ``Member``, it will be named automatically after that field and it will be an implicit input of all ``Methods`` of the ``Module``. Its storage will be shared by all ``Methods`` of the ``Module``.
A ``Member`` cannot wrap a ``Result`` which is the result of a previous computation. [What does this mean?][Fred:Still true?]
**NOTE:** after the state is declared, ``module.state`` will yield the ``result``, '''not''' the ``Member``. This is so it can be used directly in theano expressions. [What does this mean? What confusion does this clear up?] Basically:
.. code-block:: python
member = M.Member(result)
module.state = member
assert module.state is result # NOT member
**NOTE2:** this can also lead to some subtle bug as to share a member between module, you should do as this:
.. code-block:: python
module2 = M.Module()
module2.m1_state = M.Member(module.state)
#wrong: module2.m1_state = module.state as module2.m1_state won't be a member of module2...
see later section for more information.
``Method``
------------
Usage:
.. code-block:: python
module.method = M.Method(inputs, outputs, **updates)
Each key in the updates dictionary must be the name of an existing ``Member`` of the ``Module`` (or a ``Result`` that was declared to be a member of the module) and the value associated to that key is the update to the state. When called on a ``ModuleInstance`` produced by the ``Module``, the method will calculate the outputs from the inputs and will update all the states as specified. See the basic example for an example.
Inner Module
------------
To share a member between modules, the modules must be linked by inner module.
Usage:
.. code-block:: python
module2.submodule = module
``ModuleInstance``
====================
A ``Module`` can produce a ``ModuleInstance`` with its ``make`` method. Think of this as a class and an object in C++/Java. If an attribute was a ``Member``, it will become a read/write access to actual data for the state. If it was a ``M.Method``, a function will be compiled with the proper signature and semantics.
Module Interface
================
.. code-block:: python
def make(self, mode = {'FAST_COMPILE', 'FAST_RUN', ... }, **init)
'''make''' compiles all ``Methods`` and allocates storage for all ``Members`` into a ``ModuleInstance`` object, which is returned. The ``init`` dictionary can be used to provide initial values for the members.
'''make''' calls ``initialize_storage``[Fred: still true???] to allocate storage and ``_instance_initialize`` to initialize the instance.
.. code-block:: python
def resolve(self, symbol, filter = None)
Resolves a symbol in this module. The symbol can be a string or a ``Result``. If the string contains dots (eg ``"x.y"``), the module will resolve the symbol hierarchically in its inner modules. The filter argument is None or a class and it can be used to restrict the search to ``Member`` or ``Method`` instances for example.
.. code-block:: python
def initialize_storage(self, stor)
This allocates a ``Container`` for each member (and hierarchically, for the members of each inner module). This can be easily overriden by ``Module`` subclasses to share storage between some states.[Fred: still usefull?]
.. code-block:: python
def _instance_initialize(self, inst, **init)
The inst argument is a ``ModuleInstance``. For each key, value pair in init: s``etattr(inst, key, value)``. This can be easily overriden by ``Module`` subclasses to initialize an instance in different ways. If you don't know what to put their, you probably want:
.. code-block:: python
def _instance_initialize(self, inst, **init):
M.default_initialize(inst,**init)
Basic example
=============
The problem here is to create two functions, ``inc`` and ``dec`` and a shared state ``c`` such that ``inc(n)`` increases ``c`` by ``n`` and ``dec(n)`` decreases ``c`` by ``n``. We also want a third function, ``plus10``, which adds 10 to the current state. Using the function interface, the feature can be implemented as follows:
.. code-block:: python
n, c = T.scalars('nc')
inc = theano.function([n, ((c, c + n), 0)], [])
dec = theano.function([n, ((c, c - n), inc.container[c])], []) # we need to pass inc's container in order to share
plus10 = theano.function([(c, inc.container[c])], c + 10)
assert inc[c] == 0
inc(2)
assert inc[c] == 2 and dec[c] == inc[c]
dec(3)
assert inc[c] == -1 and dec[c] == inc[c]
assert plus10() == 9
Now, using ``Module``:
.. code-block:: python
m = M.Module()
n = T.scalar('n')
m.c = M.Member(T.scalar()) # state variables must be wrapped with ModuleMember
m.inc = M.Method(n, [], c = m.c + n) # m.c <= m.c + n
m.dec = M.Method(n, [], c = m.c - n) # k.c <= k.c - n
m.dec = M.Method(n, [], updates = {m.c: m.c - n})
#m.dec = M.Method(n, [], updates = {c: m.c - n})#global c don't exist
#m.dec = M.Method(n, [], m.c = m.c - n) #python don't suppor this syntax
#m.plus10 don't update the state
m.plus10 = M.Method([], m.c + 10) # m.c is always accessible since it is a member of this mlass
inst = m.make(c = 0) # here, we make an "instance" of the module with c initialized to 0
assert inst.c == 0
inst.inc(2)
assert inst.c == 2
inst.dec(3)
assert inst.c == -1
assert inst.plus10() == 9
Benefits of ``Module`` over ``function`` in this example:
* There is no need to manipulate the containers directly
* The fact inc and dec share a state is more obvious syntactically.
* ``Method`` does not require the states to be anywhere in the input list.
* The interface of the instance produced by ``m.make()`` is simple and coherent, extremely similar to that of a normal python object. It is directly usable by any user.
Nesting example
===============
The problem now is to create two pairs of ``inc dec`` functions and a function s``um`` that adds the shared states of the first and second pair.
Using function:
.. code-block:: python
def make_incdec_function():
n, c = T.scalars('nc')
inc = theano.function([n, ((c, c + n), 0)], [])
dec = theano.function([n, ((c, c - n), inc.container[c])], [])
return inc,dec
inc1, dec1 = make_incdec_function()
inc2, dec2 = make_incdec_function()
a, b = T.scalars('ab')
sum = theano.function([(a, inc1.container['c']), (b, inc2.container['c'])], a + b)
inc1(2)
dec1(4)
inc2(6)
assert inc1['c'] == -2 and inc2['c'] == 6
assert sum() == 4 # -2 + 6
Using Module:
.. code-block:: python
def make_incdec_module():
m = M.Module()
n = T.scalar('n')
m.c = M.Member(T.scalar()) # state variables must be wrapped with ModuleMember
m.inc = M.Method(n, [], c = m.c + n) # m.c <= m.c + n
m.dec = M.Method(n, [], c = m.c - n) # k.c <= k.c - n
return m
m = M.Module()
m.incdec1 = make_incdec_module()
m.incdec2 = make_incdec_module()
m.sum = M.Method([], m.incdec1.c + m.incdec2.c)
inst = m.make(incdec1 = dict(c=0), incdec2 = dict(c=0))
inst.incdec1.inc(2)
inst.incdec1.dec(4)
inst.incdec2.inc(6)
assert inst.incdec1.c == -2 and inst.incdec2.c == 6
assert inst.sum() == 4 # -2 + 6
Here, we make a new ``Module`` and we give it two inner ``Modules`` like the one defined in the basic example. Each inner module has methods inc and dec as well as a state c and their state is directly accessible from the outer module, which means that it can define methods using them. The ``ModuleInstance`` we make from the ``Module`` reflects the hierarchy that we created. Unlike the method using function, there is no need to manipulate any containers directly.
Advanced example
================
Complex models can be implemented by subclassing ``Module`` (though that is not mandatory). Here is a complete, extensible (and working) regression model implemented using this system:
.. code-block:: python
class RegressionLayer(M.Module):
def __init__(self, input = None, target = None, regularize = True):
super(RegressionLayer, self).__init__() #boilerplate
# MODEL CONFIGURATION
self.regularize = regularize
# ACQUIRE/MAKE INPUT AND TARGET
if not input:
input = T.matrix('input')
if not target:
target = T.matrix('target')
# HYPER-PARAMETERS
self.stepsize = M.Member(T.scalar()) # a stepsize for gradient descent
# PARAMETERS
self.w = M.Member(T.matrix()) #the linear transform to apply to our input points
self.b = M.Member(T.vector()) #a vector of biases, which make our transform affine instead of linear
# REGRESSION MODEL
self.activation = T.dot(input, self.w) + self.b
self.prediction = self.build_prediction()
# CLASSIFICATION COST
self.classification_cost = self.build_classification_cost(target)
# REGULARIZATION COST
self.regularization = self.build_regularization()
# TOTAL COST
self.cost = self.classification_cost
if self.regularize:
self.cost = self.cost + self.regularization
# GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS
self.grad_w, self.grad_b = T.grad(self.cost, [self.w, self.b])
# INTERFACE METHODS
self.update = M.Method([input, target],
self.cost,
w = self.w - self.stepsize * self.grad_w,
b = self.b - self.stepsize * self.grad_b)
self.apply = M.Method(input, self.prediction)
def params(self):
return self.w, self.b
def _instance_initialize(self, obj, input_size = None, target_size = None,
seed = 1827, **init):
# obj is an "instance" of this module holding values for each member and
# functions for each method
if input_size and target_size:
# initialize w and b in a special way using input_size and target_size
sz = (input_size, target_size)
rng = N.random.RandomState(seed)
obj.w = rng.uniform(size = sz, low = -0.5, high = 0.5)
obj.b = N.zeros(target_size)
obj.stepsize = 0.01
# here we call the default_initialize method, which takes all the name: value
# pairs in init and sets the property with that name to the provided value
# this covers setting stepsize, l2_coef; w and b can be set that way too
# we call it after as we want the parameter to superseed the default value.
M.default_initialize(obj,**init)
def build_regularization(self):
return T.zero() # no regularization!
class SoftmaxXERegression(RegressionLayer):
""" XE mean cross entropy"""
def build_prediction(self):
return NN.softmax(self.activation)
def build_classification_cost(self, target):
#self.classification_cost_matrix = target * T.log(self.prediction) + (1 - target) * T.log(1 - self.prediction)
self.classification_cost_matrix = (target - self.prediction)**2
self.classification_costs = -T.sum(self.classification_cost_matrix, axis=1)
return T.sum(self.classification_costs)
def build_regularization(self):
self.l2_coef = M.Member(T.scalar()) # we can add a hyper parameter if we need to
return self.l2_coef * T.sum(self.w * self.w)
Using the model is quite simple:
.. code-block:: python
data_x = N.random.randn(4, 10)
data_y = [ [int(x)] for x in N.random.randn(4) > 0]
model = SoftmaxXERegression(regularize = False).make(input_size = 10,
target_size = 1,
stepsize = 0.1)
for i in xrange(1000):
xe = model.update(data_x, data_y)
if i % 100 == 0:
print i, xe
pass
#for inputs, targets in my_training_set():
#print "cost:", model.update(inputs, targets)
print "final weights:", model.w
print "final biases:", model.b
Extending ``Methods``
=======================
[Fred:still valid? example don't work and I'm not able to repair it.]
``Methods`` can be extended to update more parameters. For example, if we wanted to add a variable holding the sum of all costs encountered so far to ``SoftmaxXERegression``, we could proceed like this:
.. code-block:: python
model_module = SoftmaxXERegression(regularize = False)
model_module.sum = M.Member(T.scalar()) # we add a module member to hold the sum
model_module.update.updates.update(sum = model_module.sum + model_module.cost) # now update will also update sum!
model = model_module.make(input_size = 4,
target_size = 2,
stepsize = 0.1,
sum = 0) # we mustn't forget to initialize the sum
test = model.update([[0,0,1,0]], [[0,1]]) + model.update([[0,1,0,0]], [[1,0]])
assert model.sum == test
The inputs and outputs list of a ``Method`` can be doctored as well, but it is trickier, arguably less useful and not fully supported at the moment.
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论