提交 6c4df656 authored 作者: james@X40's avatar james@X40

merge

......@@ -114,7 +114,7 @@ Setup on OS-X
Note that compiling gcc42 takes a significant time (hours) so it's probably
not the best solution if you're in a rush! In my (Doomie) experience, scipy
failed to compile the first time I tried the command, but the second time
it compiled just fine. Same thing with py25-zlib.
it compiled fine. Same thing with py25-zlib.
- Install some kind of BLAS library (TODO: how?)
......
......@@ -305,9 +305,9 @@ This is done by setting the ``destroy_map`` field of the op. ``destroy_map`` mus
Viewers
-------
Similarly, an Op might not modify the inputs, but return an output which shares state with one or several of its inputs. For example, ``transpose`` can be done very efficiently by viewing the same data as the original with modified dimensions and strides. That is fine, but the compiler needs to be told.
Similarly, an Op might not modify the inputs, but return an output which shares state with one or several of its inputs. For example, ``transpose`` can be done efficiently by viewing the same data as the original with modified dimensions and strides. That is fine, but the compiler needs to be told.
This is done by setting the ``view_map`` field of the op. It works just like the ``destroy_map`` field: to an output index is associated the list of inputs that it shares state with. For example, ``transpose.view_map == {0: [0]``} because its first output uses the same data as its first input. ``view_map`` is conservative: if there is any probability that an output will be the view of an input, that input must be in the view list of that output.
This is done by setting the ``view_map`` field of the op. It works like the ``destroy_map`` field: to an output index is associated the list of inputs that it shares state with. For example, ``transpose.view_map == {0: [0]``} because its first output uses the same data as its first input. ``view_map`` is conservative: if there is any probability that an output will be the view of an input, that input must be in the view list of that output.
Important note: currently, an output can only be the view of one input. This is limiting, as an 'if' or 'switch' op would need to declare its output as a view of both its then and else branches, but for the time being the framework is not powerful enough to handle it. A future version should address this issue.
......@@ -316,7 +316,7 @@ Hidden outputs (as a form of op state)
For performance purposes, an ``op`` might want to have a hidden internal state.
Example: if we expect to call the op repeatedly on incrementally bigger inputs, we might want private output storage that's a lot bigger than needed and take incrementally bigger views on it, to save allocation overhead. In order to do this, we can simple have two outputs: one that we will return normally and will contain the answer and the other that will be the (larger) container. In this case, the advanced note in the 'reusing outputs' section applies. Furthermore, ``__call__`` should be overriden to only return the first output instead of both of them. Here is what the example's ``perform`` and ``__call__`` would look like:
Example: if we expect to call the op repeatedly on incrementally bigger inputs, we might want private output storage that's a lot bigger than needed and take incrementally bigger views on it, to save allocation overhead. In order to do this, we can have two outputs: one that we will return normally and will contain the answer and the other that will be the (larger) container. In this case, the advanced note in the 'reusing outputs' section applies. Furthermore, ``__call__`` should be overriden to only return the first output instead of both of them. Here is what the example's ``perform`` and ``__call__`` would look like:
.. code-block:: python
......
......@@ -27,6 +27,21 @@ However, if the link target is ambiguous, Sphinx will generate errors.
NB the ``:api:`` reference is special magic by Olivier, in
./scripts/docgen.py.
How to add TODO comments in Sphinx documentation
-------------------------------------------------
To include a TODO comment in Sphinx documentation, use an indented block as
follows::
.. TODO: This is a comment.
.. You have to put .. at the beginning of every line :(
.. These lines should all be indented.
It will not appear in the output generated.
.. TODO: Check it out, this won't appear.
.. Nor will this.
How to write API documentation
---------------------------------------
......
......@@ -292,7 +292,7 @@ Complex models can be implemented by subclassing ``Module`` (though that is not
self.l2_coef = M.Member(T.scalar()) # we can add a hyper parameter if we need to
return self.l2_coef * T.sum(self.w * self.w)
Using the model is quite simple:
Here is how we use the model:
.. code-block:: python
......
......@@ -7,8 +7,13 @@ Sparse matrices
scipy.sparse
------------
Note that you want scipy >= 0.7.0. 0.6 has a very bug and inconsistent
implementation of sparse matrices.
Note that you want scipy >= 0.7.0.
.. warning::
In scipy 0.6, ``scipy.csc_matrix.dot`` has a bug with singleton
dimensions. There may be more bugs. It also has inconsistent
implementation of sparse matrices.
We describe the details of the compressed sparse matrix types.
``scipy.sparse.csc_matrix``
......
......@@ -157,7 +157,7 @@ State example
=============
In this example, we'll look at a complete logistic regression model, with
training by simple gradient descent.
training by gradient descent.
.. code-block:: python
......
......@@ -31,7 +31,7 @@ not limited to:
* constant folding
* merging of similar subgraphs, to avoid calculating the same values more than once
* simple arithmetic simplification (``x*y/x -> y``)
* arithmetic simplification (``x*y/x -> y``)
* inserting efficient BLAS_ operations
* using inplace operations wherever it is safe to do so.
......@@ -47,7 +47,7 @@ Theano is released under a BSD license (:ref:`link <license>`)
Sneak peek
==========
Here is a simple example of how to use Theano. It doesn't show
Here is an example of how to use Theano. It doesn't show
off many of Theano's features, but it illustrates concretely what
Theano is.
......@@ -110,7 +110,7 @@ There exist another symbolic package in Python, namely sympy_. Theano
is different from sympy in the sense that while Theano allows symbolic
manipulation it puts more emphasis on the evaluation of these expressions
and being able to repeatedly evaluate them on many different inputs. Theano
is also better suited to handling very large tensors which have no
is also better suited to handling large tensors which have no
assumed structures.
If numpy_ is to be compared to MATLAB_ and sympy_ to Mathematica_,
......
......@@ -43,17 +43,20 @@ The following libraries and software are optional:
Easy install
------------
The following command will install the very latest revision of Theano
The following command will install the latest revision of Theano
on your system:
.. TODO: Does this install the latest package version, or the latest Mercurial
.. revision?
.. code-block:: bash
easy_install http://pylearn.org/hg/theano/archive/tip.tar.gz
TODO: make sure this works
.. TODO: make sure this works
TODO: change the command to install the latest *stable* version of
Theano, when we figure out where to put it.
.. TODO: change the command to install the latest *stable* version of
.. Theano, when we figure out where to put it.
--------------
......
......@@ -17,7 +17,7 @@ an input provided by the end user (using c_extract) or it might simply
have been calculated by another operation. For each of the outputs,
the variables associated to them will be declared and initialized.
The operation then simply has to compute what it needs to using the
The operation then has to compute what it needs to using the
input variables and place the results in the output variables.
......@@ -88,7 +88,7 @@ variables x_name, y_name and output_name are all of the primitive C
Implementing multiplication is as simple as multiplying the two input
doubles and setting the output double to what comes out of it. If you
had more than one output, you would simply set the variable(s) for
had more than one output, you would just set the variable(s) for
each output to what they should be.
.. warning::
......
......@@ -154,7 +154,7 @@ it, it's best to publish it somewhere.
""" % dict(name = name)
double.c_init = c_init
Still straightforward. This function simply has to initialize the
This function has to initialize the
double we declared previously to a suitable value. This is useful if
we want to avoid dealing with garbage values, especially if our data
type is a pointer. This is not going to be called for all Results with
......@@ -375,7 +375,7 @@ like this:
//c_cleanup for x
}
It's not very good looking, but it gives you an idea of how things
It's not pretty, but it gives you an idea of how things
work (note that the variable names won't be x, y, z, etc. - they will
get a unique mangled name). The ``fail`` code runs a goto to the
appropriate label in order to run all cleanup that needs to be
......
......@@ -138,11 +138,10 @@ type and it should make an Apply node with an output Result of type
mul.make_node = make_node
This is a pretty simple definition: the first two lines make sure that
both inputs are Results of the ``double`` type that we created in the
previous section. We would not want to multiply two arbitrary types,
it would not make much sense (and we'd be screwed when we implement
this in C!)
The first two lines make sure that both inputs are Results of the
``double`` type that we created in the previous section. We would not
want to multiply two arbitrary types, it would not make much sense
(and we'd be screwed when we implement this in C!)
The last line is the meat of the definition. There we create an Apply
node representing the application of ``mul`` to ``x`` and ``y``. Apply
......@@ -178,8 +177,8 @@ understand the role of all three arguments of ``perform``:
return, per our own definition.
- *output_storage*: This is a list of storage cells. There is one
storage cell for each output of the Op. A storage cell is quite
simply a one-element list (note: it is forbidden to change the
storage cell for each output of the Op. A storage cell is
a one-element list (note: it is forbidden to change the
length of the list(s) contained in output_storage). In this example,
output_storage will contain a single storage cell for the
multiplication's result.
......@@ -204,18 +203,19 @@ Here, ``z`` is a list of one element. By default, ``z == [None]``.
:ref:`op` documentation.
.. warning::
The data you put in the output_storage must match the type of the
symbolic output (this is a situation where the ``node`` argument
can come in handy). In the previous example, if you put, say, an
``int`` in ``z[0]`` (even though we gave ``z`` the Theano type
The data you put in ``output_storage`` must match the type of the
symbolic output. This is a situation where the ``node`` argument
can come in handy. In this example, we gave ``z`` the Theano type
``double`` in ``make_node``, which means that a Python ``float``
must be put there) you might have nasty problems further down the
line since Theano often assumes Ops handle typing properly.
must be put there. You should not put, say, an ``int`` in ``z[0]``
because Theano assumes Ops handle typing properly.
Trying out our new Op
=====================
In the following code, we use our new Op:
>>> x, y = double('x'), double('y')
>>> z = mul(x, y)
>>> f = theano.function([x, y], z)
......@@ -224,7 +224,7 @@ Trying out our new Op
>>> f(5.6, 6.7)
37.519999999999996
Seems to work. Note that there is an implicit call to
Note that there is an implicit call to
``double.filter()`` on each argument, so if we give integers as inputs
they are magically casted to the right type. Now, what if we try this?
......@@ -237,7 +237,8 @@ Traceback (most recent call last):
AttributeError: 'int' object has no attribute 'type'
Well, ok. We'd like our Op to be a bit more flexible. This can be done
by fixing ``make_node`` a little bit:
by modifying ``make_node`` to accept Python ``int`` or ``float`` as
``x`` and/or ``y``:
.. code-block:: python
......@@ -252,8 +253,8 @@ by fixing ``make_node`` a little bit:
mul.make_node = make_node
Whenever we pass a Python int or float instead of a Result as ``x`` or
``y``, make_node will convert it to :ref:`constant` for us. Constant
is basically a :ref:`result` we statically know the value of.
``y``, make_node will convert it to :ref:`constant` for us. ``gof.Constant``
is a :ref:`result` we statically know the value of.
>>> x = double('x')
>>> z = mul(x, 2)
......@@ -263,18 +264,16 @@ is basically a :ref:`result` we statically know the value of.
>>> f(3.4)
6.7999999999999998
And now it works the way we want it to.
Now the code works the way we want it to.
Final version
=============
While I would call the above definitions appropriately pedagogical, it
is not necessarily the best way to do things, especially when you need
to define the other basic arithmetic operations ``add``, ``sub`` and
``div``. It appears that the code for ``make_node`` can be shared
between these Ops. Here is the final version of the four arithmetic
operators (well, pending revision of this tutorial, I guess):
The above example is pedagogical. When you define other basic arithmetic
operations ``add``, ``sub`` and ``div``, code for ``make_node`` can be
shared between these Ops. Here is revised implementation of these four
arithmetic operators:
.. code-block:: python
......@@ -313,37 +312,27 @@ operators (well, pending revision of this tutorial, I guess):
div = BinaryDoubleOp(name = 'div',
fn = lambda x, y: x / y)
Can you see how the definition of ``mul`` here does exactly the same
thing as the definition we had earlier?
Instead of working directly on an instance of Op, we create a subclass
of Op that we can parametrize. First, all the operations we define are
binary, they all work on inputs with type ``double`` and they all
return a single Result of type ``double``. Therefore, ``make_node``
basically does the same thing for all these operations, except for the
fact that the Op reference passed as first argument to Apply must be
themselves. Therefore we can abstract out most of the logic and pass
self to Apply, which seems natural. We can also easily define
``perform`` as depending on a function or lambda expression passed in
the constructor.
This design therefore appears to be a flexible way to define our four
basic operations (and possibly many more!) without duplicating
code. The same way a Type subclass represents a set of structurally
similar types (see previous section), an Op subclass represents a set
of structurally similar operations: operations that have the same
input/output types, operations that only differ in one small detail,
etc. If you see common patterns in several Ops that you want to
define, it can be a good idea to abstract out what you can, as I did
here. Remember that an Op is just an object which satisfies the
contract described above on this page and that you should use all the
tools at your disposal to create these objects as efficiently as
possible.
While I could have made a generic DoubleOp where the number of
arguments can also be given as a parameter, I decided it was not
necessary here.
Instead of working directly on an instance of Op, we create a subclass of
Op that we can parametrize. All the operations we define are binary. They
all work on two inputs with type ``double``. They all return a single
Result of type ``double``. Therefore, ``make_node`` does the same thing
for all these operations, except for the Op reference ``self`` passed
as first argument to Apply. We define ``perform`` using the function
``fn`` passed in the constructor.
This design is a flexible way to define basic operations without
duplicating code. The same way a Type subclass represents a set of
structurally similar types (see previous section), an Op subclass
represents a set of structurally similar operations: operations that
have the same input/output types, operations that only differ in one
small detail, etc. If you see common patterns in several Ops that you
want to define, it can be a good idea to abstract out what you can.
Remember that an Op is just an object which satisfies the contract
described above on this page and that you should use all the tools at
your disposal to create these objects as efficiently as possible.
**Exercise**: Make a generic DoubleOp, where the number of
arguments can also be given as a parameter.
**Next:** `Implementing double in C`_
......
......@@ -11,7 +11,7 @@ Before tackling this tutorial, it is highly recommended to read the
The advanced tutorial is meant to give the reader a greater
understanding of the building blocks of Theano. Through this tutorial
we are going to define one :ref:`type`, ``double`` and basic
we are going to define one :ref:`type`, ``double``, and basic
arithmetic :ref:`operations <op>` on that Type. We will first define
them using a Python implementation and then we will add a C
implementation.
......
......@@ -166,7 +166,7 @@ first input (rank 0).
Purely destructive operations
=============================
While some operations will operate inplace on their inputs, some will
While some operations will operate inplace on their inputs, some might
simply destroy or corrupt them. For example, an Op could do temporary
calculations right in its inputs. If that is the case, Theano also
needs to be notified. The way to notify Theano is to assume that some
......
......@@ -176,7 +176,7 @@ optimization you wrote. For example, consider the following:
>>> e
[div(mul(add(y, z), x), add(y, z))]
Nothing happened here. The reason is simple: ``add(y, z) != add(y,
Nothing happened here. The reason is: ``add(y, z) != add(y,
z)``. That is the case for efficiency reasons. To fix this problem we
first need to merge the parts of the graph that represent the same
computation, using the ``merge_optimizer`` defined in
......
......@@ -14,7 +14,7 @@ WRITEME
Don't define new Ops unless you have to
=======================================
It is usually not very useful to define Ops that can be easily
It is usually not useful to define Ops that can be easily
implemented using other already existing Ops. For example, instead of
writing a "sum_square_difference" Op, you should probably just write a
simple function:
......
......@@ -30,6 +30,12 @@ add. Note that from now on, we will use the term :term:`Result` to
mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result
objects).
If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function``
instruction. Behind the scenes, ``f`` was being compiled into C code.
.. TODO: help
-------------------------------------------
**Step 1**
......@@ -119,16 +125,15 @@ The result is a numpy array. We can also use numpy arrays directly as
inputs:
>>> import numpy
>>> f(numpy.ones((3, 5)), numpy.ones((3, 5)))
array([[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]])
>>> f(numpy.array([[1, 2], [3, 4]]), numpy.array([[10, 20], [30, 40]]))
array([[ 11., 22.],
[ 33., 44.]])
It is possible to add scalars to matrices, vectors to matrices,
scalars to vectors, etc. The behavior of these operations is defined
by :term:`broadcasting`.
The following types are readily available:
The following types are available:
* **byte**: bscalar, bvector, bmatrix
* **32-bit integers**: iscalar, ivector, imatrix
......@@ -136,16 +141,15 @@ The following types are readily available:
* **float**: fscalar, fvector, fmatrix
* **double**: dscalar, dvector, dmatrix
The previous list is not exhaustive. A guide to all types compatible
with numpy arrays may be found :ref:`here <predefinedtypes>`.
.. note::
Watch out for the distinction between 32 and 64 bit integers (i
prefix vs the l prefix) and between 32 and 64 bit floats (f prefix
vs the d prefix).
Try to mix and match them and see what happens. The previous list is
not exhaustive. A guide to all types compatible with numpy arrays may
be found :ref:`here <predefinedtypes>`.
**Next:** `More examples`_
......
......@@ -17,39 +17,63 @@ the logistic curve, which is given by:
s(x) = \frac{1}{1 + e^{-x}}
.. figure:: logistic.png
A plot of the logistic function, with x on the x-axis and s(x) on the
y-axis.
You want to compute the function :term:`elementwise` on matrices of
doubles.
doubles, which means that you want to apply this function to each
individual element of the matrix.
Well, what you do is this:
>>> x = T.dmatrix('x')
>>> s = 1 / (1 + T.exp(-x))
>>> logistic = function([x], s)
>>> logistic([[0, 1], [-1, -2]])
array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]])
Alternatively:
The reason logistic is performed elementwise is because all of its
operations---division, addition, exponentiation, and division---are
themselves elementwise operations.
>>> s = (T.tanh(x) + 1) / 2
>>> logistic = function([x], s)
It is also the case that:
.. math::
s(x) = \frac{1}{1 + e^{-x}} = \frac{1 + \tanh(x/2)}{2}
We can verify that this alternate form produces the same values:
>>> s2 = (1 + T.tanh(x / 2)) / 2
>>> logistic2 = function([x], s2)
>>> logistic2([[0, 1], [-1, -2]])
array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]])
Computing more than one thing at the same time
==============================================
Theano supports functions with multiple outputs. For example, we can
compute the :term:`elementwise` absolute difference between two
matrices ``x`` and ``y`` and the squared difference at the same time:
compute the :term:`elementwise` difference, absolute difference, and
squared difference between two matrices ``x`` and ``y`` at the same time:
>>> x, y = T.dmatrices('xy')
>>> diff = x - y
>>> abs_diff = abs(x - y)
>>> abs_diff = abs(diff)
>>> diff_squared = diff**2
>>> f = function([x, y], [abs_diff, diff_squared])
>>> f = function([x, y], [diff, abs_diff, diff_squared])
When we use the function, it will return the two results (the printing
was reformatted for readability):
>>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
[array([[ 1., 0.],
[-1., -2.]]),
array([[ 1., 0.],
[ 1., 2.]]),
array([[ 1., 0.],
[ 1., 4.]])]
......@@ -62,9 +86,12 @@ Computing gradients
===================
Now let's use Theano for a slightly more sophisticated task: create a
function which computes the derivative of some expression ``e`` with
function which computes the derivative of some expression ``y`` with
respect to its parameter ``x``. For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`.
gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`.
Here is code to compute this gradient:
>>> x = T.dscalar('x')
>>> y = x**2
......@@ -76,17 +103,26 @@ array(8.0)
array(188.40000000000001)
We can also compute the gradient of complex expressions such as the
logistic function defined above:
logistic function defined above. It turns out that the derivative of the
logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
.. figure:: dlogistic.png
A plot of the gradient of the logistic function, with x on the x-axis
and :math:`ds(x)/dx` on the y-axis.
>>> x = T.dmatrix('x')
>>> s = 1 / (1 + T.exp(-x))
>>> gs = T.grad(s, x)
>>> glogistic = function([x], gs)
>>> dlogistic = function([x], gs)
>>> dlogistic([[0, 1], [-1, -2]])
array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]])
The resulting function computes the gradient of its first argument
with respect to the second. It is pretty much equivalent in semantics
and in computational complexity as what you would obtain through an
`automatic differentiation`_ tool.
with respect to the second. In this way, Theano can be used for
`automatic differentiation`_.
.. note::
......@@ -125,7 +161,7 @@ Making a function with state
It is also possible to make a function with an internal state. For
example, let's say we want to make an accumulator: at the beginning,
the state is initialized to zero, then on each function call the state
the state is initialized to zero. Then, on each function call, the state
is incremented by the function's argument. We'll also make it so that
the increment has a default value of 1.
......@@ -136,12 +172,12 @@ First let's define the accumulator function:
>>> new_state = state + inc
>>> accumulator = function([(inc, 1), ((state, new_state), 0)], new_state)
The first argument is a pair. As we saw in the previous section this
simply means that inc is an input with a default value of 1. The
second argument has a new syntax which creates an internal state or
The first argument is a pair. As we saw in the previous section, this
means that ``inc`` is an input with a default value of 1. The
second argument has syntax that creates an internal state or
closure. The syntax is ``((state_result, new_state_result),
initial_value)``. What this means is that every time ``accumulator``
will be called, the value of the internal ``state`` will be replaced
is called, the value of the internal ``state`` will be replaced
by the value computed as ``new_state``. In this case, the state will
be replaced by the result of incrementing it by ``inc``.
......@@ -152,7 +188,7 @@ however you like as long as the name does not conflict with the names
of other inputs.
Anyway, let's try it out! The state can be accessed using the square
brackets notation ``[]``. You may access the state either by putting
brackets notation ``[]``. You may access the state either by using
the :ref:`result` representing it or the name of that
:ref:`result`. In our example we can access the state either with the
``state`` object or the string 'state'.
......@@ -174,8 +210,8 @@ array(301.0)
>>> accumulator['state']
array(301.0)
It is of course possible to reset the state. This is done very
naturally by assigning to the state using the square brackets
It is possible to reset the state. This is done
by assigning to the state using the square brackets
notation:
>>> accumulator['state'] = 5
......
set terminal svg font "Bitstream Vera Sans,10" size 300,200
set output "logistic.svg"
set xrange [-6:6]
set xzeroaxis linetype -1
set yzeroaxis linetype -1
set xtics axis nomirror
set ytics axis nomirror 0,0.5,1
set key off
set grid
set border 1
set samples 400
plot 1/(1 + exp(-x)) with line linetype rgbcolor "blue" linewidth 2
set ytics axis nomirror 0,0.25
set output "dlogistic.svg"
plot 1/(1 + exp(-x)) * (1 - 1/(1 + exp(-x))) with line linetype rgbcolor "blue" linewidth 2
......@@ -3,11 +3,11 @@
Using Module
============
Now that we're familiar with the basics, we can see Theano's more
Now that we're familiar with the basics, we introduce Theano's more
advanced interface, Module. This interface allows you to define Theano
"objects" which can have many state variables and many methods sharing
these states. This is what you should use if you aim to use Theano to
define complex systems such as a neural network.
these states. This is what you should use to define complex systems such
as a neural network.
Remake of the "state" example
......@@ -61,7 +61,7 @@ defined in our Module.
The inc variable doesn't need to be declared as a Member because it
will only serve as an input to the method we will define. This is why
it is defined as an :ref:`external` variable. Do note that it is
inconsequential if you do declare it as a Member - it is very unlikely
inconsequential if you do declare it as a Member - it is unlikely
to cause you any problems.
.. note::
......
......@@ -52,7 +52,7 @@ object for each of fn and gn).
>>> m.nearly_zeros = Method([], rv_u + rv_u - 2 * rv_u)
This function will always return a 2x2 matrix of very small numbers, or possibly
This function will always return a 2x2 matrix of small numbers, or possibly
zeros. It illustrates that random variables are not re-drawn every time they
are used, they are only drawn once (per call).
......@@ -84,7 +84,7 @@ seed method of a RandomStreamsInstance.
Of course, a RandomStreamsInstance can contain several RandomState instances and
these will _not_ all be seeded to the same seed_value. They will all be seeded
deterministically and very-probably uniquely as a function of the seed_value.
deterministically and probably uniquely as a function of the seed_value.
Seeding the generator in this way makes it possible to repeat random streams.
......
......@@ -22,7 +22,7 @@ much longer than intended - maybe we should just link to it! --OB
Predefined types
----------------
Theano gives you many premade types to work with. These types are
Predefined types are
located in the ``theano.tensor`` package. The name of the types follow
a recipe:
......@@ -53,9 +53,9 @@ col [m, 1] No Yes
matrix [m, n] No No
====== ====== ========================================== =============================================
So for example if you want a row of 32-bit floats, it is available
under ``theano.tensor.frow`` and if you want a matrix of unsigned
32-bit integers it is available under ``theano.tensor.imatrix``.
So, if you want a row of 32-bit floats, it is available
as ``theano.tensor.frow``. If you want a matrix of unsigned
32-bit integers it is available as ``theano.tensor.imatrix``.
Each of the types described above can be constructed by two methods:
a singular version (e.g., ``dmatrix``) and a plural version
......@@ -108,16 +108,18 @@ complex128 complex 128 (two float64)
.. note::
Even though ``theano.tensor`` does not define any type using
``complex`` dtypes (``complex64`` or ``complex128``), you can define
them explicitly with ``Tensor`` (see example below). However, few
operations are fully supported for complex types: as of version 0.1,
only elementary operations (``+-*/``) have C implementations.
Even though ``theano.tensor`` does not define any type
using ``complex`` dtypes (``complex64`` or ``complex128``),
you can define them explicitly with ``Tensor`` (see example
below). However, few operations are fully supported for complex
types: as of version 0.1, only elementary operations (``+-*/``)
have C implementations. Additionally, complex types have received
little testing.
The broadcastable pattern, on the other hand, indicates both the
number of dimensions and whether a particular dimension has length
1. Here is a handy table mapping the :term:`broadcastable
The broadcastable pattern indicates both the number of dimensions and
whether a particular dimension must have length 1.
Here is a table mapping the :term:`broadcastable
<broadcasting>` pattern to what kind of tensor it encodes:
===================== =================================
......@@ -136,14 +138,18 @@ pattern interpretation
[False, False, False] A MxNxP tensor (pattern of a + b)
===================== =================================
For dimensions in which broadcasting is False, the length of this
dimension can be 1 or more. For dimensions in which broadcasting is True,
the length of this dimension must be 1.
When two tensors have a different number of dimensions, the broadcastable
pattern is *expanded to the left*, by padding with ``True``. So, for example,
pattern is *expanded to the left*, by padding with ``True``. For example,
a vector's pattern, ``[False]``, could be expanded to ``[True, False]``, and
would behave like a row (1xN matrix). In the same way, a matrix (``[False,
False]``) would behave like a 1xNxP tensor (``[True, False, False]``).
So if we wanted to create a type representing a 3D array of unsigned
bytes, we would simply do:
If we wanted to create a type representing a 3D array of unsigned
bytes, we would do:
.. code-block:: python
......@@ -158,10 +164,8 @@ bytes, we would simply do:
Ops
===
There's a lot of operations readily available in the ``theano.tensor``
package. They do not require much explanation according to this
tutorial's author, so he will simply direct you to the :ref:`oplist`
:)
There are a lot of operations available in the ``theano.tensor`` package.
See :ref:`oplist`.
......
......@@ -24,7 +24,7 @@ difficult, we will give our Op a solid C implementation.
Implementing a new Op in Python
===============================
This is actually very simple to do. You are required to define two
You are required to define two
methods - one to create the :ref:`apply` node every time your Op is
applied to some inputs, declaring the outputs in the process and
another to operate on the inputs. There is also one optional method
......
......@@ -115,6 +115,19 @@ class Function(object):
"""
pickle_aliased_memory_strategy = 'warn'
"""How to deal with pickling finding aliased storage.
Meaningful settings are: 'ignore', 'warn', 'raise'
If the value is 'warn', then a message will be printed to stderr if aliased storage is
dectected during pickle.dump.
If the value is 'raise', then an AliasedMemoryError will be raised if aliased storage is
detected during pickle.dump.
"""
def __init__(self, fn, input_storage, output_storage, indices, outputs, defaults, unpack_single, maker):
"""
fn -> a function returned by some linker's make_thunk method
......@@ -334,9 +347,29 @@ def _pickle_Function(f):
else:
defaults.append(ins[0])
del ins[0]
rval = (_constructor_Function, (f.maker, defaults, [x.data for x in f.input_storage]))
inputs_data = [x.data for x in f.input_storage]
#HACK to detect aliased storage.
# aliased relationships will not be preserved across the pickle operation
if not (f.pickle_aliased_memory_strategy == 'ignore'):
all_data = defaults + inputs_data
for i, d_i in enumerate(all_data):
for j, d_j in enumerate(all_data):
if (i < j) and isinstance(d_i, numpy.ndarray) and isinstance(d_j, numpy.ndarray):
if f.pickle_aliased_memory_strategy == 'warn':
print >> sys.stderr, ('WARNING: '
'aliased relationship between Function arguments '
'will not be preserved by un-pickling operation')
#print >> sys.stderr, d_i, d_j, id(d_i), id(d_j)
else:
raise AliasedMemoryError(d_i, d_j)
rval = (_constructor_Function, (f.maker, defaults, inputs_data))
return rval
class AliasedMemoryError(Exception):pass
def _constructor_Function(maker, defaults, data):
f = maker.create(defaults, trustme = True)
assert len(f.input_storage) == len(data)
......
......@@ -1143,13 +1143,11 @@ class Module(ComponentDict):
value=unpack_member_and_external(value)
if not hasattr(self,"local_attr"):
self.__dict__["local_attr"]={}
self.__dict__["local_attr_order"]=[]
self.__dict__["local_attr"][attr]=value
self.__dict__["local_attr_order"].append((attr, value))
self.__dict__["local_attr"][attr] = value
def build(self, mode, memo):
for k,v in list(self.local_attr_order): #.iteritems():
for k,v in self.local_attr.iteritems():
self.__setattr__(k,v)
inst = super(Module, self).build(mode, memo)
if not isinstance(inst, ModuleInstance):
......@@ -1181,41 +1179,44 @@ class Module(ComponentDict):
for name, value in chain(init.iteritems(), kwinit.iteritems()):
inst[name] = value
def make_mi(self, *args, **kwargs):
mods=[]
meth=[]#we put the method after the member to be sure of the ordering.
def make_module_instance(self, *args, **kwargs):
"""
Module's __setattr__ method hides all members under local_attr. This
method iterates over those elements and wraps them so they can be used
in a computation graph. The "wrapped" members are then set as object
attributes accessible through the dotted notation syntax (<module_name>
<dot> <member_name>). Submodules are handled recursively.
"""
# Function to go through member lists and dictionaries recursively,
# to look for submodules on which make_module_instance needs to be called
def recurse(v):
iter = enumerate(v) if isinstance(v,list) else v.iteritems()
for sk,sv in iter:
if isinstance(sv,(list,dict)):
sv = recurse(sv)
elif isinstance(sv,Module):
sv = sv.make_module_instance(args,kwargs)
v[sk] = sv
return v
for k,v in self.local_attr.iteritems():
if isinstance(v,Module):
mods.append((k, v))
v = v.make_module_instance(args,kwargs)
self[k] = self.__wrapper__(v)
elif isinstance(v,Method):
meth.append((k,v))
elif isinstance(v, list) and isinstance(v[0],Module):
temp = []
for m in v:
m=m.make_mi(args,kwargs)
m = self.__wrapper__(m)
temp.append(m)
self[k] = self.__wrapper__(temp)
self.__setitem__(k,v)
else:
v = self.__wrapper__(v)
# iterate through lists and dictionaries to wrap submodules
if isinstance(v,(list,dict)):
self[k] = self.__wrapper__(recurse(v))
try:
self[k] = v
self[k] = self.__wrapper__(v)
except:
if isinstance(v, Component):
raise
else:
self.__dict__[k] = v
# self.__setitem__(k,v)
for k,v in mods:
v=v.make_mi(args,kwargs)
v = self.__wrapper__(v)
self[k] = v
for k,v in meth:
self.__setitem__(k,v)
return self
def make(self, *args, **kwargs):
......@@ -1226,7 +1227,7 @@ class Module(ComponentDict):
arguments and the keyword arguments. If 'mode' is in the
keyword arguments it will be passed to build().
"""
self.make_mi(args,kwargs)
self.make_module_instance(args,kwargs)
mode = kwargs.pop('mode', default_mode)
rval = self.make_no_init(mode)
......
......@@ -4,7 +4,9 @@
__docformat__ = "restructuredtext en"
import cPickle, numpy, unittest
from theano.compile.mode import default_mode
from theano.compile.module import *
from theano.compile.function_module import AliasedMemoryError
import theano.tensor as T
import sys
import theano
......@@ -570,7 +572,8 @@ def test_pickle():
M.f = Method([a], a + M.x + M.y)
M.g = Method([a], a * M.x * M.y)
m = M.make(x=numpy.zeros((4,5)), y=numpy.ones((2,3)))
mode = default_mode if default_mode is not 'DEBUG_MODE' else 'FAST_RUN'
m = M.make(x=numpy.zeros((4,5)), y=numpy.ones((2,3)), mode=mode)
m_dup = cPickle.loads(cPickle.dumps(m))
......@@ -587,38 +590,56 @@ def test_pickle():
assert m_dup.y is m_dup.g.input_storage[2].data
def test_pickle_aliased_memory():
M = Module()
M.x = (T.dmatrix())
M.y = (T.dmatrix())
a = T.dmatrix()
M.f = Method([a], a + M.x + M.y)
M.g = Method([a], a * M.x * M.y)
mode = default_mode if default_mode is not 'DEBUG_MODE' else 'FAST_RUN'
m = M.make(x=numpy.zeros((4,5)), y=numpy.ones((2,3)), mode=mode)
m.y = m.x[:]
#m's x and y memory is aliased....
m.x[0,0] = 3.14
assert m.y[0,0] == 3.14
import StringIO
sio = StringIO.StringIO()
old_stderr = sys.stderr
sys.stderr = sio
m.f.pickle_aliased_memory_strategy = 'warn'
m.g.pickle_aliased_memory_strategy = 'warn'
m_dup = cPickle.loads(cPickle.dumps(m))
sys.stderr = old_stderr
assert sio.getvalue().startswith('WARNING: aliased relat')
try:
M = Module()
M.x = (T.dmatrix())
M.y = (T.dmatrix())
a = T.dmatrix()
M.f = Method([a], a + M.x + M.y)
M.g = Method([a], a * M.x * M.y)
m = M.make(x=numpy.zeros((4,5)), y=numpy.ones((2,3)))
m.y = m.x[:]
m.f.pickle_aliased_memory_strategy = 'raise'
m.g.pickle_aliased_memory_strategy = 'raise'
m_dup = cPickle.loads(cPickle.dumps(m))
except AliasedMemoryError, e:
return
#m's memory is aliased....
m.x[0,0] = 3.14
assert m.y[0,0] == 3.14
assert 0 #should have failed to pickle
#is m_dup's memory aliased?
m_dup.x[0,0] = 3.14
assert m_dup.y[0,0] == 3.14
#is m_dup's memory aliased?
m_dup.x[0,0] = 3.14
assert m_dup.y[0,0] == 3.14
#m's memory is aliased differently....
m.y = m.x[1:2]
m_dup = cPickle.loads(cPickle.dumps(m))
#m's memory is aliased differently....
m.y = m.x[1:2]
m_dup = cPickle.loads(cPickle.dumps(m))
if 0:
#is m_dup's memory aliased the same way?
m.x[1,0] = 3.142
assert m.y[0,0] == 3.142
m_dup.x[1,0] = 3.142
assert m_dup.y[0,0] == 3.142
except Exception, e:
raise Exception('Known Failure: These branch cuts are known to fail', str(e))
if __name__ == '__main__':
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论