提交 07d66494 authored 作者: Joseph Turian's avatar Joseph Turian

Updates to documentation

上级 669e606d
...@@ -114,7 +114,7 @@ Setup on OS-X ...@@ -114,7 +114,7 @@ Setup on OS-X
Note that compiling gcc42 takes a significant time (hours) so it's probably Note that compiling gcc42 takes a significant time (hours) so it's probably
not the best solution if you're in a rush! In my (Doomie) experience, scipy not the best solution if you're in a rush! In my (Doomie) experience, scipy
failed to compile the first time I tried the command, but the second time failed to compile the first time I tried the command, but the second time
it compiled just fine. Same thing with py25-zlib. it compiled fine. Same thing with py25-zlib.
- Install some kind of BLAS library (TODO: how?) - Install some kind of BLAS library (TODO: how?)
......
...@@ -305,9 +305,9 @@ This is done by setting the ``destroy_map`` field of the op. ``destroy_map`` mus ...@@ -305,9 +305,9 @@ This is done by setting the ``destroy_map`` field of the op. ``destroy_map`` mus
Viewers Viewers
------- -------
Similarly, an Op might not modify the inputs, but return an output which shares state with one or several of its inputs. For example, ``transpose`` can be done very efficiently by viewing the same data as the original with modified dimensions and strides. That is fine, but the compiler needs to be told. Similarly, an Op might not modify the inputs, but return an output which shares state with one or several of its inputs. For example, ``transpose`` can be done efficiently by viewing the same data as the original with modified dimensions and strides. That is fine, but the compiler needs to be told.
This is done by setting the ``view_map`` field of the op. It works just like the ``destroy_map`` field: to an output index is associated the list of inputs that it shares state with. For example, ``transpose.view_map == {0: [0]``} because its first output uses the same data as its first input. ``view_map`` is conservative: if there is any probability that an output will be the view of an input, that input must be in the view list of that output. This is done by setting the ``view_map`` field of the op. It works like the ``destroy_map`` field: to an output index is associated the list of inputs that it shares state with. For example, ``transpose.view_map == {0: [0]``} because its first output uses the same data as its first input. ``view_map`` is conservative: if there is any probability that an output will be the view of an input, that input must be in the view list of that output.
Important note: currently, an output can only be the view of one input. This is limiting, as an 'if' or 'switch' op would need to declare its output as a view of both its then and else branches, but for the time being the framework is not powerful enough to handle it. A future version should address this issue. Important note: currently, an output can only be the view of one input. This is limiting, as an 'if' or 'switch' op would need to declare its output as a view of both its then and else branches, but for the time being the framework is not powerful enough to handle it. A future version should address this issue.
...@@ -316,7 +316,7 @@ Hidden outputs (as a form of op state) ...@@ -316,7 +316,7 @@ Hidden outputs (as a form of op state)
For performance purposes, an ``op`` might want to have a hidden internal state. For performance purposes, an ``op`` might want to have a hidden internal state.
Example: if we expect to call the op repeatedly on incrementally bigger inputs, we might want private output storage that's a lot bigger than needed and take incrementally bigger views on it, to save allocation overhead. In order to do this, we can simple have two outputs: one that we will return normally and will contain the answer and the other that will be the (larger) container. In this case, the advanced note in the 'reusing outputs' section applies. Furthermore, ``__call__`` should be overriden to only return the first output instead of both of them. Here is what the example's ``perform`` and ``__call__`` would look like: Example: if we expect to call the op repeatedly on incrementally bigger inputs, we might want private output storage that's a lot bigger than needed and take incrementally bigger views on it, to save allocation overhead. In order to do this, we can have two outputs: one that we will return normally and will contain the answer and the other that will be the (larger) container. In this case, the advanced note in the 'reusing outputs' section applies. Furthermore, ``__call__`` should be overriden to only return the first output instead of both of them. Here is what the example's ``perform`` and ``__call__`` would look like:
.. code-block:: python .. code-block:: python
......
...@@ -27,6 +27,21 @@ However, if the link target is ambiguous, Sphinx will generate errors. ...@@ -27,6 +27,21 @@ However, if the link target is ambiguous, Sphinx will generate errors.
NB the ``:api:`` reference is special magic by Olivier, in NB the ``:api:`` reference is special magic by Olivier, in
./scripts/docgen.py. ./scripts/docgen.py.
How to add TODO comments in Sphinx documentation
-------------------------------------------------
To include a TODO comment in Sphinx documentation, use an indented block as
follows::
.. TODO: This is a comment.
.. You have to put .. at the beginning of every line :(
.. These lines should all be indented.
It will not appear in the output generated.
.. TODO: Check it out, this won't appear.
.. Nor will this.
How to write API documentation How to write API documentation
--------------------------------------- ---------------------------------------
......
...@@ -292,7 +292,7 @@ Complex models can be implemented by subclassing ``Module`` (though that is not ...@@ -292,7 +292,7 @@ Complex models can be implemented by subclassing ``Module`` (though that is not
self.l2_coef = M.Member(T.scalar()) # we can add a hyper parameter if we need to self.l2_coef = M.Member(T.scalar()) # we can add a hyper parameter if we need to
return self.l2_coef * T.sum(self.w * self.w) return self.l2_coef * T.sum(self.w * self.w)
Using the model is quite simple: Here is how we use the model:
.. code-block:: python .. code-block:: python
......
...@@ -7,8 +7,13 @@ Sparse matrices ...@@ -7,8 +7,13 @@ Sparse matrices
scipy.sparse scipy.sparse
------------ ------------
Note that you want scipy >= 0.7.0. 0.6 has a very bug and inconsistent Note that you want scipy >= 0.7.0.
implementation of sparse matrices.
.. warning::
In scipy 0.6, ``scipy.csc_matrix.dot`` has a bug with singleton
dimensions. There may be more bugs. It also has inconsistent
implementation of sparse matrices.
We describe the details of the compressed sparse matrix types. We describe the details of the compressed sparse matrix types.
``scipy.sparse.csc_matrix`` ``scipy.sparse.csc_matrix``
......
...@@ -157,7 +157,7 @@ State example ...@@ -157,7 +157,7 @@ State example
============= =============
In this example, we'll look at a complete logistic regression model, with In this example, we'll look at a complete logistic regression model, with
training by simple gradient descent. training by gradient descent.
.. code-block:: python .. code-block:: python
......
...@@ -31,7 +31,7 @@ not limited to: ...@@ -31,7 +31,7 @@ not limited to:
* constant folding * constant folding
* merging of similar subgraphs, to avoid calculating the same values more than once * merging of similar subgraphs, to avoid calculating the same values more than once
* simple arithmetic simplification (``x*y/x -> y``) * arithmetic simplification (``x*y/x -> y``)
* inserting efficient BLAS_ operations * inserting efficient BLAS_ operations
* using inplace operations wherever it is safe to do so. * using inplace operations wherever it is safe to do so.
...@@ -47,7 +47,7 @@ Theano is released under a BSD license (:ref:`link <license>`) ...@@ -47,7 +47,7 @@ Theano is released under a BSD license (:ref:`link <license>`)
Sneak peek Sneak peek
========== ==========
Here is a simple example of how to use Theano. It doesn't show Here is an example of how to use Theano. It doesn't show
off many of Theano's features, but it illustrates concretely what off many of Theano's features, but it illustrates concretely what
Theano is. Theano is.
...@@ -110,7 +110,7 @@ There exist another symbolic package in Python, namely sympy_. Theano ...@@ -110,7 +110,7 @@ There exist another symbolic package in Python, namely sympy_. Theano
is different from sympy in the sense that while Theano allows symbolic is different from sympy in the sense that while Theano allows symbolic
manipulation it puts more emphasis on the evaluation of these expressions manipulation it puts more emphasis on the evaluation of these expressions
and being able to repeatedly evaluate them on many different inputs. Theano and being able to repeatedly evaluate them on many different inputs. Theano
is also better suited to handling very large tensors which have no is also better suited to handling large tensors which have no
assumed structures. assumed structures.
If numpy_ is to be compared to MATLAB_ and sympy_ to Mathematica_, If numpy_ is to be compared to MATLAB_ and sympy_ to Mathematica_,
......
...@@ -43,17 +43,20 @@ The following libraries and software are optional: ...@@ -43,17 +43,20 @@ The following libraries and software are optional:
Easy install Easy install
------------ ------------
The following command will install the very latest revision of Theano The following command will install the latest revision of Theano
on your system: on your system:
.. TODO: Does this install the latest package version, or the latest Mercurial
.. revision?
.. code-block:: bash .. code-block:: bash
easy_install http://pylearn.org/hg/theano/archive/tip.tar.gz easy_install http://pylearn.org/hg/theano/archive/tip.tar.gz
TODO: make sure this works .. TODO: make sure this works
TODO: change the command to install the latest *stable* version of .. TODO: change the command to install the latest *stable* version of
Theano, when we figure out where to put it. .. Theano, when we figure out where to put it.
-------------- --------------
......
...@@ -17,7 +17,7 @@ an input provided by the end user (using c_extract) or it might simply ...@@ -17,7 +17,7 @@ an input provided by the end user (using c_extract) or it might simply
have been calculated by another operation. For each of the outputs, have been calculated by another operation. For each of the outputs,
the variables associated to them will be declared and initialized. the variables associated to them will be declared and initialized.
The operation then simply has to compute what it needs to using the The operation then has to compute what it needs to using the
input variables and place the results in the output variables. input variables and place the results in the output variables.
...@@ -88,7 +88,7 @@ variables x_name, y_name and output_name are all of the primitive C ...@@ -88,7 +88,7 @@ variables x_name, y_name and output_name are all of the primitive C
Implementing multiplication is as simple as multiplying the two input Implementing multiplication is as simple as multiplying the two input
doubles and setting the output double to what comes out of it. If you doubles and setting the output double to what comes out of it. If you
had more than one output, you would simply set the variable(s) for had more than one output, you would just set the variable(s) for
each output to what they should be. each output to what they should be.
.. warning:: .. warning::
......
...@@ -154,7 +154,7 @@ it, it's best to publish it somewhere. ...@@ -154,7 +154,7 @@ it, it's best to publish it somewhere.
""" % dict(name = name) """ % dict(name = name)
double.c_init = c_init double.c_init = c_init
Still straightforward. This function simply has to initialize the This function has to initialize the
double we declared previously to a suitable value. This is useful if double we declared previously to a suitable value. This is useful if
we want to avoid dealing with garbage values, especially if our data we want to avoid dealing with garbage values, especially if our data
type is a pointer. This is not going to be called for all Results with type is a pointer. This is not going to be called for all Results with
...@@ -375,7 +375,7 @@ like this: ...@@ -375,7 +375,7 @@ like this:
//c_cleanup for x //c_cleanup for x
} }
It's not very good looking, but it gives you an idea of how things It's not pretty, but it gives you an idea of how things
work (note that the variable names won't be x, y, z, etc. - they will work (note that the variable names won't be x, y, z, etc. - they will
get a unique mangled name). The ``fail`` code runs a goto to the get a unique mangled name). The ``fail`` code runs a goto to the
appropriate label in order to run all cleanup that needs to be appropriate label in order to run all cleanup that needs to be
......
...@@ -138,11 +138,10 @@ type and it should make an Apply node with an output Result of type ...@@ -138,11 +138,10 @@ type and it should make an Apply node with an output Result of type
mul.make_node = make_node mul.make_node = make_node
This is a pretty simple definition: the first two lines make sure that The first two lines make sure that both inputs are Results of the
both inputs are Results of the ``double`` type that we created in the ``double`` type that we created in the previous section. We would not
previous section. We would not want to multiply two arbitrary types, want to multiply two arbitrary types, it would not make much sense
it would not make much sense (and we'd be screwed when we implement (and we'd be screwed when we implement this in C!)
this in C!)
The last line is the meat of the definition. There we create an Apply The last line is the meat of the definition. There we create an Apply
node representing the application of ``mul`` to ``x`` and ``y``. Apply node representing the application of ``mul`` to ``x`` and ``y``. Apply
...@@ -178,8 +177,8 @@ understand the role of all three arguments of ``perform``: ...@@ -178,8 +177,8 @@ understand the role of all three arguments of ``perform``:
return, per our own definition. return, per our own definition.
- *output_storage*: This is a list of storage cells. There is one - *output_storage*: This is a list of storage cells. There is one
storage cell for each output of the Op. A storage cell is quite storage cell for each output of the Op. A storage cell is
simply a one-element list (note: it is forbidden to change the a one-element list (note: it is forbidden to change the
length of the list(s) contained in output_storage). In this example, length of the list(s) contained in output_storage). In this example,
output_storage will contain a single storage cell for the output_storage will contain a single storage cell for the
multiplication's result. multiplication's result.
...@@ -204,18 +203,19 @@ Here, ``z`` is a list of one element. By default, ``z == [None]``. ...@@ -204,18 +203,19 @@ Here, ``z`` is a list of one element. By default, ``z == [None]``.
:ref:`op` documentation. :ref:`op` documentation.
.. warning:: .. warning::
The data you put in the output_storage must match the type of the The data you put in ``output_storage`` must match the type of the
symbolic output (this is a situation where the ``node`` argument symbolic output. This is a situation where the ``node`` argument
can come in handy). In the previous example, if you put, say, an can come in handy. In this example, we gave ``z`` the Theano type
``int`` in ``z[0]`` (even though we gave ``z`` the Theano type
``double`` in ``make_node``, which means that a Python ``float`` ``double`` in ``make_node``, which means that a Python ``float``
must be put there) you might have nasty problems further down the must be put there. You should not put, say, an ``int`` in ``z[0]``
line since Theano often assumes Ops handle typing properly. because Theano assumes Ops handle typing properly.
Trying out our new Op Trying out our new Op
===================== =====================
In the following code, we use our new Op:
>>> x, y = double('x'), double('y') >>> x, y = double('x'), double('y')
>>> z = mul(x, y) >>> z = mul(x, y)
>>> f = theano.function([x, y], z) >>> f = theano.function([x, y], z)
...@@ -224,7 +224,7 @@ Trying out our new Op ...@@ -224,7 +224,7 @@ Trying out our new Op
>>> f(5.6, 6.7) >>> f(5.6, 6.7)
37.519999999999996 37.519999999999996
Seems to work. Note that there is an implicit call to Note that there is an implicit call to
``double.filter()`` on each argument, so if we give integers as inputs ``double.filter()`` on each argument, so if we give integers as inputs
they are magically casted to the right type. Now, what if we try this? they are magically casted to the right type. Now, what if we try this?
...@@ -237,7 +237,8 @@ Traceback (most recent call last): ...@@ -237,7 +237,8 @@ Traceback (most recent call last):
AttributeError: 'int' object has no attribute 'type' AttributeError: 'int' object has no attribute 'type'
Well, ok. We'd like our Op to be a bit more flexible. This can be done Well, ok. We'd like our Op to be a bit more flexible. This can be done
by fixing ``make_node`` a little bit: by modifying ``make_node`` to accept Python ``int`` or ``float`` as
``x`` and/or ``y``:
.. code-block:: python .. code-block:: python
...@@ -252,8 +253,8 @@ by fixing ``make_node`` a little bit: ...@@ -252,8 +253,8 @@ by fixing ``make_node`` a little bit:
mul.make_node = make_node mul.make_node = make_node
Whenever we pass a Python int or float instead of a Result as ``x`` or Whenever we pass a Python int or float instead of a Result as ``x`` or
``y``, make_node will convert it to :ref:`constant` for us. Constant ``y``, make_node will convert it to :ref:`constant` for us. ``gof.Constant``
is basically a :ref:`result` we statically know the value of. is a :ref:`result` we statically know the value of.
>>> x = double('x') >>> x = double('x')
>>> z = mul(x, 2) >>> z = mul(x, 2)
...@@ -263,18 +264,16 @@ is basically a :ref:`result` we statically know the value of. ...@@ -263,18 +264,16 @@ is basically a :ref:`result` we statically know the value of.
>>> f(3.4) >>> f(3.4)
6.7999999999999998 6.7999999999999998
And now it works the way we want it to. Now the code works the way we want it to.
Final version Final version
============= =============
While I would call the above definitions appropriately pedagogical, it The above example is pedagogical. When you define other basic arithmetic
is not necessarily the best way to do things, especially when you need operations ``add``, ``sub`` and ``div``, code for ``make_node`` can be
to define the other basic arithmetic operations ``add``, ``sub`` and shared between these Ops. Here is revised implementation of these four
``div``. It appears that the code for ``make_node`` can be shared arithmetic operators:
between these Ops. Here is the final version of the four arithmetic
operators (well, pending revision of this tutorial, I guess):
.. code-block:: python .. code-block:: python
...@@ -313,37 +312,27 @@ operators (well, pending revision of this tutorial, I guess): ...@@ -313,37 +312,27 @@ operators (well, pending revision of this tutorial, I guess):
div = BinaryDoubleOp(name = 'div', div = BinaryDoubleOp(name = 'div',
fn = lambda x, y: x / y) fn = lambda x, y: x / y)
Can you see how the definition of ``mul`` here does exactly the same Instead of working directly on an instance of Op, we create a subclass of
thing as the definition we had earlier? Op that we can parametrize. All the operations we define are binary. They
all work on two inputs with type ``double``. They all return a single
Instead of working directly on an instance of Op, we create a subclass Result of type ``double``. Therefore, ``make_node`` does the same thing
of Op that we can parametrize. First, all the operations we define are for all these operations, except for the Op reference ``self`` passed
binary, they all work on inputs with type ``double`` and they all as first argument to Apply. We define ``perform`` using the function
return a single Result of type ``double``. Therefore, ``make_node`` ``fn`` passed in the constructor.
basically does the same thing for all these operations, except for the
fact that the Op reference passed as first argument to Apply must be This design is a flexible way to define basic operations without
themselves. Therefore we can abstract out most of the logic and pass duplicating code. The same way a Type subclass represents a set of
self to Apply, which seems natural. We can also easily define structurally similar types (see previous section), an Op subclass
``perform`` as depending on a function or lambda expression passed in represents a set of structurally similar operations: operations that
the constructor. have the same input/output types, operations that only differ in one
small detail, etc. If you see common patterns in several Ops that you
This design therefore appears to be a flexible way to define our four want to define, it can be a good idea to abstract out what you can.
basic operations (and possibly many more!) without duplicating Remember that an Op is just an object which satisfies the contract
code. The same way a Type subclass represents a set of structurally described above on this page and that you should use all the tools at
similar types (see previous section), an Op subclass represents a set your disposal to create these objects as efficiently as possible.
of structurally similar operations: operations that have the same
input/output types, operations that only differ in one small detail, **Exercise**: Make a generic DoubleOp, where the number of
etc. If you see common patterns in several Ops that you want to arguments can also be given as a parameter.
define, it can be a good idea to abstract out what you can, as I did
here. Remember that an Op is just an object which satisfies the
contract described above on this page and that you should use all the
tools at your disposal to create these objects as efficiently as
possible.
While I could have made a generic DoubleOp where the number of
arguments can also be given as a parameter, I decided it was not
necessary here.
**Next:** `Implementing double in C`_ **Next:** `Implementing double in C`_
......
...@@ -11,7 +11,7 @@ Before tackling this tutorial, it is highly recommended to read the ...@@ -11,7 +11,7 @@ Before tackling this tutorial, it is highly recommended to read the
The advanced tutorial is meant to give the reader a greater The advanced tutorial is meant to give the reader a greater
understanding of the building blocks of Theano. Through this tutorial understanding of the building blocks of Theano. Through this tutorial
we are going to define one :ref:`type`, ``double`` and basic we are going to define one :ref:`type`, ``double``, and basic
arithmetic :ref:`operations <op>` on that Type. We will first define arithmetic :ref:`operations <op>` on that Type. We will first define
them using a Python implementation and then we will add a C them using a Python implementation and then we will add a C
implementation. implementation.
......
...@@ -166,7 +166,7 @@ first input (rank 0). ...@@ -166,7 +166,7 @@ first input (rank 0).
Purely destructive operations Purely destructive operations
============================= =============================
While some operations will operate inplace on their inputs, some will While some operations will operate inplace on their inputs, some might
simply destroy or corrupt them. For example, an Op could do temporary simply destroy or corrupt them. For example, an Op could do temporary
calculations right in its inputs. If that is the case, Theano also calculations right in its inputs. If that is the case, Theano also
needs to be notified. The way to notify Theano is to assume that some needs to be notified. The way to notify Theano is to assume that some
......
...@@ -176,7 +176,7 @@ optimization you wrote. For example, consider the following: ...@@ -176,7 +176,7 @@ optimization you wrote. For example, consider the following:
>>> e >>> e
[div(mul(add(y, z), x), add(y, z))] [div(mul(add(y, z), x), add(y, z))]
Nothing happened here. The reason is simple: ``add(y, z) != add(y, Nothing happened here. The reason is: ``add(y, z) != add(y,
z)``. That is the case for efficiency reasons. To fix this problem we z)``. That is the case for efficiency reasons. To fix this problem we
first need to merge the parts of the graph that represent the same first need to merge the parts of the graph that represent the same
computation, using the ``merge_optimizer`` defined in computation, using the ``merge_optimizer`` defined in
......
...@@ -14,7 +14,7 @@ WRITEME ...@@ -14,7 +14,7 @@ WRITEME
Don't define new Ops unless you have to Don't define new Ops unless you have to
======================================= =======================================
It is usually not very useful to define Ops that can be easily It is usually not useful to define Ops that can be easily
implemented using other already existing Ops. For example, instead of implemented using other already existing Ops. For example, instead of
writing a "sum_square_difference" Op, you should probably just write a writing a "sum_square_difference" Op, you should probably just write a
simple function: simple function:
......
...@@ -30,6 +30,12 @@ add. Note that from now on, we will use the term :term:`Result` to ...@@ -30,6 +30,12 @@ add. Note that from now on, we will use the term :term:`Result` to
mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result
objects). objects).
If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function``
instruction. Behind the scenes, ``f`` was being compiled into C code.
.. TODO: help
------------------------------------------- -------------------------------------------
**Step 1** **Step 1**
...@@ -119,16 +125,15 @@ The result is a numpy array. We can also use numpy arrays directly as ...@@ -119,16 +125,15 @@ The result is a numpy array. We can also use numpy arrays directly as
inputs: inputs:
>>> import numpy >>> import numpy
>>> f(numpy.ones((3, 5)), numpy.ones((3, 5))) >>> f(numpy.array([[1, 2], [3, 4]]), numpy.array([[10, 20], [30, 40]]))
array([[ 2., 2., 2., 2., 2.], array([[ 11., 22.],
[ 2., 2., 2., 2., 2.], [ 33., 44.]])
[ 2., 2., 2., 2., 2.]])
It is possible to add scalars to matrices, vectors to matrices, It is possible to add scalars to matrices, vectors to matrices,
scalars to vectors, etc. The behavior of these operations is defined scalars to vectors, etc. The behavior of these operations is defined
by :term:`broadcasting`. by :term:`broadcasting`.
The following types are readily available: The following types are available:
* **byte**: bscalar, bvector, bmatrix * **byte**: bscalar, bvector, bmatrix
* **32-bit integers**: iscalar, ivector, imatrix * **32-bit integers**: iscalar, ivector, imatrix
...@@ -136,16 +141,15 @@ The following types are readily available: ...@@ -136,16 +141,15 @@ The following types are readily available:
* **float**: fscalar, fvector, fmatrix * **float**: fscalar, fvector, fmatrix
* **double**: dscalar, dvector, dmatrix * **double**: dscalar, dvector, dmatrix
The previous list is not exhaustive. A guide to all types compatible
with numpy arrays may be found :ref:`here <predefinedtypes>`.
.. note:: .. note::
Watch out for the distinction between 32 and 64 bit integers (i Watch out for the distinction between 32 and 64 bit integers (i
prefix vs the l prefix) and between 32 and 64 bit floats (f prefix prefix vs the l prefix) and between 32 and 64 bit floats (f prefix
vs the d prefix). vs the d prefix).
Try to mix and match them and see what happens. The previous list is
not exhaustive. A guide to all types compatible with numpy arrays may
be found :ref:`here <predefinedtypes>`.
**Next:** `More examples`_ **Next:** `More examples`_
......
...@@ -17,39 +17,63 @@ the logistic curve, which is given by: ...@@ -17,39 +17,63 @@ the logistic curve, which is given by:
s(x) = \frac{1}{1 + e^{-x}} s(x) = \frac{1}{1 + e^{-x}}
.. figure:: logistic.png
A plot of the logistic function, with x on the x-axis and s(x) on the
y-axis.
You want to compute the function :term:`elementwise` on matrices of You want to compute the function :term:`elementwise` on matrices of
doubles. doubles, which means that you want to apply this function to each
individual element of the matrix.
Well, what you do is this: Well, what you do is this:
>>> x = T.dmatrix('x') >>> x = T.dmatrix('x')
>>> s = 1 / (1 + T.exp(-x)) >>> s = 1 / (1 + T.exp(-x))
>>> logistic = function([x], s) >>> logistic = function([x], s)
>>> logistic([[0, 1], [-1, -2]])
array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]])
Alternatively: The reason logistic is performed elementwise is because all of its
operations---division, addition, exponentiation, and division---are
themselves elementwise operations.
>>> s = (T.tanh(x) + 1) / 2 It is also the case that:
>>> logistic = function([x], s)
.. math::
s(x) = \frac{1}{1 + e^{-x}} = \frac{1 + \tanh(x/2)}{2}
We can verify that this alternate form produces the same values:
>>> s2 = (1 + T.tanh(x / 2)) / 2
>>> logistic2 = function([x], s2)
>>> logistic2([[0, 1], [-1, -2]])
array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]])
Computing more than one thing at the same time Computing more than one thing at the same time
============================================== ==============================================
Theano supports functions with multiple outputs. For example, we can Theano supports functions with multiple outputs. For example, we can
compute the :term:`elementwise` absolute difference between two compute the :term:`elementwise` difference, absolute difference, and
matrices ``x`` and ``y`` and the squared difference at the same time: squared difference between two matrices ``x`` and ``y`` at the same time:
>>> x, y = T.dmatrices('xy') >>> x, y = T.dmatrices('xy')
>>> diff = x - y >>> diff = x - y
>>> abs_diff = abs(x - y) >>> abs_diff = abs(diff)
>>> diff_squared = diff**2 >>> diff_squared = diff**2
>>> f = function([x, y], [abs_diff, diff_squared]) >>> f = function([x, y], [diff, abs_diff, diff_squared])
When we use the function, it will return the two results (the printing When we use the function, it will return the two results (the printing
was reformatted for readability): was reformatted for readability):
>>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]]) >>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
[array([[ 1., 0.], [array([[ 1., 0.],
[-1., -2.]]),
array([[ 1., 0.],
[ 1., 2.]]), [ 1., 2.]]),
array([[ 1., 0.], array([[ 1., 0.],
[ 1., 4.]])] [ 1., 4.]])]
...@@ -62,9 +86,12 @@ Computing gradients ...@@ -62,9 +86,12 @@ Computing gradients
=================== ===================
Now let's use Theano for a slightly more sophisticated task: create a Now let's use Theano for a slightly more sophisticated task: create a
function which computes the derivative of some expression ``e`` with function which computes the derivative of some expression ``y`` with
respect to its parameter ``x``. For instance, we can compute the respect to its parameter ``x``. For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`. gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`.
Here is code to compute this gradient:
>>> x = T.dscalar('x') >>> x = T.dscalar('x')
>>> y = x**2 >>> y = x**2
...@@ -76,17 +103,26 @@ array(8.0) ...@@ -76,17 +103,26 @@ array(8.0)
array(188.40000000000001) array(188.40000000000001)
We can also compute the gradient of complex expressions such as the We can also compute the gradient of complex expressions such as the
logistic function defined above: logistic function defined above. It turns out that the derivative of the
logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
.. figure:: dlogistic.png
A plot of the gradient of the logistic function, with x on the x-axis
and :math:`ds(x)/dx` on the y-axis.
>>> x = T.dmatrix('x') >>> x = T.dmatrix('x')
>>> s = 1 / (1 + T.exp(-x)) >>> s = 1 / (1 + T.exp(-x))
>>> gs = T.grad(s, x) >>> gs = T.grad(s, x)
>>> glogistic = function([x], gs) >>> dlogistic = function([x], gs)
>>> dlogistic([[0, 1], [-1, -2]])
array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]])
The resulting function computes the gradient of its first argument The resulting function computes the gradient of its first argument
with respect to the second. It is pretty much equivalent in semantics with respect to the second. In this way, Theano can be used for
and in computational complexity as what you would obtain through an `automatic differentiation`_.
`automatic differentiation`_ tool.
.. note:: .. note::
...@@ -125,7 +161,7 @@ Making a function with state ...@@ -125,7 +161,7 @@ Making a function with state
It is also possible to make a function with an internal state. For It is also possible to make a function with an internal state. For
example, let's say we want to make an accumulator: at the beginning, example, let's say we want to make an accumulator: at the beginning,
the state is initialized to zero, then on each function call the state the state is initialized to zero. Then, on each function call, the state
is incremented by the function's argument. We'll also make it so that is incremented by the function's argument. We'll also make it so that
the increment has a default value of 1. the increment has a default value of 1.
...@@ -136,12 +172,12 @@ First let's define the accumulator function: ...@@ -136,12 +172,12 @@ First let's define the accumulator function:
>>> new_state = state + inc >>> new_state = state + inc
>>> accumulator = function([(inc, 1), ((state, new_state), 0)], new_state) >>> accumulator = function([(inc, 1), ((state, new_state), 0)], new_state)
The first argument is a pair. As we saw in the previous section this The first argument is a pair. As we saw in the previous section, this
simply means that inc is an input with a default value of 1. The means that ``inc`` is an input with a default value of 1. The
second argument has a new syntax which creates an internal state or second argument has syntax that creates an internal state or
closure. The syntax is ``((state_result, new_state_result), closure. The syntax is ``((state_result, new_state_result),
initial_value)``. What this means is that every time ``accumulator`` initial_value)``. What this means is that every time ``accumulator``
will be called, the value of the internal ``state`` will be replaced is called, the value of the internal ``state`` will be replaced
by the value computed as ``new_state``. In this case, the state will by the value computed as ``new_state``. In this case, the state will
be replaced by the result of incrementing it by ``inc``. be replaced by the result of incrementing it by ``inc``.
...@@ -152,7 +188,7 @@ however you like as long as the name does not conflict with the names ...@@ -152,7 +188,7 @@ however you like as long as the name does not conflict with the names
of other inputs. of other inputs.
Anyway, let's try it out! The state can be accessed using the square Anyway, let's try it out! The state can be accessed using the square
brackets notation ``[]``. You may access the state either by putting brackets notation ``[]``. You may access the state either by using
the :ref:`result` representing it or the name of that the :ref:`result` representing it or the name of that
:ref:`result`. In our example we can access the state either with the :ref:`result`. In our example we can access the state either with the
``state`` object or the string 'state'. ``state`` object or the string 'state'.
...@@ -174,8 +210,8 @@ array(301.0) ...@@ -174,8 +210,8 @@ array(301.0)
>>> accumulator['state'] >>> accumulator['state']
array(301.0) array(301.0)
It is of course possible to reset the state. This is done very It is possible to reset the state. This is done
naturally by assigning to the state using the square brackets by assigning to the state using the square brackets
notation: notation:
>>> accumulator['state'] = 5 >>> accumulator['state'] = 5
......
set terminal svg font "Bitstream Vera Sans,10" size 300,200
set output "logistic.svg"
set xrange [-6:6]
set xzeroaxis linetype -1
set yzeroaxis linetype -1
set xtics axis nomirror
set ytics axis nomirror 0,0.5,1
set key off
set grid
set border 1
set samples 400
plot 1/(1 + exp(-x)) with line linetype rgbcolor "blue" linewidth 2
set ytics axis nomirror 0,0.25
set output "dlogistic.svg"
plot 1/(1 + exp(-x)) * (1 - 1/(1 + exp(-x))) with line linetype rgbcolor "blue" linewidth 2
...@@ -3,11 +3,11 @@ ...@@ -3,11 +3,11 @@
Using Module Using Module
============ ============
Now that we're familiar with the basics, we can see Theano's more Now that we're familiar with the basics, we introduce Theano's more
advanced interface, Module. This interface allows you to define Theano advanced interface, Module. This interface allows you to define Theano
"objects" which can have many state variables and many methods sharing "objects" which can have many state variables and many methods sharing
these states. This is what you should use if you aim to use Theano to these states. This is what you should use to define complex systems such
define complex systems such as a neural network. as a neural network.
Remake of the "state" example Remake of the "state" example
...@@ -61,7 +61,7 @@ defined in our Module. ...@@ -61,7 +61,7 @@ defined in our Module.
The inc variable doesn't need to be declared as a Member because it The inc variable doesn't need to be declared as a Member because it
will only serve as an input to the method we will define. This is why will only serve as an input to the method we will define. This is why
it is defined as an :ref:`external` variable. Do note that it is it is defined as an :ref:`external` variable. Do note that it is
inconsequential if you do declare it as a Member - it is very unlikely inconsequential if you do declare it as a Member - it is unlikely
to cause you any problems. to cause you any problems.
.. note:: .. note::
......
...@@ -52,7 +52,7 @@ object for each of fn and gn). ...@@ -52,7 +52,7 @@ object for each of fn and gn).
>>> m.nearly_zeros = Method([], rv_u + rv_u - 2 * rv_u) >>> m.nearly_zeros = Method([], rv_u + rv_u - 2 * rv_u)
This function will always return a 2x2 matrix of very small numbers, or possibly This function will always return a 2x2 matrix of small numbers, or possibly
zeros. It illustrates that random variables are not re-drawn every time they zeros. It illustrates that random variables are not re-drawn every time they
are used, they are only drawn once (per call). are used, they are only drawn once (per call).
...@@ -84,7 +84,7 @@ seed method of a RandomStreamsInstance. ...@@ -84,7 +84,7 @@ seed method of a RandomStreamsInstance.
Of course, a RandomStreamsInstance can contain several RandomState instances and Of course, a RandomStreamsInstance can contain several RandomState instances and
these will _not_ all be seeded to the same seed_value. They will all be seeded these will _not_ all be seeded to the same seed_value. They will all be seeded
deterministically and very-probably uniquely as a function of the seed_value. deterministically and probably uniquely as a function of the seed_value.
Seeding the generator in this way makes it possible to repeat random streams. Seeding the generator in this way makes it possible to repeat random streams.
......
...@@ -22,7 +22,7 @@ much longer than intended - maybe we should just link to it! --OB ...@@ -22,7 +22,7 @@ much longer than intended - maybe we should just link to it! --OB
Predefined types Predefined types
---------------- ----------------
Theano gives you many premade types to work with. These types are Predefined types are
located in the ``theano.tensor`` package. The name of the types follow located in the ``theano.tensor`` package. The name of the types follow
a recipe: a recipe:
...@@ -53,9 +53,9 @@ col [m, 1] No Yes ...@@ -53,9 +53,9 @@ col [m, 1] No Yes
matrix [m, n] No No matrix [m, n] No No
====== ====== ========================================== ============================================= ====== ====== ========================================== =============================================
So for example if you want a row of 32-bit floats, it is available So, if you want a row of 32-bit floats, it is available
under ``theano.tensor.frow`` and if you want a matrix of unsigned as ``theano.tensor.frow``. If you want a matrix of unsigned
32-bit integers it is available under ``theano.tensor.imatrix``. 32-bit integers it is available as ``theano.tensor.imatrix``.
Each of the types described above can be constructed by two methods: Each of the types described above can be constructed by two methods:
a singular version (e.g., ``dmatrix``) and a plural version a singular version (e.g., ``dmatrix``) and a plural version
...@@ -108,16 +108,18 @@ complex128 complex 128 (two float64) ...@@ -108,16 +108,18 @@ complex128 complex 128 (two float64)
.. note:: .. note::
Even though ``theano.tensor`` does not define any type using Even though ``theano.tensor`` does not define any type
``complex`` dtypes (``complex64`` or ``complex128``), you can define using ``complex`` dtypes (``complex64`` or ``complex128``),
them explicitly with ``Tensor`` (see example below). However, few you can define them explicitly with ``Tensor`` (see example
operations are fully supported for complex types: as of version 0.1, below). However, few operations are fully supported for complex
only elementary operations (``+-*/``) have C implementations. types: as of version 0.1, only elementary operations (``+-*/``)
have C implementations. Additionally, complex types have received
little testing.
The broadcastable pattern, on the other hand, indicates both the The broadcastable pattern indicates both the number of dimensions and
number of dimensions and whether a particular dimension has length whether a particular dimension must have length 1.
1. Here is a handy table mapping the :term:`broadcastable Here is a table mapping the :term:`broadcastable
<broadcasting>` pattern to what kind of tensor it encodes: <broadcasting>` pattern to what kind of tensor it encodes:
===================== ================================= ===================== =================================
...@@ -136,14 +138,18 @@ pattern interpretation ...@@ -136,14 +138,18 @@ pattern interpretation
[False, False, False] A MxNxP tensor (pattern of a + b) [False, False, False] A MxNxP tensor (pattern of a + b)
===================== ================================= ===================== =================================
For dimensions in which broadcasting is False, the length of this
dimension can be 1 or more. For dimensions in which broadcasting is True,
the length of this dimension must be 1.
When two tensors have a different number of dimensions, the broadcastable When two tensors have a different number of dimensions, the broadcastable
pattern is *expanded to the left*, by padding with ``True``. So, for example, pattern is *expanded to the left*, by padding with ``True``. For example,
a vector's pattern, ``[False]``, could be expanded to ``[True, False]``, and a vector's pattern, ``[False]``, could be expanded to ``[True, False]``, and
would behave like a row (1xN matrix). In the same way, a matrix (``[False, would behave like a row (1xN matrix). In the same way, a matrix (``[False,
False]``) would behave like a 1xNxP tensor (``[True, False, False]``). False]``) would behave like a 1xNxP tensor (``[True, False, False]``).
So if we wanted to create a type representing a 3D array of unsigned If we wanted to create a type representing a 3D array of unsigned
bytes, we would simply do: bytes, we would do:
.. code-block:: python .. code-block:: python
...@@ -158,10 +164,8 @@ bytes, we would simply do: ...@@ -158,10 +164,8 @@ bytes, we would simply do:
Ops Ops
=== ===
There's a lot of operations readily available in the ``theano.tensor`` There are a lot of operations available in the ``theano.tensor`` package.
package. They do not require much explanation according to this See :ref:`oplist`.
tutorial's author, so he will simply direct you to the :ref:`oplist`
:)
......
...@@ -24,7 +24,7 @@ difficult, we will give our Op a solid C implementation. ...@@ -24,7 +24,7 @@ difficult, we will give our Op a solid C implementation.
Implementing a new Op in Python Implementing a new Op in Python
=============================== ===============================
This is actually very simple to do. You are required to define two You are required to define two
methods - one to create the :ref:`apply` node every time your Op is methods - one to create the :ref:`apply` node every time your Op is
applied to some inputs, declaring the outputs in the process and applied to some inputs, declaring the outputs in the process and
another to operate on the inputs. There is also one optional method another to operate on the inputs. There is also one optional method
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论