提交 edfd9f24 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

Correct Theano's tutorial: one more round of corrections

上级 dba02a39
......@@ -33,12 +33,12 @@ Let's break this down into several steps. The first step is to define
two symbols (*Variables*) representing the quantities that you want
to add. Note that from now on, we will use the term
*Variable* to mean "symbol" (in other words,
``x``, ``y``, ``z`` are all *Variable* objects). The output of the function
``f`` is a ``numpy.ndarray`` with zero dimensions.
*x*, *y*, *z* are all *Variable* objects). The output of the function
*f* is a ``numpy.ndarray`` with zero dimensions.
If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function``
instruction. Behind the scene, ``f`` was being compiled into C code.
instruction. Behind the scene, *f* was being compiled into C code.
.. note:
......@@ -51,9 +51,9 @@ instruction. Behind the scene, ``f`` was being compiled into C code.
>>> x = theano.tensor.ivector()
>>> y = -x
``x`` and ``y`` are both Variables, i.e. instances of the
*x* and *y* are both Variables, i.e. instances of the
``theano.gof.graph.Variable`` class. The
type of both ``x`` and ``y`` is ``theano.tensor.ivector``.
type of both *x* and *y* is ``theano.tensor.ivector``.
**Step 1**
......@@ -65,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar``
is the type we assign to "0-dimensional arrays (`scalar`) of doubles
(`d`)". It is a Theano :ref:`type`.
``dscalar`` is not a class. Therefore, neither ``x`` nor ``y``
``dscalar`` is not a class. Therefore, neither *x* nor *y*
are actually instances of ``dscalar``. They are instances of
:class:`TensorVariable`. ``x`` and ``y``
:class:`TensorVariable`. *x* and *y*
are, however, assigned the theano Type ``dscalar`` in their ``type``
field, as you can see here:
......@@ -91,13 +91,13 @@ could also learn more by looking into :ref:`graphstructures`.
**Step 2**
The second step is to combine ``x`` and ``y`` into their sum ``z``:
The second step is to combine *x* and *y* into their sum *z*:
>>> z = x + y
``z`` is yet another *Variable* which represents the addition of
``x`` and ``y``. You can use the :ref:`pp <libdoc_printing>`
function to pretty-print out the computation associated to ``z``.
*z* is yet another *Variable* which represents the addition of
*x* and *y*. You can use the :ref:`pp <libdoc_printing>`
function to pretty-print out the computation associated to *z*.
>>> print pp(z)
(x + y)
......@@ -105,15 +105,15 @@ function to pretty-print out the computation associated to ``z``.
**Step 3**
The last step is to create a function taking ``x`` and ``y`` as inputs
and giving ``z`` as output:
The last step is to create a function taking *x* and *y* as inputs
and giving *z* as output:
>>> f = function([x, y], z)
The first argument to :func:`function <function.function>` is a list of Variables
that will be provided as inputs to the function. The second argument
is a single Variable *or* a list of Variables. For either case, the second
argument is what we want to see as output when we apply the function. ``f`` may
argument is what we want to see as output when we apply the function. *f* may
then be used like a normal Python function.
......@@ -121,8 +121,8 @@ Adding two Matrices
===================
You might already have guessed how to do this. Indeed, the only change
from the previous example is that you need to instantiate ``x`` and
``y`` using the matrix Types:
from the previous example is that you need to instantiate *x* and
*y* using the matrix Types:
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_adding.test_adding_2
......@@ -153,12 +153,12 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`.
The following types are available:
* **byte**: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
* **32-bit integers**: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
* **64-bit integers**: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
* **float**: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4
* **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``
* **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``
* **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``
* **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``
* **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``
* **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``
The previous list is not exhaustive and a guide to all types compatible
with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.
......
......@@ -5,11 +5,11 @@
Understanding Memory Aliasing for Speed and Correctness
=======================================================
The aggressive reuse of memory is one of the ways Theano makes code fast, and
it's important for the correctness and speed of your program that you understand
which buffers Theano might alias to which other.
The aggressive reuse of memory is one of the ways through which Theano makes code fast, and
it is important for the correctness and speed of your program that you understand
how Theano might alias buffers.
This section describes the principles based on which Theano treats memory, and explains
This section describes the principles based on which Theano handles memory, and explains
when you might want to alter the default behaviour of some functions and
methods for faster performance.
......@@ -17,7 +17,7 @@ methods for faster performance.
The Memory Model: Two Spaces
============================
There are some simple principles that guide Theano's treatment of memory. The
There are some simple principles that guide Theano's handling of memory. The
main idea is that there is a pool of memory managed by Theano, and Theano tracks
changes to values in that pool.
......@@ -26,14 +26,14 @@ changes to values in that pool.
* Theano functions only modify buffers that are in Theano's memory space.
* Theano's memory space includes the buffers allocated to store shared
* Theano's memory space includes the buffers allocated to store ``shared``
variables and the temporaries used to evaluate functions.
* Physically, Theano's memory space may be spread across the host, a GPU
device(s), and in the future may even include objects on a remote machine.
* The memory allocated for a shared variable buffer is unique: it is never
aliased to another shared variable.
* The memory allocated for a ``shared`` variable buffer is unique: it is never
aliased to another ``shared`` variable.
* Theano's managed memory is constant while Theano functions are not running
and Theano's library code is not running.
......@@ -42,11 +42,10 @@ changes to values in that pool.
outputs, and to expect user-space values for inputs.
The distinction between Theano-managed memory and user-managed memory can be
broken down by some Theano functions (e.g. shared, get_value and the
constructors for In and Out) by using
a ``borrow=True`` flag. This can make those methods faster (by avoiding copy
operations) at the expense of risking subtle bugs in the overall program (by
aliasing memory).
broken down by some Theano functions (e.g. ``shared``, ``get_value`` and the
constructors for ``In`` and ``Out``) by using a ``borrow=True`` flag.
This can make those methods faster (by avoiding copy operations) at the expense
of risking subtle bugs in the overall program (by aliasing memory).
The rest of this section is aimed at helping you to understand when it is safe
to use the ``borrow=True`` argument and reap the benefits of faster code.
......@@ -69,9 +68,9 @@ A ``borrow`` argument can be provided to the shared-variable constructor.
s_false = theano.shared(np_array, borrow=False)
s_true = theano.shared(np_array, borrow=True)
By default (``s_default``) and when explicitly setting ``borrow=False``, the
shared variable we construct gets a [deep] copy of ``np_array``. So changes we
subsequently make to ``np_array`` have no effect on our shared variable.
By default (*s_default*) and when explicitly setting ``borrow=False``, the
shared variable we construct gets a [deep] copy of *np_array*. So changes we
subsequently make to *np_array* have no effect on our shared variable.
.. code-block:: python
......@@ -82,31 +81,30 @@ subsequently make to ``np_array`` have no effect on our shared variable.
s_true.get_value() # -> array([2.0, 2.0])
If we are running this with the CPU as the device,
then changes we make to np_array *right away* will show up in
then changes we make to *np_array* *right away* will show up in
``s_true.get_value()``
because NumPy arrays are mutable, and ``s_true`` is using the ``np_array``
because NumPy arrays are mutable, and *s_true* is using the *np_array*
object as it's internal buffer.
However, this aliasing of ``np_array`` and ``s_true`` is not guaranteed to occur,
However, this aliasing of *np_array* and *s_true* is not guaranteed to occur,
and may occur only temporarily even if it occurs at all.
It is not guaranteed to occur because if Theano is using a GPU device, then the
borrow flag has no effect.
It may occur only temporarily because
if we call a Theano function that updates the value of ``s_true`` the aliasing
``borrow`` flag has no effect. It may occur only temporarily because
if we call a Theano function that updates the value of *s_true* the aliasing
relationship *may* or *may not* be broken (the function is allowed to
update the shared variable by modifying its buffer, which will preserve
update the ``shared`` variable by modifying its buffer, which will preserve
the aliasing, or by changing which buffer the variable points to, which
will terminate the aliasing).
*Take home message:*
It is safe practice (and a good idea) to use ``borrow=True`` in a shared
variable constructor when the shared variable stands for a large object (in
It is a safe practice (and a good idea) to use ``borrow=True`` in a ``shared``
variable constructor when the ``shared`` variable stands for a large object (in
terms of memory footprint) and you do not want to create copies of it in
memory.
It is not a reliable technique to use ``borrow=True`` to modify shared variables
by side-effect, because with some devices (e.g. GPU devices) this technique will
It is not a reliable technique to use ``borrow=True`` to modify ``shared`` variables
through side-effect, because with some devices (e.g. GPU devices) this technique will
not work.
Borrowing when Accessing Value of Shared Variables
......@@ -115,7 +113,8 @@ Borrowing when Accessing Value of Shared Variables
Retrieving
----------
A ``borrow`` argument can also be used to control how a shared variable's value is retrieved.
A ``borrow`` argument can also be used to control how a ``shared`` variable's value is
retrieved.
.. If you modify this code, also change :
......@@ -136,11 +135,11 @@ When ``borrow=True`` is passed to ``get_value``, it means that the return value
But both of these calls might create copies of the internal memory.
The reason that ``borrow=True`` might still make a copy is that the internal
representation of a shared variable might not be what you expect. When you
create a shared variable by passing a NumPy array for example, then ``get_value()``
representation of a ``shared`` variable might not be what you expect. When you
create a ``shared`` variable by passing a NumPy array for example, then ``get_value()``
must return a NumPy array too. That's how Theano can make the GPU use
transparent. But when you are using a GPU (or in the future perhaps a remote machine), then the numpy.ndarray
is not the internal representation of your data.
transparent. But when you are using a GPU (or in the future perhaps a remote machine),
then the numpy.ndarray is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to
``get_value``. It will never cast the internal object (always return in
......@@ -156,28 +155,28 @@ It is possible to use ``borrow=False`` in conjunction with
This is primarily for internal debugging, not for typical use.
For the transparent use of different type of optimization Theano can make,
there is the policy that get_value() always return by default the same object type
it received when the shared variable was created. So if you created manually data on
the gpu and create a shared variable on the gpu with this data, get_value will always
return gpu data even when return_internal_type=False.
there is the policy that ``get_value()`` always return by default the same object type
it received when the ``shared`` variable was created. So if you created manually data on
the gpu and create a ``shared`` variable on the gpu with this data, ``get_value`` will always
return gpu data even when ``return_internal_type=False``.
*Take home message:*
It is safe (and sometimes much faster) to use ``get_value(borrow=True)`` when
your code does not modify the return value. *Do not use this to modify a shared
your code does not modify the return value. *Do not use this to modify a ``shared``
variable by side-effect* because it will make your code device-dependent.
Modification of GPU variables by this sort of side-effect is impossible.
Modification of GPU variables through this sort of side-effect is impossible.
Assigning
---------
Shared variables also have a ``set_value`` method that can accept an optional ``borrow=True`` argument.
The semantics are similar to those of creating a new shared variable -
``borrow=False`` is the default and ``borrow=True`` means that Theano *may*
reuse the buffer you provide as the internal storage for the variable.
``Shared`` variables also have a ``set_value`` method that can accept an optional
``borrow=True`` argument. The semantics are similar to those of creating a new
``shared`` variable - ``borrow=False`` is the default and ``borrow=True`` means
that Theano *may* reuse the buffer you provide as the internal storage for the variable.
A standard pattern for manually updating the value of a shared variable is as
follows.
A standard pattern for manually updating the value of a ``shared`` variable is as
follows:
.. code-block:: python
......@@ -185,49 +184,54 @@ follows.
some_inplace_fn(s.get_value(borrow=True)),
borrow=True)
This pattern works regardless of the compute device, and when the compute device
This pattern works regardless of the computing device, and when the latter
makes it possible to expose Theano's internal variables without a copy, then it
goes as fast as an in-place update.
proceeds as fast as an in-place update.
When shared variables are allocated on the GPU, the transfers to and from GPU device memory can
When ``shared`` variables are allocated on the GPU, the transfers to and from the GPU device memory can
be costly. Here are a few tips to ensure fast and efficient use of GPU memory and bandwidth:
* Prior to Theano 0.3.1, set_value did not work in-place on the GPU. This meant that sometimes,
* Prior to Theano 0.3.1, ``set_value`` did not work in-place on the GPU. This meant that, sometimes,
GPU memory for the new value would be allocated before the old memory was released. If you're
running near the limits of GPU memory, this could cause you to run out of GPU memory
unnecessarily. *Solution*: update to a newer version of Theano.
unnecessarily.
* If you are going to swap several chunks of data in and out of a shared variable repeatedly,
*Solution*: update to a newer version of Theano.
* If you are going to swap several chunks of data in and out of a ``shared`` variable repeatedly,
you will want to reuse the memory that you allocated the first time if possible - it is both
faster and more memory efficient.
*Solution*: upgrade to a recent version of Theano (>0.3.0) and consider padding your source
data to make sure that every chunk is the same size.
* It is also worth mentioning that, current GPU copying routines support only contiguous memory.
So Theano must make the ``value`` you provide ``c_contiguous`` prior to copying it.
This can require an extra copy of the data on the host. *Solution*: make sure that the value
you assign to a CudaNdarraySharedVariable is *already* ``c_contiguous``.
So Theano must make the value you provide *C-contiguous* prior to copying it.
This can require an extra copy of the data on the host.
*Solution*: make sure that the value
you assign to a CudaNdarraySharedVariable is *already* *C-contiguous*.
(Further remarks on the current implementation of the GPU version of set_value() can be found
(Further information on the current implementation of the GPU version of ``set_value()`` can be found
here: :ref:`libdoc_cuda_var`)
Retrieving and Assigning via the .value Property
Retrieving and Assigning via the ``.value`` Property
------------------------------------------------
Shared variables have a ``.value`` property that is connected to ``get_value``
``Shared`` variables have a ``.value`` property that is connected to ``get_value``
and ``set_value``. The borrowing behaviour of the property is controlled by a
boolean configuration variable ``config.shared.value_borrows``, which currently
defaults to ``True``. If that variable is ``True`` then an assignment like ``s.value=v``
defaults to *True*. If that variable is *True* then an assignment like ``s.value=v``
is equivalent to ``s.set_value(v, borrow=True)``, and a retrieval like ``print
s.value`` is equivalent to ``print s.get_value(borrow=True)``. Likewise,
if ``config.shared.value_borrows`` is ``False``, then the borrow parameter that the ``.value`` property
passes to ``set_value`` and ``get_value`` is ``False``.
if ``config.shared.value_borrows`` is *False*, then the borrow parameter that the ``.value`` property
passes to ``set_value`` and ``get_value`` is *False*.
The ``True`` default value of ``config.shared.value_borrows`` means that
The *True* default value of ``config.shared.value_borrows`` means that
aliasing can sometimes happen and sometimes not, which can be confusing.
Be aware that the default value may be changed to ``False`` sometime in the
Be aware that the default value may be changed to *False* sometime in the
not-to-distant future. This change will create more copies, and potentially slow
down code that accesses ``.value`` attributes inside tight loops. To avoid this
potential impact on your code, use the ``.get_value`` and ``.set_value`` methods
......@@ -238,7 +242,7 @@ Borrowing when Constructing Function Objects
============================================
A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects
that control how ``theano.function`` handles its arguments and return value[s].
that control how ``theano.function`` handles its argument[s] and return value[s].
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_aliasing.test_aliasing_3
......@@ -259,17 +263,17 @@ course of evaluating that function (e.g. ``f``).
Borrowing an output means that Theano will not insist on allocating a fresh
output buffer every time you call the function. It will possibly reuse the same one as
a previous call, and overwrite the old contents. Consequently, it may overwrite
old return values by side effect.
on a previous call, and overwrite the old content. Consequently, it may overwrite
old return values through side-effect.
Those return values may also be overwritten in
the course of evaluating *another compiled function* (for example, the output
may be aliased to a shared variable). So be careful to use a borrowed return
may be aliased to a ``shared`` variable). So be careful to use a borrowed return
value right away before calling any more Theano functions.
The default is of course to *not borrow* internal results.
It is also possible to pass an ``return_internal_type=True`` flag to the ``Out``
It is also possible to pass a ``return_internal_type=True`` flag to the ``Out``
variable which has the same interpretation as the ``return_internal_type`` flag
to the shared variable's ``get_value`` function. Unlike ``get_value()``, the
to the ``shared`` variable's ``get_value`` function. Unlike ``get_value()``, the
combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
``Out()`` are not guaranteed to avoid copying an output value. They are just
hints that give more flexibility to the compilation and optimization of the
......@@ -277,11 +281,11 @@ graph.
*Take home message:*
When an input ``x`` to a function is not needed after the function returns and you
When an input *x* to a function is not needed after the function returns and you
would like to make it available to Theano as additional workspace, then consider
marking it with ``In(x, borrow=True)``. It may make the function faster and
reduce its memory requirement.
When a return value ``y`` is large (in terms of memory footprint), and you only need to read from it once, right
When a return value *y* is large (in terms of memory footprint), and you only need to read from it once, right
away when it's returned, then consider marking it with an ``Out(y,
borrow=True)``.
......@@ -8,11 +8,11 @@ IfElse vs Switch
================
- Both Ops build a condition over symbolic variables.
- ``IfElse`` takes a `boolean` condition and two variables as inputs.
- ``Switch`` takes a `tensor` as condition and two variables as inputs.
- Both ops build a condition over symbolic variables.
- ``IfElse`` takes a *boolean* condition and two variables as inputs.
- ``Switch`` takes a *tensor* as condition and two variables as inputs.
``switch`` is an elementwise operation and is thus more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables, ``ifelse`` is lazy and only
- Whereas ``switch`` evaluates both *output* variables, ``ifelse`` is lazy and only
evaluates one variable with respect to the condition.
**Example**
......@@ -52,7 +52,7 @@ IfElse vs Switch
f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic)
In this example, the ``IfElse`` Op spends less time (about half as much) than ``Switch``
In this example, the ``IfElse`` op spends less time (about half as much) than ``Switch``
since it computes only one variable out of the two.
.. code-block:: python
......@@ -64,7 +64,7 @@ since it computes only one variable out of the two.
Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
variables and take the same computation time as ``switch``. Although the linker
is not currently set by default to 'cvm', it will be in the near future.
is not currently set by default to ``cvm``, it will be in the near future.
There is no automatic optimization replacing a ``switch`` with a
broadcasted scalar to an ``ifelse``, as this is not always faster. See
......
......@@ -6,15 +6,15 @@ Debugging Theano: FAQ and Troubleshooting
=========================================
There are many kinds of bugs that might come up in a computer program.
This page is structured as a FAQ. It should provide recipes to tackle common
problems, and introduce some of the tools that we use to find problems in our
Theano code, and even (it happens) in Theano's internals, such as
This page is structured as a FAQ. It provides recipes to tackle common
problems, and introduces some of the tools that we use to find problems in our
own Theano code, and even (it happens) in Theano's internals, in
:ref:`using_debugmode`.
Isolating the Problem/Testing Theano Compiler
---------------------------------------------
You can run your Theano function in a DebugMode(:ref:`using_debugmode`).
You can run your Theano function in a :ref:`DebugMode<using_debugmode>`.
This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.
......@@ -87,9 +87,9 @@ Running the above code generates the following error message:
_dot22(x, <TensorType(float64, matrix)>), [_dot22.0],
_dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4')
Needless to say the above is not very informative and does not provide much in
Needless to say, the above is not very informative and does not provide much in
the way of guidance. However, by instrumenting the code ever so slightly, we
can get Theano to give us the exact source of the error.
can get Theano to reveal the exact source of the error.
.. code-block:: python
......@@ -103,12 +103,12 @@ can get Theano to give us the exact source of the error.
# provide Theano with a default test-value
x.tag.test_value = numpy.random.rand(5,10)
In the above, we are tagging the symbolic matrix ``x`` with a special test
In the above, we are tagging the symbolic matrix *x* with a special test
value. This allows Theano to evaluate symbolic expressions on-the-fly (by
calling the ``perform`` method of each Op), as they are being defined. Sources
calling the ``perform`` method of each op), as they are being defined. Sources
of error can thus be identified with much more precision and much earlier in
the compilation pipeline. For example, running the above code yields the
following error message, which properly identifies line 23 as the culprit.
following error message, which properly identifies *line 23* as the culprit.
.. code-block:: bash
......@@ -121,33 +121,33 @@ following error message, which properly identifies line 23 as the culprit.
z[0] = numpy.asarray(numpy.dot(x, y))
ValueError: ('matrices are not aligned', (5, 10), (20, 10))
The compute_test_value mechanism works as follows:
The ``compute_test_value`` mechanism works as follows:
* Theano ``constants`` and ``shared variables`` are used as is. No need to instrument them.
* A Theano ``variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be
* Theano ``constants`` and ``shared`` variables are used as is. No need to instrument them.
* A Theano *variable* (i.e. ``dmatrix``, ``vector``, etc.) should be
given a special test value through the attribute ``tag.test_value``.
* Theano automatically instruments intermediate results. As such, any quantity
derived from ``x`` will be given a `tag.test_value` automatically.
derived from *x* will be given a ``tag.test_value`` automatically.
`compute_test_value` can take the following values:
``compute_test_value`` can take the following values:
* ``off``: Default behavior. This debugging mechanism is inactive.
* ``raise``: Compute test values on the fly. Any variable for which a test
value is required, but not provided by the user, is treated as an error. An
exception is raised accordingly.
* ``warn``: Idem, but a warning is issued instead of an Exception.
* ``warn``: Idem, but a warning is issued instead of an *Exception*.
* ``ignore``: Silently ignore the computation of intermediate test values, if a
variable is missing a test value.
.. note::
This feature is currently incompatible with ``Scan`` and also with Ops
This feature is currently incompatible with ``Scan`` and also with ops
which do not implement a ``perform`` method.
How do I Print an Intermediate Value in a Function/Method?
----------------------------------------------------------
Theano provides a 'Print' Op to do this.
Theano provides a 'Print' op to do this.
.. code-block:: python
......@@ -166,8 +166,8 @@ Theano provides a 'Print' Op to do this.
Since Theano runs your program in a topological order, you won't have precise
control over the order in which multiple Print() Ops are evaluted. For a more
precise inspection of what's being computed where, when, and how, see the
control over the order in which multiple ``Print()`` ops are evaluted. For a more
precise inspection of what's being computed where, when, and how, see the discussion
:ref:`faq_wraplinker`.
.. warning::
......@@ -178,8 +178,8 @@ precise inspection of what's being computed where, when, and how, see the
to remove them to know if this is the cause or not.
How do I Print a Graph (before or after compilation)?
----------------------------------------------------------
"How do I Print a Graph?" (before or after compilation)
-------------------------------------------------------
.. TODO: dead links in the next paragraph
......@@ -193,31 +193,33 @@ You can read about them in :ref:`libdoc_printing`.
The Function I Compiled is Too Slow, what's up?
-----------------------------------------------
First, make sure you're running in FAST_RUN mode.
FAST_RUN is the default mode, but make sure by passing ``mode='FAST_RUN'``
"The Function I Compiled is Too Slow, what's up?"
-------------------------------------------------
First, make sure you're running in ``FAST_RUN`` mode. Even though
``FAST_RUN`` is the default mode, insist by passing ``mode='FAST_RUN'``
to ``theano.function`` (or ``theano.make``) or by setting :attr:`config.mode`
to ``FAST_RUN``.
Second, try the theano :ref:`using_profilemode`. This will tell you which
Apply nodes, and which Ops are eating up your CPU cycles.
Second, try the Theano :ref:`using_profilemode`. This will tell you which
``Apply`` nodes, and which ops are eating up your CPU cycles.
Tips:
* use the flags ``floatX=float32`` to use *float32* instead of *float64*
for the theano type matrix(),vector(),...(if you used dmatrix, dvector()
they stay at *float64*).
* Check that in the profile mode that there is no Dot operation and you're
multiplying two matrices of the same type. Dot should be optimized to
dot22 when the inputs are matrices and of the same type. This can happen
when using floatX=float32 and something in the graph makes one of the
inputs *float64*.
* Use the flags ``floatX=float32`` to require type *float32* instead of *float64*;
Use the Theano constructors matrix(),vector(),... instead of dmatrix(), dvector(),...
since they respectively involve the default types *float32* and *float64*.
* Check in the ``profile`` mode that there is no ``Dot`` op in the post-compilation
graph while you are multiplying two matrices of the same type. ``Dot`` should be
optimized to ``dot22`` when the inputs are matrices and of the same type. This can
still happen when using ``floatX=float32`` when one of the inputs of the graph is
of type *float64*.
.. _faq_wraplinker:
How do I Step through a Compiled Function with the WrapLinker?
--------------------------------------------------------------
"How do I Step through a Compiled Function with the WrapLinker?"
----------------------------------------------------------------
This is not exactly a FAQ, but the doc is here for now...
It's pretty easy to roll-your-own evaluation mode.
......@@ -234,9 +236,9 @@ Check out this one:
wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()], [print_eval])
super(PrintEverythingMode, self).__init__(wrap_linker, optimizer='fast_run')
When you use ``mode=PrintEverythingMode()`` as the mode for Function or Method,
then you should see [potentially a lot of] output. Every Apply node will be printed out,
along with its position in the graph, the arguments to the ``perform`` or
When you use ``mode=PrintEverythingMode()`` as the mode for ``Function`` or ``Method``,
then you should see [potentially a lot of] output. Every ``Apply`` node will be printed out,
along with its position in the graph, the arguments to the functions ``perform`` or
``c_code`` and the output it computed.
>>> x = T.dscalar('x')
......@@ -247,15 +249,15 @@ along with its position in the graph, the arguments to the ``perform`` or
Admittedly, this may be a huge amount of
output to read through if you are using big tensors... but you can choose to
put logic inside of the *print_eval* function that would, for example, only
print something out if a certain kind of Op was used, at a certain program
position, or if a particular value shows up in one of the inputs or outputs.
put logic inside of the *print_eval* function that would, for example, print
something out only if a certain kind of op were used, at a certain program
position, or only if a particular value showed up in one of the inputs or outputs.
Use your imagination :)
.. TODO: documentation for link.WrapLinkerMany
This can be a really powerful debugging tool.
Note the call to *fn* inside the call to *print_eval*; without it, the graph wouldn't get computed at all!
This can be a really powerful debugging tool. Note the call to *fn* inside the call to
*print_eval*; without it, the graph wouldn't get computed at all!
How to Use pdb ?
----------------
......@@ -264,7 +266,7 @@ In the majority of cases, you won't be executing from the interactive shell
but from a set of Python scripts. In such cases, the use of the Python
debugger can come in handy, especially as your models become more complex.
Intermediate results don't necessarily have a clear name and you can get
exceptions which are hard to decipher, due to the "compiled" nature of
exceptions which are hard to decipher, due to the "compiled" nature of the
functions.
Consider this example script ("ex.py"):
......@@ -287,7 +289,7 @@ Consider this example script ("ex.py"):
f(mat1, mat2)
This is actually so simple the debugging could be done easily, but it's for
illustrative purposes. As the matrices can't be element-wise multiplied
illustrative purposes. As the matrices can't be multiplied element-wise
(unsuitable shapes), we get the following exception:
.. code-block:: text
......@@ -299,11 +301,11 @@ illustrative purposes. As the matrices can't be element-wise multiplied
File "/u/username/Theano/theano/gof/link.py", line 267, in streamline_default_f
File "/u/username/Theano/theano/gof/cc.py", line 1049, in execute ValueError: ('Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 5)', Elemwise{mul,no_inplace}(a, b), Elemwise{mul,no_inplace}(a, b))
The call stack contains a few useful informations to trace back the source
The call stack contains some useful information to trace back the source
of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line
tells us about the Op that caused the exception. In this case it's a "mul"
tells us about the op that caused the exception. In this case it's a "mul"
involving variables with names "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name.
......
......@@ -74,7 +74,7 @@ Computing More than one Thing at the Same Time
Theano supports functions with multiple outputs. For example, we can
compute the :ref:`elementwise <libdoc_tensor_elementwise>` difference, absolute difference, and
squared difference between two matrices ``a`` and ``b`` at the same time:
squared difference between two matrices *a* and *b* at the same time:
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_3
......@@ -123,7 +123,7 @@ array(35.0)
This makes use of the :ref:`Param <function_inputs>` class which allows
you to specify properties of your function's parameters with greater detail. Here we
give a default value of 1 for ``y`` by creating a ``Param`` instance with
give a default value of 1 for *y* by creating a ``Param`` instance with
its ``default`` field set to 1.
Inputs with default values must follow inputs without default
......@@ -149,7 +149,7 @@ array(34.0)
array(33.0)
.. note::
``Param`` does not know the name of the local variables ``y`` and ``w``
``Param`` does not know the name of the local variables *y* and *w*
that are passed as arguments. The symbolic variable objects have name
attributes (set by ``dscalars`` in the example above) and *these* are the
names of the keyword parameters in the functions that we build. This is
......@@ -171,7 +171,7 @@ example, let's say we want to make an accumulator: at the beginning,
the state is initialized to zero. Then, on each function call, the state
is incremented by the function's argument.
First let's define the ``accumulator`` function. It adds its argument to the
First let's define the *accumulator* function. It adds its argument to the
internal state, and returns the old state value.
.. If you modify this code, also change :
......@@ -187,13 +187,13 @@ so-called :ref:`shared variables<libdoc_compile_shared>`.
These are hybrid symbolic and non-symbolic variables whose value may be shared
between multiple functions. Shared variables can be used in symbolic expressions just like
the objects returned by ``dmatrices(...)`` but they also have an internal
value, that defines the value taken by this symbolic variable in *all* the
value that defines the value taken by this symbolic variable in *all* the
functions that use it. It is called a *shared* variable because its value is
shared between many functions. The value can be accessed and modified by the
``.get_value()`` and ``.set_value()`` methods. We will come back to this soon.
The other new thing in this code is the ``updates`` parameter of function.
The updates is a list of pairs of the form (shared-variable, new expression).
The other new thing in this code is the ``updates`` parameter of ``function``.
``updates`` must be supplied with a list of pairs of the form (shared-variable, new expression).
It can also be a dictionary whose keys are shared-variables and values are
the new expressions. Either way, it means "whenever this function runs, it
will replace the ``.value`` of each shared variable with the result of the
......@@ -241,9 +241,9 @@ achieve a similar result by returning the new expressions, and working with
them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
updates). Also, theano has more control over where and how shared variables are
updates). Also, Theano has more control over where and how shared variables are
allocated, which is one of the important elements of getting good performance
on the GPU.
on the :ref:`GPU<using_gpu>`.
It may happen that you expressed some formula using a shared variable, but
you do *not* want to use its value. In this case, you can use the
......@@ -326,16 +326,16 @@ so we get different random numbers every time.
>>> f_val1 = f() #different numbers from f_val0
When we add the extra argument ``no_default_updates=True`` to
``function`` (as in ``g``), then the random number generator state is
``function`` (as in *g*), then the random number generator state is
not affected by calling the returned function. So, for example, calling
``g`` multiple times will return the same numbers.
*g* multiple times will return the same numbers.
>>> g_val0 = g() # different numbers from f_val0 and f_val1
>>> g_val1 = g() # same numbers as g_val0!
An important remark is that a random variable is drawn at most once during any
single function execution. So the ``nearly_zeros`` function is guaranteed to
return approximately 0 (except for rounding error) even though the ``rv_u``
single function execution. So the *nearly_zeros* function is guaranteed to
return approximately 0 (except for rounding error) even though the *rv_u*
random variable appears three times in the output expression.
>>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
......@@ -363,8 +363,8 @@ Sharing Streams Between Functions
---------------------------------
As usual for shared variables, the random number generators used for random
variables are common between functions. So our ``nearly_zeros`` function will
update the state of the generators used in function ``f`` above.
variables are common between functions. So our *nearly_zeros* function will
update the state of the generators used in function *f* above.
For example:
......@@ -416,8 +416,9 @@ The preceding elements are featured in this more realistic example. It will be
prediction = p_1 > 0.5 # The prediction thresholded
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost:
# we shall return to this
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost
# (we shall return to this in a
# following section of this tutorial)
# Compile
train = theano.function(
......
......@@ -8,12 +8,12 @@ Extending Theano
Theano Graphs
-------------
- Theano works with symbolic graphs
- Those graphs are bi-partite graphs (graph with 2 types of nodes)
- The 2 types of nodes are Apply and Variable nodes
- Each Apply node has a link to the Op that it executes
- Theano works with symbolic graphs.
- Those graphs are bi-partite graphs (graph with 2 types of nodes).
- The 2 types of nodes are Apply and Variable nodes.
- Each Apply node has a link to the op that it executes.
Inputs and Outputs are lists of Theano variables
Inputs and Outputs are lists of Theano variables.
.. image:: ../hpcs2011_tutorial/pics/apply_node.png
:width: 500 px
......@@ -93,12 +93,12 @@ The first one is :func:`make_node`. The second one
would describe the computations that are required to be done
at run time. Currently there are 2 different possibilites:
implement the :func:`perform`
and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. The ``perform`` allows
to easily wrap an existing Python function into Theano. The ``c_code``
and/or :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. ``perform`` allows
to easily wrap an existing Python function into Theano. ``c_code``
and related methods allow the op to generate C code that will be
compiled and linked by Theano. On the other hand, the ``make_thunk``
method will be called only once during compilation and should generate
compiled and linked by Theano. On the other hand, ``make_thunk``
will be called only once during compilation and should generate
a ``thunk``: a standalone function that when called will do the wanted computations.
This is useful if you want to generate code and compile it yourself. For
example, this allows you to use PyCUDA to compile GPU code.
......@@ -117,7 +117,7 @@ The :func:`grad` method is required if you want to differentiate some cost whose
includes your op.
The :func:`__str__` method is useful in order to provide a more meaningful
string representation of your Op.
string representation of your op.
The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
work with your op.
......@@ -185,9 +185,9 @@ in a file and execute it with the ``nosetests`` program.
**Basic Tests**
Basic tests are done by you just by using the Op and checking that it
Basic tests are done by you just by using the op and checking that it
returns the right answer. If you detect an error, you must raise an
exception. You can use the `assert` keyword to automatically raise an
*exception*. You can use the ``assert`` keyword to automatically raise an
``AssertionError``.
.. code-block:: python
......@@ -211,10 +211,10 @@ exception. You can use the `assert` keyword to automatically raise an
**Testing the infer_shape**
When a class inherits from the ``InferShapeTester`` class, it gets the
``self._compile_and_check`` method that tests the Op ``infer_shape``
method. It tests that the Op gets optimized out of the graph if only
``self._compile_and_check`` method that tests the op's ``infer_shape``
method. It tests that the op gets optimized out of the graph if only
the shape of the output is needed and not the output
itself. Additionally, it checks that such an optimized graph computes
itself. Additionally, it checks that the optimized graph computes
the correct shape, by comparing it to the actual shape of the computed
output.
......@@ -222,8 +222,8 @@ output.
parameters the lists of input and output Theano variables, as would be
provided to ``theano.function``, and a list of real values to pass to the
compiled function (don't use shapes that are symmetric, e.g. (3, 3),
as they can easily to hide errors). It also takes the Op class to
verify that no Ops of that type appear in the shape-optimized graph.
as they can easily to hide errors). It also takes the op class as a parameter to
verify that no instance of it appears in the shape-optimized graph.
If there is an error, the function raises an exception. If you want to
see it fail, you can implement an incorrect ``infer_shape``.
......@@ -248,7 +248,7 @@ see it fail, you can implement an incorrect ``infer_shape``.
**Testing the gradient**
The function :ref:`verify_grad <validating_grad>`
verifies the gradient of an Op or Theano graph. It compares the
verifies the gradient of an op or Theano graph. It compares the
analytic (symbolically computed) gradient and the numeric
gradient (computed through the Finite Difference Method).
......@@ -266,13 +266,12 @@ the multiplication by 2).
.. TODO: repair defective links in the following paragraph
The class :class:`RopLop_checker`, give the functions
:func:`RopLop_checker.check_mat_rop_lop`,
:func:`RopLop_checker.check_rop_lop` and
:func:`RopLop_checker.check_nondiff_rop` that allow to test the
implementation of the Rop method of one Op.
The class :class:`RopLop_checker` defines the functions
:func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_rop_lop` and
:func:`RopLop_checker.check_nondiff_rop`. These allow to test the
implementation of the Rop method of a particular op.
To verify the Rop method of the DoubleOp, you can use this:
For instance, to verify the Rop method of the DoubleOp, you can use this:
.. code-block:: python
......@@ -290,7 +289,7 @@ Running your tests
You can run ``nosetests`` in the Theano folder to run all of Theano's
tests, including yours if they are somewhere in the directory
structure. You can run ``nosetests test_file.py`` to run only the
structure. For instance, you can run the following command lines to ``nosetests test_file.py`` to run only the
tests in that file. You can run ``nosetests
test_file.py:test_DoubleRop`` to run only the tests inside that test
class. You can run ``nosetests
......@@ -298,7 +297,7 @@ test_file.py:test_DoubleRop.test_double_op`` to run only one
particular test. More `nosetests
<http://readthedocs.org/docs/nose/en/latest/>`_ documentation.
You can also add this at the end of the test file:
You can also add this block the end of the test file and run the file:
.. code-block:: python
......@@ -311,14 +310,13 @@ You can also add this at the end of the test file:
**Testing GPU Ops**
Ops that execute on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
to make the distinction between both. Currently, we use this to test
if the NVIDIA driver works correctly with our sum reduction code on the
Ops to be executed on the GPU should inherit from the ``theano.sandbox.cuda.GpuOp``
and not ``theano.Op``. This allows Theano to distinguish them. Currently, we
use this to test if the NVIDIA driver works correctly with our sum reduction code on the
GPU.
A more extensive discussion than this section's may be found in the advanced
A more extensive discussion of this section's topic may be found in the advanced
tutorial :ref:`Extending Theano<extending>`
-------------------------------------------
......
......@@ -8,19 +8,17 @@ Frequently Asked Questions
TypeError: object of type 'TensorVariable' has no len()
-------------------------------------------------------
If you receive this error:
If you receive the following error, it is because the Python function *__len__* cannot
be implemented on Theano variables:
.. code-block:: python
TypeError: object of type 'TensorVariable' has no len()
We can't implement the __len__ function on Theano Variables. This is
because Python requires that this function returns an integer, but we
can't do this as we are working with symbolic variables. You can use
`var.shape[0]` as a workaround.
Python requires that *__len__* returns an integer, yet it cannot be done as Theano's symbolic variables. However, `var.shape[0]` can be used as a workaround.
Also we can't change the above error message into a more explicit one
because of some other Python internal behavior that can't be modified.
This error message cannot be made more explicit because the relevant aspects of Python's
internals cannot be modified.
Faster gcc optimization
......
......@@ -9,13 +9,13 @@ PyCUDA
Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports
Theano's implementation is called *CudaNdarray* and supports
*strides*. It also only supports the *float32* dtype. PyCUDA's implementation
is called GPUArray and doesn't support *strides*. However, it can deal with
is called *GPUArray* and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes.
We are currently working on having the same base object that will
mimic Numpy. Until this is ready, here is some information on how to
We are currently working on having the same base object for both that will
also mimic Numpy. Until this is ready, here is some information on how to
use both objects in the same script.
Transfer
......@@ -24,8 +24,8 @@ Transfer
You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
``to_gpuarray(x)`` return a new object that occupies the same memory space
as the original. Otherwise it raises a ValueError. Because GPUArrays don't
support *strides*, if the CudaNdarray is strided, we could copy it to
as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't
support strides, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set ``copyif=True`` in
``to_gpuarray``.
......@@ -122,13 +122,15 @@ CUDAMat
There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects.
They obey the same principles as Theano's PyCUDA functions and can be found in
theano.misc.cudamat_utils.py
``theano.misc.cudamat_utils.py``.
WARNING: There is a strange problem associated with stride/shape with those converters.
In order to work, the test needs a transpose and reshape...
.. TODO: this statement is unclear:
WARNING: There is a peculiar problem associated with stride/shape with those converters.
In order to work, the test needs a *transpose* and *reshape*...
Gnumpy
======
There are conversion functions between Gnumpy ``garray`` objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in theano.misc.gnumpy_utils.py.
There are conversion functions between Gnumpy *garray* objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in ``theano.misc.gnumpy_utils.py``.
......@@ -10,12 +10,14 @@ Computing Gradients
===================
Now let's use Theano for a slightly more sophisticated task: create a
function which computes the derivative of some expression ``y`` with
respect to its parameter ``x``. To do this we will use the macro ``T.grad``.
function which computes the derivative of some expression *y* with
respect to its parameter *x*. To do this we will use the macro ``T.grad``.
For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`.
.. TODO: fix the vertical positioning of the expressions in the preceding paragraph
Here is the code to compute this gradient:
.. If you modify this code, also change :
......@@ -36,7 +38,7 @@ array(188.40000000000001)
In this example, we can see from ``pp(gy)`` that we are computing
the correct symbolic gradient.
``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as
``x ** 2`` and fill it with 1.0.
*x* ** *2* and fill it with *1.0*.
.. note::
The optimizer simplifies the symbolic gradient expression. You can see
......@@ -56,7 +58,7 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
.. figure:: dlogistic.png
A plot of the gradient of the logistic function, with x on the x-axis
A plot of the gradient of the logistic function, with *x* on the x-axis
and :math:`ds(x)/dx` on the y-axis.
......@@ -71,17 +73,17 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]])
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
In general, for any **scalar** expression *s*, ``T.grad(s, w)`` provides
the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation
(as the expression return by ``T.grad`` will be optimized during compilation), even for
(as the expression returned by ``T.grad`` will be optimized during compilation), even for
function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation).
.. note::
The second argument of ``T.grad`` can be a list, in which case the
output is also a list. The order in both lists is important, element
output is also a list. The order in both lists is important: element
*i* of the output list is the gradient of the first argument of
``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor
......@@ -95,14 +97,17 @@ of symbolic differentiation).
Computing the Jacobian
======================
Theano implements :func:`theano.gradient.jacobian` macro that does all
what is needed to compute the Jacobian. The following text explains how
In Theano's parlance, the term *Jacobian* designates the tensor comprising the
first differences of the output of a function with respect to its inputs.
(This is a generalization of to the so-called Jacobian matrix in Mathematics.)
Theano implements the :func:`theano.gradient.jacobian` macro that does all
that is needed to compute the Jacobian. The following text explains how
to do it manually.
In order to manually compute the Jacobian of some function ``y`` with
respect to some parameter ``x`` we need to use ``scan``. What we
do is to loop over the entries in ``y`` and compute the gradient of
``y[i]`` with respect to ``x``.
In order to manually compute the Jacobian of some function *y* with
respect to some parameter *x* we need to use ``scan``. What we
do is to loop over the entries in *y* and compute the gradient of
*y[i]* with respect to *x*.
.. note::
......@@ -110,7 +115,7 @@ do is to loop over the entries in ``y`` and compute the gradient of
manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. We
shall return to ``scan`` in a moment.
shall return to ``scan`` later in this tutorial.
>>> x = T.dvector('x')
>>> y = x**2
......@@ -120,31 +125,33 @@ do is to loop over the entries in ``y`` and compute the gradient of
array([[ 8., 0.],
[ 0., 8.]])
What we do in this code is to generate a sequence of ints from ``0`` to
What we do in this code is to generate a sequence of *ints* from *0* to
``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
at each step, we compute the gradient of element ``y[[i]`` with respect to
``x``. ``scan`` automatically concatenates all these rows, generating a
at each step, we compute the gradient of element *y[i]* with respect to
*x*. ``scan`` automatically concatenates all these rows, generating a
matrix which corresponds to the Jacobian.
.. note::
There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you
cannot re-write the above expression of the jacobian as
cannot re-write the above expression of the Jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this
seems possible. The reason is that ``y_i`` will not be a function of
``x`` anymore, while ``y[i]`` still is.
seems possible. The reason is that *y_i* will not be a function of
*x* anymore, while *y[i]* still is.
Computing the Hessian
=====================
Theano implements :func:`theano.gradient.hessian` macro that does all
In Theano, the term *Hessian* has the usual mathematical acception: It is the
matrix comprising the second order partial derivative of a function with scalar
output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all
that is needed to compute the Hessian. The following text explains how
to do it manually.
You can compute the Hessian manually similarly to the Jacobian. The only
difference is that now, instead of computing the Jacobian of some expression
``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some
*y*, we compute the Jacobian of ``T.grad(cost,x)``, where *cost* is some
scalar.
......@@ -181,12 +188,12 @@ R-operator
The *R operator* is built to evaluate the product between a Jacobian and a
vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in
can be extended even for *x* being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such
expressions in terms of weight matrices, Theano supports this more generic
form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
expression *y*, with respect to *x*, multiplying the Jacobian with *v*
you need to do something similar to this:
......@@ -221,19 +228,19 @@ array([[ 0., 0.],
.. note::
`v`, the point of evaluation, differs between the *L-operator* and the *R-operator*.
`v`, the *point of evaluation*, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the point of evaluation needs to have the same shape
as the output, whereas for the *R-operator* this point should
have the same shape as the input parameter. Furthermore, the results of these two
operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* has a shape similar
to the output.
to that of the output.
Hessian times a Vector
======================
If you need to compute the Hessian times a vector, you can make use of the
If you need to compute the *Hessian times a vector*, you can make use of the
above-defined operators to do it more efficiently than actually computing
the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will
......@@ -267,7 +274,7 @@ Final Pointers
==============
* The ``grad`` function works symbolically: it receives and returns a Theano variables.
* The ``grad`` function works symbolically: it receives and returns Theano variables.
* ``grad`` can be compared to a macro since it can be applied repeatedly.
......@@ -276,5 +283,5 @@ Final Pointers
* Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*.
* Work is in progress on the optimizations required to compute efficiently the full
Jacobian and Hessian matrices and the *Jacobian times vector* expression.
Jacobian and the Hessian matrix as well as the *Jacobian times vector*.
......@@ -6,8 +6,8 @@ Loading and Saving
==================
Python's standard way of saving class instances and reloading them
is the pickle_ mechanism. Many Theano objects can be serialized (and
deserialized) by ``pickle``, however, a limitation of ``pickle`` is that
is the pickle_ mechanism. Many Theano objects can be *serialized* (and
*deserialized*) by ``pickle``, however, a limitation of ``pickle`` is that
it does not save the code or data of a class along with the instance of
the class being serialized. As a result, reloading objects created by a
previous version of a class can be really problematic.
......@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you
don't.
For instance, if the only parameters you want to save are a weight
matrix ``W`` and a bias ``b``, you can define:
matrix *W* and a bias *b*, you can define:
.. code-block:: python
......@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define:
self.W = W
self.b = b
If at some point in time ``W`` is renamed to ``weights`` and ``b`` to
``bias``, the older pickled files will still be usable, if you update these
If at some point in time *W* is renamed to *weights* and *b* to
*bias*, the older pickled files will still be usable, if you update these
functions to reflect the change in name:
.. code-block:: python
......@@ -152,6 +152,6 @@ functions to reflect the change in name:
self.weights = W
self.bias = b
For more information on advanced use of pickle and its internals, see Python's
For more information on advanced use of ``pickle`` and its internals, see Python's
pickle_ documentation.
......@@ -9,10 +9,10 @@ Scan
====
- A general form of *recurrence*, which can be used for looping.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of scan.
- You 'scan' a function along some input sequence, producing an output at each time-step.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of ``scan``.
- You ``scan`` a function along some input sequence, producing an output at each time-step.
- The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- ``sum()`` could be computed by scanning the *z + x(i)* function over a list, given an initial state of *z=0*.
- Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over *for* loops:
......@@ -30,6 +30,7 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
k = T.iscalar("k"); A = T.vector("A")
......@@ -54,8 +55,10 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x"); max_coefficients_supported = 10000
......
......@@ -9,14 +9,14 @@ Configuration Settings and Compiling Modes
Configuration
=============
The ``config`` module contains several ``attributes`` that modify Theano's behavior. Many of these
The ``config`` module contains several *attributes* that modify Theano's behavior. Many of these
attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only.
*As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
Theano's code comes with default values for these attributes, but you can
override them from your .theanorc file, and override those values in turn by
override them from your ``.theanorc`` file, and override those values in turn by
the :envvar:`THEANO_FLAGS` environment variable.
The order of precedence is:
......@@ -110,6 +110,8 @@ time the execution using the command line ``time python file.py``.
.. TODO: To be resolved:
.. Solution said:
.. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
.. Why the latter portion?
......@@ -119,10 +121,10 @@ time the execution using the command line ``time python file.py``.
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of int32 with float32 to float64:
* Circumvent the automatic cast of *int32* with *float32* to *float64*:
* Insert manual cast in your code or use [u]int{8,16}.
* Insert manual cast around the mean operator (this involves division by length, which is an int64).
* Insert manual cast in your code or use *[u]int{8,16}*.
* Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
* Notice that a new casting mechanism is being developed.
-------------------------------------------
......@@ -130,7 +132,7 @@ time the execution using the command line ``time python file.py``.
Mode
====
Everytime :func:`theano.function <function.function>` is called
Everytime :func:`theano.function <function.function>` is called,
the symbolic relationships between the input and output Theano *variables*
are optimized and compiled. The way this compilation occurs
is controlled by the value of the ``mode`` parameter.
......@@ -139,9 +141,9 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes,
but can identify many kinds of problems.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes,but can identify
several kinds of problems.
- ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information
The default mode is typically ``FAST_RUN``, but it can be controlled via
......@@ -152,18 +154,18 @@ which can be overridden by passing the keyword argument to
================= =============================================================== ===============================================================================
short name Full constructor What does it do?
================= =============================================================== ===============================================================================
FAST_COMPILE ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations
FAST_RUN ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations.
DEBUG_MODE ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
PROFILE_MODE ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information.
``FAST_COMPILE`` ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations
``FAST_RUN`` ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations.
``DEBUG_MODE`` ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
``PROFILE_MODE`` ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information.
================= =============================================================== ===============================================================================
Linkers
=======
A mode is composed of 2 things: an optimizer and a linker. Some modes,
like PROFILE_MODE and DEBUG_MODE, add logic around the optimizer and
linker. PROFILE_MODE and DEBUG_MODE use their own linker.
like ``PROFILE_MODE`` and ``DEBUG_MODE``, add logic around the optimizer and
linker. ``PROFILE_MODE`` and ``DEBUG_MODE`` use their own linker.
You can select witch linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers.
......@@ -184,8 +186,8 @@ DebugMode no yes VERY HIGH Make many checks on what
.. [#gc] Garbage collection of intermediate results during computation.
Otherwise, their memory space used by the ops is kept between
Theano function calls, in order not to
reallocate memory, and lower the overhead (make it faster...)
.. [#cpy1] default
reallocate memory, and lower the overhead (make it faster...).
.. [#cpy1] Default
.. [#cpy2] Deprecated
......@@ -201,10 +203,10 @@ While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode,
it is useful at first (especially when you are defining new kinds of
expressions or new optimizations) to run your code using the DebugMode
(available via ``mode='DEBUG_MODE'``). The DebugMode is designed to
do several self-checks and assertations that can help to diagnose
possible programming errors that can lead to incorect output. Note that
``DEBUG_MODE`` is much slower then ``FAST_RUN`` or ``FAST_COMPILE`` so
use it only during development (not when you launch 1000 process on a
run several self-checks and assertions that can help diagnose
possible programming errors leading to incorrect output. Note that
``DEBUG_MODE`` is much slower than ``FAST_RUN`` or ``FAST_COMPILE`` so
use it only during development (not when you launch 1000 processes on a
cluster!).
......@@ -225,14 +227,16 @@ DebugMode is used as follows:
If any problem is detected, DebugMode will raise an exception according to
what went wrong, either at call time (``f(5)``) or compile time (
what went wrong, either at call time (*f(5)*) or compile time (
``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions
should *not* be ignored; talk to your local Theano guru or email the
users list if you cannot make the exception go away.
Some kinds of errors can only be detected for certain input value combinations.
In the example above, there is no way to guarantee that a future call to say,
``f(-1)`` won't cause a problem. DebugMode is not a silver bullet.
In the example above, there is no way to guarantee that a future call to, say
*f(-1)*, won't cause a problem. DebugMode is not a silver bullet.
.. TODO: repair the following link
If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
......@@ -277,7 +281,7 @@ implementation only, should use the gof.PerformLinker (or "py" for
short). On the other hand, a user wanting to profile his graph using C
implementations wherever possible should use the ``gof.OpWiseCLinker``
(or "c|py"). For testing the speed of your code we would recommend
using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker.
using the ``fast_run`` optimizer and the ``gof.OpWiseCLinker`` linker.
Compiling your Graph with ProfileMode
-------------------------------------
......@@ -300,7 +304,7 @@ the desired timing information, indicating where your graph is spending most
of its time. This is best shown through an example. Let's use our logistic
regression example.
Compiling the module with ProfileMode and calling ``profmode.print_summary()``
Compiling the module with ``ProfileMode`` and calling ``profmode.print_summary()``
generates the following output:
.. code-block:: python
......@@ -352,14 +356,14 @@ generates the following output:
This output has two components. In the first section called
*Apply-wise summary*, timing information is provided for the worst
offending Apply nodes. This corresponds to individual Op applications
offending ``Apply`` nodes. This corresponds to individual op applications
within your graph which took longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion,
the *Op-wise summary*, the execution time of all Apply nodes executing
the same Op are grouped together and the total execution time per Op
the *Op-wise summary*, the execution time of all ``Apply`` nodes executing
the same op are grouped together and the total execution time per op
is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them).
Finally, notice that the ProfileMode also shows which Ops were running a C
Finally, notice that the ``ProfileMode`` also shows which ops were running a C
implementation.
......
.. _tutorial_printing_drawing:
==============================
Printing/Drawing Theano graphs
==============================
.. TODO: repair the defective links in the next paragraph
Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a *png* image of the function.
You can read about them in :ref:`libdoc_printing`.
Consider again the logistic regression but notice the additional printing instuctions.
The following output depicts the pre- and post- compilation graphs.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.01*gw, b:b-0.01*gb},
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
if any( [x.op.__class__.__name__=='Gemv' for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
# Print the picture graphs
# after compilation
theano.printing.pydotprint(predict,
outfile="pics/logreg_pydotprint_predic.png",
var_with_name_simple=True)
# before compilation
theano.printing.pydotprint_variables(prediction,
outfile="pics/logreg_pydotprint_prediction.png",
var_with_name_simple=True)
theano.printing.pydotprint(train,
outfile="pics/logreg_pydotprint_train.png",
var_with_name_simple=True)
Pretty Printing
===============
``theano.printing.pprint(variable)``
>>> theano.printing.pprint(prediction) # (pre-compilation)
gt((TensorConstant{1} / (TensorConstant{1} + exp(((-(x \\dot w)) - b)))),TensorConstant{0.5})
Debug Printing
==============
``theano.printing.debugprint({fct, variable, list of variables})``
>>> theano.printing.debugprint(prediction) # (pre-compilation)
Elemwise{gt,no_inplace} [@181772236] ''
|Elemwise{true_div,no_inplace} [@181746668] ''
| |InplaceDimShuffle{x} [@181746412] ''
| | |TensorConstant{1} [@181745836]
| |Elemwise{add,no_inplace} [@181745644] ''
| | |InplaceDimShuffle{x} [@181745420] ''
| | | |TensorConstant{1} [@181744844]
| | |Elemwise{exp,no_inplace} [@181744652] ''
| | | |Elemwise{sub,no_inplace} [@181744012] ''
| | | | |Elemwise{neg,no_inplace} [@181730764] ''
| | | | | |dot [@181729676] ''
| | | | | | |x [@181563948]
| | | | | | |w [@181729964]
| | | | |InplaceDimShuffle{x} [@181743788] ''
| | | | | |b [@181730156]
|InplaceDimShuffle{x} [@181771788] ''
| |TensorConstant{0.5} [@181771148]
>>> theano.printing.debugprint(predict) # (post-compilation)
Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] '' 2
|dot [@183018796] '' 1
| |x [@183000780]
| |w [@183000812]
|InplaceDimShuffle{x} [@183133580] '' 0
| |b [@183000876]
|TensorConstant{[ 0.5]} [@183084108]
Picture Printing
================
>>> theano.printing.pydotprint_variables(prediction) # (pre-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
:width: 800 px
Notice that ``pydotprint()`` requires *Graphviz* and Python's ``pydot``.
>>> theano.printing.pydotprint(predict) # (post-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
:width: 800 px
>>> theano.printing.pydotprint(train) # This is a small train example!
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
:width: 1500 px
......@@ -5,15 +5,19 @@
Some general Remarks
=====================
Theano offers quite a bit of flexibility, but has some limitations too.
How should you write your algorithm to make the most of what Theano can do?
.. TODO: This discussion is awkward. Even with this beneficial reordering (28 July 2012) its purpose and message are unclear.
Limitations
-----------
- While- or for-Loops within an expression graph are supported, but only via
Theano offers a good amount of flexibility, but has some limitations too.
How then can you write your algorithm to make the most of what Theano can do?
- *While*- or *for*-Loops within an expression graph are supported, but only via
the :func:`theano.scan` op (which puts restrictions on how the loop body can
interact with the rest of the graph).
- Neither ``goto`` nor recursion is supported or planned within expression graphs.
- Neither *goto* nor *recursion* is supported or planned within expression graphs.
......@@ -18,7 +18,7 @@ Currently, information regarding shape is used in two ways in Theano:
`Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_
method.
ex:
Example:
.. code-block:: python
......@@ -40,7 +40,7 @@ Shape Inference Problem
=======================
Theano propagates information about shape in the graph. Sometimes this
can lead to errors. For example:
can lead to errors. Consider this example:
.. code-block:: python
......@@ -90,19 +90,19 @@ example), an inferred shape is computed directly, without executing
the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of the output of ``join`` is done only
this example, the computation of the shape of the output of ``join`` is done only
based on the first input Theano variable, which leads to an error.
This might happen with other ops such as elemwise, dot, ...
This might happen with other ops such as ``elemwise`` and ``dot``, for example.
Indeed, to perform some optimizations (for speed or stability, for instance),
Theano assumes that the computation is correct and consistent
in the first place, as it does here.
You can detect those problems by running the code without this
optimization, with the Theano flag
optimization, using the Theano flag
``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
same effect by running in the modes FAST_COMPILE (it will not apply this
optimization, nor most other optimizations) or DEBUG_MODE (it will test
same effect by running in the modes ``FAST_COMPILE`` (it will not apply this
optimization, nor most other optimizations) or ``DEBUG_MODE`` (it will test
before and after all optimizations (much slower)).
......@@ -113,15 +113,15 @@ Currently, specifying a shape is not as easy and flexible as we wish and we plan
upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the ``ConvOp`` created
when calling conv2d. You simply add the parameters image_shape
and filter_shape to the call. They must be tuples of 4
when calling ``conv2d``. You simply set the parameters ``image_shape``
and ``filter_shape`` inside the call. They must be tuples of 4
elements. For example:
.. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape information anywhere in the
- You can use the ``SpecifyShape`` op to add shape information anywhere in the
graph. This allows to perform some optimizations. In the following example,
this makes it possible to precompute the Theano function to a constant.
......@@ -138,6 +138,6 @@ Future Plans
============
The parameter "constant shape" will be added to ``theano.shared()``. This is probably
the most frequent case with ``shared variables``. This will make the code
the most frequent occurrence with ``shared`` variables. It will make the code
simpler and will make it possible to check that the shape does not change when
updating the shared variable.
updating the ``shared`` variable.
......@@ -19,7 +19,7 @@ relations using symbolic placeholders (**variables**). When writing down
these expressions you use operations like ``+``, ``-``, ``**``,
``sum()``, ``tanh()``. All these are represented internally as **ops**.
An **op** represents a certain computation on some type of inputs
producing some type of output. You can see it as a function definition
producing some type of output. You can see it as a *function definition*
in most programming languages.
Theano builds internally a graph structure composed of interconnected
......@@ -69,15 +69,15 @@ Take for example the following code:
x = T.dmatrix('x')
y = x*2.
If you print `type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
which is the apply node that connects the op and the inputs to get this
output. You can now print the name of the op that is applied to get
``y``:
*y*:
>>> y.owner.op.name
'Elemwise{mul,no_inplace}'
Hence, an elementwise multiplication is used to compute ``y``. This
Hence, an elementwise multiplication is used to compute *y*. This
multiplication is done between the inputs:
>>> len(y.owner.inputs)
......@@ -89,7 +89,7 @@ InplaceDimShuffle{x,x}.0
Note that the second input is not 2 as we would have expected. This is
because 2 was first :term:`broadcasted <broadcasting>` to a matrix of
same shape as x. This is done by using the op ``DimShuffle`` :
same shape as *x*. This is done by using the op ``DimShuffle`` :
>>> type(y.owner.inputs[1])
<class 'theano.tensor.basic.TensorVariable'>
......@@ -122,7 +122,7 @@ Using the
these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs .
A coming section of this tutorial will address the topic of differentiation
A following section of this tutorial will examine the topic of differentiation
in greater detail.
......@@ -131,7 +131,7 @@ Optimizations
When compiling a Theano function, what you give to the
:func:`theano.function <function.function>` is actually a graph
(starting from the outputs variables you can traverse the graph up to
(starting from the output variables you can traverse the graph up to
the input variables). While this graph structure shows how to compute
the output from the input, it also offers the possibility to improve the
way this computation is carried out. The way optimizations work in
......
......@@ -5,11 +5,14 @@
Using the GPU
=============
For an introductory discussion of *Graphical Processing Units* (GPU) and their use for
intensive parallel computation purposes, see `GPGPU <http://en.wikipedia.org/wiki/GPGPU>`_.
One of Theano's design goals is to specify computations at an
abstract level, so that the internal function compiler has a lot of flexibility
about how to carry out those computations. One of the ways we take advantage of
this flexibility is in carrying out calculations on an Nvidia graphics card when
there is a CUDA-enabled device present in the computer.
the device present in the computer is CUDA-enabled.
Setting Up CUDA
----------------
......@@ -52,11 +55,11 @@ file and run it.
else:
print 'Used the gpu'
The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to
make sure that the input `x` is stored on the graphics device.
The program just computes the ``exp()`` of a bunch of random numbers.
Note that we use the ``shared`` function to
make sure that the input *x* is stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
If I run this program (in thing.py) with ``device=cpu``, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. The GPU will not always produce the exact
same floating-point numbers as the CPU. As a benchmark, a loop that calls ``numpy.exp(x.value)`` also takes about 7 seconds.
......@@ -71,18 +74,18 @@ same floating-point numbers as the CPU. As a benchmark, a loop that calls ``nump
Looping 1000 times took 0.418929815292 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Note that for now GPU operations in Theano require floatX to be float32 (see also below).
Note that GPU operations in Theano require for now ``floatX`` to be *float32* (see also below).
Returning a Handle to Device-Allocated Data
-------------------------------------------
The speedup is not greater in the preceding example because the function is
returning its result as a NumPy ndarray which has already been copied from the
device to the host for your convenience. This is what makes it so easy to swap in device=gpu, but
if you don't mind being less portable, you might prefer to see a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The gpu_from_host
Op means "copy the input from the host to the GPU" and it is optimized away
after the T.exp(x) is replaced by a GPU version of exp().
device to the host for your convenience. This is what makes it so easy to swap in ``device=gpu``, but
if you don't mind less portability, you might gain a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The ``gpu_from_host``
op means "copy the input from the host to the GPU" and it is optimized away
after the ``T.exp(x)`` is replaced by a GPU version of ``exp()``.
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_2
......@@ -131,12 +134,16 @@ NumPy casting mechanism.
Running the GPU at Full Speed
------------------------------
.. TODO: the discussion of this section is unintelligible to a beginner
To really get maximum performance in this simple example, we need to use an :class:`Out`
instance to tell Theano not to copy the output it returns to us. Theano allocates memory for
internal use like a working buffer, but by default it will never return a result that is
allocated in the working buffer. This is normally what you want, but our example is so simple
that it has the unwanted side-effect of really slowing things down.
..
TODO:
The story here about copying and working buffers is misleading and potentially not correct
......@@ -181,12 +188,11 @@ the CPU implementation!
Result is <CudaNdarray object at 0x31eeaf0>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
This version of the code ``using borrow=True`` is slightly less safe because if we had saved
the `r` returned from one function call, we would have to take care and remember that its value might
be over-written by a subsequent function call. Although borrow=True makes a dramatic difference in this example,
be careful! The advantage of
borrow=True is much weaker in larger graphs, and there is a lot of potential for making a
mistake by failing to account for the resulting memory aliasing.
This version of the code using ``borrow=True`` is slightly less safe because if we had saved
the *r* returned from one function call, we would have to take care and remember that its value might
be over-written by a subsequent function call. Although ``borrow=True`` makes a dramatic difference
in this example, be careful! The advantage of ``borrow=True`` is much weaker in larger graphs, and
there is a lot of potential for making a mistake by failing to account for the resulting memory aliasing.
What Can Be Accelerated on the GPU?
......@@ -197,8 +203,8 @@ implementations, and vary from device to device, but to give a rough idea of
what to expect right now:
* Only computations
with float32 data-type can be accelerated. Better support for float64 is expected in upcoming hardware but
float64 computations are still relatively slow (Jan 2010).
with *float32* data-type can be accelerated. Better support for *float64* is expected in upcoming hardware but
*float64* computations are still relatively slow (Jan 2010).
* Matrix
multiplication, convolution, and large element-wise operations can be
accelerated a lot (5-50x) when arguments are large enough to keep 30
......@@ -219,35 +225,35 @@ Tips for Improving Performance on GPU
-------------------------------------
* Consider
adding ``floatX=float32`` to your .theanorc file if you plan to do a lot of
adding ``floatX=float32`` to your ``.theanorc`` file if you plan to do a lot of
GPU work.
* Prefer
constructors like 'matrix', 'vector' and 'scalar' to 'dmatrix', 'dvector' and
'dscalar' because the former will give you float32 variables when
floatX=float32.
constructors like ``matrix``, ``vector`` and ``scalar`` to ``dmatrix``, ``dvector`` and
``dscalar`` because the former will give you *float32* variables when
``floatX=float32``.
* Ensure
that your output variables have a float32 dtype and not float64. The
more float32 variables are in your graph, the more work the GPU can do for
that your output variables have a *float32* dtype and not *float64*. The
more *float32* variables are in your graph, the more work the GPU can do for
you.
* Minimize
tranfers to the GPU device by using shared 'float32' variables to store
tranfers to the GPU device by using ``shared`` *float32* variables to store
frequently-accessed data (see :func:`shared()<shared.shared>`). When using
the GPU, 'float32' tensor shared variables are stored on the GPU by default to
the GPU, *float32* tensor ``shared`` variables are stored on the GPU by default to
eliminate transfer time for GPU ops using those variables.
* If you aren't happy with the performance you see, try building your functions with
mode='PROFILE_MODE'. This should print some timing information at program
termination (atexit). Is time being used sensibly? If an Op or Apply is
``mode='PROFILE_MODE'``. This should print some timing information at program
termination (at exit). Is time being used sensibly? If an op or Apply is
taking more time than its share, then if you know something about GPU
programming, have a look at how it's implemented in theano.sandbox.cuda.
Check the line like 'Spent Xs(X%) in cpu Op, Xs(X%) in gpu Op and Xs(X%) transfert Op'
that can tell you if not enough of your graph is on the GPU or if there
is too much memory transfert.
Check the line similar to *Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op*.
This can tell you if not enough of your graph is on the GPU or if there
is too much memory transfer.
Changing the Value of Shared Variables
--------------------------------------
To change the value of a shared variable, e.g. to provide new data to process,
To change the value of a shared variable, e.g. to provide new data to processes,
use ``shared_variable.set_value(new_value)``. For a lot more detail about this,
see :ref:`aliasing`.
......@@ -321,31 +327,31 @@ Consider the logistic regression:
Modify and execute this example to run on GPU with floatX=float32 and
time it using the command line "time python file.py".
Modify and execute this example to run on GPU with ``floatX=float32`` and
time it using the command line ``time python file.py``.
Is there an increase in speed from CPU to GPU?
Where does it come from? (Use ProfileMode)
Where does it come from? (Use ``ProfileMode``)
What can be done to further increase the speed of the GPU version?
.. Note::
* Only 32 bit floats are currently supported (development is in process).
* Shared variables with float32 dtype are by default moved to the GPU memory space.
* Only 32 bits floats are currently supported (development is in process).
* ``Shared`` variables with *float32* dtype are by default moved to the GPU memory space.
* There is a limit of one GPU per process.
* Use the Theano flag ``device=gpu`` to require use of the GPU device.
* Use ``device=gpu{0, 1, ...}`` to specify which GPU if you have more than one.
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of int32 with float32 to float64:
* ``Cast`` inputs before storing them into a ``shared`` variable.
* Circumvent the automatic cast of *int32* with *float32* to *float64*:
* Insert manual cast in your code or use [u]int{8,16}.
* Insert manual cast around the mean operator (this involves division by length, which is an int64).
* Insert manual cast in your code or use *[u]int{8,16}*.
* Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
* Notice that a new casting mechanism is being developed.
......@@ -354,21 +360,21 @@ What can be done to further increase the speed of the GPU version?
Software for Directly Programming a GPU
---------------------------------------
Leaving aside Theano which is a meta-programmer, there is:
Leaving aside Theano which is a meta-programmer, there are:
* CUDA: C extension by NVIDIA
* **CUDA**: C extension by NVIDIA
* Vendor-specific
* Numeric libraries (BLAS, RNG, FFT) are maturing.
* OpenCL: multi-vendor version of CUDA
* **OpenCL**: multi-vendor version of CUDA
* More general, standardized
* More general, standardized.
* Fewer libraries, lesser spread.
* PyCUDA: Python bindings to CUDA driver interface allow to access Nvidia's CUDA parallel
* **PyCUDA**: Python bindings to CUDA driver interface allow to access Nvidia's CUDA parallel
computation API from Python
* Convenience: Makes it easy to do GPU meta-programming from within Python. Helpful documentation.
......@@ -389,9 +395,9 @@ Leaving aside Theano which is a meta-programmer, there is:
PyCUDA knows about dependencies (e.g. it won't detach from a context before all memory allocated in it is also freed).
(GPU memory buffer: \texttt{pycuda.gpuarray.GPUArray})
(GPU memory buffer: \texttt{``pycuda.gpuarray.GPUArray``})
* PyOpenCL: PyCUDA for OpenCL
* **PyOpenCL**: PyCUDA for OpenCL
**Example: PyCUDA**
......@@ -496,12 +502,12 @@ To test it:
Run the preceding example.
Modify and execute to multiply two matrices: x * y.
Modify and execute to multiply two matrices: *x* * *y*.
Modify and execute to return two outputs: x + y and x - y.
(Currently, elemwise fusion generates computation with only 1 output.)
Modify and execute to return two outputs: *x + y* and *x - y*.
(Currently, *elemwise fusion* generates computation with only 1 output.)
Modify and execute to support *stride* (i.e. so as not constrain the input to be c contiguous).
Modify and execute to support *stride* (i.e. so as not constrain the input to be C-contiguous).
-------------------------------------------
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论