提交 c86c72f4 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

Correct Theano's tutorial: typos and layout

上级 3bffa49b
.. _glossary: .. _glossary:
Glossary of terminology Glossary
======================= ========
.. glossary:: .. glossary::
......
.. _adding: .. _adding:
==================== ====================
Baby steps - Algebra Baby Steps - Algebra
==================== ====================
Adding two scalars Adding two Scalars
================== ==================
So, to get us started with Theano and get a feel of what we're working with, So, to get us started with Theano and get a feel of what we're working with,
...@@ -117,7 +117,7 @@ argument is what we want to see as output when we apply the function. ...@@ -117,7 +117,7 @@ argument is what we want to see as output when we apply the function.
``f`` may then be used like a normal Python function. ``f`` may then be used like a normal Python function.
Adding two matrices Adding two Matrices
=================== ===================
You might already have guessed how to do this. Indeed, the only change You might already have guessed how to do this. Indeed, the only change
......
...@@ -7,39 +7,39 @@ Understanding Memory Aliasing for Speed and Correctness ...@@ -7,39 +7,39 @@ Understanding Memory Aliasing for Speed and Correctness
The aggressive reuse of memory is one of the ways Theano makes code fast, and The aggressive reuse of memory is one of the ways Theano makes code fast, and
it's important for the correctness and speed of your program that you understand it's important for the correctness and speed of your program that you understand
which buffers Theano might alias to which others. which buffers Theano might alias to which other.
This file describes the principles for how Theano treats memory, and explains This section describes the principles based on which Theano treats memory, and explains
when you might want to change the default behaviour of some functions and when you might want to alter the default behaviour of some functions and
methods for faster performance. methods for faster performance.
The memory model: 2 spaces The Memory Model: Two Spaces
========================== ============================
There are some simple principles that guide Theano's treatment of memory. The There are some simple principles that guide Theano's treatment of memory. The
main idea is that there is a pool of memory managed by Theano, and Theano tracks main idea is that there is a pool of memory managed by Theano, and Theano tracks
changes to values in that pool. changes to values in that pool.
1. Theano manages its own memory space, which typically does not overlap with * Theano manages its own memory space, which typically does not overlap with
the memory of normal python variables that non-Theano code creates. the memory of normal Python variables that non-Theano code creates.
1. Theano Functions only modify buffers that are in Theano's memory space. * Theano functions only modify buffers that are in Theano's memory space.
1. Theano's memory space includes the buffers allocated to store shared * Theano's memory space includes the buffers allocated to store shared
variables and the temporaries used to evaluate Functions. variables and the temporaries used to evaluate functions.
1. Physically, Theano's memory space may be spread across the host, a GPU * Physically, Theano's memory space may be spread across the host, a GPU
device(s), and in the future may even include objects on a remote machine. device(s), and in the future may even include objects on a remote machine.
1. The memory allocated for a shared variable buffer is unique: it is never * The memory allocated for a shared variable buffer is unique: it is never
aliased to another shared variable. aliased to another shared variable.
1. Theano's managed memory is constant while Theano Functions are not running * Theano's managed memory is constant while Theano functions are not running
and Theano library code is not running. and Theano's library code is not running.
1. The default behaviour of Function is to return user-space values for * The default behaviour of a function is to return user-space values for
outputs, and to expect user-space values for inputs. outputs, and to expect user-space values for inputs.
The distinction between Theano-managed memory and user-managed memory can be The distinction between Theano-managed memory and user-managed memory can be
broken down by some Theano functions (e.g. shared, get_value and the broken down by some Theano functions (e.g. shared, get_value and the
...@@ -49,9 +49,9 @@ operations) at the expense of risking subtle bugs in the overall program (by ...@@ -49,9 +49,9 @@ operations) at the expense of risking subtle bugs in the overall program (by
aliasing memory). aliasing memory).
The rest of this section is aimed at helping you to understand when it is safe The rest of this section is aimed at helping you to understand when it is safe
to use the ``borrow=True`` argument and reap the benefit of faster code. to use the ``borrow=True`` argument and reap the benefits of faster code.
Borrowing when creating shared variables Borrowing when Creating Shared Variables
======================================== ========================================
A ``borrow`` argument can be provided to the shared-variable constructor. A ``borrow`` argument can be provided to the shared-variable constructor.
...@@ -109,7 +109,7 @@ It is not a reliable technique to use ``borrow=True`` to modify shared variables ...@@ -109,7 +109,7 @@ It is not a reliable technique to use ``borrow=True`` to modify shared variables
by side-effect, because with some devices (e.g. GPU devices) this technique will by side-effect, because with some devices (e.g. GPU devices) this technique will
not work. not work.
Borrowing when accessing value of shared variables Borrowing when Accessing Value of Shared Variables
================================================== ==================================================
Retrieving Retrieving
...@@ -139,7 +139,7 @@ The reason that ``borrow=True`` might still make a copy is that the internal ...@@ -139,7 +139,7 @@ The reason that ``borrow=True`` might still make a copy is that the internal
representation of a shared variable might not be what you expect. When you representation of a shared variable might not be what you expect. When you
create a shared variable by passing a numpy array for example, then ``get_value()`` create a shared variable by passing a numpy array for example, then ``get_value()``
must return a numpy array too. That's how Theano can make the GPU use must return a numpy array too. That's how Theano can make the GPU use
transparent. But when you are using a GPU (or in future perhaps a remote machine), then the numpy.ndarray transparent. But when you are using a GPU (or in the future perhaps a remote machine), then the numpy.ndarray
is not the internal representation of your data. is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it* If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to then you should use the ``return_internal_type=True`` argument to
...@@ -213,7 +213,7 @@ be costly. Here are a few tips to ensure fast and efficient use of GPU memory a ...@@ -213,7 +213,7 @@ be costly. Here are a few tips to ensure fast and efficient use of GPU memory a
here: :ref:`libdoc_cuda_var`) here: :ref:`libdoc_cuda_var`)
Retrieving and assigning via the .value property Retrieving and Assigning via the .value Property
------------------------------------------------ ------------------------------------------------
Shared variables have a ``.value`` property that is connected to ``get_value`` Shared variables have a ``.value`` property that is connected to ``get_value``
...@@ -234,7 +234,7 @@ potential impact on your code, use the ``.get_value`` and ``.set_value`` methods ...@@ -234,7 +234,7 @@ potential impact on your code, use the ``.get_value`` and ``.set_value`` methods
directly with appropriate flags. directly with appropriate flags.
Borrowing when constructing Function objects Borrowing when Constructing Function Objects
============================================ ============================================
A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects
...@@ -276,6 +276,7 @@ hints that give more flexibility to the compilation and optimization of the ...@@ -276,6 +276,7 @@ hints that give more flexibility to the compilation and optimization of the
graph. graph.
*Take home message:* *Take home message:*
When an input ``x`` to a function is not needed after the function returns and you When an input ``x`` to a function is not needed after the function returns and you
would like to make it available to Theano as additional workspace, then consider would like to make it available to Theano as additional workspace, then consider
marking it with ``In(x, borrow=True)``. It may make the function faster and marking it with ``In(x, borrow=True)``. It may make the function faster and
......
...@@ -4,15 +4,15 @@ ...@@ -4,15 +4,15 @@
Conditions Conditions
========== ==========
IfElse vs switch IfElse vs Switch
================ ================
- Build condition over symbolic variables. - Both Ops build a condition over symbolic variables.
- IfElse Op takes a `boolean` condition and two variables to compute as input. - ``IfElse`` takes a `boolean` condition and two variables as inputs.
- Switch take a `tensor` as condition and two variables to compute as input. - ``Switch`` takes a `tensor` as condition and two variables as inputs.
- Switch is an elementwise operation. It is more general than IfElse. ``switch`` is an elementwise operation and it is more general than ``ifelse``.
- While Switch Op evaluates both 'output' variables, IfElse Op is lazy and only - Whereas ``switch`` evaluates both 'output' variables ``ifelse`` is lazy and only
evaluates one variable respect to the condition. evaluates one variable respect to the condition.
**Example** **Example**
...@@ -62,11 +62,10 @@ since it computes only one variable instead of both. ...@@ -62,11 +62,10 @@ since it computes only one variable instead of both.
time spent evaluating one value 0.3500 sec time spent evaluating one value 0.3500 sec
It is actually important to use ``linker='vm'`` or ``linker='cvm'``, Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both variables and take the same computation
otherwise IfElse will compute both variables and take the same computation time as ``switch``. The linker is not currently set by default to 'cvm' but
time as the Switch Op. The linker is not currently set by default to 'cvm' but
it will be in a near future. it will be in a near future.
There is not an optimization to automatically change a switch with a There is not an optimization automatically replacing a ``switch`` with a
broadcasted scalar to an ifelse, as this is not always the faster. See broadcasted scalar to an ``ifelse``, as this is not always faster. See
this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_. this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
...@@ -6,15 +6,16 @@ Debugging Theano: FAQ and Troubleshooting ...@@ -6,15 +6,16 @@ Debugging Theano: FAQ and Troubleshooting
========================================= =========================================
There are many kinds of bugs that might come up in a computer program. There are many kinds of bugs that might come up in a computer program.
This page is structured as an FAQ. It should provide recipes to tackle common This page is structured as a FAQ. It should provide recipes to tackle common
problems, and introduce some of the tools that we use to find problems in our problems, and introduce some of the tools that we use to find problems in our
Theano code, and even (it happens) in Theano's internals, such as Theano code, and even (it happens) in Theano's internals, such as
:ref:`using_debugmode`. :ref:`using_debugmode`.
Isolating the problem/Testing Theano compiler Isolating the Problem/Testing Theano Compiler
--------------------------------------------- ---------------------------------------------
You can run your Theano function in a DebugMode(:ref:`using_debugmode`). This test the Theano optimizations and help to find where NaN, inf and other problem come from. You can run your Theano function in a DebugMode(:ref:`using_debugmode`).
This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.
Using Test Values Using Test Values
...@@ -102,7 +103,7 @@ can get Theano to give us the exact source of the error. ...@@ -102,7 +103,7 @@ can get Theano to give us the exact source of the error.
# provide Theano with a default test-value # provide Theano with a default test-value
x.tag.test_value = numpy.random.rand(5,10) x.tag.test_value = numpy.random.rand(5,10)
In the above, we're tagging the symbolic matrix ``x`` with a special test In the above, we are tagging the symbolic matrix ``x`` with a special test
value. This allows Theano to evaluate symbolic expressions on-the-fly (by value. This allows Theano to evaluate symbolic expressions on-the-fly (by
calling the ``perform`` method of each Op), as they are being defined. Sources calling the ``perform`` method of each Op), as they are being defined. Sources
of error can thus be identified with much more precision and much earlier in of error can thus be identified with much more precision and much earlier in
...@@ -122,8 +123,8 @@ following error message, which properly identifies line 23 as the culprit. ...@@ -122,8 +123,8 @@ following error message, which properly identifies line 23 as the culprit.
The compute_test_value mechanism works as follows: The compute_test_value mechanism works as follows:
* Theano Constants and SharedVariable are used as is. No need to instrument them. * Theano ``constants`` and ``shared variables`` are used as is. No need to instrument them.
* A Theano ``Variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be * A Theano ``variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be
given a special test value through the attribute ``tag.test_value``. given a special test value through the attribute ``tag.test_value``.
* Theano automatically instruments intermediate results. As such, any quantity * Theano automatically instruments intermediate results. As such, any quantity
derived from ``x`` will be given a `tag.test_value` automatically. derived from ``x`` will be given a `tag.test_value` automatically.
...@@ -139,11 +140,11 @@ The compute_test_value mechanism works as follows: ...@@ -139,11 +140,11 @@ The compute_test_value mechanism works as follows:
variable is missing a test value. variable is missing a test value.
.. note:: .. note::
This feature is currently not compatible with ``Scan`` and also with Ops This feature is currently incompatible with ``Scan`` and also with Ops
which do not implement a ``perform`` method. which do not implement a ``perform`` method.
How do I print an intermediate value in a Function/Method? How do I Print an Intermediate Value in a Function/Method?
---------------------------------------------------------- ----------------------------------------------------------
Theano provides a 'Print' Op to do this. Theano provides a 'Print' Op to do this.
...@@ -177,7 +178,7 @@ precise inspection of what's being computed where, when, and how, see the ...@@ -177,7 +178,7 @@ precise inspection of what's being computed where, when, and how, see the
to remove them to know if this is the cause or not. to remove them to know if this is the cause or not.
How do I print a graph (before or after compilation)? How do I Print a Graph (before or after compilation)?
---------------------------------------------------------- ----------------------------------------------------------
Theano provides two functions (:func:`theano.pp` and Theano provides two functions (:func:`theano.pp` and
...@@ -190,7 +191,7 @@ You can read about them in :ref:`libdoc_printing`. ...@@ -190,7 +191,7 @@ You can read about them in :ref:`libdoc_printing`.
The function I compiled is too slow, what's up? The Function I Compiled is Too Slow, what's up?
----------------------------------------------- -----------------------------------------------
First, make sure you're running in FAST_RUN mode. First, make sure you're running in FAST_RUN mode.
FAST_RUN is the default mode, but make sure by passing ``mode='FAST_RUN'`` FAST_RUN is the default mode, but make sure by passing ``mode='FAST_RUN'``
...@@ -207,10 +208,10 @@ Tips: ...@@ -207,10 +208,10 @@ Tips:
.. _faq_wraplinker: .. _faq_wraplinker:
How do I step through a compiled function with the WrapLinker? How do I Step through a Compiled Function with the WrapLinker?
-------------------------------------------------------------- --------------------------------------------------------------
This is not exactly an FAQ, but the doc is here for now... This is not exactly a FAQ, but the doc is here for now...
It's pretty easy to roll-your-own evaluation mode. It's pretty easy to roll-your-own evaluation mode.
Check out this one: Check out this one:
...@@ -248,7 +249,7 @@ Use your imagination :) ...@@ -248,7 +249,7 @@ Use your imagination :)
This can be a really powerful debugging tool. This can be a really powerful debugging tool.
Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all! Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all!
How to use pdb ? How to Use pdb ?
---------------- ----------------
In the majority of cases, you won't be executing from the interactive shell In the majority of cases, you won't be executing from the interactive shell
...@@ -294,7 +295,7 @@ The call stack contains a few useful informations to trace back the source ...@@ -294,7 +295,7 @@ The call stack contains a few useful informations to trace back the source
of the error. There's the script where the compiled function was called -- of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line might originate from ops in these modules, not this script. The last line
tells us about the Op that caused the exception. In thise case it's a "mul" tells us about the Op that caused the exception. In this case it's a "mul"
involving Variables name "a" and "b". But suppose we instead had an involving Variables name "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name. intermediate result to which we hadn't given a name.
......
...@@ -2,11 +2,11 @@ ...@@ -2,11 +2,11 @@
.. _basictutexamples: .. _basictutexamples:
============= =============
More examples More Examples
============= =============
Logistic function Logistic Function
================= =================
Here's another straightforward example, though a bit more elaborate Here's another straightforward example, though a bit more elaborate
...@@ -61,7 +61,7 @@ array([[ 0.5 , 0.73105858], ...@@ -61,7 +61,7 @@ array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]]) [ 0.26894142, 0.11920292]])
Computing more than one thing at the same time Computing More than one Thing at the Same Time
============================================== ==============================================
Theano supports functions with multiple outputs. For example, we can Theano supports functions with multiple outputs. For example, we can
...@@ -94,7 +94,7 @@ was reformatted for readability): ...@@ -94,7 +94,7 @@ was reformatted for readability):
[ 1., 4.]])] [ 1., 4.]])]
Setting a default value for an argument Setting a Default Value for an Argument
======================================= =======================================
Let's say you want to define a function that adds two numbers, except Let's say you want to define a function that adds two numbers, except
...@@ -152,7 +152,7 @@ array(33.0) ...@@ -152,7 +152,7 @@ array(33.0)
.. _functionstateexample: .. _functionstateexample:
Using shared variables Using Shared Variables
====================== ======================
It is also possible to make a function with an internal state. For It is also possible to make a function with an internal state. For
...@@ -227,7 +227,7 @@ array(0) ...@@ -227,7 +227,7 @@ array(0)
You might be wondering why the updates mechanism exists. You can always You might be wondering why the updates mechanism exists. You can always
achieve a similar thing by returning the new expressions, and working with achieve a similar thing by returning the new expressions, and working with
them in numpy as usual. The updates mechanism can be a syntactic convenience, them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
updates). Also, theano has more control over where and how shared variables are updates). Also, theano has more control over where and how shared variables are
...@@ -252,15 +252,15 @@ array(7) ...@@ -252,15 +252,15 @@ array(7)
>>> state.get_value() # old state still there, but we didn't use it >>> state.get_value() # old state still there, but we didn't use it
array(0) array(0)
The givens parameter can be used to replace any symbolic variable, not just a The ``givens`` parameter can be used to replace any symbolic variable, not just a
shared variable. You can replace constants, and expressions, in general. Be shared variable. You can replace constants, and expressions, in general. Be
careful though, not to allow the expressions introduced by a givens careful though, not to allow the expressions introduced by a ``givens``
substitution to be co-dependent, the order of substitution is not defined, so substitution to be co-dependent, the order of substitution is not defined, so
the substitutions have to work in any order. the substitutions have to work in any order.
In practice, a good way of thinking about the ``givens`` is as a mechanism In practice, a good way of thinking about the ``givens`` is as a mechanism
that allows you to replace any part of your formula with a different that allows you to replace any part of your formula with a different
expression that evaluates to a tensor of same shape and dtype. ``givens`` expression that evaluates to a tensor of same shape and dtype.
.. _using_random_numbers: .. _using_random_numbers:
...@@ -270,17 +270,17 @@ Using Random Numbers ...@@ -270,17 +270,17 @@ Using Random Numbers
Because in Theano you first express everything symbolically and Because in Theano you first express everything symbolically and
afterwards compile this expression to get functions, afterwards compile this expression to get functions,
using pseudo-random numbers is not as straightforward as it is in using pseudo-random numbers is not as straightforward as it is in
numpy, though also not too complicated. NumPy, though also not too complicated.
The way to think about putting randomness into Theano's computations is The way to think about putting randomness into Theano's computations is
to put random variables in your graph. Theano will allocate a numpy to put random variables in your graph. Theano will allocate a NumPy
RandomStream object (a random number generator) for each such RandomStream object (a random number generator) for each such
variable, and draw from it as necessary. We will call this sort of variable, and draw from it as necessary. We will call this sort of
sequence of random numbers a *random stream*. *Random streams* are at sequence of random numbers a *random stream*. *Random streams* are at
their core shared variables, so the observations on shared variables their core shared variables, so the observations on shared variables
hold here as well. hold here as well.
Brief example Brief Example
------------- -------------
Here's a brief example. The setup code is: Here's a brief example. The setup code is:
...@@ -325,8 +325,8 @@ random variable appears three times in the output expression. ...@@ -325,8 +325,8 @@ random variable appears three times in the output expression.
>>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u) >>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
Seedings Streams Seeding Streams
---------------- ---------------
Random variables can be seeded individually or collectively. Random variables can be seeded individually or collectively.
...@@ -344,7 +344,7 @@ of the random variables. ...@@ -344,7 +344,7 @@ of the random variables.
>>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each >>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each
Sharing Streams between Functions Sharing Streams Between Functions
--------------------------------- ---------------------------------
As usual for shared variables, the random number generators used for random As usual for shared variables, the random number generators used for random
...@@ -362,7 +362,7 @@ For example: ...@@ -362,7 +362,7 @@ For example:
>>> v2 = f() # v2 != v1 >>> v2 = f() # v2 != v1
Others Random Distributions Other Random Distributions
--------------------------- ---------------------------
There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`. There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
...@@ -371,7 +371,7 @@ There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`. ...@@ -371,7 +371,7 @@ There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
.. _logistic_regression: .. _logistic_regression:
A Real example: Logistic Regression A Real Example: Logistic Regression
=================================== ===================================
The preceding elements are put to work in this more realistic example. It will be used repeatedly. The preceding elements are put to work in this more realistic example. It will be used repeatedly.
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
Extending Theano Extending Theano
**************** ****************
Theano graphs Theano Graphs
------------- -------------
- Theano works with symbolic graphs - Theano works with symbolic graphs
...@@ -40,7 +40,6 @@ Inputs and Outputs are lists of Theano variables ...@@ -40,7 +40,6 @@ Inputs and Outputs are lists of Theano variables
See :ref:`dev_start_guide` for information about git, github, the See :ref:`dev_start_guide` for information about git, github, the
development workflow and how to make a quality contribution. development workflow and how to make a quality contribution.
Op contract
----------- -----------
...@@ -96,13 +95,13 @@ at run time. Currently there are 2 different possibilites: ...@@ -96,13 +95,13 @@ at run time. Currently there are 2 different possibilites:
implement the :func:`perform` implement the :func:`perform`
and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. The ``perform`` allows <cop>`), or the :func:`make_thunk` method. The ``perform`` allows
to easily wrap an existing python function into Theano. The ``c_code`` to easily wrap an existing Python function into Theano. The ``c_code``
and related methods allow the op to generate c code that will be and related methods allow the op to generate C code that will be
compiled and linked by Theano. On the other hand, the ``make_thunk`` compiled and linked by Theano. On the other hand, the ``make_thunk``
method will be called only once during compilation and should generate method will be called only once during compilation and should generate
a ``thunk``: a standalone function that when called will do the wanted computations. a ``thunk``: a standalone function that when called will do the wanted computations.
This is useful if you want to generate code and compile it yourself. For This is useful if you want to generate code and compile it yourself. For
example, this allows you to use PyCUDA to compile gpu code. example, this allows you to use PyCUDA to compile GPU code.
Also there are 2 methods that are highly recommended to be implemented. They are Also there are 2 methods that are highly recommended to be implemented. They are
needed in order to merge duplicate computations involving your op. So if you needed in order to merge duplicate computations involving your op. So if you
...@@ -110,7 +109,7 @@ do not want Theano to execute your op multiple times with the same inputs, ...@@ -110,7 +109,7 @@ do not want Theano to execute your op multiple times with the same inputs,
do implement them. Those methods are :func:`__eq__` and do implement them. Those methods are :func:`__eq__` and
:func:`__hash__`. :func:`__hash__`.
The :func:`infer_shape` method allows to infer shape of some variable, somewhere in the The :func:`infer_shape` method allows to infer the shape of some variable, somewhere in the
middle of the computational graph without actually computing the outputs (when possible). middle of the computational graph without actually computing the outputs (when possible).
This could be helpful if one only needs the shape of the output instead of the actual outputs. This could be helpful if one only needs the shape of the output instead of the actual outputs.
...@@ -123,7 +122,7 @@ string representation of your Op. ...@@ -123,7 +122,7 @@ string representation of your Op.
The :func:`R_op` method is needed if you want `theano.tensor.Rop` to The :func:`R_op` method is needed if you want `theano.tensor.Rop` to
work with your op. work with your op.
Op example Op Example
---------- ----------
.. code-block:: python .. code-block:: python
...@@ -164,7 +163,7 @@ Op example ...@@ -164,7 +163,7 @@ Op example
return eval_points return eval_points
return self.grad(inputs, eval_points) return self.grad(inputs, eval_points)
Try it! Try it!:
.. code-block:: python .. code-block:: python
...@@ -177,15 +176,14 @@ Try it! ...@@ -177,15 +176,14 @@ Try it!
print inp print inp
print out print out
How to test it How To Test it
-------------- --------------
Theano has some functions to simplify testing. These help test the Theano has some functions to simplify testing. These help test the
``infer_shape``, ``grad`` and ``R_op`` methods. Put the following code ``infer_shape``, ``grad`` and ``R_op`` methods. Put the following code
in a file and execute it with the ``nosetests`` program. in a file and execute it with the ``nosetests`` program.
Basic tests **Basic Tests**
===========
Basic tests are done by you just by using the Op and checking that it Basic tests are done by you just by using the Op and checking that it
returns the right answer. If you detect an error, you must raise an returns the right answer. If you detect an error, you must raise an
...@@ -210,8 +208,7 @@ exception. You can use the `assert` keyword to automatically raise an ...@@ -210,8 +208,7 @@ exception. You can use the `assert` keyword to automatically raise an
# Compare the result computed to the expected value. # Compare the result computed to the expected value.
assert numpy.allclose(inp * 2, out) assert numpy.allclose(inp * 2, out)
Testing the infer_shape **Testing the infer_shape**
=======================
When a class inherits from the ``InferShapeTester`` class, it gets the When a class inherits from the ``InferShapeTester`` class, it gets the
`self._compile_and_check` method that tests the Op ``infer_shape`` `self._compile_and_check` method that tests the Op ``infer_shape``
...@@ -248,8 +245,7 @@ see it fail, you can implement an incorrect ``infer_shape``. ...@@ -248,8 +245,7 @@ see it fail, you can implement an incorrect ``infer_shape``.
# Op that should be removed from the graph. # Op that should be removed from the graph.
self.op_class) self.op_class)
Testing the gradient **Testing the gradient**
====================
The function :ref:`verify_grad <validating_grad>` The function :ref:`verify_grad <validating_grad>`
verifies the gradient of an Op or Theano graph. It compares the verifies the gradient of an Op or Theano graph. It compares the
...@@ -266,8 +262,7 @@ the multiplication by 2). ...@@ -266,8 +262,7 @@ the multiplication by 2).
theano.tests.unittest_tools.verify_grad(self.op, theano.tests.unittest_tools.verify_grad(self.op,
[numpy.random.rand(5, 7, 2)]) [numpy.random.rand(5, 7, 2)])
Testing the Rop **Testing the Rop**
===============
The class :class:`RopLop_checker`, give the functions The class :class:`RopLop_checker`, give the functions
:func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_mat_rop_lop`,
...@@ -310,16 +305,28 @@ You can also add this at the end of the test file: ...@@ -310,16 +305,28 @@ You can also add this at the end of the test file:
t.setUp() t.setUp()
t.test_double_rop() t.test_double_rop()
**Testing GPU Ops**
Ops that execute on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
to make the distinction between both. Currently, we use this to test
if the NVIDIA driver works correctly with our sum reduction code on the
GPU.
------------------------------------------- -------------------------------------------
**Exercise** **Exercise**
- Run the code in the file double_op.py. Run the code in the file double_op.py.
- Modify and execute to compute: x * y
- Modify and execute the example to return 2 outputs: x + y and x - y Modify and execute to compute: x * y.
- Our current element-wise fusion generates computation with only 1 output. Modify and execute the example to return 2 outputs: x + y and x - y
(our current element-wise fusion generates computation with only 1 output).
SciPy SciPy
----- -----
...@@ -363,14 +370,8 @@ don't forget to call the parent ``setUp`` function. ...@@ -363,14 +370,8 @@ don't forget to call the parent ``setUp`` function.
For more details see :ref:`random_value_in_tests`. For more details see :ref:`random_value_in_tests`.
GPU Op
------
Op that execute on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
to make the distinction between both. Currently, we use this to test
if the NVIDIA driver works correctly with our sum reduction code on the
gpu.
Documentation Documentation
......
.. _gpu_data_convert: .. _gpu_data_convert:
=================================== ===================================
PyCUDA/CUDAMat/Gnumpy compatibility PyCUDA/CUDAMat/gnumpy compatibility
=================================== ===================================
PyCUDA PyCUDA
====== ======
Currently PyCUDA and Theano have different objects to store GPU Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features. data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports Theano's implementation is called CudaNdarray and supports
strides. It supports only the float32 dtype. PyCUDA's implementation *strides*. It also only supports the float32 dtype. PyCUDA's implementation
is called GPUArray and doesn't support strides. However, it can deal with all Numpy and Cuda dtypes. is called GPUArray and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes.
We are currently working on having the same base object that will We are currently working on having the same base object that will
mimic Numpy. Until this is ready, here is some information on how to mimic Numpy. Until this is ready, here is some information on how to
...@@ -23,8 +24,8 @@ Transfer ...@@ -23,8 +24,8 @@ Transfer
You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and
from CudaNdarray. The functions `to_cudandarray(x, copyif=False)` and from CudaNdarray. The functions `to_cudandarray(x, copyif=False)` and
`to_gpuarray(x)` return a new object that occupies the same memory space `to_gpuarray(x)` return a new object that occupies the same memory space
as the original. Otherwise it raises a ValueError. Because GPUArray don't as the original. Otherwise it raises a ValueError. Because GPUArrays don't
support strides, if the CudaNdarray is strided, we could copy it to support *strides*, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set `copyif=True` in memory region. If you want this behavior, set `copyif=True` in
`to_gpuarray`. `to_gpuarray`.
...@@ -33,7 +34,7 @@ Compiling with PyCUDA ...@@ -33,7 +34,7 @@ Compiling with PyCUDA
--------------------- ---------------------
You can use PyCUDA to compile CUDA functions that work directly on You can use PyCUDA to compile CUDA functions that work directly on
CudaNdarray. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py` CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py`:
.. code-block:: python .. code-block:: python
...@@ -75,7 +76,7 @@ CudaNdarray. Here is an example from the file `theano/misc/tests/test_pycuda_the ...@@ -75,7 +76,7 @@ CudaNdarray. Here is an example from the file `theano/misc/tests/test_pycuda_the
Theano op using PyCUDA function Theano op using PyCUDA function
------------------------------- -------------------------------
You can use gpu function compiled with PyCUDA in a Theano op. Here is an example.. You can use a GPU function compiled with PyCUDA in a Theano op. Here is an example:
.. code-block:: python .. code-block:: python
...@@ -119,15 +120,15 @@ You can use gpu function compiled with PyCUDA in a Theano op. Here is an example ...@@ -119,15 +120,15 @@ You can use gpu function compiled with PyCUDA in a Theano op. Here is an example
CUDAMat CUDAMat
======= =======
There are functions for conversion between CUDAMat and Theano CudaNdArray objects. There are functions for conversion between CUDAMats and Theano CudaNdArray objects.
They obey the same principles as PyCUDA's functions and can be found in They obey the same principles as PyCUDA's functions and can be found in
theano.misc.cudamat_utils.py theano.misc.cudamat_utils.py
WARNING: There is a strange problem associated with stride/shape with those converters. WARNING: There is a strange problem associated with stride/shape with those converters.
To work, the test needs a transpose and reshape... To work, the test needs a transpose and reshape...
Gnumpy gnumpy
====== ======
There are conversion functions between gnumpy garray object and Theano CudaNdArray. There are conversion functions between gnumpy garray objects and Theano CudaNdArrays.
They are also similar to PyCUDA's and can be found in theano.misc.gnumpy_utils.py They are also similar to PyCUDA's and can be found in theano.misc.gnumpy_utils.py
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
Derivatives in Theano Derivatives in Theano
===================== =====================
Computing gradients Computing Gradients
=================== ===================
Now let's use Theano for a slightly more sophisticated task: create a Now let's use Theano for a slightly more sophisticated task: create a
...@@ -16,7 +16,7 @@ For instance, we can compute the ...@@ -16,7 +16,7 @@ For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`. Note that: gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`. :math:`d(x^2)/dx = 2 \cdot x`.
Here is code to compute this gradient: Here is the code to compute this gradient:
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_4 .. theano/tests/test_tutorial.py:T_examples.test_examples_4
...@@ -74,15 +74,14 @@ array([[ 0.25 , 0.19661193], ...@@ -74,15 +74,14 @@ array([[ 0.25 , 0.19661193],
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
the theano expression for computing :math:`\frac{\partial s}{\partial w}`. In the theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation this way Theano can be used for doing **efficient** symbolic differentiation
(as (as the expression return by ``T.grad`` will be optimized during compilation), even for
the expression return by ``T.grad`` will be optimized during compilation) even for
function with many inputs. ( see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description function with many inputs. ( see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation). of symbolic differentiation).
.. note:: .. note::
The second argument of ``T.grad`` can be a list, in which case the The second argument of ``T.grad`` can be a list, in which case the
output is also a list. The order in both list is important, element output is also a list. The order in both lists is important, element
*i* of the output list is the gradient of the first argument of *i* of the output list is the gradient of the first argument of
``T.grad`` with respect to the *i*-th element of the list given as second argument. ``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor The first argument of ``T.grad`` has to be a scalar (a tensor
...@@ -90,7 +89,6 @@ of symbolic differentiation). ...@@ -90,7 +89,6 @@ of symbolic differentiation).
``T.grad`` and details about the implementation, see :ref:`this <libdoc_gradient>`. ``T.grad`` and details about the implementation, see :ref:`this <libdoc_gradient>`.
Computing the Jacobian Computing the Jacobian
====================== ======================
...@@ -105,10 +103,10 @@ do is to loop over the entries in ``y`` and compute the gradient of ...@@ -105,10 +103,10 @@ do is to loop over the entries in ``y`` and compute the gradient of
.. note:: .. note::
``scan`` is a generic op in Theano that allows writting in a symbolic ``scan`` is a generic op in Theano that allows writing in a symbolic
manner all kind of recurrent equations. While in principle, creating manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task, symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. More effort is being done for improving the performance of ``scan``. For more
information about how to use this op, see :ref:`this <lib_scan>`. information about how to use this op, see :ref:`this <lib_scan>`.
...@@ -120,15 +118,15 @@ do is to loop over the entries in ``y`` and compute the gradient of ...@@ -120,15 +118,15 @@ do is to loop over the entries in ``y`` and compute the gradient of
array([[ 8., 0.], array([[ 8., 0.],
[ 0., 8.]]) [ 0., 8.]])
What we did in this code, is to generate a sequence of ints from ``0`` to What we do in this code is to generate a sequence of ints from ``0`` to
``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and ``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
at each step, we compute the gradient of element ``y[[i]`` with respect to at each step, we compute the gradient of element ``y[[i]`` with respect to
``x``. ``scan`` automatically concatenates all these rows, generating a ``x``. ``scan`` automatically concatenates all these rows, generating a
matrix, which corresponds to the Jacobian. matrix which corresponds to the Jacobian.
.. note:: .. note::
There are a few gotchas regarding ``T.grad``. One of them is that you There are a few pitfalls to be aware of regarding ``T.grad``. One of them is that you
can not re-write the above expression of the jacobian as cannot re-write the above expression of the jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y, ``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this non_sequences=x)``, even though from the documentation of scan this
seems possible. The reason is that ``y_i`` will not be a function of seems possible. The reason is that ``y_i`` will not be a function of
...@@ -142,7 +140,7 @@ Theano implements :func:`theano.gradient.hessian` macro that does all ...@@ -142,7 +140,7 @@ Theano implements :func:`theano.gradient.hessian` macro that does all
that is needed to compute the Hessian. The following text explains how that is needed to compute the Hessian. The following text explains how
to do it manually. to do it manually.
You can compute the Hessian manually as the Jacobian. The only You can compute the Hessian manually similarly to the Jacobian. The only
difference is that now, instead of computing the Jacobian of some expression difference is that now, instead of computing the Jacobian of some expression
``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some ``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some
scalar. scalar.
...@@ -159,34 +157,33 @@ array([[ 2., 0.], ...@@ -159,34 +157,33 @@ array([[ 2., 0.],
[ 0., 2.]]) [ 0., 2.]])
Jacobian times a vector Jacobian times a Vector
======================= =======================
Sometimes we can express the algorithm in terms of Jacobians times vectors, Sometimes we can express the algorithm in terms of Jacobians times vectors,
or vectors times Jacobians. Compared to evaluating the Jacobian and then or vectors times Jacobians. Compared to evaluating the Jacobian and then
doing the product, there are methods that computes the wanted result, while doing the product, there are methods that compute the desired results while
avoiding actually evaluating the Jacobian. This can bring about significant avoiding actual evaluation of the Jacobian. This can bring about significant
performance gains. A description of one such algorithm can be found here: performance gains. A description of one such algorithm can be found here:
* Barak A. Pearlmutter, "Fast Exact Multiplication by the Hessian", *Neural * Barak A. Pearlmutter, "Fast Exact Multiplication by the Hessian", *Neural
Computation, 1994* Computation, 1994*
While in principle we would want Theano to identify such patterns for us, While in principle we would want Theano to identify these patterns automatically for us,
in practice, implementing such optimizations in a generic manner can be in paractice, implementing such optimizations in a generic manner is extremely
close to impossible. As such, we offer special functions that difficult. Therefore, we offer special functions dedicated to these tasks.
can be used to compute such expression.
R-operator R-operator
---------- ----------
The *R operator* is suppose to evaluate the product between a Jacobian and a The *R operator* is built to evaluate the product between a Jacobian and a
vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in can be extended even for `x` being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such of tensor product. Because in practice we end up needing to compute such
expression in terms of weight matrices, theano supports this more generic expressions in terms of weight matrices, theano supports this more generic
meaning of the operation. In order to evaluate the *R-operation* of form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v`` expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
you need to do something similar to this: you need to do something similar to this:
...@@ -205,10 +202,10 @@ array([ 2., 2.]) ...@@ -205,10 +202,10 @@ array([ 2., 2.])
L-operator L-operator
---------- ----------
Similar to *R-operator* the *L-operator* would compute a *row* vector times Similar to *R-operator*, the *L-operator* would compute a *row* vector times
the Jacobian. The mathematical forumla would be :math:`v \frac{\partial the Jacobian. The mathematical forumla would be :math:`v \frac{\partial
f(x)}{\partial x}`. As for the *R-operator*, the *L-operator* is supported f(x)}{\partial x}`. As for the *R-operator*, the *L-operator* is supported
for generic tensors (not only for vectors). Similarly, it can be used as for generic tensors (not only for vectors). Similarly, it can be implemented as
follows: follows:
>>> W = T.dmatrix('W') >>> W = T.dmatrix('W')
...@@ -226,21 +223,21 @@ array([[ 0., 0.], ...@@ -226,21 +223,21 @@ array([[ 0., 0.],
`v`, the evaluation point, differs between the *L-operator* and the *R-operator*. `v`, the evaluation point, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the evaluation point needs to have the same shape For the *L-operator*, the evaluation point needs to have the same shape
as the output, while for the *R-operator* the evaluation point should as the output, while for the *R-operator* the evaluation point should
have the same shape as the input parameter. Also the result of these two have the same shape as the input parameter. Also, the results of these two
opeartion differs. The result of the *L-operator* is of the same shape operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* is the same as the input parameter, while the result of the *R-operator* is the same
as the output. as the output.
Hessian times a vector Hessian times a Vector
====================== ======================
If you need to compute the Hessian times a vector, you can make use of the If you need to compute the Hessian times a vector, you can make use of the
above defined operators to do it more efficiently than actually computing above defined operators to do it more efficiently than actually computing
the exact Hessian and then doing the product. Due to the symmetry of the the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will Hessian matrix, you have two options that will
give you the same result, though these options might exhibit different performance, so we give you the same result, though these options might exhibit differing performances.
suggest to profile the methods before using either of the two: Hence, we suggest profiling the methods before using either of the two:
>>> x = T.dvector('x') >>> x = T.dvector('x')
...@@ -265,18 +262,18 @@ or, making use of the *R-operator*: ...@@ -265,18 +262,18 @@ or, making use of the *R-operator*:
array([ 4., 4.]) array([ 4., 4.])
Final notes Final Pointers
=========== ==============
* T.grad works symbolically: takes and returns a Theano variable. * The ``grad`` function works symbolically: it takes and returns a Theano variable.
* Can be compared to a macro: can be applied multiple times. * It can be compared to a macro since it can be applied repeatedly.
* Handles scalar costs only. * It directly handles scalar costs only.
* However, a simple recipe allows to compute efficiently vector x Jacobian and vector x Hessian. * Built-in functions allow to compute efficiently vector times Jacobian and vector times Hessian.
* Work is in progress on the missing optimizations to be able to compute efficiently the full * Work is in progress on the optimizations required to compute efficiently the full
Jacobian and Hessian matrices and Jacobian times x vector. Jacobian and Hessian matrices and the Jacobian times vector.
...@@ -24,7 +24,7 @@ as you would in the course of any other Python program. ...@@ -24,7 +24,7 @@ as you would in the course of any other Python program.
.. _pickle: http://docs.python.org/library/pickle.html .. _pickle: http://docs.python.org/library/pickle.html
The basics of pickling The Basics of Pickling
====================== ======================
The two modules ``pickle`` and ``cPickle`` have the same functionalities, but The two modules ``pickle`` and ``cPickle`` have the same functionalities, but
...@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with ...@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with
.. note:: .. note::
If you want your saved object to be stored efficiently, don't forget If you want your saved object to be stored efficiently, don't forget
to use ``cPickle.HIGHEST_PROTOCOL``, the resulting file can be to use ``cPickle.HIGHEST_PROTOCOL``. The resulting file can be
dozens of times smaller than with the default protocol. dozens of times smaller than with the default protocol.
.. note:: .. note::
...@@ -81,7 +81,7 @@ For more details about pickle's usage, see ...@@ -81,7 +81,7 @@ For more details about pickle's usage, see
`Python documentation <http://docs.python.org/library/pickle.html#usage>`_. `Python documentation <http://docs.python.org/library/pickle.html#usage>`_.
Short-term serialization Short-Term Serialization
======================== ========================
If you are confident that the class instance you are serializing will be If you are confident that the class instance you are serializing will be
...@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of: ...@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of:
self.training_set = cPickle.load(file(self.training_set_file, 'rb')) self.training_set = cPickle.load(file(self.training_set_file, 'rb'))
Long-term serialization Long-Term Serialization
======================= =======================
If the implementation of the class you want to save is quite unstable, for If the implementation of the class you want to save is quite unstable, for
...@@ -138,7 +138,7 @@ matrix ``W`` and a bias ``b``, you can define: ...@@ -138,7 +138,7 @@ matrix ``W`` and a bias ``b``, you can define:
self.W = W self.W = W
self.b = b self.b = b
If, at some point in time, ``W`` is renamed to ``weights`` and ``b`` to If at some point in time ``W`` is renamed to ``weights`` and ``b`` to
``bias``, the older pickled files will still be usable, if you update these ``bias``, the older pickled files will still be usable, if you update these
functions to reflect the change in name: functions to reflect the change in name:
......
...@@ -8,17 +8,17 @@ Loop ...@@ -8,17 +8,17 @@ Loop
Scan Scan
==== ====
- A general form of **recurrence**, which can be used for looping. - A general form of *recurrence*, which can be used for looping.
- **Reduction** and **map** (loop over the leading dimensions) are special cases of scan. - *Reduction* and *map* (loop over the leading dimensions) are special cases of scan.
- You 'scan' a function along some input sequence, producing an output at each time-step. - You 'scan' a function along some input sequence, producing an output at each time-step.
- The function can see the **previous K time-steps** of your function. - The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``. - ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping. - Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over for loops: - Advantages of using ``scan`` over for loops:
- Number of iterations to be part of the symbolic graph. - Number of iterations to be part of the symbolic graph.
- Minimizes GPU transfers if GPU is involved. - Minimizes GPU transfers (if GPU is involved).
- Compute gradients through sequential steps. - Computes gradients through sequential steps.
- Slightly faster than using a for loop in Python with a compiled Theano function. - Slightly faster than using a for loop in Python with a compiled Theano function.
- Can lower the overall memory usage by detecting the actual amount of memory needed. - Can lower the overall memory usage by detecting the actual amount of memory needed.
...@@ -81,8 +81,9 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`. ...@@ -81,8 +81,9 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
**Exercise** **Exercise**
- Run both examples. Run both examples.
- Modify and execute the polynomial example to have the reduction done by scan.
Modify and execute the polynomial example to have the reduction done by ``scan``.
------------------------------------------- -------------------------------------------
...@@ -2,15 +2,15 @@ ...@@ -2,15 +2,15 @@
.. _using_modes: .. _using_modes:
========================================== ==========================================
Configuration settings and Compiling modes Configuration Settings and Compiling Modes
========================================== ==========================================
Configuration Configuration
============= =============
The config module contains many ``attributes`` that modify Theano's behavior. Many of these The ``config`` module contains several ``attributes`` that modify Theano's behavior. Many of these
attributes are consulted during the import of the ``theano`` module and many are assumed to be attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only. read-only.
*As a rule, the attributes in this module should not be modified by user code.* *As a rule, the attributes in this module should not be modified by user code.*
...@@ -38,7 +38,7 @@ variables, type this from the command-line: ...@@ -38,7 +38,7 @@ variables, type this from the command-line:
**Exercise** **Exercise**
Consider once again the logistic regression: Consider the logistic regression:
.. code-block:: python .. code-block:: python
...@@ -63,7 +63,6 @@ Consider once again the logistic regression: ...@@ -63,7 +63,6 @@ Consider once again the logistic regression:
#print "Initial model:" #print "Initial model:"
#print w.get_value(), b.get_value() #print w.get_value(), b.get_value()
# Construct Theano expression graph # Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1 prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
...@@ -91,7 +90,6 @@ Consider once again the logistic regression: ...@@ -91,7 +90,6 @@ Consider once again the logistic regression:
print train.maker.fgraph.toposort() print train.maker.fgraph.toposort()
for i in range(training_steps): for i in range(training_steps):
pred, err = train(D[0], D[1]) pred, err = train(D[0], D[1])
#print "Final model:" #print "Final model:"
...@@ -105,19 +103,24 @@ Consider once again the logistic regression: ...@@ -105,19 +103,24 @@ Consider once again the logistic regression:
Modify and execute this example to run on CPU (the default) with floatX=float32 and Modify and execute this example to run on CPU (the default) with floatX=float32 and
time with ``time python file.py``. time the execution using the command line ``time python file.py``.
.. TODO: To be resolved:
.. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
.. Why the latter portion?
????You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
.. Note:: .. Note::
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code. * Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* Cast inputs before putting them into a shared variable. * Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of int32 with float32 to float64: * Circumvent the automatic cast of int32 with float32 to float64:
* Insert manual cast in your code or use [u]int{8,16}. * Insert manual cast in your code or use [u]int{8,16}.
* Insert manual cast around the mean operator (this involves division by length, which is an int64). * Insert manual cast around the mean operator (this involves division by length, which is an int64).
* A new casting mechanism is being developed. * Notice that a new casting mechanism is being developed.
------------------------------------------- -------------------------------------------
...@@ -237,25 +240,25 @@ is quite strict. ...@@ -237,25 +240,25 @@ is quite strict.
ProfileMode ProfileMode
=========== ===========
Beside checking for errors, another important task is to profile your Besides checking for errors, another important task is to profile your
code. For this Theano uses a special mode called ProfileMode which has code. For this Theano uses a special mode called ProfileMode which has
to be passed as an argument to :func:`theano.function <function.function>`. to be passed as an argument to :func:`theano.function <function.function>`.
Using the ProfileMode is a three-step process. Using the ProfileMode is a three-step process.
.. note:: .. note::
To change the default to it, put the Theano flags To switch the default accordingly, set the Theano flag
:attr:`config.mode` to ProfileMode. In that case, when the python :attr:`config.mode` to ProfileMode. In that case, when the Python
process exit, it will automatically print the profiling process exits, it will automatically print the profiling
information on the stdout. information on the standard output.
The memory profile of the output of each apply node can be enabled with the The memory profile of the output of each ``apply`` node can be enabled with the
Theano flag :attr:`config.ProfileMode.profile_memory`. Theano flag :attr:`config.ProfileMode.profile_memory`.
Creating a ProfileMode Instance Creating a ProfileMode Instance
------------------------------- -------------------------------
First create a ProfileMode instance. First create a ProfileMode instance:
>>> from theano import ProfileMode >>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker()) >>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
...@@ -270,7 +273,7 @@ implementations wherever possible should use the ``gof.OpWiseCLinker`` ...@@ -270,7 +273,7 @@ implementations wherever possible should use the ``gof.OpWiseCLinker``
using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker. using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker.
Compiling your Graph with ProfileMode Compiling your Graph with ProfileMode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -------------------------------------
Once the ProfileMode instance is created, simply compile your graph as you Once the ProfileMode instance is created, simply compile your graph as you
would normally, by specifying the mode parameter. would normally, by specifying the mode parameter.
...@@ -282,17 +285,13 @@ would normally, by specifying the mode parameter. ...@@ -282,17 +285,13 @@ would normally, by specifying the mode parameter.
>>> minst = m.make(mode=profmode) >>> minst = m.make(mode=profmode)
Retrieving Timing Information Retrieving Timing Information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -----------------------------
Once your graph is compiled, simply run the program or operation you wish to Once your graph is compiled, simply run the program or operation you wish to
profile, then call ``profmode.print_summary()``. This will provide you with profile, then call ``profmode.print_summary()``. This will provide you with
the desired timing information, indicating where your graph is spending most the desired timing information, indicating where your graph is spending most
of its time. of its time. This is best shown through an example. Let's use our logistic
regression example.
This is best shown through an example.
Lets use the example of logistic
regression. (Code for this example is in the file
``benchmark/regression/regression.py``.)
Compiling the module with ProfileMode and calling ``profmode.print_summary()`` Compiling the module with ProfileMode and calling ``profmode.print_summary()``
generates the following output: generates the following output:
...@@ -344,16 +343,16 @@ generates the following output: ...@@ -344,16 +343,16 @@ generates the following output:
""" """
The summary has two components to it. In the first section called the This output has two components. In the first section called
Apply-wise summary, timing information is provided for the worst *Apply-wise summary*, timing information is provided for the worst
offending Apply nodes. This corresponds to individual Op applications offending Apply nodes. This corresponds to individual Op applications
within your graph which take the longest to execute (so if you use within your graph which took longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion, ``dot`` twice, you will see two entries there). In the second portion,
the Op-wise summary, the execution time of all Apply nodes executing the *Op-wise summary*, the execution time of all Apply nodes executing
the same Op are grouped together and the total execution time per Op the same Op are grouped together and the total execution time per Op
is shown (so if you use ``dot`` twice, you will see only one entry is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them). there corresponding to the sum of the time spent in each of them).
Note that the ProfileMode also shows which Ops were running a c Finally, notice that the ProfileMode also shows which Ops were running a C
implementation. implementation.
...@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a ...@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a
neural network then the weights from the input to the first hidden neural network then the weights from the input to the first hidden
layer would represent a matrix of size (5, #hid). layer would represent a matrix of size (5, #hid).
If I have an array: Consider this array:
>>> numpy.asarray([[1., 2], [3, 4], [5, 6]]) >>> numpy.asarray([[1., 2], [3, 4], [5, 6]])
array([[ 1., 2.], array([[ 1., 2.],
...@@ -61,5 +61,5 @@ array([2., 4., 6.]) ...@@ -61,5 +61,5 @@ array([2., 4., 6.])
The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size
as ``a`` during the multiplication. This trick is often useful in as ``a`` during the multiplication. This trick is often useful in
simplifying how expression are written. More details about *broadcasting* simplifying how expression are written. More detail about *broadcasting*
can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__. can be found in the `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
...@@ -5,7 +5,8 @@ ...@@ -5,7 +5,8 @@
Python tutorial Python tutorial
*************** ***************
In this documentation, we suppose that reader know python. Here is a small list of python tutorials/exercices if you know know it or need a refresher: In this documentation, we suppose that the reader knows Python. Here is a small list of Python
tutorials/exercises if you need to learn it or only need a refresher:
* `Python Challenge <http://www.pythonchallenge.com/>`__ * `Python Challenge <http://www.pythonchallenge.com/>`__
* `Dive into Python <http://diveintopython.net/>`__ * `Dive into Python <http://diveintopython.net/>`__
......
.. _shape_info: .. _shape_info:
============================================ ==========================================
How shape informations are handled by Theano How Shape Information is Handled by Theano
============================================ ==========================================
It is not possible to enforce strict shape into a Theano variable when It is not possible to strictly enforce the shape of a Theano variable when
building a graph. The given parameter of theano.function can change the building a graph since the particular value provided for a parameter of the theano.function can change the
shape any TheanoVariable in a graph. shape any Theano variable in its graph.
Currently shape informations are used for 2 things in Theano: Currently, information regarding shape is used in two ways in Theano:
- When the exact shape is known, we use it to generate faster c code for - When the exact output shape is known, to generate faster C code for
the 2d convolution on the cpu and gpu. the 2d convolution on the CPU and GPU.
- To remove computations in the graph when we only want to know the - To remove computations in the graph when we only want to know the
shape, but not the actual value of a variable. This is done with the shape, but not the actual value of a variable. This is done with the
...@@ -32,11 +32,11 @@ Currently shape informations are used for 2 things in Theano: ...@@ -32,11 +32,11 @@ Currently shape informations are used for 2 things in Theano:
# |Shape_i{1} [@43797968] '' 0 # |Shape_i{1} [@43797968] '' 0
# | |x [@43423568] # | |x [@43423568]
The output of this compiled function do not contain any multiplication The output of this compiled function does not contain any multiplication
or power. Theano has removed them to compute directly the shape of the or power. Theano has removed them to compute directly the shape of the
output. output.
Shape inference problem Shape Inference Problem
======================= =======================
Theano propagates shape information in the graph. Sometimes this Theano propagates shape information in the graph. Sometimes this
...@@ -83,20 +83,20 @@ can lead to errors. For example: ...@@ -83,20 +83,20 @@ can lead to errors. For example:
# |y [@44540304] # |y [@44540304]
f(xv,yv) f(xv,yv)
# Raise a dimensions mismatch error. # Raises a dimensions mismatch error.
As you see, when you ask for the shape of some computation (join in the As you can see, when asking only for the shape of some computation (``join`` in the
example), we sometimes compute an inferred shape directly, without executing example), an inferred shape is computed directly, without executing
the computation itself (there is no join in the first output or debugprint). the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can hide errors. In This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of join is done on the first the example, the computation of the shape of ``join`` is done on the first
theano variable in the join, not on the other. theano variable in the ``join`` computation and not on the other.
This can probably happen with many other op as elemwise, dot, ... This might happen with other ops such as elemwise, dot, ...
Indeed, to make some optimizations (for speed or stability, for instance), Indeed, to make some optimizations (for speed or stability, for instance),
Theano can assume that the computation is correct and consistent Theano assumes that the computation is correct and consistent
in the first place, this is the case here. in the first place, as it does here.
You can detect those problem by running the code without this You can detect those problem by running the code without this
optimization, with the Theano flag optimization, with the Theano flag
...@@ -106,23 +106,23 @@ optimization, nor most other optimizations) or DEBUG_MODE (it will test ...@@ -106,23 +106,23 @@ optimization, nor most other optimizations) or DEBUG_MODE (it will test
before and after all optimizations (much slower)). before and after all optimizations (much slower)).
Specifing exact shape Specifing Exact Shape
===================== =====================
Currently, specifying a shape is not as easy as we want. We plan some Currently, specifying a shape is not as easy and flexible as we want and we plan some
upgrade, but this is the current state of what can be done. upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the `ConvOp` created - You can pass the shape info directly to the `ConvOp` created
when calling conv2d. You must add the parameter image_shape when calling conv2d. You simply add the parameters image_shape
and filter_shape to that call. They but most be tuple of 4 and filter_shape to the call. They must be tuples of 4
elements. Ex: elements. Ex:
.. code-block:: python .. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4)) theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape anywhere in the - You can use the SpecifyShape op to add shape info anywhere in the
graph. This allows to do some optimizations. In the following example, graph. This allows to perform some optimizations. In the following example,
this allows to precompute the Theano function to a constant. this allows to precompute the Theano function to a constant.
.. code-block:: python .. code-block:: python
...@@ -134,10 +134,10 @@ upgrade, but this is the current state of what can be done. ...@@ -134,10 +134,10 @@ upgrade, but this is the current state of what can be done.
theano.printing.debugprint(f) theano.printing.debugprint(f)
# [2 2] [@72791376] # [2 2] [@72791376]
Future plans Future Plans
============ ============
- Add the parameter "constant shape" to theano.shared(). This is probably - Add the parameter "constant shape" to theano.shared(). This is probably
the most frequent use case when we will use it. This will make the code the most frequent case with ``shared variables``. This will make the code
simpler and we will be able to check that the shape does not change when simpler and will make it possible to check that the shape does not change when
we update the shared variable. updating the shared variable.
...@@ -12,7 +12,7 @@ Theano Graphs ...@@ -12,7 +12,7 @@ Theano Graphs
Debugging or profiling code written in Theano is not that simple if you Debugging or profiling code written in Theano is not that simple if you
do not know what goes on under the hood. This chapter is meant to do not know what goes on under the hood. This chapter is meant to
introduce you to a required minimum of the inner workings of Theano, introduce you to a required minimum of the inner workings of Theano,
for more details see :ref:`extending`. for more detail see :ref:`extending`.
The first step in writing Theano code is to write down all mathematical The first step in writing Theano code is to write down all mathematical
relations using symbolic placeholders (**variables**). When writing down relations using symbolic placeholders (**variables**). When writing down
...@@ -28,8 +28,8 @@ Theano builds internally a graph structure composed of interconnected ...@@ -28,8 +28,8 @@ Theano builds internally a graph structure composed of interconnected
**variables**. It is important to make the difference between the **variables**. It is important to make the difference between the
definition of a computation represented by an **op** and its application definition of a computation represented by an **op** and its application
to some actual data which is represented by the **apply** node. For more to some actual data which is represented by the **apply** node. For more
details about these building blocks see :ref:`variable`, :ref:`op`, detail about these building blocks see :ref:`variable`, :ref:`op`,
:ref:`apply`. A graph example is the following: :ref:`apply`. Here is a an example of a graph:
**Code** **Code**
...@@ -77,7 +77,7 @@ output. You can now print the name of the op that is applied to get ...@@ -77,7 +77,7 @@ output. You can now print the name of the op that is applied to get
>>> y.owner.op.name >>> y.owner.op.name
'Elemwise{mul,no_inplace}' 'Elemwise{mul,no_inplace}'
So a elementwise multiplication is used to compute ``y``. This Hence, an elementwise multiplication is used to compute ``y``. This
multiplication is done between the inputs: multiplication is done between the inputs:
>>> len(y.owner.inputs) >>> len(y.owner.inputs)
...@@ -101,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` : ...@@ -101,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` :
[2.0] [2.0]
Starting from this graph structure it is easy to understand how Starting from this graph structure it is easier to understand how
*automatic differentiation* is done, or how the symbolic relations *automatic differentiation* proceeds and how the symbolic relations
can be optimized for performance or stability. can be *optimized* for performance or stability.
Automatic Differentiation Automatic Differentiation
...@@ -159,4 +159,4 @@ Consider the following example of optimization: ...@@ -159,4 +159,4 @@ Consider the following example of optimization:
.. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png .. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
====================================================== ===================================================== ====================================================== =====================================================
Symbolic programming involves a paradigm shift: people need to use it to understand it. Symbolic programming involves a paradigm shift: it is best to use it in order to understand it.
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论