提交 c86c72f4 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

Correct Theano's tutorial: typos and layout

上级 3bffa49b
.. _glossary:
Glossary of terminology
=======================
Glossary
========
.. glossary::
......
.. _adding:
====================
Baby steps - Algebra
Baby Steps - Algebra
====================
Adding two scalars
Adding two Scalars
==================
So, to get us started with Theano and get a feel of what we're working with,
......@@ -117,7 +117,7 @@ argument is what we want to see as output when we apply the function.
``f`` may then be used like a normal Python function.
Adding two matrices
Adding two Matrices
===================
You might already have guessed how to do this. Indeed, the only change
......
......@@ -7,39 +7,39 @@ Understanding Memory Aliasing for Speed and Correctness
The aggressive reuse of memory is one of the ways Theano makes code fast, and
it's important for the correctness and speed of your program that you understand
which buffers Theano might alias to which others.
which buffers Theano might alias to which other.
This file describes the principles for how Theano treats memory, and explains
when you might want to change the default behaviour of some functions and
This section describes the principles based on which Theano treats memory, and explains
when you might want to alter the default behaviour of some functions and
methods for faster performance.
The memory model: 2 spaces
==========================
The Memory Model: Two Spaces
============================
There are some simple principles that guide Theano's treatment of memory. The
main idea is that there is a pool of memory managed by Theano, and Theano tracks
changes to values in that pool.
1. Theano manages its own memory space, which typically does not overlap with
the memory of normal python variables that non-Theano code creates.
* Theano manages its own memory space, which typically does not overlap with
the memory of normal Python variables that non-Theano code creates.
1. Theano Functions only modify buffers that are in Theano's memory space.
* Theano functions only modify buffers that are in Theano's memory space.
1. Theano's memory space includes the buffers allocated to store shared
variables and the temporaries used to evaluate Functions.
* Theano's memory space includes the buffers allocated to store shared
variables and the temporaries used to evaluate functions.
1. Physically, Theano's memory space may be spread across the host, a GPU
device(s), and in the future may even include objects on a remote machine.
* Physically, Theano's memory space may be spread across the host, a GPU
device(s), and in the future may even include objects on a remote machine.
1. The memory allocated for a shared variable buffer is unique: it is never
aliased to another shared variable.
* The memory allocated for a shared variable buffer is unique: it is never
aliased to another shared variable.
1. Theano's managed memory is constant while Theano Functions are not running
and Theano library code is not running.
* Theano's managed memory is constant while Theano functions are not running
and Theano's library code is not running.
1. The default behaviour of Function is to return user-space values for
outputs, and to expect user-space values for inputs.
* The default behaviour of a function is to return user-space values for
outputs, and to expect user-space values for inputs.
The distinction between Theano-managed memory and user-managed memory can be
broken down by some Theano functions (e.g. shared, get_value and the
......@@ -49,9 +49,9 @@ operations) at the expense of risking subtle bugs in the overall program (by
aliasing memory).
The rest of this section is aimed at helping you to understand when it is safe
to use the ``borrow=True`` argument and reap the benefit of faster code.
to use the ``borrow=True`` argument and reap the benefits of faster code.
Borrowing when creating shared variables
Borrowing when Creating Shared Variables
========================================
A ``borrow`` argument can be provided to the shared-variable constructor.
......@@ -109,7 +109,7 @@ It is not a reliable technique to use ``borrow=True`` to modify shared variables
by side-effect, because with some devices (e.g. GPU devices) this technique will
not work.
Borrowing when accessing value of shared variables
Borrowing when Accessing Value of Shared Variables
==================================================
Retrieving
......@@ -139,7 +139,7 @@ The reason that ``borrow=True`` might still make a copy is that the internal
representation of a shared variable might not be what you expect. When you
create a shared variable by passing a numpy array for example, then ``get_value()``
must return a numpy array too. That's how Theano can make the GPU use
transparent. But when you are using a GPU (or in future perhaps a remote machine), then the numpy.ndarray
transparent. But when you are using a GPU (or in the future perhaps a remote machine), then the numpy.ndarray
is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to
......@@ -213,7 +213,7 @@ be costly. Here are a few tips to ensure fast and efficient use of GPU memory a
here: :ref:`libdoc_cuda_var`)
Retrieving and assigning via the .value property
Retrieving and Assigning via the .value Property
------------------------------------------------
Shared variables have a ``.value`` property that is connected to ``get_value``
......@@ -234,7 +234,7 @@ potential impact on your code, use the ``.get_value`` and ``.set_value`` methods
directly with appropriate flags.
Borrowing when constructing Function objects
Borrowing when Constructing Function Objects
============================================
A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects
......@@ -276,6 +276,7 @@ hints that give more flexibility to the compilation and optimization of the
graph.
*Take home message:*
When an input ``x`` to a function is not needed after the function returns and you
would like to make it available to Theano as additional workspace, then consider
marking it with ``In(x, borrow=True)``. It may make the function faster and
......
......@@ -4,15 +4,15 @@
Conditions
==========
IfElse vs switch
IfElse vs Switch
================
- Build condition over symbolic variables.
- IfElse Op takes a `boolean` condition and two variables to compute as input.
- Switch take a `tensor` as condition and two variables to compute as input.
- Switch is an elementwise operation. It is more general than IfElse.
- While Switch Op evaluates both 'output' variables, IfElse Op is lazy and only
- Both Ops build a condition over symbolic variables.
- ``IfElse`` takes a `boolean` condition and two variables as inputs.
- ``Switch`` takes a `tensor` as condition and two variables as inputs.
``switch`` is an elementwise operation and it is more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables ``ifelse`` is lazy and only
evaluates one variable respect to the condition.
**Example**
......@@ -62,11 +62,10 @@ since it computes only one variable instead of both.
time spent evaluating one value 0.3500 sec
It is actually important to use ``linker='vm'`` or ``linker='cvm'``,
otherwise IfElse will compute both variables and take the same computation
time as the Switch Op. The linker is not currently set by default to 'cvm' but
Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both variables and take the same computation
time as ``switch``. The linker is not currently set by default to 'cvm' but
it will be in a near future.
There is not an optimization to automatically change a switch with a
broadcasted scalar to an ifelse, as this is not always the faster. See
There is not an optimization automatically replacing a ``switch`` with a
broadcasted scalar to an ``ifelse``, as this is not always faster. See
this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
......@@ -6,15 +6,16 @@ Debugging Theano: FAQ and Troubleshooting
=========================================
There are many kinds of bugs that might come up in a computer program.
This page is structured as an FAQ. It should provide recipes to tackle common
This page is structured as a FAQ. It should provide recipes to tackle common
problems, and introduce some of the tools that we use to find problems in our
Theano code, and even (it happens) in Theano's internals, such as
:ref:`using_debugmode`.
Isolating the problem/Testing Theano compiler
Isolating the Problem/Testing Theano Compiler
---------------------------------------------
You can run your Theano function in a DebugMode(:ref:`using_debugmode`). This test the Theano optimizations and help to find where NaN, inf and other problem come from.
You can run your Theano function in a DebugMode(:ref:`using_debugmode`).
This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.
Using Test Values
......@@ -102,7 +103,7 @@ can get Theano to give us the exact source of the error.
# provide Theano with a default test-value
x.tag.test_value = numpy.random.rand(5,10)
In the above, we're tagging the symbolic matrix ``x`` with a special test
In the above, we are tagging the symbolic matrix ``x`` with a special test
value. This allows Theano to evaluate symbolic expressions on-the-fly (by
calling the ``perform`` method of each Op), as they are being defined. Sources
of error can thus be identified with much more precision and much earlier in
......@@ -122,8 +123,8 @@ following error message, which properly identifies line 23 as the culprit.
The compute_test_value mechanism works as follows:
* Theano Constants and SharedVariable are used as is. No need to instrument them.
* A Theano ``Variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be
* Theano ``constants`` and ``shared variables`` are used as is. No need to instrument them.
* A Theano ``variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be
given a special test value through the attribute ``tag.test_value``.
* Theano automatically instruments intermediate results. As such, any quantity
derived from ``x`` will be given a `tag.test_value` automatically.
......@@ -139,11 +140,11 @@ The compute_test_value mechanism works as follows:
variable is missing a test value.
.. note::
This feature is currently not compatible with ``Scan`` and also with Ops
This feature is currently incompatible with ``Scan`` and also with Ops
which do not implement a ``perform`` method.
How do I print an intermediate value in a Function/Method?
How do I Print an Intermediate Value in a Function/Method?
----------------------------------------------------------
Theano provides a 'Print' Op to do this.
......@@ -177,7 +178,7 @@ precise inspection of what's being computed where, when, and how, see the
to remove them to know if this is the cause or not.
How do I print a graph (before or after compilation)?
How do I Print a Graph (before or after compilation)?
----------------------------------------------------------
Theano provides two functions (:func:`theano.pp` and
......@@ -190,7 +191,7 @@ You can read about them in :ref:`libdoc_printing`.
The function I compiled is too slow, what's up?
The Function I Compiled is Too Slow, what's up?
-----------------------------------------------
First, make sure you're running in FAST_RUN mode.
FAST_RUN is the default mode, but make sure by passing ``mode='FAST_RUN'``
......@@ -207,10 +208,10 @@ Tips:
.. _faq_wraplinker:
How do I step through a compiled function with the WrapLinker?
How do I Step through a Compiled Function with the WrapLinker?
--------------------------------------------------------------
This is not exactly an FAQ, but the doc is here for now...
This is not exactly a FAQ, but the doc is here for now...
It's pretty easy to roll-your-own evaluation mode.
Check out this one:
......@@ -248,7 +249,7 @@ Use your imagination :)
This can be a really powerful debugging tool.
Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all!
How to use pdb ?
How to Use pdb ?
----------------
In the majority of cases, you won't be executing from the interactive shell
......@@ -294,7 +295,7 @@ The call stack contains a few useful informations to trace back the source
of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line
tells us about the Op that caused the exception. In thise case it's a "mul"
tells us about the Op that caused the exception. In this case it's a "mul"
involving Variables name "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name.
......
......@@ -2,11 +2,11 @@
.. _basictutexamples:
=============
More examples
More Examples
=============
Logistic function
Logistic Function
=================
Here's another straightforward example, though a bit more elaborate
......@@ -61,7 +61,7 @@ array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]])
Computing more than one thing at the same time
Computing More than one Thing at the Same Time
==============================================
Theano supports functions with multiple outputs. For example, we can
......@@ -94,7 +94,7 @@ was reformatted for readability):
[ 1., 4.]])]
Setting a default value for an argument
Setting a Default Value for an Argument
=======================================
Let's say you want to define a function that adds two numbers, except
......@@ -152,7 +152,7 @@ array(33.0)
.. _functionstateexample:
Using shared variables
Using Shared Variables
======================
It is also possible to make a function with an internal state. For
......@@ -227,7 +227,7 @@ array(0)
You might be wondering why the updates mechanism exists. You can always
achieve a similar thing by returning the new expressions, and working with
them in numpy as usual. The updates mechanism can be a syntactic convenience,
them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
updates). Also, theano has more control over where and how shared variables are
......@@ -252,15 +252,15 @@ array(7)
>>> state.get_value() # old state still there, but we didn't use it
array(0)
The givens parameter can be used to replace any symbolic variable, not just a
The ``givens`` parameter can be used to replace any symbolic variable, not just a
shared variable. You can replace constants, and expressions, in general. Be
careful though, not to allow the expressions introduced by a givens
careful though, not to allow the expressions introduced by a ``givens``
substitution to be co-dependent, the order of substitution is not defined, so
the substitutions have to work in any order.
In practice, a good way of thinking about the ``givens`` is as a mechanism
that allows you to replace any part of your formula with a different
expression that evaluates to a tensor of same shape and dtype. ``givens``
expression that evaluates to a tensor of same shape and dtype.
.. _using_random_numbers:
......@@ -270,17 +270,17 @@ Using Random Numbers
Because in Theano you first express everything symbolically and
afterwards compile this expression to get functions,
using pseudo-random numbers is not as straightforward as it is in
numpy, though also not too complicated.
NumPy, though also not too complicated.
The way to think about putting randomness into Theano's computations is
to put random variables in your graph. Theano will allocate a numpy
to put random variables in your graph. Theano will allocate a NumPy
RandomStream object (a random number generator) for each such
variable, and draw from it as necessary. We will call this sort of
sequence of random numbers a *random stream*. *Random streams* are at
their core shared variables, so the observations on shared variables
hold here as well.
Brief example
Brief Example
-------------
Here's a brief example. The setup code is:
......@@ -325,8 +325,8 @@ random variable appears three times in the output expression.
>>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
Seedings Streams
----------------
Seeding Streams
---------------
Random variables can be seeded individually or collectively.
......@@ -344,7 +344,7 @@ of the random variables.
>>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each
Sharing Streams between Functions
Sharing Streams Between Functions
---------------------------------
As usual for shared variables, the random number generators used for random
......@@ -362,7 +362,7 @@ For example:
>>> v2 = f() # v2 != v1
Others Random Distributions
Other Random Distributions
---------------------------
There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
......@@ -371,7 +371,7 @@ There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
.. _logistic_regression:
A Real example: Logistic Regression
A Real Example: Logistic Regression
===================================
The preceding elements are put to work in this more realistic example. It will be used repeatedly.
......
......@@ -5,7 +5,7 @@
Extending Theano
****************
Theano graphs
Theano Graphs
-------------
- Theano works with symbolic graphs
......@@ -40,7 +40,6 @@ Inputs and Outputs are lists of Theano variables
See :ref:`dev_start_guide` for information about git, github, the
development workflow and how to make a quality contribution.
Op contract
-----------
......@@ -96,13 +95,13 @@ at run time. Currently there are 2 different possibilites:
implement the :func:`perform`
and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. The ``perform`` allows
to easily wrap an existing python function into Theano. The ``c_code``
and related methods allow the op to generate c code that will be
to easily wrap an existing Python function into Theano. The ``c_code``
and related methods allow the op to generate C code that will be
compiled and linked by Theano. On the other hand, the ``make_thunk``
method will be called only once during compilation and should generate
a ``thunk``: a standalone function that when called will do the wanted computations.
This is useful if you want to generate code and compile it yourself. For
example, this allows you to use PyCUDA to compile gpu code.
example, this allows you to use PyCUDA to compile GPU code.
Also there are 2 methods that are highly recommended to be implemented. They are
needed in order to merge duplicate computations involving your op. So if you
......@@ -110,7 +109,7 @@ do not want Theano to execute your op multiple times with the same inputs,
do implement them. Those methods are :func:`__eq__` and
:func:`__hash__`.
The :func:`infer_shape` method allows to infer shape of some variable, somewhere in the
The :func:`infer_shape` method allows to infer the shape of some variable, somewhere in the
middle of the computational graph without actually computing the outputs (when possible).
This could be helpful if one only needs the shape of the output instead of the actual outputs.
......@@ -123,7 +122,7 @@ string representation of your Op.
The :func:`R_op` method is needed if you want `theano.tensor.Rop` to
work with your op.
Op example
Op Example
----------
.. code-block:: python
......@@ -164,7 +163,7 @@ Op example
return eval_points
return self.grad(inputs, eval_points)
Try it!
Try it!:
.. code-block:: python
......@@ -177,15 +176,14 @@ Try it!
print inp
print out
How to test it
How To Test it
--------------
Theano has some functions to simplify testing. These help test the
``infer_shape``, ``grad`` and ``R_op`` methods. Put the following code
in a file and execute it with the ``nosetests`` program.
Basic tests
===========
**Basic Tests**
Basic tests are done by you just by using the Op and checking that it
returns the right answer. If you detect an error, you must raise an
......@@ -210,8 +208,7 @@ exception. You can use the `assert` keyword to automatically raise an
# Compare the result computed to the expected value.
assert numpy.allclose(inp * 2, out)
Testing the infer_shape
=======================
**Testing the infer_shape**
When a class inherits from the ``InferShapeTester`` class, it gets the
`self._compile_and_check` method that tests the Op ``infer_shape``
......@@ -248,8 +245,7 @@ see it fail, you can implement an incorrect ``infer_shape``.
# Op that should be removed from the graph.
self.op_class)
Testing the gradient
====================
**Testing the gradient**
The function :ref:`verify_grad <validating_grad>`
verifies the gradient of an Op or Theano graph. It compares the
......@@ -266,8 +262,7 @@ the multiplication by 2).
theano.tests.unittest_tools.verify_grad(self.op,
[numpy.random.rand(5, 7, 2)])
Testing the Rop
===============
**Testing the Rop**
The class :class:`RopLop_checker`, give the functions
:func:`RopLop_checker.check_mat_rop_lop`,
......@@ -310,16 +305,28 @@ You can also add this at the end of the test file:
t.setUp()
t.test_double_rop()
**Testing GPU Ops**
Ops that execute on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
to make the distinction between both. Currently, we use this to test
if the NVIDIA driver works correctly with our sum reduction code on the
GPU.
-------------------------------------------
**Exercise**
- Run the code in the file double_op.py.
- Modify and execute to compute: x * y
- Modify and execute the example to return 2 outputs: x + y and x - y
Run the code in the file double_op.py.
Modify and execute to compute: x * y.
- Our current element-wise fusion generates computation with only 1 output.
Modify and execute the example to return 2 outputs: x + y and x - y
(our current element-wise fusion generates computation with only 1 output).
SciPy
-----
......@@ -363,14 +370,8 @@ don't forget to call the parent ``setUp`` function.
For more details see :ref:`random_value_in_tests`.
GPU Op
------
Op that execute on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
to make the distinction between both. Currently, we use this to test
if the NVIDIA driver works correctly with our sum reduction code on the
gpu.
Documentation
......
.. _gpu_data_convert:
===================================
PyCUDA/CUDAMat/Gnumpy compatibility
PyCUDA/CUDAMat/gnumpy compatibility
===================================
PyCUDA
======
Currently PyCUDA and Theano have different objects to store GPU
Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports
strides. It supports only the float32 dtype. PyCUDA's implementation
is called GPUArray and doesn't support strides. However, it can deal with all Numpy and Cuda dtypes.
*strides*. It also only supports the float32 dtype. PyCUDA's implementation
is called GPUArray and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes.
We are currently working on having the same base object that will
mimic Numpy. Until this is ready, here is some information on how to
......@@ -23,8 +24,8 @@ Transfer
You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and
from CudaNdarray. The functions `to_cudandarray(x, copyif=False)` and
`to_gpuarray(x)` return a new object that occupies the same memory space
as the original. Otherwise it raises a ValueError. Because GPUArray don't
support strides, if the CudaNdarray is strided, we could copy it to
as the original. Otherwise it raises a ValueError. Because GPUArrays don't
support *strides*, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set `copyif=True` in
`to_gpuarray`.
......@@ -33,7 +34,7 @@ Compiling with PyCUDA
---------------------
You can use PyCUDA to compile CUDA functions that work directly on
CudaNdarray. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py`
CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py`:
.. code-block:: python
......@@ -75,7 +76,7 @@ CudaNdarray. Here is an example from the file `theano/misc/tests/test_pycuda_the
Theano op using PyCUDA function
-------------------------------
You can use gpu function compiled with PyCUDA in a Theano op. Here is an example..
You can use a GPU function compiled with PyCUDA in a Theano op. Here is an example:
.. code-block:: python
......@@ -119,15 +120,15 @@ You can use gpu function compiled with PyCUDA in a Theano op. Here is an example
CUDAMat
=======
There are functions for conversion between CUDAMat and Theano CudaNdArray objects.
There are functions for conversion between CUDAMats and Theano CudaNdArray objects.
They obey the same principles as PyCUDA's functions and can be found in
theano.misc.cudamat_utils.py
WARNING: There is a strange problem associated with stride/shape with those converters.
To work, the test needs a transpose and reshape...
Gnumpy
gnumpy
======
There are conversion functions between gnumpy garray object and Theano CudaNdArray.
There are conversion functions between gnumpy garray objects and Theano CudaNdArrays.
They are also similar to PyCUDA's and can be found in theano.misc.gnumpy_utils.py
......@@ -6,7 +6,7 @@
Derivatives in Theano
=====================
Computing gradients
Computing Gradients
===================
Now let's use Theano for a slightly more sophisticated task: create a
......@@ -16,7 +16,7 @@ For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`.
Here is code to compute this gradient:
Here is the code to compute this gradient:
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_4
......@@ -74,15 +74,14 @@ array([[ 0.25 , 0.19661193],
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
the theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation
(as
the expression return by ``T.grad`` will be optimized during compilation) even for
(as the expression return by ``T.grad`` will be optimized during compilation), even for
function with many inputs. ( see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation).
.. note::
The second argument of ``T.grad`` can be a list, in which case the
output is also a list. The order in both list is important, element
output is also a list. The order in both lists is important, element
*i* of the output list is the gradient of the first argument of
``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor
......@@ -90,7 +89,6 @@ of symbolic differentiation).
``T.grad`` and details about the implementation, see :ref:`this <libdoc_gradient>`.
Computing the Jacobian
======================
......@@ -105,10 +103,10 @@ do is to loop over the entries in ``y`` and compute the gradient of
.. note::
``scan`` is a generic op in Theano that allows writting in a symbolic
manner all kind of recurrent equations. While in principle, creating
``scan`` is a generic op in Theano that allows writing in a symbolic
manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. More
effort is being done for improving the performance of ``scan``. For more
information about how to use this op, see :ref:`this <lib_scan>`.
......@@ -120,15 +118,15 @@ do is to loop over the entries in ``y`` and compute the gradient of
array([[ 8., 0.],
[ 0., 8.]])
What we did in this code, is to generate a sequence of ints from ``0`` to
What we do in this code is to generate a sequence of ints from ``0`` to
``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
at each step, we compute the gradient of element ``y[[i]`` with respect to
``x``. ``scan`` automatically concatenates all these rows, generating a
matrix, which corresponds to the Jacobian.
matrix which corresponds to the Jacobian.
.. note::
There are a few gotchas regarding ``T.grad``. One of them is that you
can not re-write the above expression of the jacobian as
There are a few pitfalls to be aware of regarding ``T.grad``. One of them is that you
cannot re-write the above expression of the jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this
seems possible. The reason is that ``y_i`` will not be a function of
......@@ -142,7 +140,7 @@ Theano implements :func:`theano.gradient.hessian` macro that does all
that is needed to compute the Hessian. The following text explains how
to do it manually.
You can compute the Hessian manually as the Jacobian. The only
You can compute the Hessian manually similarly to the Jacobian. The only
difference is that now, instead of computing the Jacobian of some expression
``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some
scalar.
......@@ -159,34 +157,33 @@ array([[ 2., 0.],
[ 0., 2.]])
Jacobian times a vector
Jacobian times a Vector
=======================
Sometimes we can express the algorithm in terms of Jacobians times vectors,
or vectors times Jacobians. Compared to evaluating the Jacobian and then
doing the product, there are methods that computes the wanted result, while
avoiding actually evaluating the Jacobian. This can bring about significant
doing the product, there are methods that compute the desired results while
avoiding actual evaluation of the Jacobian. This can bring about significant
performance gains. A description of one such algorithm can be found here:
* Barak A. Pearlmutter, "Fast Exact Multiplication by the Hessian", *Neural
Computation, 1994*
While in principle we would want Theano to identify such patterns for us,
in practice, implementing such optimizations in a generic manner can be
close to impossible. As such, we offer special functions that
can be used to compute such expression.
While in principle we would want Theano to identify these patterns automatically for us,
in paractice, implementing such optimizations in a generic manner is extremely
difficult. Therefore, we offer special functions dedicated to these tasks.
R-operator
----------
The *R operator* is suppose to evaluate the product between a Jacobian and a
The *R operator* is built to evaluate the product between a Jacobian and a
vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such
expression in terms of weight matrices, theano supports this more generic
meaning of the operation. In order to evaluate the *R-operation* of
expressions in terms of weight matrices, theano supports this more generic
form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
you need to do something similar to this:
......@@ -205,10 +202,10 @@ array([ 2., 2.])
L-operator
----------
Similar to *R-operator* the *L-operator* would compute a *row* vector times
Similar to *R-operator*, the *L-operator* would compute a *row* vector times
the Jacobian. The mathematical forumla would be :math:`v \frac{\partial
f(x)}{\partial x}`. As for the *R-operator*, the *L-operator* is supported
for generic tensors (not only for vectors). Similarly, it can be used as
for generic tensors (not only for vectors). Similarly, it can be implemented as
follows:
>>> W = T.dmatrix('W')
......@@ -226,21 +223,21 @@ array([[ 0., 0.],
`v`, the evaluation point, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the evaluation point needs to have the same shape
as the output, while for the *R-operator* the evaluation point should
have the same shape as the input parameter. Also the result of these two
opeartion differs. The result of the *L-operator* is of the same shape
have the same shape as the input parameter. Also, the results of these two
operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* is the same
as the output.
Hessian times a vector
Hessian times a Vector
======================
If you need to compute the Hessian times a vector, you can make use of the
above defined operators to do it more efficiently than actually computing
the exact Hessian and then doing the product. Due to the symmetry of the
the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will
give you the same result, though these options might exhibit different performance, so we
suggest to profile the methods before using either of the two:
give you the same result, though these options might exhibit differing performances.
Hence, we suggest profiling the methods before using either of the two:
>>> x = T.dvector('x')
......@@ -265,18 +262,18 @@ or, making use of the *R-operator*:
array([ 4., 4.])
Final notes
===========
Final Pointers
==============
* T.grad works symbolically: takes and returns a Theano variable.
* The ``grad`` function works symbolically: it takes and returns a Theano variable.
* Can be compared to a macro: can be applied multiple times.
* It can be compared to a macro since it can be applied repeatedly.
* Handles scalar costs only.
* It directly handles scalar costs only.
* However, a simple recipe allows to compute efficiently vector x Jacobian and vector x Hessian.
* Built-in functions allow to compute efficiently vector times Jacobian and vector times Hessian.
* Work is in progress on the missing optimizations to be able to compute efficiently the full
Jacobian and Hessian matrices and Jacobian times x vector.
* Work is in progress on the optimizations required to compute efficiently the full
Jacobian and Hessian matrices and the Jacobian times vector.
......@@ -24,7 +24,7 @@ as you would in the course of any other Python program.
.. _pickle: http://docs.python.org/library/pickle.html
The basics of pickling
The Basics of Pickling
======================
The two modules ``pickle`` and ``cPickle`` have the same functionalities, but
......@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with
.. note::
If you want your saved object to be stored efficiently, don't forget
to use ``cPickle.HIGHEST_PROTOCOL``, the resulting file can be
to use ``cPickle.HIGHEST_PROTOCOL``. The resulting file can be
dozens of times smaller than with the default protocol.
.. note::
......@@ -81,7 +81,7 @@ For more details about pickle's usage, see
`Python documentation <http://docs.python.org/library/pickle.html#usage>`_.
Short-term serialization
Short-Term Serialization
========================
If you are confident that the class instance you are serializing will be
......@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of:
self.training_set = cPickle.load(file(self.training_set_file, 'rb'))
Long-term serialization
Long-Term Serialization
=======================
If the implementation of the class you want to save is quite unstable, for
......@@ -138,7 +138,7 @@ matrix ``W`` and a bias ``b``, you can define:
self.W = W
self.b = b
If, at some point in time, ``W`` is renamed to ``weights`` and ``b`` to
If at some point in time ``W`` is renamed to ``weights`` and ``b`` to
``bias``, the older pickled files will still be usable, if you update these
functions to reflect the change in name:
......
......@@ -8,17 +8,17 @@ Loop
Scan
====
- A general form of **recurrence**, which can be used for looping.
- **Reduction** and **map** (loop over the leading dimensions) are special cases of scan.
- A general form of *recurrence*, which can be used for looping.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of scan.
- You 'scan' a function along some input sequence, producing an output at each time-step.
- The function can see the **previous K time-steps** of your function.
- The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over for loops:
- Number of iterations to be part of the symbolic graph.
- Minimizes GPU transfers if GPU is involved.
- Compute gradients through sequential steps.
- Minimizes GPU transfers (if GPU is involved).
- Computes gradients through sequential steps.
- Slightly faster than using a for loop in Python with a compiled Theano function.
- Can lower the overall memory usage by detecting the actual amount of memory needed.
......@@ -81,8 +81,9 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
**Exercise**
- Run both examples.
- Modify and execute the polynomial example to have the reduction done by scan.
Run both examples.
Modify and execute the polynomial example to have the reduction done by ``scan``.
-------------------------------------------
......@@ -2,15 +2,15 @@
.. _using_modes:
==========================================
Configuration settings and Compiling modes
Configuration Settings and Compiling Modes
==========================================
Configuration
=============
The config module contains many ``attributes`` that modify Theano's behavior. Many of these
attributes are consulted during the import of the ``theano`` module and many are assumed to be
The ``config`` module contains several ``attributes`` that modify Theano's behavior. Many of these
attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only.
*As a rule, the attributes in this module should not be modified by user code.*
......@@ -38,7 +38,7 @@ variables, type this from the command-line:
**Exercise**
Consider once again the logistic regression:
Consider the logistic regression:
.. code-block:: python
......@@ -63,7 +63,6 @@ Consider once again the logistic regression:
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
......@@ -91,7 +90,6 @@ Consider once again the logistic regression:
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
......@@ -105,19 +103,24 @@ Consider once again the logistic regression:
Modify and execute this example to run on CPU (the default) with floatX=float32 and
time with ``time python file.py``.
time the execution using the command line ``time python file.py``.
.. TODO: To be resolved:
.. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
.. Why the latter portion?
????You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
.. Note::
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* Cast inputs before putting them into a shared variable.
* Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of int32 with float32 to float64:
* Insert manual cast in your code or use [u]int{8,16}.
* Insert manual cast around the mean operator (this involves division by length, which is an int64).
* A new casting mechanism is being developed.
* Notice that a new casting mechanism is being developed.
-------------------------------------------
......@@ -237,25 +240,25 @@ is quite strict.
ProfileMode
===========
Beside checking for errors, another important task is to profile your
Besides checking for errors, another important task is to profile your
code. For this Theano uses a special mode called ProfileMode which has
to be passed as an argument to :func:`theano.function <function.function>`.
Using the ProfileMode is a three-step process.
.. note::
To change the default to it, put the Theano flags
:attr:`config.mode` to ProfileMode. In that case, when the python
process exit, it will automatically print the profiling
information on the stdout.
To switch the default accordingly, set the Theano flag
:attr:`config.mode` to ProfileMode. In that case, when the Python
process exits, it will automatically print the profiling
information on the standard output.
The memory profile of the output of each apply node can be enabled with the
The memory profile of the output of each ``apply`` node can be enabled with the
Theano flag :attr:`config.ProfileMode.profile_memory`.
Creating a ProfileMode Instance
-------------------------------
First create a ProfileMode instance.
First create a ProfileMode instance:
>>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
......@@ -270,7 +273,7 @@ implementations wherever possible should use the ``gof.OpWiseCLinker``
using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker.
Compiling your Graph with ProfileMode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------------------
Once the ProfileMode instance is created, simply compile your graph as you
would normally, by specifying the mode parameter.
......@@ -282,17 +285,13 @@ would normally, by specifying the mode parameter.
>>> minst = m.make(mode=profmode)
Retrieving Timing Information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-----------------------------
Once your graph is compiled, simply run the program or operation you wish to
profile, then call ``profmode.print_summary()``. This will provide you with
the desired timing information, indicating where your graph is spending most
of its time.
This is best shown through an example.
Lets use the example of logistic
regression. (Code for this example is in the file
``benchmark/regression/regression.py``.)
of its time. This is best shown through an example. Let's use our logistic
regression example.
Compiling the module with ProfileMode and calling ``profmode.print_summary()``
generates the following output:
......@@ -344,16 +343,16 @@ generates the following output:
"""
The summary has two components to it. In the first section called the
Apply-wise summary, timing information is provided for the worst
This output has two components. In the first section called
*Apply-wise summary*, timing information is provided for the worst
offending Apply nodes. This corresponds to individual Op applications
within your graph which take the longest to execute (so if you use
within your graph which took longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion,
the Op-wise summary, the execution time of all Apply nodes executing
the *Op-wise summary*, the execution time of all Apply nodes executing
the same Op are grouped together and the total execution time per Op
is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them).
Note that the ProfileMode also shows which Ops were running a c
Finally, notice that the ProfileMode also shows which Ops were running a C
implementation.
......@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a
neural network then the weights from the input to the first hidden
layer would represent a matrix of size (5, #hid).
If I have an array:
Consider this array:
>>> numpy.asarray([[1., 2], [3, 4], [5, 6]])
array([[ 1., 2.],
......@@ -61,5 +61,5 @@ array([2., 4., 6.])
The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size
as ``a`` during the multiplication. This trick is often useful in
simplifying how expression are written. More details about *broadcasting*
can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
simplifying how expression are written. More detail about *broadcasting*
can be found in the `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
......@@ -5,7 +5,8 @@
Python tutorial
***************
In this documentation, we suppose that reader know python. Here is a small list of python tutorials/exercices if you know know it or need a refresher:
In this documentation, we suppose that the reader knows Python. Here is a small list of Python
tutorials/exercises if you need to learn it or only need a refresher:
* `Python Challenge <http://www.pythonchallenge.com/>`__
* `Dive into Python <http://diveintopython.net/>`__
......
.. _shape_info:
============================================
How shape informations are handled by Theano
============================================
==========================================
How Shape Information is Handled by Theano
==========================================
It is not possible to enforce strict shape into a Theano variable when
building a graph. The given parameter of theano.function can change the
shape any TheanoVariable in a graph.
It is not possible to strictly enforce the shape of a Theano variable when
building a graph since the particular value provided for a parameter of the theano.function can change the
shape any Theano variable in its graph.
Currently shape informations are used for 2 things in Theano:
Currently, information regarding shape is used in two ways in Theano:
- When the exact shape is known, we use it to generate faster c code for
the 2d convolution on the cpu and gpu.
- When the exact output shape is known, to generate faster C code for
the 2d convolution on the CPU and GPU.
- To remove computations in the graph when we only want to know the
shape, but not the actual value of a variable. This is done with the
......@@ -32,11 +32,11 @@ Currently shape informations are used for 2 things in Theano:
# |Shape_i{1} [@43797968] '' 0
# | |x [@43423568]
The output of this compiled function do not contain any multiplication
The output of this compiled function does not contain any multiplication
or power. Theano has removed them to compute directly the shape of the
output.
Shape inference problem
Shape Inference Problem
=======================
Theano propagates shape information in the graph. Sometimes this
......@@ -83,20 +83,20 @@ can lead to errors. For example:
# |y [@44540304]
f(xv,yv)
# Raise a dimensions mismatch error.
# Raises a dimensions mismatch error.
As you see, when you ask for the shape of some computation (join in the
example), we sometimes compute an inferred shape directly, without executing
the computation itself (there is no join in the first output or debugprint).
As you can see, when asking only for the shape of some computation (``join`` in the
example), an inferred shape is computed directly, without executing
the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can hide errors. In
the example, the computation of the shape of join is done on the first
theano variable in the join, not on the other.
This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of ``join`` is done on the first
theano variable in the ``join`` computation and not on the other.
This can probably happen with many other op as elemwise, dot, ...
This might happen with other ops such as elemwise, dot, ...
Indeed, to make some optimizations (for speed or stability, for instance),
Theano can assume that the computation is correct and consistent
in the first place, this is the case here.
Theano assumes that the computation is correct and consistent
in the first place, as it does here.
You can detect those problem by running the code without this
optimization, with the Theano flag
......@@ -106,23 +106,23 @@ optimization, nor most other optimizations) or DEBUG_MODE (it will test
before and after all optimizations (much slower)).
Specifing exact shape
Specifing Exact Shape
=====================
Currently, specifying a shape is not as easy as we want. We plan some
upgrade, but this is the current state of what can be done.
Currently, specifying a shape is not as easy and flexible as we want and we plan some
upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the `ConvOp` created
when calling conv2d. You must add the parameter image_shape
and filter_shape to that call. They but most be tuple of 4
when calling conv2d. You simply add the parameters image_shape
and filter_shape to the call. They must be tuples of 4
elements. Ex:
.. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape anywhere in the
graph. This allows to do some optimizations. In the following example,
- You can use the SpecifyShape op to add shape info anywhere in the
graph. This allows to perform some optimizations. In the following example,
this allows to precompute the Theano function to a constant.
.. code-block:: python
......@@ -134,10 +134,10 @@ upgrade, but this is the current state of what can be done.
theano.printing.debugprint(f)
# [2 2] [@72791376]
Future plans
Future Plans
============
- Add the parameter "constant shape" to theano.shared(). This is probably
the most frequent use case when we will use it. This will make the code
simpler and we will be able to check that the shape does not change when
we update the shared variable.
the most frequent case with ``shared variables``. This will make the code
simpler and will make it possible to check that the shape does not change when
updating the shared variable.
......@@ -12,7 +12,7 @@ Theano Graphs
Debugging or profiling code written in Theano is not that simple if you
do not know what goes on under the hood. This chapter is meant to
introduce you to a required minimum of the inner workings of Theano,
for more details see :ref:`extending`.
for more detail see :ref:`extending`.
The first step in writing Theano code is to write down all mathematical
relations using symbolic placeholders (**variables**). When writing down
......@@ -28,8 +28,8 @@ Theano builds internally a graph structure composed of interconnected
**variables**. It is important to make the difference between the
definition of a computation represented by an **op** and its application
to some actual data which is represented by the **apply** node. For more
details about these building blocks see :ref:`variable`, :ref:`op`,
:ref:`apply`. A graph example is the following:
detail about these building blocks see :ref:`variable`, :ref:`op`,
:ref:`apply`. Here is a an example of a graph:
**Code**
......@@ -77,7 +77,7 @@ output. You can now print the name of the op that is applied to get
>>> y.owner.op.name
'Elemwise{mul,no_inplace}'
So a elementwise multiplication is used to compute ``y``. This
Hence, an elementwise multiplication is used to compute ``y``. This
multiplication is done between the inputs:
>>> len(y.owner.inputs)
......@@ -101,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` :
[2.0]
Starting from this graph structure it is easy to understand how
*automatic differentiation* is done, or how the symbolic relations
can be optimized for performance or stability.
Starting from this graph structure it is easier to understand how
*automatic differentiation* proceeds and how the symbolic relations
can be *optimized* for performance or stability.
Automatic Differentiation
......@@ -159,4 +159,4 @@ Consider the following example of optimization:
.. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
====================================================== =====================================================
Symbolic programming involves a paradigm shift: people need to use it to understand it.
Symbolic programming involves a paradigm shift: it is best to use it in order to understand it.
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论