提交 dba02a39 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

Correct Theanos's tutorial: add cross-references, correct some more typos,…

Correct Theanos's tutorial: add cross-references, correct some more typos, improve style a little, correct the logical structure some more
上级 c86c72f4
......@@ -6,15 +6,15 @@ Extending Theano
================
This documentation is for users who want to extend Theano with new Types, new
This advanced tutorial is for users who want to extend Theano with new Types, new
Operations (Ops), and new graph optimizations.
Along the way, it also introduces many aspects of how Theano works, so it is
also good for you if you are interested in getting more under the hood with
Theano itself.
Before tackling this tutorial, it is highly recommended to read the
:ref:`tutorial`.
Before tackling this more advanced presentation, it is highly recommended to read the
introductory :ref:`Tutorial<tutorial>`.
The first few pages will walk you through the definition of a new :ref:`type`,
``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We
......
.. _libdoc_compile_mode:
======================================
:mod:`mode` -- controlling compilation
======================================
......
.. currentmodule:: tensor
.. _libdoc_basic_tensor:
===========================
Basic Tensor Functionality
===========================
......
......@@ -7,7 +7,7 @@ Baby Steps - Algebra
Adding two Scalars
==================
So, to get us started with Theano and get a feel of what we're working with,
To get us started with Theano and get a feel of what we're working with,
let's make a simple function: add two numbers together. Here is how you do
it:
......@@ -38,7 +38,7 @@ to add. Note that from now on, we will use the term
If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function``
instruction. Behind the scenes, ``f`` was being compiled into C code.
instruction. Behind the scene, ``f`` was being compiled into C code.
.. note:
......@@ -80,13 +80,14 @@ TensorType(float64, scalar)
>>> x.type is T.dscalar
True
You can learn more about the structures in Theano in :ref:`graphstructures`.
By calling ``T.dscalar`` with a string argument, you create a
*Variable* representing a floating-point scalar quantity with the
given name. If you provide no argument, the symbol will be unnamed. Names
are not required, but they can help debugging.
More will be said in a moment regarding Theano's inner structure. You
could also learn more by looking into :ref:`graphstructures`.
**Step 2**
......@@ -112,9 +113,8 @@ and giving ``z`` as output:
The first argument to :func:`function <function.function>` is a list of Variables
that will be provided as inputs to the function. The second argument
is a single Variable *or* a list of Variables. For either case, the second
argument is what we want to see as output when we apply the function.
``f`` may then be used like a normal Python function.
argument is what we want to see as output when we apply the function. ``f`` may
then be used like a normal Python function.
Adding two Matrices
......@@ -132,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and
>>> z = x + y
>>> f = function([x, y], z)
``dmatrix`` is the Type for matrices of doubles. And then we can use
``dmatrix`` is the Type for matrices of doubles. Then we can use
our new function on 2D arrays:
>>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])
array([[ 11., 22.],
[ 33., 44.]])
The variable is a numpy array. We can also use numpy arrays directly as
The variable is a NumPy array. We can also use NumPy arrays directly as
inputs:
>>> import numpy
......@@ -160,8 +160,8 @@ The following types are available:
* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4
The previous list is not exhaustive. A guide to all types compatible
with numpy arrays may be found :ref:`here <libdoc_tensor_creation>`.
The previous list is not exhaustive and a guide to all types compatible
with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.
.. note::
......
......@@ -84,7 +84,7 @@ subsequently make to ``np_array`` have no effect on our shared variable.
If we are running this with the CPU as the device,
then changes we make to np_array *right away* will show up in
``s_true.get_value()``
because numpy arrays are mutable, and ``s_true`` is using the ``np_array``
because NumPy arrays are mutable, and ``s_true`` is using the ``np_array``
object as it's internal buffer.
However, this aliasing of ``np_array`` and ``s_true`` is not guaranteed to occur,
......@@ -137,15 +137,15 @@ But both of these calls might create copies of the internal memory.
The reason that ``borrow=True`` might still make a copy is that the internal
representation of a shared variable might not be what you expect. When you
create a shared variable by passing a numpy array for example, then ``get_value()``
must return a numpy array too. That's how Theano can make the GPU use
create a shared variable by passing a NumPy array for example, then ``get_value()``
must return a NumPy array too. That's how Theano can make the GPU use
transparent. But when you are using a GPU (or in the future perhaps a remote machine), then the numpy.ndarray
is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to
``get_value``. It will never cast the internal object (always return in
constant time), but might return various datatypes depending on contextual
factors (e.g. the compute device, the dtype of the numpy array).
factors (e.g. the compute device, the dtype of the NumPy array).
.. code-block:: python
......
......@@ -11,9 +11,9 @@ IfElse vs Switch
- Both Ops build a condition over symbolic variables.
- ``IfElse`` takes a `boolean` condition and two variables as inputs.
- ``Switch`` takes a `tensor` as condition and two variables as inputs.
``switch`` is an elementwise operation and it is more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables ``ifelse`` is lazy and only
evaluates one variable respect to the condition.
``switch`` is an elementwise operation and is thus more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables, ``ifelse`` is lazy and only
evaluates one variable with respect to the condition.
**Example**
......@@ -52,8 +52,8 @@ IfElse vs Switch
f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic)
In this example, IfElse Op spends less time (about an half) than Switch
since it computes only one variable instead of both.
In this example, the ``IfElse`` Op spends less time (about half as much) than ``Switch``
since it computes only one variable out of the two.
.. code-block:: python
......@@ -62,10 +62,10 @@ since it computes only one variable instead of both.
time spent evaluating one value 0.3500 sec
Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both variables and take the same computation
time as ``switch``. The linker is not currently set by default to 'cvm' but
it will be in a near future.
Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
variables and take the same computation time as ``switch``. Although the linker
is not currently set by default to 'cvm', it will be in the near future.
There is not an optimization automatically replacing a ``switch`` with a
There is no automatic optimization replacing a ``switch`` with a
broadcasted scalar to an ``ifelse``, as this is not always faster. See
this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
......@@ -22,7 +22,7 @@ Using Test Values
-----------------
As of v.0.4.0, Theano has a new mechanism by which graphs are executed
on-the-fly, before a theano.function is ever compiled. Since optimizations
on-the-fly, before a ``theano.function`` is ever compiled. Since optimizations
haven't been applied at this stage, it is easier for the user to locate the
source of some bug. This functionality is enabled through the config flag
``theano.config.compute_test_value``. Its use is best shown through the
......@@ -131,12 +131,12 @@ The compute_test_value mechanism works as follows:
`compute_test_value` can take the following values:
* ``off``: default behavior. This debugging mechanism is inactive.
* ``raise``: compute test values on the fly. Any variable for which a test
* ``off``: Default behavior. This debugging mechanism is inactive.
* ``raise``: Compute test values on the fly. Any variable for which a test
value is required, but not provided by the user, is treated as an error. An
exception is raised accordingly.
* ``warn``: idem, but a warning is issued instead of an Exception.
* ``ignore``: silently ignore the computation of intermediate test values, if a
* ``warn``: Idem, but a warning is issued instead of an Exception.
* ``ignore``: Silently ignore the computation of intermediate test values, if a
variable is missing a test value.
.. note::
......@@ -181,6 +181,8 @@ precise inspection of what's being computed where, when, and how, see the
How do I Print a Graph (before or after compilation)?
----------------------------------------------------------
.. TODO: dead links in the next paragraph
Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways:
......@@ -203,8 +205,14 @@ Apply nodes, and which Ops are eating up your CPU cycles.
Tips:
* use the flags floatX=float32 to use float32 instead of float64 for the theano type matrix(),vector(),...(if you used dmatrix, dvector() they stay at float64).
* Check that in the profile mode that there is no Dot operation and you're multiplying two matrices of the same type. Dot should be optimized to dot22 when the inputs are matrices and of the same type. This can happen when using floatX=float32 and something in the graph makes one of the inputs float64.
* use the flags ``floatX=float32`` to use *float32* instead of *float64*
for the theano type matrix(),vector(),...(if you used dmatrix, dvector()
they stay at *float64*).
* Check that in the profile mode that there is no Dot operation and you're
multiplying two matrices of the same type. Dot should be optimized to
dot22 when the inputs are matrices and of the same type. This can happen
when using floatX=float32 and something in the graph makes one of the
inputs *float64*.
.. _faq_wraplinker:
......@@ -239,7 +247,7 @@ along with its position in the graph, the arguments to the ``perform`` or
Admittedly, this may be a huge amount of
output to read through if you are using big tensors... but you can choose to
put logic inside of the print_eval function that would, for example, only
put logic inside of the *print_eval* function that would, for example, only
print something out if a certain kind of Op was used, at a certain program
position, or if a particular value shows up in one of the inputs or outputs.
Use your imagination :)
......@@ -247,7 +255,7 @@ Use your imagination :)
.. TODO: documentation for link.WrapLinkerMany
This can be a really powerful debugging tool.
Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all!
Note the call to *fn* inside the call to *print_eval*; without it, the graph wouldn't get computed at all!
How to Use pdb ?
----------------
......@@ -296,7 +304,7 @@ of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line
tells us about the Op that caused the exception. In this case it's a "mul"
involving Variables name "a" and "b". But suppose we instead had an
involving variables with names "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name.
After learning a few things about the graph structure in Theano, we can use
......@@ -329,7 +337,7 @@ explore around the graph.
That graph is purely symbolic (no data, just symbols to manipulate it
abstractly). To get information about the actual parameters, you explore the
"thunks" objects, which bind the storage for the inputs (and outputs) with
"thunk" objects, which bind the storage for the inputs (and outputs) with
the function itself (a "thunk" is a concept related to closures). Here, to
get the current node's first input's shape, you'd therefore do "p
thunk.inputs[0][0].shape", which prints out "(3, 4)".
......
......@@ -5,6 +5,14 @@
More Examples
=============
At this point it would be wise to begin familiarizing yourself
more systematically with Theano's fundamental objects and operations by browsing
this section of the library: :ref:`libdoc_basic_tensor`.
As the tutorial unfolds, you should also gradually acquaint yourself with the other
relevant areas of the library and with the relevant subjects of the documentation
entrance page.
Logistic Function
=================
......@@ -82,7 +90,7 @@ squared difference between two matrices ``a`` and ``b`` at the same time:
shortcut for allocating symbolic variables that we will often use in the
tutorials.
When we use the function, it will return the three variables (the printing
When we use the function f, it returns the three variables (the printing
was reformatted for readability):
>>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
......@@ -119,7 +127,7 @@ give a default value of 1 for ``y`` by creating a ``Param`` instance with
its ``default`` field set to 1.
Inputs with default values must follow inputs without default
values (like python's functions). There can be multiple inputs with default values. These parameters can
values (like Python's functions). There can be multiple inputs with default values. These parameters can
be set positionally or by name, as in standard Python:
......@@ -146,10 +154,13 @@ array(33.0)
attributes (set by ``dscalars`` in the example above) and *these* are the
names of the keyword parameters in the functions that we build. This is
the mechanism at work in ``Param(y, default=1)``. In the case of ``Param(w,
default=2, name='w_by_name')``, we override the symbolic variable's name
default=2, name='w_by_name')``. We override the symbolic variable's name
attribute with a name to be used for this function.
You may like to see :ref:`Function<usingfunction>` in the library for more detail.
.. _functionstateexample:
Using Shared Variables
......@@ -172,9 +183,9 @@ internal state, and returns the old state value.
>>> accumulator = function([inc], state, updates=[(state, state+inc)])
This code introduces a few new concepts. The ``shared`` function constructs
so-called :term:`shared variables <shared variable>`.
These are hybrid symbolic and non-symbolic
variables. Shared variables can be used in symbolic expressions just like
so-called :ref:`shared variables<libdoc_compile_shared>`.
These are hybrid symbolic and non-symbolic variables whose value may be shared
between multiple functions. Shared variables can be used in symbolic expressions just like
the objects returned by ``dmatrices(...)`` but they also have an internal
value, that defines the value taken by this symbolic variable in *all* the
functions that use it. It is called a *shared* variable because its value is
......@@ -189,7 +200,7 @@ will replace the ``.value`` of each shared variable with the result of the
corresponding expression". Above, our accumulator replaces the ``state``'s value with the sum
of the state and the increment amount.
Anyway, let's try it out!
Let's try it out!
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_8
......@@ -214,7 +225,7 @@ array(-1)
array(2)
As we mentioned above, you can define more than one function to use the same
shared variable. These functions can both update the value.
shared variable. These functions can all update the value.
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_8
......@@ -226,7 +237,7 @@ array(2)
array(0)
You might be wondering why the updates mechanism exists. You can always
achieve a similar thing by returning the new expressions, and working with
achieve a similar result by returning the new expressions, and working with
them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
......@@ -278,7 +289,9 @@ RandomStream object (a random number generator) for each such
variable, and draw from it as necessary. We will call this sort of
sequence of random numbers a *random stream*. *Random streams* are at
their core shared variables, so the observations on shared variables
hold here as well.
hold here as well. Theanos's random objects are defined and implemented in
:ref:`RandomStreams<libdoc_tensor_shared_randomstreams>` and, at a lower level,
in :ref:`RandomStreamsBase<libdoc_tensor_raw_random>`.
Brief Example
-------------
......@@ -301,7 +314,9 @@ Here's a brief example. The setup code is:
Here, 'rv_u' represents a random stream of 2x2 matrices of draws from a uniform
distribution. Likewise, 'rv_n' represents a random stream of 2x2 matrices of
draws from a normal distribution. The distributions that are implemented are
defined in :class:`RandomStreams`.
defined in :class:`RandomStreams` and, at a lower level, in :ref:`raw_random<_libdoc_tensor_raw_random>`.
.. TODO: repair the latter reference on RandomStreams
Now let's use these objects. If we call f(), we get random uniform numbers.
The internal state of the random number generator is automatically updated,
......@@ -312,7 +327,7 @@ so we get different random numbers every time.
When we add the extra argument ``no_default_updates=True`` to
``function`` (as in ``g``), then the random number generator state is
not affected by calling the returned function. So for example, calling
not affected by calling the returned function. So, for example, calling
``g`` multiple times will return the same numbers.
>>> g_val0 = g() # different numbers from f_val0 and f_val1
......@@ -374,7 +389,7 @@ There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
A Real Example: Logistic Regression
===================================
The preceding elements are put to work in this more realistic example. It will be used repeatedly.
The preceding elements are featured in this more realistic example. It will be used repeatedly.
.. code-block:: python
......@@ -401,7 +416,8 @@ The preceding elements are put to work in this more realistic example. It will b
prediction = p_1 > 0.5 # The prediction thresholded
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of cost
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost:
# we shall return to this
# Compile
train = theano.function(
......
......@@ -119,7 +119,7 @@ includes your op.
The :func:`__str__` method is useful in order to provide a more meaningful
string representation of your Op.
The :func:`R_op` method is needed if you want `theano.tensor.Rop` to
The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
work with your op.
Op Example
......@@ -211,16 +211,16 @@ exception. You can use the `assert` keyword to automatically raise an
**Testing the infer_shape**
When a class inherits from the ``InferShapeTester`` class, it gets the
`self._compile_and_check` method that tests the Op ``infer_shape``
``self._compile_and_check`` method that tests the Op ``infer_shape``
method. It tests that the Op gets optimized out of the graph if only
the shape of the output is needed and not the output
itself. Additionally, it checks that such an optimized graph computes
the correct shape, by comparing it to the actual shape of the computed
output.
`self._compile_and_check` compiles a Theano function. It takes as
``self._compile_and_check`` compiles a Theano function. It takes as
parameters the lists of input and output Theano variables, as would be
provided to theano.function, and a list of real values to pass to the
provided to ``theano.function``, and a list of real values to pass to the
compiled function (don't use shapes that are symmetric, e.g. (3, 3),
as they can easily to hide errors). It also takes the Op class to
verify that no Ops of that type appear in the shape-optimized graph.
......@@ -264,6 +264,8 @@ the multiplication by 2).
**Testing the Rop**
.. TODO: repair defective links in the following paragraph
The class :class:`RopLop_checker`, give the functions
:func:`RopLop_checker.check_mat_rop_lop`,
:func:`RopLop_checker.check_rop_lop` and
......@@ -316,6 +318,9 @@ if the NVIDIA driver works correctly with our sum reduction code on the
GPU.
A more extensive discussion than this section's may be found in the advanced
tutorial :ref:`Extending Theano<extending>`
-------------------------------------------
**Exercise**
......
.. _gpu_data_convert:
===================================
PyCUDA/CUDAMat/gnumpy compatibility
PyCUDA/CUDAMat/Gnumpy compatibility
===================================
PyCUDA
......@@ -10,7 +10,7 @@ PyCUDA
Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports
*strides*. It also only supports the float32 dtype. PyCUDA's implementation
*strides*. It also only supports the *float32* dtype. PyCUDA's implementation
is called GPUArray and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes.
......@@ -21,20 +21,20 @@ use both objects in the same script.
Transfer
--------
You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and
from CudaNdarray. The functions `to_cudandarray(x, copyif=False)` and
`to_gpuarray(x)` return a new object that occupies the same memory space
You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
``to_gpuarray(x)`` return a new object that occupies the same memory space
as the original. Otherwise it raises a ValueError. Because GPUArrays don't
support *strides*, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set `copyif=True` in
`to_gpuarray`.
memory region. If you want this behavior, set ``copyif=True`` in
``to_gpuarray``.
Compiling with PyCUDA
---------------------
You can use PyCUDA to compile CUDA functions that work directly on
CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py`:
CudaNdarrays. Here is an example from the file ``theano/misc/tests/test_pycuda_theano_simple.py``:
.. code-block:: python
......@@ -73,10 +73,10 @@ CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_th
assert (numpy.asarray(dest) == a * b).all()
Theano op using PyCUDA function
Theano Op using a PyCUDA function
-------------------------------
You can use a GPU function compiled with PyCUDA in a Theano op. Here is an example:
You can use a GPU function compiled with PyCUDA in a Theano op:
.. code-block:: python
......@@ -120,15 +120,15 @@ You can use a GPU function compiled with PyCUDA in a Theano op. Here is an examp
CUDAMat
=======
There are functions for conversion between CUDAMats and Theano CudaNdArray objects.
They obey the same principles as PyCUDA's functions and can be found in
There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects.
They obey the same principles as Theano's PyCUDA functions and can be found in
theano.misc.cudamat_utils.py
WARNING: There is a strange problem associated with stride/shape with those converters.
To work, the test needs a transpose and reshape...
In order to work, the test needs a transpose and reshape...
gnumpy
Gnumpy
======
There are conversion functions between gnumpy garray objects and Theano CudaNdArrays.
They are also similar to PyCUDA's and can be found in theano.misc.gnumpy_utils.py
There are conversion functions between Gnumpy ``garray`` objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in theano.misc.gnumpy_utils.py.
......@@ -33,7 +33,7 @@ array(8.0)
>>> f(94.2)
array(188.40000000000001)
In the example above, we can see from ``pp(gy)`` that we are computing
In this example, we can see from ``pp(gy)`` that we are computing
the correct symbolic gradient.
``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as
``x ** 2`` and fill it with 1.0.
......@@ -72,10 +72,10 @@ array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]])
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
the theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation
(as the expression return by ``T.grad`` will be optimized during compilation), even for
function with many inputs. ( see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation).
.. note::
......@@ -86,8 +86,11 @@ of symbolic differentiation).
``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor
of size 1). For more information on the semantics of the arguments of
``T.grad`` and details about the implementation, see :ref:`this <libdoc_gradient>`.
``T.grad`` and details about the implementation, see
:ref:`this<libdoc_gradient>` section of the library.
Additional information on the inner workings of differentiation may also be
found in the more advanced tutorial :ref:`Extending Theano<extending>`.
Computing the Jacobian
======================
......@@ -106,9 +109,8 @@ do is to loop over the entries in ``y`` and compute the gradient of
``scan`` is a generic op in Theano that allows writing in a symbolic
manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. For more
information about how to use this op, see :ref:`this <lib_scan>`.
effort is being done for improving the performance of ``scan``. We
shall return to ``scan`` in a moment.
>>> x = T.dvector('x')
>>> y = x**2
......@@ -125,7 +127,7 @@ at each step, we compute the gradient of element ``y[[i]`` with respect to
matrix which corresponds to the Jacobian.
.. note::
There are a few pitfalls to be aware of regarding ``T.grad``. One of them is that you
There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you
cannot re-write the above expression of the jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this
......@@ -170,8 +172,8 @@ performance gains. A description of one such algorithm can be found here:
Computation, 1994*
While in principle we would want Theano to identify these patterns automatically for us,
in paractice, implementing such optimizations in a generic manner is extremely
difficult. Therefore, we offer special functions dedicated to these tasks.
in practice, implementing such optimizations in a generic manner is extremely
difficult. Therefore, we provide special functions dedicated to these tasks.
R-operator
......@@ -182,7 +184,7 @@ vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such
expressions in terms of weight matrices, theano supports this more generic
expressions in terms of weight matrices, Theano supports this more generic
form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
you need to do something similar to this:
......@@ -202,11 +204,10 @@ array([ 2., 2.])
L-operator
----------
Similar to *R-operator*, the *L-operator* would compute a *row* vector times
the Jacobian. The mathematical forumla would be :math:`v \frac{\partial
f(x)}{\partial x}`. As for the *R-operator*, the *L-operator* is supported
for generic tensors (not only for vectors). Similarly, it can be implemented as
follows:
In similitude to the *R-operator*, the *L-operator* would compute a *row* vector times
the Jacobian. The mathematical formula would be :math:`v \frac{\partial
f(x)}{\partial x}`. The *L-operator* is also supported for generic tensors
(not only for vectors). Similarly, it can be implemented as follows:
>>> W = T.dmatrix('W')
>>> v = T.dvector('v')
......@@ -220,24 +221,24 @@ array([[ 0., 0.],
.. note::
`v`, the evaluation point, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the evaluation point needs to have the same shape
as the output, while for the *R-operator* the evaluation point should
have the same shape as the input parameter. Also, the results of these two
`v`, the point of evaluation, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the point of evaluation needs to have the same shape
as the output, whereas for the *R-operator* this point should
have the same shape as the input parameter. Furthermore, the results of these two
operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* is the same
as the output.
as the input parameter, while the result of the *R-operator* has a shape similar
to the output.
Hessian times a Vector
======================
If you need to compute the Hessian times a vector, you can make use of the
above defined operators to do it more efficiently than actually computing
above-defined operators to do it more efficiently than actually computing
the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will
give you the same result, though these options might exhibit differing performances.
Hence, we suggest profiling the methods before using either of the two:
Hence, we suggest profiling the methods before using either one of the two:
>>> x = T.dvector('x')
......@@ -266,14 +267,14 @@ Final Pointers
==============
* The ``grad`` function works symbolically: it takes and returns a Theano variable.
* The ``grad`` function works symbolically: it receives and returns a Theano variables.
* It can be compared to a macro since it can be applied repeatedly.
* ``grad`` can be compared to a macro since it can be applied repeatedly.
* It directly handles scalar costs only.
* Scalar costs only can be directly handled by ``grad``. Arrays are handled through repeated applications.
* Built-in functions allow to compute efficiently vector times Jacobian and vector times Hessian.
* Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*.
* Work is in progress on the optimizations required to compute efficiently the full
Jacobian and Hessian matrices and the Jacobian times vector.
Jacobian and Hessian matrices and the *Jacobian times vector* expression.
......@@ -5,17 +5,17 @@
Tutorial
========
Let us start an interactive session (e.g. ``python`` or ``ipython``) and import Theano.
Let us start an interactive session (e.g. with ``python`` or ``ipython``) and import Theano.
>>> from theano import *
Many of symbols you will need to use are in the ``tensor`` subpackage
of Theano. Let's import that subpackage under a handy name like
``T`` (many tutorials use this convention).
Several of the symbols you will need to use are in the ``tensor`` subpackage
of Theano. Let us import that subpackage under a handy name like
``T`` (the tutorials will frequently use this convention).
>>> import theano.tensor as T
If that worked you are ready for the tutorial, otherwise check your
If that succeeded you are ready for the tutorial, otherwise check your
installation (see :ref:`install`).
Throughout the tutorial, bear in mind that there is a :ref:`glossary` to help
......@@ -32,14 +32,14 @@ you out.
gradients
modes
loading_and_saving
aliasing
conditions
loop
sparse
using_gpu
gpu_data_convert
aliasing
shape_info
remarks
extending_theano
debug_faq
extending_theano
faq
......@@ -13,13 +13,13 @@ Scan
- You 'scan' a function along some input sequence, producing an output at each time-step.
- The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over for loops:
- Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over *for* loops:
- Number of iterations to be part of the symbolic graph.
- Minimizes GPU transfers (if GPU is involved).
- Computes gradients through sequential steps.
- Slightly faster than using a for loop in Python with a compiled Theano function.
- Slightly faster than using a *for* loop in Python with a compiled Theano function.
- Can lower the overall memory usage by detecting the actual amount of memory needed.
The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
......
......@@ -13,7 +13,7 @@ The ``config`` module contains several ``attributes`` that modify Theano's behav
attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only.
*As a rule, the attributes in this module should not be modified by user code.*
*As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
Theano's code comes with default values for these attributes, but you can
override them from your .theanorc file, and override those values in turn by
......@@ -25,7 +25,7 @@ The order of precedence is:
2. an assignment in :envvar:`THEANO_FLAGS`
3. an assignment in the .theanorc file (or the file indicated in :envvar:`THEANORC`)
You can print out the current/effective configuration at any time by printing
You can display the current/effective configuration at any time by printing
theano.config. For example, to see a list of all active configuration
variables, type this from the command-line:
......@@ -33,6 +33,9 @@ variables, type this from the command-line:
python -c 'import theano; print theano.config' | less
For more detail, see :ref:`Configuration <libdoc_config>` in the library.
-------------------------------------------
**Exercise**
......@@ -136,7 +139,7 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes,
but can identify many kinds of problems.
- ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information
......@@ -168,11 +171,11 @@ Here is a table to compare the different linkers.
============= ========= ================= ========= ===
linker gc [#gc]_ Raise error by op Overhead Definition
============= ========= ================= ========= ===
c|py [#cpy1]_ yes yes "+++" Try c code. If none exist for an op, use python
c|py [#cpy1]_ yes yes "+++" Try C code. If none exist for an op, use Python
c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only c code (if none available for an op, raise an error)
py yes yes "+++" Use only python code
c&py [#cpy2]_ no yes "+++++" Use c and python code
c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only Python code
c&py [#cpy2]_ no yes "+++++" Use c and Python code
ProfileMode no no "++++" Compute some extra profiling info
DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= ===
......@@ -186,6 +189,9 @@ DebugMode no yes VERY HIGH Make many checks on what
.. [#cpy2] Deprecated
For more detail, see :ref:`Mode<libdoc_compile_mode>` in the library.
.. _using_debugmode:
Using DebugMode
......@@ -230,13 +236,14 @@ In the example above, there is no way to guarantee that a future call to say,
If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
constructor arguments. See :ref:`DebugMode <debugMode>` for details.
The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``)
constructor arguments. The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE'``)
is quite strict.
For more detail, see :ref:`DebugMode<libdoc_compile_mode>` in the library.
.. _using_profilemode:
ProfileMode
===========
......@@ -352,7 +359,9 @@ the *Op-wise summary*, the execution time of all Apply nodes executing
the same Op are grouped together and the total execution time per Op
is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them).
Finally, notice that the ProfileMode also shows which Ops were running a C
implementation.
For more detail, see :ref:`ProfileMode<libdoc_compile_mode>` in the library.
......@@ -5,13 +5,13 @@ How Shape Information is Handled by Theano
==========================================
It is not possible to strictly enforce the shape of a Theano variable when
building a graph since the particular value provided for a parameter of the theano.function can change the
shape any Theano variable in its graph.
building a graph since the particular value provided at run-time for a parameter of a
Theano function may condition the shape of the Theano variables in its graph.
Currently, information regarding shape is used in two ways in Theano:
- When the exact output shape is known, to generate faster C code for
the 2d convolution on the CPU and GPU.
- To generate faster C code for the 2d convolution on the CPU and the GPU,
when the exact output shape is known in advance.
- To remove computations in the graph when we only want to know the
shape, but not the actual value of a variable. This is done with the
......@@ -39,7 +39,7 @@ output.
Shape Inference Problem
=======================
Theano propagates shape information in the graph. Sometimes this
Theano propagates information about shape in the graph. Sometimes this
can lead to errors. For example:
.. code-block:: python
......@@ -90,18 +90,18 @@ example), an inferred shape is computed directly, without executing
the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of ``join`` is done on the first
theano variable in the ``join`` computation and not on the other.
the example, the computation of the shape of the output of ``join`` is done only
based on the first input Theano variable, which leads to an error.
This might happen with other ops such as elemwise, dot, ...
Indeed, to make some optimizations (for speed or stability, for instance),
Indeed, to perform some optimizations (for speed or stability, for instance),
Theano assumes that the computation is correct and consistent
in the first place, as it does here.
You can detect those problem by running the code without this
You can detect those problems by running the code without this
optimization, with the Theano flag
`optimizer_excluding=local_shape_to_shape_i`. You can also have the
same effect by running in the mode FAST_COMPILE (it will not apply this
``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
same effect by running in the modes FAST_COMPILE (it will not apply this
optimization, nor most other optimizations) or DEBUG_MODE (it will test
before and after all optimizations (much slower)).
......@@ -109,21 +109,21 @@ before and after all optimizations (much slower)).
Specifing Exact Shape
=====================
Currently, specifying a shape is not as easy and flexible as we want and we plan some
Currently, specifying a shape is not as easy and flexible as we wish and we plan some
upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the `ConvOp` created
- You can pass the shape info directly to the ``ConvOp`` created
when calling conv2d. You simply add the parameters image_shape
and filter_shape to the call. They must be tuples of 4
elements. Ex:
elements. For example:
.. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape info anywhere in the
- You can use the SpecifyShape op to add shape information anywhere in the
graph. This allows to perform some optimizations. In the following example,
this allows to precompute the Theano function to a constant.
this makes it possible to precompute the Theano function to a constant.
.. code-block:: python
......@@ -137,7 +137,7 @@ upgrade. Here is the current state of what can be done:
Future Plans
============
- Add the parameter "constant shape" to theano.shared(). This is probably
The parameter "constant shape" will be added to ``theano.shared()``. This is probably
the most frequent case with ``shared variables``. This will make the code
simpler and will make it possible to check that the shape does not change when
updating the shared variable.
......@@ -11,8 +11,8 @@ Theano Graphs
Debugging or profiling code written in Theano is not that simple if you
do not know what goes on under the hood. This chapter is meant to
introduce you to a required minimum of the inner workings of Theano,
for more detail see :ref:`extending`.
introduce you to a required minimum of the inner workings of Theano.
For more detail see :ref:`extending`.
The first step in writing Theano code is to write down all mathematical
relations using symbolic placeholders (**variables**). When writing down
......@@ -122,6 +122,9 @@ Using the
these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs .
A coming section of this tutorial will address the topic of differentiation
in greater detail.
Optimizations
=============
......@@ -141,10 +144,15 @@ twice or reformulate parts of the graph to a GPU specific version.
For example, one (simple) optimization that Theano uses is to replace
the pattern :math:`\frac{xy}{y}` by :math:`x`.
Further information regarding the optimization
:ref:`process<optimization>` and the specific :ref:`optimizations<optimizations>` that are applicable
is respectively available in the library and on the entrance page of the documentation.
**Example**
Consider the following example of optimization:
Symbolic programming involves a change of paradigm: it will become clearer
as we apply it. Consider the following example of optimization:
>>> import theano
>>> a = theano.tensor.vector("a") # declare symbolic variable
......@@ -158,5 +166,3 @@ Consider the following example of optimization:
====================================================== =====================================================
.. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
====================================================== =====================================================
Symbolic programming involves a paradigm shift: it is best to use it in order to understand it.
......@@ -9,7 +9,7 @@ One of Theano's design goals is to specify computations at an
abstract level, so that the internal function compiler has a lot of flexibility
about how to carry out those computations. One of the ways we take advantage of
this flexibility is in carrying out calculations on an Nvidia graphics card when
there is a CUDA-enabled device in present in the computer.
there is a CUDA-enabled device present in the computer.
Setting Up CUDA
----------------
......@@ -54,7 +54,7 @@ file and run it.
The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to
make sure that the input `x` are stored on the graphics device.
make sure that the input `x` is stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. The GPU will not always produce the exact
......@@ -76,12 +76,12 @@ Note that for now GPU operations in Theano require floatX to be float32 (see als
Returning a Handle to Device-Allocated Data
-------------------------------------------
The speedup is not greater in the example above because the function is
The speedup is not greater in the preceding example because the function is
returning its result as a NumPy ndarray which has already been copied from the
device to the host for your convenience. This is what makes it so easy to swap in device=gpu, but
if you don't mind being less portable, you might prefer to see a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The gpu_from_host
Op means "copy the input from the host to the gpu" and it is optimized away
Op means "copy the input from the host to the GPU" and it is optimized away
after the T.exp(x) is replaced by a GPU version of exp().
.. If you modify this code, also change :
......@@ -135,7 +135,7 @@ To really get maximum performance in this simple example, we need to use an :cla
instance to tell Theano not to copy the output it returns to us. Theano allocates memory for
internal use like a working buffer, but by default it will never return a result that is
allocated in the working buffer. This is normally what you want, but our example is so simple
that it has the un-wanted side-effect of really slowing things down.
that it has the unwanted side-effect of really slowing things down.
..
TODO:
......@@ -207,7 +207,7 @@ what to expect right now:
dimension-shuffling and constant-time reshaping will be equally fast on GPU
as on CPU.
* Summation
over rows/columns of tensors can be a little slower on the GPU than on the CPU
over rows/columns of tensors can be a little slower on the GPU than on the CPU.
* Copying
of large quantities of data to and from a device is relatively slow, and
often cancels most of the advantage of one or two accelerated functions on
......@@ -219,10 +219,10 @@ Tips for Improving Performance on GPU
-------------------------------------
* Consider
adding ``floatX = float32`` to your .theanorc file if you plan to do a lot of
adding ``floatX=float32`` to your .theanorc file if you plan to do a lot of
GPU work.
* Prefer
constructors like 'matrix' 'vector' and 'scalar' to 'dmatrix', 'dvector' and
constructors like 'matrix', 'vector' and 'scalar' to 'dmatrix', 'dvector' and
'dscalar' because the former will give you float32 variables when
floatX=float32.
* Ensure
......@@ -238,9 +238,9 @@ Tips for Improving Performance on GPU
mode='PROFILE_MODE'. This should print some timing information at program
termination (atexit). Is time being used sensibly? If an Op or Apply is
taking more time than its share, then if you know something about GPU
programming have a look at how it's implemented in theano.sandbox.cuda.
programming, have a look at how it's implemented in theano.sandbox.cuda.
Check the line like 'Spent Xs(X%) in cpu Op, Xs(X%) in gpu Op and Xs(X%) transfert Op'
that can tell you if not enough of your graph is on the gpu or if there
that can tell you if not enough of your graph is on the GPU or if there
is too much memory transfert.
......@@ -302,9 +302,9 @@ Consider the logistic regression:
print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
print 'Used the GPU'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print 'ERROR, not able to tell if theano used the cpu or the GPU'
print train.maker.fgraph.toposort()
......@@ -354,7 +354,7 @@ What can be done to further increase the speed of the GPU version?
Software for Directly Programming a GPU
---------------------------------------
Leaving aside Theano which is a meta-programmer, there are:
Leaving aside Theano which is a meta-programmer, there is:
* CUDA: C extension by NVIDIA
......@@ -373,7 +373,7 @@ Leaving aside Theano which is a meta-programmer, there are:
* Convenience: Makes it easy to do GPU meta-programming from within Python. Helpful documentation.
(abstractions to compile low-level CUDA code from Python: ``pycuda.driver.SourceModule``).
(abstractions to compile low-level CUDA code from Python: ``pycuda.driver.SourceModule``)
* Completeness: Binding to all of CUDA's driver API.
......@@ -381,16 +381,16 @@ Leaving aside Theano which is a meta-programmer, there are:
* Speed: PyCUDA's base layer is written in C++.
* Memory management of GPU objects:
GPU memory buffer: \texttt{pycuda.gpuarray.GPUArray}.
* Good memory management of GPU objects:
Object cleanup tied to lifetime of objects (RAII, Resource Acquisition Is Initialization).
Object cleanup tied to lifetime of objects (RAII, 'Resource Acquisition Is Initialization').
Makes it much easier to write correct, leak- and crash-free code.
PyCUDA knows about dependencies (e.g. it won't detach from a context before all memory allocated in it is also freed).
(GPU memory buffer: \texttt{pycuda.gpuarray.GPUArray})
* PyOpenCL: PyCUDA for OpenCL
......@@ -431,8 +431,7 @@ Leaving aside Theano which is a meta-programmer, there are:
Run the preceding example.
Modify and execute it to work for a matrix of 20 x 10.
Modify and execute to work for a matrix of shape (20, 10).
-------------------------------------------
......@@ -497,13 +496,12 @@ To test it:
Run the preceding example.
Modify and execute the example to multiple two matrix: x * y.
Modify and execute to multiply two matrices: x * y.
Modify and execute the example to return 2 outputs: x + y and x - y.
Modify and execute to return two outputs: x + y and x - y.
(Currently, elemwise fusion generates computation with only 1 output.)
Modify and execute the example to support *stride* (i.e. so as not constrain the input to be c contiguous).
Modify and execute to support *stride* (i.e. so as not constrain the input to be c contiguous).
-------------------------------------------
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论