提交 dba02a39 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

Correct Theanos's tutorial: add cross-references, correct some more typos,…

Correct Theanos's tutorial: add cross-references, correct some more typos, improve style a little, correct the logical structure some more
上级 c86c72f4
...@@ -6,15 +6,15 @@ Extending Theano ...@@ -6,15 +6,15 @@ Extending Theano
================ ================
This documentation is for users who want to extend Theano with new Types, new This advanced tutorial is for users who want to extend Theano with new Types, new
Operations (Ops), and new graph optimizations. Operations (Ops), and new graph optimizations.
Along the way, it also introduces many aspects of how Theano works, so it is Along the way, it also introduces many aspects of how Theano works, so it is
also good for you if you are interested in getting more under the hood with also good for you if you are interested in getting more under the hood with
Theano itself. Theano itself.
Before tackling this tutorial, it is highly recommended to read the Before tackling this more advanced presentation, it is highly recommended to read the
:ref:`tutorial`. introductory :ref:`Tutorial<tutorial>`.
The first few pages will walk you through the definition of a new :ref:`type`, The first few pages will walk you through the definition of a new :ref:`type`,
``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We ``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We
......
.. _libdoc_compile_mode:
====================================== ======================================
:mod:`mode` -- controlling compilation :mod:`mode` -- controlling compilation
====================================== ======================================
......
.. currentmodule:: tensor .. currentmodule:: tensor
.. _libdoc_basic_tensor:
=========================== ===========================
Basic Tensor Functionality Basic Tensor Functionality
=========================== ===========================
......
...@@ -7,7 +7,7 @@ Baby Steps - Algebra ...@@ -7,7 +7,7 @@ Baby Steps - Algebra
Adding two Scalars Adding two Scalars
================== ==================
So, to get us started with Theano and get a feel of what we're working with, To get us started with Theano and get a feel of what we're working with,
let's make a simple function: add two numbers together. Here is how you do let's make a simple function: add two numbers together. Here is how you do
it: it:
...@@ -38,7 +38,7 @@ to add. Note that from now on, we will use the term ...@@ -38,7 +38,7 @@ to add. Note that from now on, we will use the term
If you are following along and typing into an interpreter, you may have If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function`` noticed that there was a slight delay in executing the ``function``
instruction. Behind the scenes, ``f`` was being compiled into C code. instruction. Behind the scene, ``f`` was being compiled into C code.
.. note: .. note:
...@@ -80,13 +80,14 @@ TensorType(float64, scalar) ...@@ -80,13 +80,14 @@ TensorType(float64, scalar)
>>> x.type is T.dscalar >>> x.type is T.dscalar
True True
You can learn more about the structures in Theano in :ref:`graphstructures`.
By calling ``T.dscalar`` with a string argument, you create a By calling ``T.dscalar`` with a string argument, you create a
*Variable* representing a floating-point scalar quantity with the *Variable* representing a floating-point scalar quantity with the
given name. If you provide no argument, the symbol will be unnamed. Names given name. If you provide no argument, the symbol will be unnamed. Names
are not required, but they can help debugging. are not required, but they can help debugging.
More will be said in a moment regarding Theano's inner structure. You
could also learn more by looking into :ref:`graphstructures`.
**Step 2** **Step 2**
...@@ -112,9 +113,8 @@ and giving ``z`` as output: ...@@ -112,9 +113,8 @@ and giving ``z`` as output:
The first argument to :func:`function <function.function>` is a list of Variables The first argument to :func:`function <function.function>` is a list of Variables
that will be provided as inputs to the function. The second argument that will be provided as inputs to the function. The second argument
is a single Variable *or* a list of Variables. For either case, the second is a single Variable *or* a list of Variables. For either case, the second
argument is what we want to see as output when we apply the function. argument is what we want to see as output when we apply the function. ``f`` may
then be used like a normal Python function.
``f`` may then be used like a normal Python function.
Adding two Matrices Adding two Matrices
...@@ -132,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and ...@@ -132,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and
>>> z = x + y >>> z = x + y
>>> f = function([x, y], z) >>> f = function([x, y], z)
``dmatrix`` is the Type for matrices of doubles. And then we can use ``dmatrix`` is the Type for matrices of doubles. Then we can use
our new function on 2D arrays: our new function on 2D arrays:
>>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]]) >>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])
array([[ 11., 22.], array([[ 11., 22.],
[ 33., 44.]]) [ 33., 44.]])
The variable is a numpy array. We can also use numpy arrays directly as The variable is a NumPy array. We can also use NumPy arrays directly as
inputs: inputs:
>>> import numpy >>> import numpy
...@@ -160,8 +160,8 @@ The following types are available: ...@@ -160,8 +160,8 @@ The following types are available:
* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4 * **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4 * **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4
The previous list is not exhaustive. A guide to all types compatible The previous list is not exhaustive and a guide to all types compatible
with numpy arrays may be found :ref:`here <libdoc_tensor_creation>`. with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.
.. note:: .. note::
......
...@@ -84,7 +84,7 @@ subsequently make to ``np_array`` have no effect on our shared variable. ...@@ -84,7 +84,7 @@ subsequently make to ``np_array`` have no effect on our shared variable.
If we are running this with the CPU as the device, If we are running this with the CPU as the device,
then changes we make to np_array *right away* will show up in then changes we make to np_array *right away* will show up in
``s_true.get_value()`` ``s_true.get_value()``
because numpy arrays are mutable, and ``s_true`` is using the ``np_array`` because NumPy arrays are mutable, and ``s_true`` is using the ``np_array``
object as it's internal buffer. object as it's internal buffer.
However, this aliasing of ``np_array`` and ``s_true`` is not guaranteed to occur, However, this aliasing of ``np_array`` and ``s_true`` is not guaranteed to occur,
...@@ -137,15 +137,15 @@ But both of these calls might create copies of the internal memory. ...@@ -137,15 +137,15 @@ But both of these calls might create copies of the internal memory.
The reason that ``borrow=True`` might still make a copy is that the internal The reason that ``borrow=True`` might still make a copy is that the internal
representation of a shared variable might not be what you expect. When you representation of a shared variable might not be what you expect. When you
create a shared variable by passing a numpy array for example, then ``get_value()`` create a shared variable by passing a NumPy array for example, then ``get_value()``
must return a numpy array too. That's how Theano can make the GPU use must return a NumPy array too. That's how Theano can make the GPU use
transparent. But when you are using a GPU (or in the future perhaps a remote machine), then the numpy.ndarray transparent. But when you are using a GPU (or in the future perhaps a remote machine), then the numpy.ndarray
is not the internal representation of your data. is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it* If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to then you should use the ``return_internal_type=True`` argument to
``get_value``. It will never cast the internal object (always return in ``get_value``. It will never cast the internal object (always return in
constant time), but might return various datatypes depending on contextual constant time), but might return various datatypes depending on contextual
factors (e.g. the compute device, the dtype of the numpy array). factors (e.g. the compute device, the dtype of the NumPy array).
.. code-block:: python .. code-block:: python
......
...@@ -11,9 +11,9 @@ IfElse vs Switch ...@@ -11,9 +11,9 @@ IfElse vs Switch
- Both Ops build a condition over symbolic variables. - Both Ops build a condition over symbolic variables.
- ``IfElse`` takes a `boolean` condition and two variables as inputs. - ``IfElse`` takes a `boolean` condition and two variables as inputs.
- ``Switch`` takes a `tensor` as condition and two variables as inputs. - ``Switch`` takes a `tensor` as condition and two variables as inputs.
``switch`` is an elementwise operation and it is more general than ``ifelse``. ``switch`` is an elementwise operation and is thus more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables ``ifelse`` is lazy and only - Whereas ``switch`` evaluates both 'output' variables, ``ifelse`` is lazy and only
evaluates one variable respect to the condition. evaluates one variable with respect to the condition.
**Example** **Example**
...@@ -52,8 +52,8 @@ IfElse vs Switch ...@@ -52,8 +52,8 @@ IfElse vs Switch
f_lazyifelse(val1, val2, big_mat1, big_mat2) f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic) print 'time spent evaluating one value %f sec'%(time.clock()-tic)
In this example, IfElse Op spends less time (about an half) than Switch In this example, the ``IfElse`` Op spends less time (about half as much) than ``Switch``
since it computes only one variable instead of both. since it computes only one variable out of the two.
.. code-block:: python .. code-block:: python
...@@ -62,10 +62,10 @@ since it computes only one variable instead of both. ...@@ -62,10 +62,10 @@ since it computes only one variable instead of both.
time spent evaluating one value 0.3500 sec time spent evaluating one value 0.3500 sec
Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both variables and take the same computation Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
time as ``switch``. The linker is not currently set by default to 'cvm' but variables and take the same computation time as ``switch``. Although the linker
it will be in a near future. is not currently set by default to 'cvm', it will be in the near future.
There is not an optimization automatically replacing a ``switch`` with a There is no automatic optimization replacing a ``switch`` with a
broadcasted scalar to an ``ifelse``, as this is not always faster. See broadcasted scalar to an ``ifelse``, as this is not always faster. See
this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_. this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
...@@ -22,7 +22,7 @@ Using Test Values ...@@ -22,7 +22,7 @@ Using Test Values
----------------- -----------------
As of v.0.4.0, Theano has a new mechanism by which graphs are executed As of v.0.4.0, Theano has a new mechanism by which graphs are executed
on-the-fly, before a theano.function is ever compiled. Since optimizations on-the-fly, before a ``theano.function`` is ever compiled. Since optimizations
haven't been applied at this stage, it is easier for the user to locate the haven't been applied at this stage, it is easier for the user to locate the
source of some bug. This functionality is enabled through the config flag source of some bug. This functionality is enabled through the config flag
``theano.config.compute_test_value``. Its use is best shown through the ``theano.config.compute_test_value``. Its use is best shown through the
...@@ -131,12 +131,12 @@ The compute_test_value mechanism works as follows: ...@@ -131,12 +131,12 @@ The compute_test_value mechanism works as follows:
`compute_test_value` can take the following values: `compute_test_value` can take the following values:
* ``off``: default behavior. This debugging mechanism is inactive. * ``off``: Default behavior. This debugging mechanism is inactive.
* ``raise``: compute test values on the fly. Any variable for which a test * ``raise``: Compute test values on the fly. Any variable for which a test
value is required, but not provided by the user, is treated as an error. An value is required, but not provided by the user, is treated as an error. An
exception is raised accordingly. exception is raised accordingly.
* ``warn``: idem, but a warning is issued instead of an Exception. * ``warn``: Idem, but a warning is issued instead of an Exception.
* ``ignore``: silently ignore the computation of intermediate test values, if a * ``ignore``: Silently ignore the computation of intermediate test values, if a
variable is missing a test value. variable is missing a test value.
.. note:: .. note::
...@@ -181,6 +181,8 @@ precise inspection of what's being computed where, when, and how, see the ...@@ -181,6 +181,8 @@ precise inspection of what's being computed where, when, and how, see the
How do I Print a Graph (before or after compilation)? How do I Print a Graph (before or after compilation)?
---------------------------------------------------------- ----------------------------------------------------------
.. TODO: dead links in the next paragraph
Theano provides two functions (:func:`theano.pp` and Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after :func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways: compilation. These two functions print expression graphs in different ways:
...@@ -203,8 +205,14 @@ Apply nodes, and which Ops are eating up your CPU cycles. ...@@ -203,8 +205,14 @@ Apply nodes, and which Ops are eating up your CPU cycles.
Tips: Tips:
* use the flags floatX=float32 to use float32 instead of float64 for the theano type matrix(),vector(),...(if you used dmatrix, dvector() they stay at float64). * use the flags ``floatX=float32`` to use *float32* instead of *float64*
* Check that in the profile mode that there is no Dot operation and you're multiplying two matrices of the same type. Dot should be optimized to dot22 when the inputs are matrices and of the same type. This can happen when using floatX=float32 and something in the graph makes one of the inputs float64. for the theano type matrix(),vector(),...(if you used dmatrix, dvector()
they stay at *float64*).
* Check that in the profile mode that there is no Dot operation and you're
multiplying two matrices of the same type. Dot should be optimized to
dot22 when the inputs are matrices and of the same type. This can happen
when using floatX=float32 and something in the graph makes one of the
inputs *float64*.
.. _faq_wraplinker: .. _faq_wraplinker:
...@@ -239,7 +247,7 @@ along with its position in the graph, the arguments to the ``perform`` or ...@@ -239,7 +247,7 @@ along with its position in the graph, the arguments to the ``perform`` or
Admittedly, this may be a huge amount of Admittedly, this may be a huge amount of
output to read through if you are using big tensors... but you can choose to output to read through if you are using big tensors... but you can choose to
put logic inside of the print_eval function that would, for example, only put logic inside of the *print_eval* function that would, for example, only
print something out if a certain kind of Op was used, at a certain program print something out if a certain kind of Op was used, at a certain program
position, or if a particular value shows up in one of the inputs or outputs. position, or if a particular value shows up in one of the inputs or outputs.
Use your imagination :) Use your imagination :)
...@@ -247,7 +255,7 @@ Use your imagination :) ...@@ -247,7 +255,7 @@ Use your imagination :)
.. TODO: documentation for link.WrapLinkerMany .. TODO: documentation for link.WrapLinkerMany
This can be a really powerful debugging tool. This can be a really powerful debugging tool.
Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all! Note the call to *fn* inside the call to *print_eval*; without it, the graph wouldn't get computed at all!
How to Use pdb ? How to Use pdb ?
---------------- ----------------
...@@ -296,7 +304,7 @@ of the error. There's the script where the compiled function was called -- ...@@ -296,7 +304,7 @@ of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line might originate from ops in these modules, not this script. The last line
tells us about the Op that caused the exception. In this case it's a "mul" tells us about the Op that caused the exception. In this case it's a "mul"
involving Variables name "a" and "b". But suppose we instead had an involving variables with names "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name. intermediate result to which we hadn't given a name.
After learning a few things about the graph structure in Theano, we can use After learning a few things about the graph structure in Theano, we can use
...@@ -329,7 +337,7 @@ explore around the graph. ...@@ -329,7 +337,7 @@ explore around the graph.
That graph is purely symbolic (no data, just symbols to manipulate it That graph is purely symbolic (no data, just symbols to manipulate it
abstractly). To get information about the actual parameters, you explore the abstractly). To get information about the actual parameters, you explore the
"thunks" objects, which bind the storage for the inputs (and outputs) with "thunk" objects, which bind the storage for the inputs (and outputs) with
the function itself (a "thunk" is a concept related to closures). Here, to the function itself (a "thunk" is a concept related to closures). Here, to
get the current node's first input's shape, you'd therefore do "p get the current node's first input's shape, you'd therefore do "p
thunk.inputs[0][0].shape", which prints out "(3, 4)". thunk.inputs[0][0].shape", which prints out "(3, 4)".
......
...@@ -5,6 +5,14 @@ ...@@ -5,6 +5,14 @@
More Examples More Examples
============= =============
At this point it would be wise to begin familiarizing yourself
more systematically with Theano's fundamental objects and operations by browsing
this section of the library: :ref:`libdoc_basic_tensor`.
As the tutorial unfolds, you should also gradually acquaint yourself with the other
relevant areas of the library and with the relevant subjects of the documentation
entrance page.
Logistic Function Logistic Function
================= =================
...@@ -82,7 +90,7 @@ squared difference between two matrices ``a`` and ``b`` at the same time: ...@@ -82,7 +90,7 @@ squared difference between two matrices ``a`` and ``b`` at the same time:
shortcut for allocating symbolic variables that we will often use in the shortcut for allocating symbolic variables that we will often use in the
tutorials. tutorials.
When we use the function, it will return the three variables (the printing When we use the function f, it returns the three variables (the printing
was reformatted for readability): was reformatted for readability):
>>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]]) >>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
...@@ -119,7 +127,7 @@ give a default value of 1 for ``y`` by creating a ``Param`` instance with ...@@ -119,7 +127,7 @@ give a default value of 1 for ``y`` by creating a ``Param`` instance with
its ``default`` field set to 1. its ``default`` field set to 1.
Inputs with default values must follow inputs without default Inputs with default values must follow inputs without default
values (like python's functions). There can be multiple inputs with default values. These parameters can values (like Python's functions). There can be multiple inputs with default values. These parameters can
be set positionally or by name, as in standard Python: be set positionally or by name, as in standard Python:
...@@ -146,10 +154,13 @@ array(33.0) ...@@ -146,10 +154,13 @@ array(33.0)
attributes (set by ``dscalars`` in the example above) and *these* are the attributes (set by ``dscalars`` in the example above) and *these* are the
names of the keyword parameters in the functions that we build. This is names of the keyword parameters in the functions that we build. This is
the mechanism at work in ``Param(y, default=1)``. In the case of ``Param(w, the mechanism at work in ``Param(y, default=1)``. In the case of ``Param(w,
default=2, name='w_by_name')``, we override the symbolic variable's name default=2, name='w_by_name')``. We override the symbolic variable's name
attribute with a name to be used for this function. attribute with a name to be used for this function.
You may like to see :ref:`Function<usingfunction>` in the library for more detail.
.. _functionstateexample: .. _functionstateexample:
Using Shared Variables Using Shared Variables
...@@ -172,9 +183,9 @@ internal state, and returns the old state value. ...@@ -172,9 +183,9 @@ internal state, and returns the old state value.
>>> accumulator = function([inc], state, updates=[(state, state+inc)]) >>> accumulator = function([inc], state, updates=[(state, state+inc)])
This code introduces a few new concepts. The ``shared`` function constructs This code introduces a few new concepts. The ``shared`` function constructs
so-called :term:`shared variables <shared variable>`. so-called :ref:`shared variables<libdoc_compile_shared>`.
These are hybrid symbolic and non-symbolic These are hybrid symbolic and non-symbolic variables whose value may be shared
variables. Shared variables can be used in symbolic expressions just like between multiple functions. Shared variables can be used in symbolic expressions just like
the objects returned by ``dmatrices(...)`` but they also have an internal the objects returned by ``dmatrices(...)`` but they also have an internal
value, that defines the value taken by this symbolic variable in *all* the value, that defines the value taken by this symbolic variable in *all* the
functions that use it. It is called a *shared* variable because its value is functions that use it. It is called a *shared* variable because its value is
...@@ -189,7 +200,7 @@ will replace the ``.value`` of each shared variable with the result of the ...@@ -189,7 +200,7 @@ will replace the ``.value`` of each shared variable with the result of the
corresponding expression". Above, our accumulator replaces the ``state``'s value with the sum corresponding expression". Above, our accumulator replaces the ``state``'s value with the sum
of the state and the increment amount. of the state and the increment amount.
Anyway, let's try it out! Let's try it out!
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_8 .. theano/tests/test_tutorial.py:T_examples.test_examples_8
...@@ -214,7 +225,7 @@ array(-1) ...@@ -214,7 +225,7 @@ array(-1)
array(2) array(2)
As we mentioned above, you can define more than one function to use the same As we mentioned above, you can define more than one function to use the same
shared variable. These functions can both update the value. shared variable. These functions can all update the value.
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_8 .. theano/tests/test_tutorial.py:T_examples.test_examples_8
...@@ -226,7 +237,7 @@ array(2) ...@@ -226,7 +237,7 @@ array(2)
array(0) array(0)
You might be wondering why the updates mechanism exists. You can always You might be wondering why the updates mechanism exists. You can always
achieve a similar thing by returning the new expressions, and working with achieve a similar result by returning the new expressions, and working with
them in NumPy as usual. The updates mechanism can be a syntactic convenience, them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
...@@ -278,7 +289,9 @@ RandomStream object (a random number generator) for each such ...@@ -278,7 +289,9 @@ RandomStream object (a random number generator) for each such
variable, and draw from it as necessary. We will call this sort of variable, and draw from it as necessary. We will call this sort of
sequence of random numbers a *random stream*. *Random streams* are at sequence of random numbers a *random stream*. *Random streams* are at
their core shared variables, so the observations on shared variables their core shared variables, so the observations on shared variables
hold here as well. hold here as well. Theanos's random objects are defined and implemented in
:ref:`RandomStreams<libdoc_tensor_shared_randomstreams>` and, at a lower level,
in :ref:`RandomStreamsBase<libdoc_tensor_raw_random>`.
Brief Example Brief Example
------------- -------------
...@@ -301,7 +314,9 @@ Here's a brief example. The setup code is: ...@@ -301,7 +314,9 @@ Here's a brief example. The setup code is:
Here, 'rv_u' represents a random stream of 2x2 matrices of draws from a uniform Here, 'rv_u' represents a random stream of 2x2 matrices of draws from a uniform
distribution. Likewise, 'rv_n' represents a random stream of 2x2 matrices of distribution. Likewise, 'rv_n' represents a random stream of 2x2 matrices of
draws from a normal distribution. The distributions that are implemented are draws from a normal distribution. The distributions that are implemented are
defined in :class:`RandomStreams`. defined in :class:`RandomStreams` and, at a lower level, in :ref:`raw_random<_libdoc_tensor_raw_random>`.
.. TODO: repair the latter reference on RandomStreams
Now let's use these objects. If we call f(), we get random uniform numbers. Now let's use these objects. If we call f(), we get random uniform numbers.
The internal state of the random number generator is automatically updated, The internal state of the random number generator is automatically updated,
...@@ -312,7 +327,7 @@ so we get different random numbers every time. ...@@ -312,7 +327,7 @@ so we get different random numbers every time.
When we add the extra argument ``no_default_updates=True`` to When we add the extra argument ``no_default_updates=True`` to
``function`` (as in ``g``), then the random number generator state is ``function`` (as in ``g``), then the random number generator state is
not affected by calling the returned function. So for example, calling not affected by calling the returned function. So, for example, calling
``g`` multiple times will return the same numbers. ``g`` multiple times will return the same numbers.
>>> g_val0 = g() # different numbers from f_val0 and f_val1 >>> g_val0 = g() # different numbers from f_val0 and f_val1
...@@ -374,7 +389,7 @@ There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`. ...@@ -374,7 +389,7 @@ There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
A Real Example: Logistic Regression A Real Example: Logistic Regression
=================================== ===================================
The preceding elements are put to work in this more realistic example. It will be used repeatedly. The preceding elements are featured in this more realistic example. It will be used repeatedly.
.. code-block:: python .. code-block:: python
...@@ -401,7 +416,8 @@ The preceding elements are put to work in this more realistic example. It will b ...@@ -401,7 +416,8 @@ The preceding elements are put to work in this more realistic example. It will b
prediction = p_1 > 0.5 # The prediction thresholded prediction = p_1 > 0.5 # The prediction thresholded
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of cost gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost:
# we shall return to this
# Compile # Compile
train = theano.function( train = theano.function(
......
...@@ -119,7 +119,7 @@ includes your op. ...@@ -119,7 +119,7 @@ includes your op.
The :func:`__str__` method is useful in order to provide a more meaningful The :func:`__str__` method is useful in order to provide a more meaningful
string representation of your Op. string representation of your Op.
The :func:`R_op` method is needed if you want `theano.tensor.Rop` to The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
work with your op. work with your op.
Op Example Op Example
...@@ -211,16 +211,16 @@ exception. You can use the `assert` keyword to automatically raise an ...@@ -211,16 +211,16 @@ exception. You can use the `assert` keyword to automatically raise an
**Testing the infer_shape** **Testing the infer_shape**
When a class inherits from the ``InferShapeTester`` class, it gets the When a class inherits from the ``InferShapeTester`` class, it gets the
`self._compile_and_check` method that tests the Op ``infer_shape`` ``self._compile_and_check`` method that tests the Op ``infer_shape``
method. It tests that the Op gets optimized out of the graph if only method. It tests that the Op gets optimized out of the graph if only
the shape of the output is needed and not the output the shape of the output is needed and not the output
itself. Additionally, it checks that such an optimized graph computes itself. Additionally, it checks that such an optimized graph computes
the correct shape, by comparing it to the actual shape of the computed the correct shape, by comparing it to the actual shape of the computed
output. output.
`self._compile_and_check` compiles a Theano function. It takes as ``self._compile_and_check`` compiles a Theano function. It takes as
parameters the lists of input and output Theano variables, as would be parameters the lists of input and output Theano variables, as would be
provided to theano.function, and a list of real values to pass to the provided to ``theano.function``, and a list of real values to pass to the
compiled function (don't use shapes that are symmetric, e.g. (3, 3), compiled function (don't use shapes that are symmetric, e.g. (3, 3),
as they can easily to hide errors). It also takes the Op class to as they can easily to hide errors). It also takes the Op class to
verify that no Ops of that type appear in the shape-optimized graph. verify that no Ops of that type appear in the shape-optimized graph.
...@@ -264,6 +264,8 @@ the multiplication by 2). ...@@ -264,6 +264,8 @@ the multiplication by 2).
**Testing the Rop** **Testing the Rop**
.. TODO: repair defective links in the following paragraph
The class :class:`RopLop_checker`, give the functions The class :class:`RopLop_checker`, give the functions
:func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_mat_rop_lop`,
:func:`RopLop_checker.check_rop_lop` and :func:`RopLop_checker.check_rop_lop` and
...@@ -316,6 +318,9 @@ if the NVIDIA driver works correctly with our sum reduction code on the ...@@ -316,6 +318,9 @@ if the NVIDIA driver works correctly with our sum reduction code on the
GPU. GPU.
A more extensive discussion than this section's may be found in the advanced
tutorial :ref:`Extending Theano<extending>`
------------------------------------------- -------------------------------------------
**Exercise** **Exercise**
......
.. _gpu_data_convert: .. _gpu_data_convert:
=================================== ===================================
PyCUDA/CUDAMat/gnumpy compatibility PyCUDA/CUDAMat/Gnumpy compatibility
=================================== ===================================
PyCUDA PyCUDA
...@@ -10,7 +10,7 @@ PyCUDA ...@@ -10,7 +10,7 @@ PyCUDA
Currently, PyCUDA and Theano have different objects to store GPU Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features. data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports Theano's implementation is called CudaNdarray and supports
*strides*. It also only supports the float32 dtype. PyCUDA's implementation *strides*. It also only supports the *float32* dtype. PyCUDA's implementation
is called GPUArray and doesn't support *strides*. However, it can deal with is called GPUArray and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes. all NumPy and CUDA dtypes.
...@@ -21,20 +21,20 @@ use both objects in the same script. ...@@ -21,20 +21,20 @@ use both objects in the same script.
Transfer Transfer
-------- --------
You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
from CudaNdarray. The functions `to_cudandarray(x, copyif=False)` and from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
`to_gpuarray(x)` return a new object that occupies the same memory space ``to_gpuarray(x)`` return a new object that occupies the same memory space
as the original. Otherwise it raises a ValueError. Because GPUArrays don't as the original. Otherwise it raises a ValueError. Because GPUArrays don't
support *strides*, if the CudaNdarray is strided, we could copy it to support *strides*, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set `copyif=True` in memory region. If you want this behavior, set ``copyif=True`` in
`to_gpuarray`. ``to_gpuarray``.
Compiling with PyCUDA Compiling with PyCUDA
--------------------- ---------------------
You can use PyCUDA to compile CUDA functions that work directly on You can use PyCUDA to compile CUDA functions that work directly on
CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py`: CudaNdarrays. Here is an example from the file ``theano/misc/tests/test_pycuda_theano_simple.py``:
.. code-block:: python .. code-block:: python
...@@ -73,10 +73,10 @@ CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_th ...@@ -73,10 +73,10 @@ CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_th
assert (numpy.asarray(dest) == a * b).all() assert (numpy.asarray(dest) == a * b).all()
Theano op using PyCUDA function Theano Op using a PyCUDA function
------------------------------- -------------------------------
You can use a GPU function compiled with PyCUDA in a Theano op. Here is an example: You can use a GPU function compiled with PyCUDA in a Theano op:
.. code-block:: python .. code-block:: python
...@@ -120,15 +120,15 @@ You can use a GPU function compiled with PyCUDA in a Theano op. Here is an examp ...@@ -120,15 +120,15 @@ You can use a GPU function compiled with PyCUDA in a Theano op. Here is an examp
CUDAMat CUDAMat
======= =======
There are functions for conversion between CUDAMats and Theano CudaNdArray objects. There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects.
They obey the same principles as PyCUDA's functions and can be found in They obey the same principles as Theano's PyCUDA functions and can be found in
theano.misc.cudamat_utils.py theano.misc.cudamat_utils.py
WARNING: There is a strange problem associated with stride/shape with those converters. WARNING: There is a strange problem associated with stride/shape with those converters.
To work, the test needs a transpose and reshape... In order to work, the test needs a transpose and reshape...
gnumpy Gnumpy
====== ======
There are conversion functions between gnumpy garray objects and Theano CudaNdArrays. There are conversion functions between Gnumpy ``garray`` objects and Theano CudaNdArray objects.
They are also similar to PyCUDA's and can be found in theano.misc.gnumpy_utils.py They are also similar to Theano's PyCUDA functions and can be found in theano.misc.gnumpy_utils.py.
...@@ -33,7 +33,7 @@ array(8.0) ...@@ -33,7 +33,7 @@ array(8.0)
>>> f(94.2) >>> f(94.2)
array(188.40000000000001) array(188.40000000000001)
In the example above, we can see from ``pp(gy)`` that we are computing In this example, we can see from ``pp(gy)`` that we are computing
the correct symbolic gradient. the correct symbolic gradient.
``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as ``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as
``x ** 2`` and fill it with 1.0. ``x ** 2`` and fill it with 1.0.
...@@ -72,10 +72,10 @@ array([[ 0.25 , 0.19661193], ...@@ -72,10 +72,10 @@ array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]]) [ 0.19661193, 0.10499359]])
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
the theano expression for computing :math:`\frac{\partial s}{\partial w}`. In the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation this way Theano can be used for doing **efficient** symbolic differentiation
(as the expression return by ``T.grad`` will be optimized during compilation), even for (as the expression return by ``T.grad`` will be optimized during compilation), even for
function with many inputs. ( see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation). of symbolic differentiation).
.. note:: .. note::
...@@ -86,8 +86,11 @@ of symbolic differentiation). ...@@ -86,8 +86,11 @@ of symbolic differentiation).
``T.grad`` with respect to the *i*-th element of the list given as second argument. ``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor The first argument of ``T.grad`` has to be a scalar (a tensor
of size 1). For more information on the semantics of the arguments of of size 1). For more information on the semantics of the arguments of
``T.grad`` and details about the implementation, see :ref:`this <libdoc_gradient>`. ``T.grad`` and details about the implementation, see
:ref:`this<libdoc_gradient>` section of the library.
Additional information on the inner workings of differentiation may also be
found in the more advanced tutorial :ref:`Extending Theano<extending>`.
Computing the Jacobian Computing the Jacobian
====================== ======================
...@@ -106,9 +109,8 @@ do is to loop over the entries in ``y`` and compute the gradient of ...@@ -106,9 +109,8 @@ do is to loop over the entries in ``y`` and compute the gradient of
``scan`` is a generic op in Theano that allows writing in a symbolic ``scan`` is a generic op in Theano that allows writing in a symbolic
manner all kinds of recurrent equations. While creating manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task, symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. For more effort is being done for improving the performance of ``scan``. We
information about how to use this op, see :ref:`this <lib_scan>`. shall return to ``scan`` in a moment.
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> y = x**2 >>> y = x**2
...@@ -125,7 +127,7 @@ at each step, we compute the gradient of element ``y[[i]`` with respect to ...@@ -125,7 +127,7 @@ at each step, we compute the gradient of element ``y[[i]`` with respect to
matrix which corresponds to the Jacobian. matrix which corresponds to the Jacobian.
.. note:: .. note::
There are a few pitfalls to be aware of regarding ``T.grad``. One of them is that you There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you
cannot re-write the above expression of the jacobian as cannot re-write the above expression of the jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y, ``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this non_sequences=x)``, even though from the documentation of scan this
...@@ -170,8 +172,8 @@ performance gains. A description of one such algorithm can be found here: ...@@ -170,8 +172,8 @@ performance gains. A description of one such algorithm can be found here:
Computation, 1994* Computation, 1994*
While in principle we would want Theano to identify these patterns automatically for us, While in principle we would want Theano to identify these patterns automatically for us,
in paractice, implementing such optimizations in a generic manner is extremely in practice, implementing such optimizations in a generic manner is extremely
difficult. Therefore, we offer special functions dedicated to these tasks. difficult. Therefore, we provide special functions dedicated to these tasks.
R-operator R-operator
...@@ -182,7 +184,7 @@ vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation ...@@ -182,7 +184,7 @@ vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in can be extended even for `x` being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such of tensor product. Because in practice we end up needing to compute such
expressions in terms of weight matrices, theano supports this more generic expressions in terms of weight matrices, Theano supports this more generic
form of the operation. In order to evaluate the *R-operation* of form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v`` expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
you need to do something similar to this: you need to do something similar to this:
...@@ -202,11 +204,10 @@ array([ 2., 2.]) ...@@ -202,11 +204,10 @@ array([ 2., 2.])
L-operator L-operator
---------- ----------
Similar to *R-operator*, the *L-operator* would compute a *row* vector times In similitude to the *R-operator*, the *L-operator* would compute a *row* vector times
the Jacobian. The mathematical forumla would be :math:`v \frac{\partial the Jacobian. The mathematical formula would be :math:`v \frac{\partial
f(x)}{\partial x}`. As for the *R-operator*, the *L-operator* is supported f(x)}{\partial x}`. The *L-operator* is also supported for generic tensors
for generic tensors (not only for vectors). Similarly, it can be implemented as (not only for vectors). Similarly, it can be implemented as follows:
follows:
>>> W = T.dmatrix('W') >>> W = T.dmatrix('W')
>>> v = T.dvector('v') >>> v = T.dvector('v')
...@@ -220,24 +221,24 @@ array([[ 0., 0.], ...@@ -220,24 +221,24 @@ array([[ 0., 0.],
.. note:: .. note::
`v`, the evaluation point, differs between the *L-operator* and the *R-operator*. `v`, the point of evaluation, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the evaluation point needs to have the same shape For the *L-operator*, the point of evaluation needs to have the same shape
as the output, while for the *R-operator* the evaluation point should as the output, whereas for the *R-operator* this point should
have the same shape as the input parameter. Also, the results of these two have the same shape as the input parameter. Furthermore, the results of these two
operations differ. The result of the *L-operator* is of the same shape operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* is the same as the input parameter, while the result of the *R-operator* has a shape similar
as the output. to the output.
Hessian times a Vector Hessian times a Vector
====================== ======================
If you need to compute the Hessian times a vector, you can make use of the If you need to compute the Hessian times a vector, you can make use of the
above defined operators to do it more efficiently than actually computing above-defined operators to do it more efficiently than actually computing
the exact Hessian and then performing the product. Due to the symmetry of the the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will Hessian matrix, you have two options that will
give you the same result, though these options might exhibit differing performances. give you the same result, though these options might exhibit differing performances.
Hence, we suggest profiling the methods before using either of the two: Hence, we suggest profiling the methods before using either one of the two:
>>> x = T.dvector('x') >>> x = T.dvector('x')
...@@ -266,14 +267,14 @@ Final Pointers ...@@ -266,14 +267,14 @@ Final Pointers
============== ==============
* The ``grad`` function works symbolically: it takes and returns a Theano variable. * The ``grad`` function works symbolically: it receives and returns a Theano variables.
* It can be compared to a macro since it can be applied repeatedly. * ``grad`` can be compared to a macro since it can be applied repeatedly.
* It directly handles scalar costs only. * Scalar costs only can be directly handled by ``grad``. Arrays are handled through repeated applications.
* Built-in functions allow to compute efficiently vector times Jacobian and vector times Hessian. * Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*.
* Work is in progress on the optimizations required to compute efficiently the full * Work is in progress on the optimizations required to compute efficiently the full
Jacobian and Hessian matrices and the Jacobian times vector. Jacobian and Hessian matrices and the *Jacobian times vector* expression.
...@@ -5,17 +5,17 @@ ...@@ -5,17 +5,17 @@
Tutorial Tutorial
======== ========
Let us start an interactive session (e.g. ``python`` or ``ipython``) and import Theano. Let us start an interactive session (e.g. with ``python`` or ``ipython``) and import Theano.
>>> from theano import * >>> from theano import *
Many of symbols you will need to use are in the ``tensor`` subpackage Several of the symbols you will need to use are in the ``tensor`` subpackage
of Theano. Let's import that subpackage under a handy name like of Theano. Let us import that subpackage under a handy name like
``T`` (many tutorials use this convention). ``T`` (the tutorials will frequently use this convention).
>>> import theano.tensor as T >>> import theano.tensor as T
If that worked you are ready for the tutorial, otherwise check your If that succeeded you are ready for the tutorial, otherwise check your
installation (see :ref:`install`). installation (see :ref:`install`).
Throughout the tutorial, bear in mind that there is a :ref:`glossary` to help Throughout the tutorial, bear in mind that there is a :ref:`glossary` to help
...@@ -32,14 +32,14 @@ you out. ...@@ -32,14 +32,14 @@ you out.
gradients gradients
modes modes
loading_and_saving loading_and_saving
aliasing
conditions conditions
loop loop
sparse sparse
using_gpu using_gpu
gpu_data_convert gpu_data_convert
aliasing
shape_info shape_info
remarks remarks
extending_theano
debug_faq debug_faq
extending_theano
faq faq
...@@ -13,13 +13,13 @@ Scan ...@@ -13,13 +13,13 @@ Scan
- You 'scan' a function along some input sequence, producing an output at each time-step. - You 'scan' a function along some input sequence, producing an output at each time-step.
- The function can see the *previous K time-steps* of your function. - The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``. - ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping. - Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over for loops: - Advantages of using ``scan`` over *for* loops:
- Number of iterations to be part of the symbolic graph. - Number of iterations to be part of the symbolic graph.
- Minimizes GPU transfers (if GPU is involved). - Minimizes GPU transfers (if GPU is involved).
- Computes gradients through sequential steps. - Computes gradients through sequential steps.
- Slightly faster than using a for loop in Python with a compiled Theano function. - Slightly faster than using a *for* loop in Python with a compiled Theano function.
- Can lower the overall memory usage by detecting the actual amount of memory needed. - Can lower the overall memory usage by detecting the actual amount of memory needed.
The full documentation can be found in the library: :ref:`Scan <lib_scan>`. The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
......
...@@ -13,7 +13,7 @@ The ``config`` module contains several ``attributes`` that modify Theano's behav ...@@ -13,7 +13,7 @@ The ``config`` module contains several ``attributes`` that modify Theano's behav
attributes are examined during the import of the ``theano`` module and several are assumed to be attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only. read-only.
*As a rule, the attributes in this module should not be modified by user code.* *As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
Theano's code comes with default values for these attributes, but you can Theano's code comes with default values for these attributes, but you can
override them from your .theanorc file, and override those values in turn by override them from your .theanorc file, and override those values in turn by
...@@ -25,7 +25,7 @@ The order of precedence is: ...@@ -25,7 +25,7 @@ The order of precedence is:
2. an assignment in :envvar:`THEANO_FLAGS` 2. an assignment in :envvar:`THEANO_FLAGS`
3. an assignment in the .theanorc file (or the file indicated in :envvar:`THEANORC`) 3. an assignment in the .theanorc file (or the file indicated in :envvar:`THEANORC`)
You can print out the current/effective configuration at any time by printing You can display the current/effective configuration at any time by printing
theano.config. For example, to see a list of all active configuration theano.config. For example, to see a list of all active configuration
variables, type this from the command-line: variables, type this from the command-line:
...@@ -33,6 +33,9 @@ variables, type this from the command-line: ...@@ -33,6 +33,9 @@ variables, type this from the command-line:
python -c 'import theano; print theano.config' | less python -c 'import theano; print theano.config' | less
For more detail, see :ref:`Configuration <libdoc_config>` in the library.
------------------------------------------- -------------------------------------------
**Exercise** **Exercise**
...@@ -136,7 +139,7 @@ Theano defines the following modes by name: ...@@ -136,7 +139,7 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations. - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible. - ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python - ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes, implementations. This mode can take much longer than the other modes,
but can identify many kinds of problems. but can identify many kinds of problems.
- ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information - ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information
...@@ -168,11 +171,11 @@ Here is a table to compare the different linkers. ...@@ -168,11 +171,11 @@ Here is a table to compare the different linkers.
============= ========= ================= ========= === ============= ========= ================= ========= ===
linker gc [#gc]_ Raise error by op Overhead Definition linker gc [#gc]_ Raise error by op Overhead Definition
============= ========= ================= ========= === ============= ========= ================= ========= ===
c|py [#cpy1]_ yes yes "+++" Try c code. If none exist for an op, use python c|py [#cpy1]_ yes yes "+++" Try C code. If none exist for an op, use Python
c|py_nogc no yes "++" As c|py, but without gc c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only c code (if none available for an op, raise an error) c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only python code py yes yes "+++" Use only Python code
c&py [#cpy2]_ no yes "+++++" Use c and python code c&py [#cpy2]_ no yes "+++++" Use c and Python code
ProfileMode no no "++++" Compute some extra profiling info ProfileMode no no "++++" Compute some extra profiling info
DebugMode no yes VERY HIGH Make many checks on what Theano computes DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= === ============= ========= ================= ========= ===
...@@ -186,6 +189,9 @@ DebugMode no yes VERY HIGH Make many checks on what ...@@ -186,6 +189,9 @@ DebugMode no yes VERY HIGH Make many checks on what
.. [#cpy2] Deprecated .. [#cpy2] Deprecated
For more detail, see :ref:`Mode<libdoc_compile_mode>` in the library.
.. _using_debugmode: .. _using_debugmode:
Using DebugMode Using DebugMode
...@@ -230,13 +236,14 @@ In the example above, there is no way to guarantee that a future call to say, ...@@ -230,13 +236,14 @@ In the example above, there is no way to guarantee that a future call to say,
If you instantiate DebugMode using the constructor (see :class:`DebugMode`) If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
constructor arguments. See :ref:`DebugMode <debugMode>` for details. constructor arguments. The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE'``)
The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``)
is quite strict. is quite strict.
For more detail, see :ref:`DebugMode<libdoc_compile_mode>` in the library.
.. _using_profilemode: .. _using_profilemode:
ProfileMode ProfileMode
=========== ===========
...@@ -352,7 +359,9 @@ the *Op-wise summary*, the execution time of all Apply nodes executing ...@@ -352,7 +359,9 @@ the *Op-wise summary*, the execution time of all Apply nodes executing
the same Op are grouped together and the total execution time per Op the same Op are grouped together and the total execution time per Op
is shown (so if you use ``dot`` twice, you will see only one entry is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them). there corresponding to the sum of the time spent in each of them).
Finally, notice that the ProfileMode also shows which Ops were running a C Finally, notice that the ProfileMode also shows which Ops were running a C
implementation. implementation.
For more detail, see :ref:`ProfileMode<libdoc_compile_mode>` in the library.
...@@ -5,13 +5,13 @@ How Shape Information is Handled by Theano ...@@ -5,13 +5,13 @@ How Shape Information is Handled by Theano
========================================== ==========================================
It is not possible to strictly enforce the shape of a Theano variable when It is not possible to strictly enforce the shape of a Theano variable when
building a graph since the particular value provided for a parameter of the theano.function can change the building a graph since the particular value provided at run-time for a parameter of a
shape any Theano variable in its graph. Theano function may condition the shape of the Theano variables in its graph.
Currently, information regarding shape is used in two ways in Theano: Currently, information regarding shape is used in two ways in Theano:
- When the exact output shape is known, to generate faster C code for - To generate faster C code for the 2d convolution on the CPU and the GPU,
the 2d convolution on the CPU and GPU. when the exact output shape is known in advance.
- To remove computations in the graph when we only want to know the - To remove computations in the graph when we only want to know the
shape, but not the actual value of a variable. This is done with the shape, but not the actual value of a variable. This is done with the
...@@ -39,7 +39,7 @@ output. ...@@ -39,7 +39,7 @@ output.
Shape Inference Problem Shape Inference Problem
======================= =======================
Theano propagates shape information in the graph. Sometimes this Theano propagates information about shape in the graph. Sometimes this
can lead to errors. For example: can lead to errors. For example:
.. code-block:: python .. code-block:: python
...@@ -90,18 +90,18 @@ example), an inferred shape is computed directly, without executing ...@@ -90,18 +90,18 @@ example), an inferred shape is computed directly, without executing
the computation itself (there is no ``join`` in the first output or debugprint). the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can also hide errors. In This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of ``join`` is done on the first the example, the computation of the shape of the output of ``join`` is done only
theano variable in the ``join`` computation and not on the other. based on the first input Theano variable, which leads to an error.
This might happen with other ops such as elemwise, dot, ... This might happen with other ops such as elemwise, dot, ...
Indeed, to make some optimizations (for speed or stability, for instance), Indeed, to perform some optimizations (for speed or stability, for instance),
Theano assumes that the computation is correct and consistent Theano assumes that the computation is correct and consistent
in the first place, as it does here. in the first place, as it does here.
You can detect those problem by running the code without this You can detect those problems by running the code without this
optimization, with the Theano flag optimization, with the Theano flag
`optimizer_excluding=local_shape_to_shape_i`. You can also have the ``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
same effect by running in the mode FAST_COMPILE (it will not apply this same effect by running in the modes FAST_COMPILE (it will not apply this
optimization, nor most other optimizations) or DEBUG_MODE (it will test optimization, nor most other optimizations) or DEBUG_MODE (it will test
before and after all optimizations (much slower)). before and after all optimizations (much slower)).
...@@ -109,21 +109,21 @@ before and after all optimizations (much slower)). ...@@ -109,21 +109,21 @@ before and after all optimizations (much slower)).
Specifing Exact Shape Specifing Exact Shape
===================== =====================
Currently, specifying a shape is not as easy and flexible as we want and we plan some Currently, specifying a shape is not as easy and flexible as we wish and we plan some
upgrade. Here is the current state of what can be done: upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the `ConvOp` created - You can pass the shape info directly to the ``ConvOp`` created
when calling conv2d. You simply add the parameters image_shape when calling conv2d. You simply add the parameters image_shape
and filter_shape to the call. They must be tuples of 4 and filter_shape to the call. They must be tuples of 4
elements. Ex: elements. For example:
.. code-block:: python .. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4)) theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape info anywhere in the - You can use the SpecifyShape op to add shape information anywhere in the
graph. This allows to perform some optimizations. In the following example, graph. This allows to perform some optimizations. In the following example,
this allows to precompute the Theano function to a constant. this makes it possible to precompute the Theano function to a constant.
.. code-block:: python .. code-block:: python
...@@ -137,7 +137,7 @@ upgrade. Here is the current state of what can be done: ...@@ -137,7 +137,7 @@ upgrade. Here is the current state of what can be done:
Future Plans Future Plans
============ ============
- Add the parameter "constant shape" to theano.shared(). This is probably The parameter "constant shape" will be added to ``theano.shared()``. This is probably
the most frequent case with ``shared variables``. This will make the code the most frequent case with ``shared variables``. This will make the code
simpler and will make it possible to check that the shape does not change when simpler and will make it possible to check that the shape does not change when
updating the shared variable. updating the shared variable.
...@@ -11,8 +11,8 @@ Theano Graphs ...@@ -11,8 +11,8 @@ Theano Graphs
Debugging or profiling code written in Theano is not that simple if you Debugging or profiling code written in Theano is not that simple if you
do not know what goes on under the hood. This chapter is meant to do not know what goes on under the hood. This chapter is meant to
introduce you to a required minimum of the inner workings of Theano, introduce you to a required minimum of the inner workings of Theano.
for more detail see :ref:`extending`. For more detail see :ref:`extending`.
The first step in writing Theano code is to write down all mathematical The first step in writing Theano code is to write down all mathematical
relations using symbolic placeholders (**variables**). When writing down relations using symbolic placeholders (**variables**). When writing down
...@@ -103,7 +103,7 @@ same shape as x. This is done by using the op ``DimShuffle`` : ...@@ -103,7 +103,7 @@ same shape as x. This is done by using the op ``DimShuffle`` :
Starting from this graph structure it is easier to understand how Starting from this graph structure it is easier to understand how
*automatic differentiation* proceeds and how the symbolic relations *automatic differentiation* proceeds and how the symbolic relations
can be *optimized* for performance or stability. can be *optimized* for performance or stability.
Automatic Differentiation Automatic Differentiation
...@@ -122,6 +122,9 @@ Using the ...@@ -122,6 +122,9 @@ Using the
these gradients can be composed in order to obtain the expression of the these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs . gradient of the graph's output with respect to the graph's inputs .
A coming section of this tutorial will address the topic of differentiation
in greater detail.
Optimizations Optimizations
============= =============
...@@ -141,10 +144,15 @@ twice or reformulate parts of the graph to a GPU specific version. ...@@ -141,10 +144,15 @@ twice or reformulate parts of the graph to a GPU specific version.
For example, one (simple) optimization that Theano uses is to replace For example, one (simple) optimization that Theano uses is to replace
the pattern :math:`\frac{xy}{y}` by :math:`x`. the pattern :math:`\frac{xy}{y}` by :math:`x`.
Further information regarding the optimization
:ref:`process<optimization>` and the specific :ref:`optimizations<optimizations>` that are applicable
is respectively available in the library and on the entrance page of the documentation.
**Example** **Example**
Consider the following example of optimization: Symbolic programming involves a change of paradigm: it will become clearer
as we apply it. Consider the following example of optimization:
>>> import theano >>> import theano
>>> a = theano.tensor.vector("a") # declare symbolic variable >>> a = theano.tensor.vector("a") # declare symbolic variable
...@@ -158,5 +166,3 @@ Consider the following example of optimization: ...@@ -158,5 +166,3 @@ Consider the following example of optimization:
====================================================== ===================================================== ====================================================== =====================================================
.. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png .. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
====================================================== ===================================================== ====================================================== =====================================================
Symbolic programming involves a paradigm shift: it is best to use it in order to understand it.
...@@ -9,7 +9,7 @@ One of Theano's design goals is to specify computations at an ...@@ -9,7 +9,7 @@ One of Theano's design goals is to specify computations at an
abstract level, so that the internal function compiler has a lot of flexibility abstract level, so that the internal function compiler has a lot of flexibility
about how to carry out those computations. One of the ways we take advantage of about how to carry out those computations. One of the ways we take advantage of
this flexibility is in carrying out calculations on an Nvidia graphics card when this flexibility is in carrying out calculations on an Nvidia graphics card when
there is a CUDA-enabled device in present in the computer. there is a CUDA-enabled device present in the computer.
Setting Up CUDA Setting Up CUDA
---------------- ----------------
...@@ -54,7 +54,7 @@ file and run it. ...@@ -54,7 +54,7 @@ file and run it.
The program just computes the exp() of a bunch of random numbers. The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to Note that we use the `shared` function to
make sure that the input `x` are stored on the graphics device. make sure that the input `x` is stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds, If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. The GPU will not always produce the exact whereas on the GPU it takes just over 0.4 seconds. The GPU will not always produce the exact
...@@ -76,12 +76,12 @@ Note that for now GPU operations in Theano require floatX to be float32 (see als ...@@ -76,12 +76,12 @@ Note that for now GPU operations in Theano require floatX to be float32 (see als
Returning a Handle to Device-Allocated Data Returning a Handle to Device-Allocated Data
------------------------------------------- -------------------------------------------
The speedup is not greater in the example above because the function is The speedup is not greater in the preceding example because the function is
returning its result as a NumPy ndarray which has already been copied from the returning its result as a NumPy ndarray which has already been copied from the
device to the host for your convenience. This is what makes it so easy to swap in device=gpu, but device to the host for your convenience. This is what makes it so easy to swap in device=gpu, but
if you don't mind being less portable, you might prefer to see a bigger speedup by changing if you don't mind being less portable, you might prefer to see a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The gpu_from_host the graph to express a computation with a GPU-stored result. The gpu_from_host
Op means "copy the input from the host to the gpu" and it is optimized away Op means "copy the input from the host to the GPU" and it is optimized away
after the T.exp(x) is replaced by a GPU version of exp(). after the T.exp(x) is replaced by a GPU version of exp().
.. If you modify this code, also change : .. If you modify this code, also change :
...@@ -135,7 +135,7 @@ To really get maximum performance in this simple example, we need to use an :cla ...@@ -135,7 +135,7 @@ To really get maximum performance in this simple example, we need to use an :cla
instance to tell Theano not to copy the output it returns to us. Theano allocates memory for instance to tell Theano not to copy the output it returns to us. Theano allocates memory for
internal use like a working buffer, but by default it will never return a result that is internal use like a working buffer, but by default it will never return a result that is
allocated in the working buffer. This is normally what you want, but our example is so simple allocated in the working buffer. This is normally what you want, but our example is so simple
that it has the un-wanted side-effect of really slowing things down. that it has the unwanted side-effect of really slowing things down.
.. ..
TODO: TODO:
...@@ -207,7 +207,7 @@ what to expect right now: ...@@ -207,7 +207,7 @@ what to expect right now:
dimension-shuffling and constant-time reshaping will be equally fast on GPU dimension-shuffling and constant-time reshaping will be equally fast on GPU
as on CPU. as on CPU.
* Summation * Summation
over rows/columns of tensors can be a little slower on the GPU than on the CPU over rows/columns of tensors can be a little slower on the GPU than on the CPU.
* Copying * Copying
of large quantities of data to and from a device is relatively slow, and of large quantities of data to and from a device is relatively slow, and
often cancels most of the advantage of one or two accelerated functions on often cancels most of the advantage of one or two accelerated functions on
...@@ -219,10 +219,10 @@ Tips for Improving Performance on GPU ...@@ -219,10 +219,10 @@ Tips for Improving Performance on GPU
------------------------------------- -------------------------------------
* Consider * Consider
adding ``floatX = float32`` to your .theanorc file if you plan to do a lot of adding ``floatX=float32`` to your .theanorc file if you plan to do a lot of
GPU work. GPU work.
* Prefer * Prefer
constructors like 'matrix' 'vector' and 'scalar' to 'dmatrix', 'dvector' and constructors like 'matrix', 'vector' and 'scalar' to 'dmatrix', 'dvector' and
'dscalar' because the former will give you float32 variables when 'dscalar' because the former will give you float32 variables when
floatX=float32. floatX=float32.
* Ensure * Ensure
...@@ -238,9 +238,9 @@ Tips for Improving Performance on GPU ...@@ -238,9 +238,9 @@ Tips for Improving Performance on GPU
mode='PROFILE_MODE'. This should print some timing information at program mode='PROFILE_MODE'. This should print some timing information at program
termination (atexit). Is time being used sensibly? If an Op or Apply is termination (atexit). Is time being used sensibly? If an Op or Apply is
taking more time than its share, then if you know something about GPU taking more time than its share, then if you know something about GPU
programming have a look at how it's implemented in theano.sandbox.cuda. programming, have a look at how it's implemented in theano.sandbox.cuda.
Check the line like 'Spent Xs(X%) in cpu Op, Xs(X%) in gpu Op and Xs(X%) transfert Op' Check the line like 'Spent Xs(X%) in cpu Op, Xs(X%) in gpu Op and Xs(X%) transfert Op'
that can tell you if not enough of your graph is on the gpu or if there that can tell you if not enough of your graph is on the GPU or if there
is too much memory transfert. is too much memory transfert.
...@@ -302,9 +302,9 @@ Consider the logistic regression: ...@@ -302,9 +302,9 @@ Consider the logistic regression:
print 'Used the cpu' print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]): train.maker.fgraph.toposort()]):
print 'Used the gpu' print 'Used the GPU'
else: else:
print 'ERROR, not able to tell if theano used the cpu or the gpu' print 'ERROR, not able to tell if theano used the cpu or the GPU'
print train.maker.fgraph.toposort() print train.maker.fgraph.toposort()
...@@ -354,7 +354,7 @@ What can be done to further increase the speed of the GPU version? ...@@ -354,7 +354,7 @@ What can be done to further increase the speed of the GPU version?
Software for Directly Programming a GPU Software for Directly Programming a GPU
--------------------------------------- ---------------------------------------
Leaving aside Theano which is a meta-programmer, there are: Leaving aside Theano which is a meta-programmer, there is:
* CUDA: C extension by NVIDIA * CUDA: C extension by NVIDIA
...@@ -373,7 +373,7 @@ Leaving aside Theano which is a meta-programmer, there are: ...@@ -373,7 +373,7 @@ Leaving aside Theano which is a meta-programmer, there are:
* Convenience: Makes it easy to do GPU meta-programming from within Python. Helpful documentation. * Convenience: Makes it easy to do GPU meta-programming from within Python. Helpful documentation.
(abstractions to compile low-level CUDA code from Python: ``pycuda.driver.SourceModule``). (abstractions to compile low-level CUDA code from Python: ``pycuda.driver.SourceModule``)
* Completeness: Binding to all of CUDA's driver API. * Completeness: Binding to all of CUDA's driver API.
...@@ -381,16 +381,16 @@ Leaving aside Theano which is a meta-programmer, there are: ...@@ -381,16 +381,16 @@ Leaving aside Theano which is a meta-programmer, there are:
* Speed: PyCUDA's base layer is written in C++. * Speed: PyCUDA's base layer is written in C++.
* Memory management of GPU objects: * Good memory management of GPU objects:
GPU memory buffer: \texttt{pycuda.gpuarray.GPUArray}.
Object cleanup tied to lifetime of objects (RAII, Resource Acquisition Is Initialization). Object cleanup tied to lifetime of objects (RAII, 'Resource Acquisition Is Initialization').
Makes it much easier to write correct, leak- and crash-free code. Makes it much easier to write correct, leak- and crash-free code.
PyCUDA knows about dependencies (e.g. it won't detach from a context before all memory allocated in it is also freed). PyCUDA knows about dependencies (e.g. it won't detach from a context before all memory allocated in it is also freed).
(GPU memory buffer: \texttt{pycuda.gpuarray.GPUArray})
* PyOpenCL: PyCUDA for OpenCL * PyOpenCL: PyCUDA for OpenCL
...@@ -431,8 +431,7 @@ Leaving aside Theano which is a meta-programmer, there are: ...@@ -431,8 +431,7 @@ Leaving aside Theano which is a meta-programmer, there are:
Run the preceding example. Run the preceding example.
Modify and execute it to work for a matrix of 20 x 10. Modify and execute to work for a matrix of shape (20, 10).
------------------------------------------- -------------------------------------------
...@@ -497,13 +496,12 @@ To test it: ...@@ -497,13 +496,12 @@ To test it:
Run the preceding example. Run the preceding example.
Modify and execute the example to multiple two matrix: x * y. Modify and execute to multiply two matrices: x * y.
Modify and execute the example to return 2 outputs: x + y and x - y. Modify and execute to return two outputs: x + y and x - y.
(Currently, elemwise fusion generates computation with only 1 output.) (Currently, elemwise fusion generates computation with only 1 output.)
Modify and execute the example to support *stride* (i.e. so as not constrain the input to be c contiguous). Modify and execute to support *stride* (i.e. so as not constrain the input to be c contiguous).
------------------------------------------- -------------------------------------------
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论