提交 edfd9f24 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

Correct Theano's tutorial: one more round of corrections

上级 dba02a39
...@@ -33,12 +33,12 @@ Let's break this down into several steps. The first step is to define ...@@ -33,12 +33,12 @@ Let's break this down into several steps. The first step is to define
two symbols (*Variables*) representing the quantities that you want two symbols (*Variables*) representing the quantities that you want
to add. Note that from now on, we will use the term to add. Note that from now on, we will use the term
*Variable* to mean "symbol" (in other words, *Variable* to mean "symbol" (in other words,
``x``, ``y``, ``z`` are all *Variable* objects). The output of the function *x*, *y*, *z* are all *Variable* objects). The output of the function
``f`` is a ``numpy.ndarray`` with zero dimensions. *f* is a ``numpy.ndarray`` with zero dimensions.
If you are following along and typing into an interpreter, you may have If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function`` noticed that there was a slight delay in executing the ``function``
instruction. Behind the scene, ``f`` was being compiled into C code. instruction. Behind the scene, *f* was being compiled into C code.
.. note: .. note:
...@@ -51,9 +51,9 @@ instruction. Behind the scene, ``f`` was being compiled into C code. ...@@ -51,9 +51,9 @@ instruction. Behind the scene, ``f`` was being compiled into C code.
>>> x = theano.tensor.ivector() >>> x = theano.tensor.ivector()
>>> y = -x >>> y = -x
``x`` and ``y`` are both Variables, i.e. instances of the *x* and *y* are both Variables, i.e. instances of the
``theano.gof.graph.Variable`` class. The ``theano.gof.graph.Variable`` class. The
type of both ``x`` and ``y`` is ``theano.tensor.ivector``. type of both *x* and *y* is ``theano.tensor.ivector``.
**Step 1** **Step 1**
...@@ -65,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar`` ...@@ -65,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar``
is the type we assign to "0-dimensional arrays (`scalar`) of doubles is the type we assign to "0-dimensional arrays (`scalar`) of doubles
(`d`)". It is a Theano :ref:`type`. (`d`)". It is a Theano :ref:`type`.
``dscalar`` is not a class. Therefore, neither ``x`` nor ``y`` ``dscalar`` is not a class. Therefore, neither *x* nor *y*
are actually instances of ``dscalar``. They are instances of are actually instances of ``dscalar``. They are instances of
:class:`TensorVariable`. ``x`` and ``y`` :class:`TensorVariable`. *x* and *y*
are, however, assigned the theano Type ``dscalar`` in their ``type`` are, however, assigned the theano Type ``dscalar`` in their ``type``
field, as you can see here: field, as you can see here:
...@@ -91,13 +91,13 @@ could also learn more by looking into :ref:`graphstructures`. ...@@ -91,13 +91,13 @@ could also learn more by looking into :ref:`graphstructures`.
**Step 2** **Step 2**
The second step is to combine ``x`` and ``y`` into their sum ``z``: The second step is to combine *x* and *y* into their sum *z*:
>>> z = x + y >>> z = x + y
``z`` is yet another *Variable* which represents the addition of *z* is yet another *Variable* which represents the addition of
``x`` and ``y``. You can use the :ref:`pp <libdoc_printing>` *x* and *y*. You can use the :ref:`pp <libdoc_printing>`
function to pretty-print out the computation associated to ``z``. function to pretty-print out the computation associated to *z*.
>>> print pp(z) >>> print pp(z)
(x + y) (x + y)
...@@ -105,15 +105,15 @@ function to pretty-print out the computation associated to ``z``. ...@@ -105,15 +105,15 @@ function to pretty-print out the computation associated to ``z``.
**Step 3** **Step 3**
The last step is to create a function taking ``x`` and ``y`` as inputs The last step is to create a function taking *x* and *y* as inputs
and giving ``z`` as output: and giving *z* as output:
>>> f = function([x, y], z) >>> f = function([x, y], z)
The first argument to :func:`function <function.function>` is a list of Variables The first argument to :func:`function <function.function>` is a list of Variables
that will be provided as inputs to the function. The second argument that will be provided as inputs to the function. The second argument
is a single Variable *or* a list of Variables. For either case, the second is a single Variable *or* a list of Variables. For either case, the second
argument is what we want to see as output when we apply the function. ``f`` may argument is what we want to see as output when we apply the function. *f* may
then be used like a normal Python function. then be used like a normal Python function.
...@@ -121,8 +121,8 @@ Adding two Matrices ...@@ -121,8 +121,8 @@ Adding two Matrices
=================== ===================
You might already have guessed how to do this. Indeed, the only change You might already have guessed how to do this. Indeed, the only change
from the previous example is that you need to instantiate ``x`` and from the previous example is that you need to instantiate *x* and
``y`` using the matrix Types: *y* using the matrix Types:
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_adding.test_adding_2 .. theano/tests/test_tutorial.py:T_adding.test_adding_2
...@@ -153,12 +153,12 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`. ...@@ -153,12 +153,12 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`.
The following types are available: The following types are available:
* **byte**: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4 * **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``
* **32-bit integers**: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4 * **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``
* **64-bit integers**: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4 * **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``
* **float**: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4 * **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``
* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4 * **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``
* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4 * **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``
The previous list is not exhaustive and a guide to all types compatible The previous list is not exhaustive and a guide to all types compatible
with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`. with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.
......
差异被折叠。
...@@ -8,11 +8,11 @@ IfElse vs Switch ...@@ -8,11 +8,11 @@ IfElse vs Switch
================ ================
- Both Ops build a condition over symbolic variables. - Both ops build a condition over symbolic variables.
- ``IfElse`` takes a `boolean` condition and two variables as inputs. - ``IfElse`` takes a *boolean* condition and two variables as inputs.
- ``Switch`` takes a `tensor` as condition and two variables as inputs. - ``Switch`` takes a *tensor* as condition and two variables as inputs.
``switch`` is an elementwise operation and is thus more general than ``ifelse``. ``switch`` is an elementwise operation and is thus more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables, ``ifelse`` is lazy and only - Whereas ``switch`` evaluates both *output* variables, ``ifelse`` is lazy and only
evaluates one variable with respect to the condition. evaluates one variable with respect to the condition.
**Example** **Example**
...@@ -52,7 +52,7 @@ IfElse vs Switch ...@@ -52,7 +52,7 @@ IfElse vs Switch
f_lazyifelse(val1, val2, big_mat1, big_mat2) f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic) print 'time spent evaluating one value %f sec'%(time.clock()-tic)
In this example, the ``IfElse`` Op spends less time (about half as much) than ``Switch`` In this example, the ``IfElse`` op spends less time (about half as much) than ``Switch``
since it computes only one variable out of the two. since it computes only one variable out of the two.
.. code-block:: python .. code-block:: python
...@@ -64,7 +64,7 @@ since it computes only one variable out of the two. ...@@ -64,7 +64,7 @@ since it computes only one variable out of the two.
Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
variables and take the same computation time as ``switch``. Although the linker variables and take the same computation time as ``switch``. Although the linker
is not currently set by default to 'cvm', it will be in the near future. is not currently set by default to ``cvm``, it will be in the near future.
There is no automatic optimization replacing a ``switch`` with a There is no automatic optimization replacing a ``switch`` with a
broadcasted scalar to an ``ifelse``, as this is not always faster. See broadcasted scalar to an ``ifelse``, as this is not always faster. See
......
差异被折叠。
...@@ -74,7 +74,7 @@ Computing More than one Thing at the Same Time ...@@ -74,7 +74,7 @@ Computing More than one Thing at the Same Time
Theano supports functions with multiple outputs. For example, we can Theano supports functions with multiple outputs. For example, we can
compute the :ref:`elementwise <libdoc_tensor_elementwise>` difference, absolute difference, and compute the :ref:`elementwise <libdoc_tensor_elementwise>` difference, absolute difference, and
squared difference between two matrices ``a`` and ``b`` at the same time: squared difference between two matrices *a* and *b* at the same time:
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_3 .. theano/tests/test_tutorial.py:T_examples.test_examples_3
...@@ -123,7 +123,7 @@ array(35.0) ...@@ -123,7 +123,7 @@ array(35.0)
This makes use of the :ref:`Param <function_inputs>` class which allows This makes use of the :ref:`Param <function_inputs>` class which allows
you to specify properties of your function's parameters with greater detail. Here we you to specify properties of your function's parameters with greater detail. Here we
give a default value of 1 for ``y`` by creating a ``Param`` instance with give a default value of 1 for *y* by creating a ``Param`` instance with
its ``default`` field set to 1. its ``default`` field set to 1.
Inputs with default values must follow inputs without default Inputs with default values must follow inputs without default
...@@ -149,7 +149,7 @@ array(34.0) ...@@ -149,7 +149,7 @@ array(34.0)
array(33.0) array(33.0)
.. note:: .. note::
``Param`` does not know the name of the local variables ``y`` and ``w`` ``Param`` does not know the name of the local variables *y* and *w*
that are passed as arguments. The symbolic variable objects have name that are passed as arguments. The symbolic variable objects have name
attributes (set by ``dscalars`` in the example above) and *these* are the attributes (set by ``dscalars`` in the example above) and *these* are the
names of the keyword parameters in the functions that we build. This is names of the keyword parameters in the functions that we build. This is
...@@ -171,7 +171,7 @@ example, let's say we want to make an accumulator: at the beginning, ...@@ -171,7 +171,7 @@ example, let's say we want to make an accumulator: at the beginning,
the state is initialized to zero. Then, on each function call, the state the state is initialized to zero. Then, on each function call, the state
is incremented by the function's argument. is incremented by the function's argument.
First let's define the ``accumulator`` function. It adds its argument to the First let's define the *accumulator* function. It adds its argument to the
internal state, and returns the old state value. internal state, and returns the old state value.
.. If you modify this code, also change : .. If you modify this code, also change :
...@@ -187,13 +187,13 @@ so-called :ref:`shared variables<libdoc_compile_shared>`. ...@@ -187,13 +187,13 @@ so-called :ref:`shared variables<libdoc_compile_shared>`.
These are hybrid symbolic and non-symbolic variables whose value may be shared These are hybrid symbolic and non-symbolic variables whose value may be shared
between multiple functions. Shared variables can be used in symbolic expressions just like between multiple functions. Shared variables can be used in symbolic expressions just like
the objects returned by ``dmatrices(...)`` but they also have an internal the objects returned by ``dmatrices(...)`` but they also have an internal
value, that defines the value taken by this symbolic variable in *all* the value that defines the value taken by this symbolic variable in *all* the
functions that use it. It is called a *shared* variable because its value is functions that use it. It is called a *shared* variable because its value is
shared between many functions. The value can be accessed and modified by the shared between many functions. The value can be accessed and modified by the
``.get_value()`` and ``.set_value()`` methods. We will come back to this soon. ``.get_value()`` and ``.set_value()`` methods. We will come back to this soon.
The other new thing in this code is the ``updates`` parameter of function. The other new thing in this code is the ``updates`` parameter of ``function``.
The updates is a list of pairs of the form (shared-variable, new expression). ``updates`` must be supplied with a list of pairs of the form (shared-variable, new expression).
It can also be a dictionary whose keys are shared-variables and values are It can also be a dictionary whose keys are shared-variables and values are
the new expressions. Either way, it means "whenever this function runs, it the new expressions. Either way, it means "whenever this function runs, it
will replace the ``.value`` of each shared variable with the result of the will replace the ``.value`` of each shared variable with the result of the
...@@ -241,9 +241,9 @@ achieve a similar result by returning the new expressions, and working with ...@@ -241,9 +241,9 @@ achieve a similar result by returning the new expressions, and working with
them in NumPy as usual. The updates mechanism can be a syntactic convenience, them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
updates). Also, theano has more control over where and how shared variables are updates). Also, Theano has more control over where and how shared variables are
allocated, which is one of the important elements of getting good performance allocated, which is one of the important elements of getting good performance
on the GPU. on the :ref:`GPU<using_gpu>`.
It may happen that you expressed some formula using a shared variable, but It may happen that you expressed some formula using a shared variable, but
you do *not* want to use its value. In this case, you can use the you do *not* want to use its value. In this case, you can use the
...@@ -326,16 +326,16 @@ so we get different random numbers every time. ...@@ -326,16 +326,16 @@ so we get different random numbers every time.
>>> f_val1 = f() #different numbers from f_val0 >>> f_val1 = f() #different numbers from f_val0
When we add the extra argument ``no_default_updates=True`` to When we add the extra argument ``no_default_updates=True`` to
``function`` (as in ``g``), then the random number generator state is ``function`` (as in *g*), then the random number generator state is
not affected by calling the returned function. So, for example, calling not affected by calling the returned function. So, for example, calling
``g`` multiple times will return the same numbers. *g* multiple times will return the same numbers.
>>> g_val0 = g() # different numbers from f_val0 and f_val1 >>> g_val0 = g() # different numbers from f_val0 and f_val1
>>> g_val1 = g() # same numbers as g_val0! >>> g_val1 = g() # same numbers as g_val0!
An important remark is that a random variable is drawn at most once during any An important remark is that a random variable is drawn at most once during any
single function execution. So the ``nearly_zeros`` function is guaranteed to single function execution. So the *nearly_zeros* function is guaranteed to
return approximately 0 (except for rounding error) even though the ``rv_u`` return approximately 0 (except for rounding error) even though the *rv_u*
random variable appears three times in the output expression. random variable appears three times in the output expression.
>>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u) >>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
...@@ -363,8 +363,8 @@ Sharing Streams Between Functions ...@@ -363,8 +363,8 @@ Sharing Streams Between Functions
--------------------------------- ---------------------------------
As usual for shared variables, the random number generators used for random As usual for shared variables, the random number generators used for random
variables are common between functions. So our ``nearly_zeros`` function will variables are common between functions. So our *nearly_zeros* function will
update the state of the generators used in function ``f`` above. update the state of the generators used in function *f* above.
For example: For example:
...@@ -416,8 +416,9 @@ The preceding elements are featured in this more realistic example. It will be ...@@ -416,8 +416,9 @@ The preceding elements are featured in this more realistic example. It will be
prediction = p_1 > 0.5 # The prediction thresholded prediction = p_1 > 0.5 # The prediction thresholded
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost: gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost
# we shall return to this # (we shall return to this in a
# following section of this tutorial)
# Compile # Compile
train = theano.function( train = theano.function(
......
...@@ -8,12 +8,12 @@ Extending Theano ...@@ -8,12 +8,12 @@ Extending Theano
Theano Graphs Theano Graphs
------------- -------------
- Theano works with symbolic graphs - Theano works with symbolic graphs.
- Those graphs are bi-partite graphs (graph with 2 types of nodes) - Those graphs are bi-partite graphs (graph with 2 types of nodes).
- The 2 types of nodes are Apply and Variable nodes - The 2 types of nodes are Apply and Variable nodes.
- Each Apply node has a link to the Op that it executes - Each Apply node has a link to the op that it executes.
Inputs and Outputs are lists of Theano variables Inputs and Outputs are lists of Theano variables.
.. image:: ../hpcs2011_tutorial/pics/apply_node.png .. image:: ../hpcs2011_tutorial/pics/apply_node.png
:width: 500 px :width: 500 px
...@@ -93,12 +93,12 @@ The first one is :func:`make_node`. The second one ...@@ -93,12 +93,12 @@ The first one is :func:`make_node`. The second one
would describe the computations that are required to be done would describe the computations that are required to be done
at run time. Currently there are 2 different possibilites: at run time. Currently there are 2 different possibilites:
implement the :func:`perform` implement the :func:`perform`
and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods and/or :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. The ``perform`` allows <cop>`), or the :func:`make_thunk` method. ``perform`` allows
to easily wrap an existing Python function into Theano. The ``c_code`` to easily wrap an existing Python function into Theano. ``c_code``
and related methods allow the op to generate C code that will be and related methods allow the op to generate C code that will be
compiled and linked by Theano. On the other hand, the ``make_thunk`` compiled and linked by Theano. On the other hand, ``make_thunk``
method will be called only once during compilation and should generate will be called only once during compilation and should generate
a ``thunk``: a standalone function that when called will do the wanted computations. a ``thunk``: a standalone function that when called will do the wanted computations.
This is useful if you want to generate code and compile it yourself. For This is useful if you want to generate code and compile it yourself. For
example, this allows you to use PyCUDA to compile GPU code. example, this allows you to use PyCUDA to compile GPU code.
...@@ -117,7 +117,7 @@ The :func:`grad` method is required if you want to differentiate some cost whose ...@@ -117,7 +117,7 @@ The :func:`grad` method is required if you want to differentiate some cost whose
includes your op. includes your op.
The :func:`__str__` method is useful in order to provide a more meaningful The :func:`__str__` method is useful in order to provide a more meaningful
string representation of your Op. string representation of your op.
The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
work with your op. work with your op.
...@@ -185,9 +185,9 @@ in a file and execute it with the ``nosetests`` program. ...@@ -185,9 +185,9 @@ in a file and execute it with the ``nosetests`` program.
**Basic Tests** **Basic Tests**
Basic tests are done by you just by using the Op and checking that it Basic tests are done by you just by using the op and checking that it
returns the right answer. If you detect an error, you must raise an returns the right answer. If you detect an error, you must raise an
exception. You can use the `assert` keyword to automatically raise an *exception*. You can use the ``assert`` keyword to automatically raise an
``AssertionError``. ``AssertionError``.
.. code-block:: python .. code-block:: python
...@@ -211,10 +211,10 @@ exception. You can use the `assert` keyword to automatically raise an ...@@ -211,10 +211,10 @@ exception. You can use the `assert` keyword to automatically raise an
**Testing the infer_shape** **Testing the infer_shape**
When a class inherits from the ``InferShapeTester`` class, it gets the When a class inherits from the ``InferShapeTester`` class, it gets the
``self._compile_and_check`` method that tests the Op ``infer_shape`` ``self._compile_and_check`` method that tests the op's ``infer_shape``
method. It tests that the Op gets optimized out of the graph if only method. It tests that the op gets optimized out of the graph if only
the shape of the output is needed and not the output the shape of the output is needed and not the output
itself. Additionally, it checks that such an optimized graph computes itself. Additionally, it checks that the optimized graph computes
the correct shape, by comparing it to the actual shape of the computed the correct shape, by comparing it to the actual shape of the computed
output. output.
...@@ -222,8 +222,8 @@ output. ...@@ -222,8 +222,8 @@ output.
parameters the lists of input and output Theano variables, as would be parameters the lists of input and output Theano variables, as would be
provided to ``theano.function``, and a list of real values to pass to the provided to ``theano.function``, and a list of real values to pass to the
compiled function (don't use shapes that are symmetric, e.g. (3, 3), compiled function (don't use shapes that are symmetric, e.g. (3, 3),
as they can easily to hide errors). It also takes the Op class to as they can easily to hide errors). It also takes the op class as a parameter to
verify that no Ops of that type appear in the shape-optimized graph. verify that no instance of it appears in the shape-optimized graph.
If there is an error, the function raises an exception. If you want to If there is an error, the function raises an exception. If you want to
see it fail, you can implement an incorrect ``infer_shape``. see it fail, you can implement an incorrect ``infer_shape``.
...@@ -248,7 +248,7 @@ see it fail, you can implement an incorrect ``infer_shape``. ...@@ -248,7 +248,7 @@ see it fail, you can implement an incorrect ``infer_shape``.
**Testing the gradient** **Testing the gradient**
The function :ref:`verify_grad <validating_grad>` The function :ref:`verify_grad <validating_grad>`
verifies the gradient of an Op or Theano graph. It compares the verifies the gradient of an op or Theano graph. It compares the
analytic (symbolically computed) gradient and the numeric analytic (symbolically computed) gradient and the numeric
gradient (computed through the Finite Difference Method). gradient (computed through the Finite Difference Method).
...@@ -266,13 +266,12 @@ the multiplication by 2). ...@@ -266,13 +266,12 @@ the multiplication by 2).
.. TODO: repair defective links in the following paragraph .. TODO: repair defective links in the following paragraph
The class :class:`RopLop_checker`, give the functions The class :class:`RopLop_checker` defines the functions
:func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_rop_lop` and
:func:`RopLop_checker.check_rop_lop` and :func:`RopLop_checker.check_nondiff_rop`. These allow to test the
:func:`RopLop_checker.check_nondiff_rop` that allow to test the implementation of the Rop method of a particular op.
implementation of the Rop method of one Op.
To verify the Rop method of the DoubleOp, you can use this: For instance, to verify the Rop method of the DoubleOp, you can use this:
.. code-block:: python .. code-block:: python
...@@ -290,7 +289,7 @@ Running your tests ...@@ -290,7 +289,7 @@ Running your tests
You can run ``nosetests`` in the Theano folder to run all of Theano's You can run ``nosetests`` in the Theano folder to run all of Theano's
tests, including yours if they are somewhere in the directory tests, including yours if they are somewhere in the directory
structure. You can run ``nosetests test_file.py`` to run only the structure. For instance, you can run the following command lines to ``nosetests test_file.py`` to run only the
tests in that file. You can run ``nosetests tests in that file. You can run ``nosetests
test_file.py:test_DoubleRop`` to run only the tests inside that test test_file.py:test_DoubleRop`` to run only the tests inside that test
class. You can run ``nosetests class. You can run ``nosetests
...@@ -298,7 +297,7 @@ test_file.py:test_DoubleRop.test_double_op`` to run only one ...@@ -298,7 +297,7 @@ test_file.py:test_DoubleRop.test_double_op`` to run only one
particular test. More `nosetests particular test. More `nosetests
<http://readthedocs.org/docs/nose/en/latest/>`_ documentation. <http://readthedocs.org/docs/nose/en/latest/>`_ documentation.
You can also add this at the end of the test file: You can also add this block the end of the test file and run the file:
.. code-block:: python .. code-block:: python
...@@ -311,14 +310,13 @@ You can also add this at the end of the test file: ...@@ -311,14 +310,13 @@ You can also add this at the end of the test file:
**Testing GPU Ops** **Testing GPU Ops**
Ops that execute on the GPU should inherit from the Ops to be executed on the GPU should inherit from the ``theano.sandbox.cuda.GpuOp``
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano and not ``theano.Op``. This allows Theano to distinguish them. Currently, we
to make the distinction between both. Currently, we use this to test use this to test if the NVIDIA driver works correctly with our sum reduction code on the
if the NVIDIA driver works correctly with our sum reduction code on the
GPU. GPU.
A more extensive discussion than this section's may be found in the advanced A more extensive discussion of this section's topic may be found in the advanced
tutorial :ref:`Extending Theano<extending>` tutorial :ref:`Extending Theano<extending>`
------------------------------------------- -------------------------------------------
......
...@@ -8,19 +8,17 @@ Frequently Asked Questions ...@@ -8,19 +8,17 @@ Frequently Asked Questions
TypeError: object of type 'TensorVariable' has no len() TypeError: object of type 'TensorVariable' has no len()
------------------------------------------------------- -------------------------------------------------------
If you receive this error: If you receive the following error, it is because the Python function *__len__* cannot
be implemented on Theano variables:
.. code-block:: python .. code-block:: python
TypeError: object of type 'TensorVariable' has no len() TypeError: object of type 'TensorVariable' has no len()
We can't implement the __len__ function on Theano Variables. This is Python requires that *__len__* returns an integer, yet it cannot be done as Theano's symbolic variables. However, `var.shape[0]` can be used as a workaround.
because Python requires that this function returns an integer, but we
can't do this as we are working with symbolic variables. You can use
`var.shape[0]` as a workaround.
Also we can't change the above error message into a more explicit one This error message cannot be made more explicit because the relevant aspects of Python's
because of some other Python internal behavior that can't be modified. internals cannot be modified.
Faster gcc optimization Faster gcc optimization
......
...@@ -9,13 +9,13 @@ PyCUDA ...@@ -9,13 +9,13 @@ PyCUDA
Currently, PyCUDA and Theano have different objects to store GPU Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features. data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports Theano's implementation is called *CudaNdarray* and supports
*strides*. It also only supports the *float32* dtype. PyCUDA's implementation *strides*. It also only supports the *float32* dtype. PyCUDA's implementation
is called GPUArray and doesn't support *strides*. However, it can deal with is called *GPUArray* and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes. all NumPy and CUDA dtypes.
We are currently working on having the same base object that will We are currently working on having the same base object for both that will
mimic Numpy. Until this is ready, here is some information on how to also mimic Numpy. Until this is ready, here is some information on how to
use both objects in the same script. use both objects in the same script.
Transfer Transfer
...@@ -24,8 +24,8 @@ Transfer ...@@ -24,8 +24,8 @@ Transfer
You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
``to_gpuarray(x)`` return a new object that occupies the same memory space ``to_gpuarray(x)`` return a new object that occupies the same memory space
as the original. Otherwise it raises a ValueError. Because GPUArrays don't as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't
support *strides*, if the CudaNdarray is strided, we could copy it to support strides, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set ``copyif=True`` in memory region. If you want this behavior, set ``copyif=True`` in
``to_gpuarray``. ``to_gpuarray``.
...@@ -122,13 +122,15 @@ CUDAMat ...@@ -122,13 +122,15 @@ CUDAMat
There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects. There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects.
They obey the same principles as Theano's PyCUDA functions and can be found in They obey the same principles as Theano's PyCUDA functions and can be found in
theano.misc.cudamat_utils.py ``theano.misc.cudamat_utils.py``.
WARNING: There is a strange problem associated with stride/shape with those converters. .. TODO: this statement is unclear:
In order to work, the test needs a transpose and reshape...
WARNING: There is a peculiar problem associated with stride/shape with those converters.
In order to work, the test needs a *transpose* and *reshape*...
Gnumpy Gnumpy
====== ======
There are conversion functions between Gnumpy ``garray`` objects and Theano CudaNdArray objects. There are conversion functions between Gnumpy *garray* objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in theano.misc.gnumpy_utils.py. They are also similar to Theano's PyCUDA functions and can be found in ``theano.misc.gnumpy_utils.py``.
...@@ -10,12 +10,14 @@ Computing Gradients ...@@ -10,12 +10,14 @@ Computing Gradients
=================== ===================
Now let's use Theano for a slightly more sophisticated task: create a Now let's use Theano for a slightly more sophisticated task: create a
function which computes the derivative of some expression ``y`` with function which computes the derivative of some expression *y* with
respect to its parameter ``x``. To do this we will use the macro ``T.grad``. respect to its parameter *x*. To do this we will use the macro ``T.grad``.
For instance, we can compute the For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`. Note that: gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`. :math:`d(x^2)/dx = 2 \cdot x`.
.. TODO: fix the vertical positioning of the expressions in the preceding paragraph
Here is the code to compute this gradient: Here is the code to compute this gradient:
.. If you modify this code, also change : .. If you modify this code, also change :
...@@ -36,7 +38,7 @@ array(188.40000000000001) ...@@ -36,7 +38,7 @@ array(188.40000000000001)
In this example, we can see from ``pp(gy)`` that we are computing In this example, we can see from ``pp(gy)`` that we are computing
the correct symbolic gradient. the correct symbolic gradient.
``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as ``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as
``x ** 2`` and fill it with 1.0. *x* ** *2* and fill it with *1.0*.
.. note:: .. note::
The optimizer simplifies the symbolic gradient expression. You can see The optimizer simplifies the symbolic gradient expression. You can see
...@@ -56,7 +58,7 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`. ...@@ -56,7 +58,7 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
.. figure:: dlogistic.png .. figure:: dlogistic.png
A plot of the gradient of the logistic function, with x on the x-axis A plot of the gradient of the logistic function, with *x* on the x-axis
and :math:`ds(x)/dx` on the y-axis. and :math:`ds(x)/dx` on the y-axis.
...@@ -71,17 +73,17 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`. ...@@ -71,17 +73,17 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
array([[ 0.25 , 0.19661193], array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]]) [ 0.19661193, 0.10499359]])
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides In general, for any **scalar** expression *s*, ``T.grad(s, w)`` provides
the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation this way Theano can be used for doing **efficient** symbolic differentiation
(as the expression return by ``T.grad`` will be optimized during compilation), even for (as the expression returned by ``T.grad`` will be optimized during compilation), even for
function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation). of symbolic differentiation).
.. note:: .. note::
The second argument of ``T.grad`` can be a list, in which case the The second argument of ``T.grad`` can be a list, in which case the
output is also a list. The order in both lists is important, element output is also a list. The order in both lists is important: element
*i* of the output list is the gradient of the first argument of *i* of the output list is the gradient of the first argument of
``T.grad`` with respect to the *i*-th element of the list given as second argument. ``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor The first argument of ``T.grad`` has to be a scalar (a tensor
...@@ -95,14 +97,17 @@ of symbolic differentiation). ...@@ -95,14 +97,17 @@ of symbolic differentiation).
Computing the Jacobian Computing the Jacobian
====================== ======================
Theano implements :func:`theano.gradient.jacobian` macro that does all In Theano's parlance, the term *Jacobian* designates the tensor comprising the
what is needed to compute the Jacobian. The following text explains how first differences of the output of a function with respect to its inputs.
(This is a generalization of to the so-called Jacobian matrix in Mathematics.)
Theano implements the :func:`theano.gradient.jacobian` macro that does all
that is needed to compute the Jacobian. The following text explains how
to do it manually. to do it manually.
In order to manually compute the Jacobian of some function ``y`` with In order to manually compute the Jacobian of some function *y* with
respect to some parameter ``x`` we need to use ``scan``. What we respect to some parameter *x* we need to use ``scan``. What we
do is to loop over the entries in ``y`` and compute the gradient of do is to loop over the entries in *y* and compute the gradient of
``y[i]`` with respect to ``x``. *y[i]* with respect to *x*.
.. note:: .. note::
...@@ -110,7 +115,7 @@ do is to loop over the entries in ``y`` and compute the gradient of ...@@ -110,7 +115,7 @@ do is to loop over the entries in ``y`` and compute the gradient of
manner all kinds of recurrent equations. While creating manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task, symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. We effort is being done for improving the performance of ``scan``. We
shall return to ``scan`` in a moment. shall return to ``scan`` later in this tutorial.
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> y = x**2 >>> y = x**2
...@@ -120,31 +125,33 @@ do is to loop over the entries in ``y`` and compute the gradient of ...@@ -120,31 +125,33 @@ do is to loop over the entries in ``y`` and compute the gradient of
array([[ 8., 0.], array([[ 8., 0.],
[ 0., 8.]]) [ 0., 8.]])
What we do in this code is to generate a sequence of ints from ``0`` to What we do in this code is to generate a sequence of *ints* from *0* to
``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and ``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
at each step, we compute the gradient of element ``y[[i]`` with respect to at each step, we compute the gradient of element *y[i]* with respect to
``x``. ``scan`` automatically concatenates all these rows, generating a *x*. ``scan`` automatically concatenates all these rows, generating a
matrix which corresponds to the Jacobian. matrix which corresponds to the Jacobian.
.. note:: .. note::
There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you
cannot re-write the above expression of the jacobian as cannot re-write the above expression of the Jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y, ``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this non_sequences=x)``, even though from the documentation of scan this
seems possible. The reason is that ``y_i`` will not be a function of seems possible. The reason is that *y_i* will not be a function of
``x`` anymore, while ``y[i]`` still is. *x* anymore, while *y[i]* still is.
Computing the Hessian Computing the Hessian
===================== =====================
Theano implements :func:`theano.gradient.hessian` macro that does all In Theano, the term *Hessian* has the usual mathematical acception: It is the
matrix comprising the second order partial derivative of a function with scalar
output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all
that is needed to compute the Hessian. The following text explains how that is needed to compute the Hessian. The following text explains how
to do it manually. to do it manually.
You can compute the Hessian manually similarly to the Jacobian. The only You can compute the Hessian manually similarly to the Jacobian. The only
difference is that now, instead of computing the Jacobian of some expression difference is that now, instead of computing the Jacobian of some expression
``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some *y*, we compute the Jacobian of ``T.grad(cost,x)``, where *cost* is some
scalar. scalar.
...@@ -181,12 +188,12 @@ R-operator ...@@ -181,12 +188,12 @@ R-operator
The *R operator* is built to evaluate the product between a Jacobian and a The *R operator* is built to evaluate the product between a Jacobian and a
vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in can be extended even for *x* being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such of tensor product. Because in practice we end up needing to compute such
expressions in terms of weight matrices, Theano supports this more generic expressions in terms of weight matrices, Theano supports this more generic
form of the operation. In order to evaluate the *R-operation* of form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v`` expression *y*, with respect to *x*, multiplying the Jacobian with *v*
you need to do something similar to this: you need to do something similar to this:
...@@ -221,19 +228,19 @@ array([[ 0., 0.], ...@@ -221,19 +228,19 @@ array([[ 0., 0.],
.. note:: .. note::
`v`, the point of evaluation, differs between the *L-operator* and the *R-operator*. `v`, the *point of evaluation*, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the point of evaluation needs to have the same shape For the *L-operator*, the point of evaluation needs to have the same shape
as the output, whereas for the *R-operator* this point should as the output, whereas for the *R-operator* this point should
have the same shape as the input parameter. Furthermore, the results of these two have the same shape as the input parameter. Furthermore, the results of these two
operations differ. The result of the *L-operator* is of the same shape operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* has a shape similar as the input parameter, while the result of the *R-operator* has a shape similar
to the output. to that of the output.
Hessian times a Vector Hessian times a Vector
====================== ======================
If you need to compute the Hessian times a vector, you can make use of the If you need to compute the *Hessian times a vector*, you can make use of the
above-defined operators to do it more efficiently than actually computing above-defined operators to do it more efficiently than actually computing
the exact Hessian and then performing the product. Due to the symmetry of the the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will Hessian matrix, you have two options that will
...@@ -267,7 +274,7 @@ Final Pointers ...@@ -267,7 +274,7 @@ Final Pointers
============== ==============
* The ``grad`` function works symbolically: it receives and returns a Theano variables. * The ``grad`` function works symbolically: it receives and returns Theano variables.
* ``grad`` can be compared to a macro since it can be applied repeatedly. * ``grad`` can be compared to a macro since it can be applied repeatedly.
...@@ -276,5 +283,5 @@ Final Pointers ...@@ -276,5 +283,5 @@ Final Pointers
* Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*. * Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*.
* Work is in progress on the optimizations required to compute efficiently the full * Work is in progress on the optimizations required to compute efficiently the full
Jacobian and Hessian matrices and the *Jacobian times vector* expression. Jacobian and the Hessian matrix as well as the *Jacobian times vector*.
...@@ -6,8 +6,8 @@ Loading and Saving ...@@ -6,8 +6,8 @@ Loading and Saving
================== ==================
Python's standard way of saving class instances and reloading them Python's standard way of saving class instances and reloading them
is the pickle_ mechanism. Many Theano objects can be serialized (and is the pickle_ mechanism. Many Theano objects can be *serialized* (and
deserialized) by ``pickle``, however, a limitation of ``pickle`` is that *deserialized*) by ``pickle``, however, a limitation of ``pickle`` is that
it does not save the code or data of a class along with the instance of it does not save the code or data of a class along with the instance of
the class being serialized. As a result, reloading objects created by a the class being serialized. As a result, reloading objects created by a
previous version of a class can be really problematic. previous version of a class can be really problematic.
...@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you ...@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you
don't. don't.
For instance, if the only parameters you want to save are a weight For instance, if the only parameters you want to save are a weight
matrix ``W`` and a bias ``b``, you can define: matrix *W* and a bias *b*, you can define:
.. code-block:: python .. code-block:: python
...@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define: ...@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define:
self.W = W self.W = W
self.b = b self.b = b
If at some point in time ``W`` is renamed to ``weights`` and ``b`` to If at some point in time *W* is renamed to *weights* and *b* to
``bias``, the older pickled files will still be usable, if you update these *bias*, the older pickled files will still be usable, if you update these
functions to reflect the change in name: functions to reflect the change in name:
.. code-block:: python .. code-block:: python
...@@ -152,6 +152,6 @@ functions to reflect the change in name: ...@@ -152,6 +152,6 @@ functions to reflect the change in name:
self.weights = W self.weights = W
self.bias = b self.bias = b
For more information on advanced use of pickle and its internals, see Python's For more information on advanced use of ``pickle`` and its internals, see Python's
pickle_ documentation. pickle_ documentation.
...@@ -9,10 +9,10 @@ Scan ...@@ -9,10 +9,10 @@ Scan
==== ====
- A general form of *recurrence*, which can be used for looping. - A general form of *recurrence*, which can be used for looping.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of scan. - *Reduction* and *map* (loop over the leading dimensions) are special cases of ``scan``.
- You 'scan' a function along some input sequence, producing an output at each time-step. - You ``scan`` a function along some input sequence, producing an output at each time-step.
- The function can see the *previous K time-steps* of your function. - The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``. - ``sum()`` could be computed by scanning the *z + x(i)* function over a list, given an initial state of *z=0*.
- Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping. - Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over *for* loops: - Advantages of using ``scan`` over *for* loops:
...@@ -30,6 +30,7 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`. ...@@ -30,6 +30,7 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
import theano import theano
import theano.tensor as T import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
k = T.iscalar("k"); A = T.vector("A") k = T.iscalar("k"); A = T.vector("A")
...@@ -54,8 +55,10 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`. ...@@ -54,8 +55,10 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
.. code-block:: python .. code-block:: python
import numpy
import theano import theano
import theano.tensor as T import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
coefficients = theano.tensor.vector("coefficients") coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x"); max_coefficients_supported = 10000 x = T.scalar("x"); max_coefficients_supported = 10000
......
...@@ -9,14 +9,14 @@ Configuration Settings and Compiling Modes ...@@ -9,14 +9,14 @@ Configuration Settings and Compiling Modes
Configuration Configuration
============= =============
The ``config`` module contains several ``attributes`` that modify Theano's behavior. Many of these The ``config`` module contains several *attributes* that modify Theano's behavior. Many of these
attributes are examined during the import of the ``theano`` module and several are assumed to be attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only. read-only.
*As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.* *As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
Theano's code comes with default values for these attributes, but you can Theano's code comes with default values for these attributes, but you can
override them from your .theanorc file, and override those values in turn by override them from your ``.theanorc`` file, and override those values in turn by
the :envvar:`THEANO_FLAGS` environment variable. the :envvar:`THEANO_FLAGS` environment variable.
The order of precedence is: The order of precedence is:
...@@ -110,6 +110,8 @@ time the execution using the command line ``time python file.py``. ...@@ -110,6 +110,8 @@ time the execution using the command line ``time python file.py``.
.. TODO: To be resolved: .. TODO: To be resolved:
.. Solution said:
.. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")`` .. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
.. Why the latter portion? .. Why the latter portion?
...@@ -119,10 +121,10 @@ time the execution using the command line ``time python file.py``. ...@@ -119,10 +121,10 @@ time the execution using the command line ``time python file.py``.
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code. * Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* Cast inputs before storing them into a shared variable. * Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of int32 with float32 to float64: * Circumvent the automatic cast of *int32* with *float32* to *float64*:
* Insert manual cast in your code or use [u]int{8,16}. * Insert manual cast in your code or use *[u]int{8,16}*.
* Insert manual cast around the mean operator (this involves division by length, which is an int64). * Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
* Notice that a new casting mechanism is being developed. * Notice that a new casting mechanism is being developed.
------------------------------------------- -------------------------------------------
...@@ -130,7 +132,7 @@ time the execution using the command line ``time python file.py``. ...@@ -130,7 +132,7 @@ time the execution using the command line ``time python file.py``.
Mode Mode
==== ====
Everytime :func:`theano.function <function.function>` is called Everytime :func:`theano.function <function.function>` is called,
the symbolic relationships between the input and output Theano *variables* the symbolic relationships between the input and output Theano *variables*
are optimized and compiled. The way this compilation occurs are optimized and compiled. The way this compilation occurs
is controlled by the value of the ``mode`` parameter. is controlled by the value of the ``mode`` parameter.
...@@ -139,9 +141,9 @@ Theano defines the following modes by name: ...@@ -139,9 +141,9 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations. - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible. - ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python - ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes, implementations. This mode can take much longer than the other modes,but can identify
but can identify many kinds of problems. several kinds of problems.
- ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information - ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information
The default mode is typically ``FAST_RUN``, but it can be controlled via The default mode is typically ``FAST_RUN``, but it can be controlled via
...@@ -152,18 +154,18 @@ which can be overridden by passing the keyword argument to ...@@ -152,18 +154,18 @@ which can be overridden by passing the keyword argument to
================= =============================================================== =============================================================================== ================= =============================================================== ===============================================================================
short name Full constructor What does it do? short name Full constructor What does it do?
================= =============================================================== =============================================================================== ================= =============================================================== ===============================================================================
FAST_COMPILE ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations ``FAST_COMPILE`` ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations
FAST_RUN ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations. ``FAST_RUN`` ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations.
DEBUG_MODE ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations. ``DEBUG_MODE`` ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
PROFILE_MODE ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information. ``PROFILE_MODE`` ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information.
================= =============================================================== =============================================================================== ================= =============================================================== ===============================================================================
Linkers Linkers
======= =======
A mode is composed of 2 things: an optimizer and a linker. Some modes, A mode is composed of 2 things: an optimizer and a linker. Some modes,
like PROFILE_MODE and DEBUG_MODE, add logic around the optimizer and like ``PROFILE_MODE`` and ``DEBUG_MODE``, add logic around the optimizer and
linker. PROFILE_MODE and DEBUG_MODE use their own linker. linker. ``PROFILE_MODE`` and ``DEBUG_MODE`` use their own linker.
You can select witch linker to use with the Theano flag :attr:`config.linker`. You can select witch linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers. Here is a table to compare the different linkers.
...@@ -184,8 +186,8 @@ DebugMode no yes VERY HIGH Make many checks on what ...@@ -184,8 +186,8 @@ DebugMode no yes VERY HIGH Make many checks on what
.. [#gc] Garbage collection of intermediate results during computation. .. [#gc] Garbage collection of intermediate results during computation.
Otherwise, their memory space used by the ops is kept between Otherwise, their memory space used by the ops is kept between
Theano function calls, in order not to Theano function calls, in order not to
reallocate memory, and lower the overhead (make it faster...) reallocate memory, and lower the overhead (make it faster...).
.. [#cpy1] default .. [#cpy1] Default
.. [#cpy2] Deprecated .. [#cpy2] Deprecated
...@@ -201,10 +203,10 @@ While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode, ...@@ -201,10 +203,10 @@ While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode,
it is useful at first (especially when you are defining new kinds of it is useful at first (especially when you are defining new kinds of
expressions or new optimizations) to run your code using the DebugMode expressions or new optimizations) to run your code using the DebugMode
(available via ``mode='DEBUG_MODE'``). The DebugMode is designed to (available via ``mode='DEBUG_MODE'``). The DebugMode is designed to
do several self-checks and assertations that can help to diagnose run several self-checks and assertions that can help diagnose
possible programming errors that can lead to incorect output. Note that possible programming errors leading to incorrect output. Note that
``DEBUG_MODE`` is much slower then ``FAST_RUN`` or ``FAST_COMPILE`` so ``DEBUG_MODE`` is much slower than ``FAST_RUN`` or ``FAST_COMPILE`` so
use it only during development (not when you launch 1000 process on a use it only during development (not when you launch 1000 processes on a
cluster!). cluster!).
...@@ -225,14 +227,16 @@ DebugMode is used as follows: ...@@ -225,14 +227,16 @@ DebugMode is used as follows:
If any problem is detected, DebugMode will raise an exception according to If any problem is detected, DebugMode will raise an exception according to
what went wrong, either at call time (``f(5)``) or compile time ( what went wrong, either at call time (*f(5)*) or compile time (
``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions ``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions
should *not* be ignored; talk to your local Theano guru or email the should *not* be ignored; talk to your local Theano guru or email the
users list if you cannot make the exception go away. users list if you cannot make the exception go away.
Some kinds of errors can only be detected for certain input value combinations. Some kinds of errors can only be detected for certain input value combinations.
In the example above, there is no way to guarantee that a future call to say, In the example above, there is no way to guarantee that a future call to, say
``f(-1)`` won't cause a problem. DebugMode is not a silver bullet. *f(-1)*, won't cause a problem. DebugMode is not a silver bullet.
.. TODO: repair the following link
If you instantiate DebugMode using the constructor (see :class:`DebugMode`) If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
...@@ -277,7 +281,7 @@ implementation only, should use the gof.PerformLinker (or "py" for ...@@ -277,7 +281,7 @@ implementation only, should use the gof.PerformLinker (or "py" for
short). On the other hand, a user wanting to profile his graph using C short). On the other hand, a user wanting to profile his graph using C
implementations wherever possible should use the ``gof.OpWiseCLinker`` implementations wherever possible should use the ``gof.OpWiseCLinker``
(or "c|py"). For testing the speed of your code we would recommend (or "c|py"). For testing the speed of your code we would recommend
using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker. using the ``fast_run`` optimizer and the ``gof.OpWiseCLinker`` linker.
Compiling your Graph with ProfileMode Compiling your Graph with ProfileMode
------------------------------------- -------------------------------------
...@@ -300,7 +304,7 @@ the desired timing information, indicating where your graph is spending most ...@@ -300,7 +304,7 @@ the desired timing information, indicating where your graph is spending most
of its time. This is best shown through an example. Let's use our logistic of its time. This is best shown through an example. Let's use our logistic
regression example. regression example.
Compiling the module with ProfileMode and calling ``profmode.print_summary()`` Compiling the module with ``ProfileMode`` and calling ``profmode.print_summary()``
generates the following output: generates the following output:
.. code-block:: python .. code-block:: python
...@@ -352,14 +356,14 @@ generates the following output: ...@@ -352,14 +356,14 @@ generates the following output:
This output has two components. In the first section called This output has two components. In the first section called
*Apply-wise summary*, timing information is provided for the worst *Apply-wise summary*, timing information is provided for the worst
offending Apply nodes. This corresponds to individual Op applications offending ``Apply`` nodes. This corresponds to individual op applications
within your graph which took longest to execute (so if you use within your graph which took longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion, ``dot`` twice, you will see two entries there). In the second portion,
the *Op-wise summary*, the execution time of all Apply nodes executing the *Op-wise summary*, the execution time of all ``Apply`` nodes executing
the same Op are grouped together and the total execution time per Op the same op are grouped together and the total execution time per op
is shown (so if you use ``dot`` twice, you will see only one entry is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them). there corresponding to the sum of the time spent in each of them).
Finally, notice that the ProfileMode also shows which Ops were running a C Finally, notice that the ``ProfileMode`` also shows which ops were running a C
implementation. implementation.
......
.. _tutorial_printing_drawing:
==============================
Printing/Drawing Theano graphs
==============================
.. TODO: repair the defective links in the next paragraph
Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a *png* image of the function.
You can read about them in :ref:`libdoc_printing`.
Consider again the logistic regression but notice the additional printing instuctions.
The following output depicts the pre- and post- compilation graphs.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.01*gw, b:b-0.01*gb},
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
if any( [x.op.__class__.__name__=='Gemv' for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
# Print the picture graphs
# after compilation
theano.printing.pydotprint(predict,
outfile="pics/logreg_pydotprint_predic.png",
var_with_name_simple=True)
# before compilation
theano.printing.pydotprint_variables(prediction,
outfile="pics/logreg_pydotprint_prediction.png",
var_with_name_simple=True)
theano.printing.pydotprint(train,
outfile="pics/logreg_pydotprint_train.png",
var_with_name_simple=True)
Pretty Printing
===============
``theano.printing.pprint(variable)``
>>> theano.printing.pprint(prediction) # (pre-compilation)
gt((TensorConstant{1} / (TensorConstant{1} + exp(((-(x \\dot w)) - b)))),TensorConstant{0.5})
Debug Printing
==============
``theano.printing.debugprint({fct, variable, list of variables})``
>>> theano.printing.debugprint(prediction) # (pre-compilation)
Elemwise{gt,no_inplace} [@181772236] ''
|Elemwise{true_div,no_inplace} [@181746668] ''
| |InplaceDimShuffle{x} [@181746412] ''
| | |TensorConstant{1} [@181745836]
| |Elemwise{add,no_inplace} [@181745644] ''
| | |InplaceDimShuffle{x} [@181745420] ''
| | | |TensorConstant{1} [@181744844]
| | |Elemwise{exp,no_inplace} [@181744652] ''
| | | |Elemwise{sub,no_inplace} [@181744012] ''
| | | | |Elemwise{neg,no_inplace} [@181730764] ''
| | | | | |dot [@181729676] ''
| | | | | | |x [@181563948]
| | | | | | |w [@181729964]
| | | | |InplaceDimShuffle{x} [@181743788] ''
| | | | | |b [@181730156]
|InplaceDimShuffle{x} [@181771788] ''
| |TensorConstant{0.5} [@181771148]
>>> theano.printing.debugprint(predict) # (post-compilation)
Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] '' 2
|dot [@183018796] '' 1
| |x [@183000780]
| |w [@183000812]
|InplaceDimShuffle{x} [@183133580] '' 0
| |b [@183000876]
|TensorConstant{[ 0.5]} [@183084108]
Picture Printing
================
>>> theano.printing.pydotprint_variables(prediction) # (pre-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
:width: 800 px
Notice that ``pydotprint()`` requires *Graphviz* and Python's ``pydot``.
>>> theano.printing.pydotprint(predict) # (post-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
:width: 800 px
>>> theano.printing.pydotprint(train) # This is a small train example!
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
:width: 1500 px
...@@ -5,15 +5,19 @@ ...@@ -5,15 +5,19 @@
Some general Remarks Some general Remarks
===================== =====================
Theano offers quite a bit of flexibility, but has some limitations too. .. TODO: This discussion is awkward. Even with this beneficial reordering (28 July 2012) its purpose and message are unclear.
How should you write your algorithm to make the most of what Theano can do?
Limitations Limitations
----------- -----------
- While- or for-Loops within an expression graph are supported, but only via Theano offers a good amount of flexibility, but has some limitations too.
How then can you write your algorithm to make the most of what Theano can do?
- *While*- or *for*-Loops within an expression graph are supported, but only via
the :func:`theano.scan` op (which puts restrictions on how the loop body can the :func:`theano.scan` op (which puts restrictions on how the loop body can
interact with the rest of the graph). interact with the rest of the graph).
- Neither ``goto`` nor recursion is supported or planned within expression graphs. - Neither *goto* nor *recursion* is supported or planned within expression graphs.
...@@ -18,7 +18,7 @@ Currently, information regarding shape is used in two ways in Theano: ...@@ -18,7 +18,7 @@ Currently, information regarding shape is used in two ways in Theano:
`Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_ `Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_
method. method.
ex: Example:
.. code-block:: python .. code-block:: python
...@@ -40,7 +40,7 @@ Shape Inference Problem ...@@ -40,7 +40,7 @@ Shape Inference Problem
======================= =======================
Theano propagates information about shape in the graph. Sometimes this Theano propagates information about shape in the graph. Sometimes this
can lead to errors. For example: can lead to errors. Consider this example:
.. code-block:: python .. code-block:: python
...@@ -90,19 +90,19 @@ example), an inferred shape is computed directly, without executing ...@@ -90,19 +90,19 @@ example), an inferred shape is computed directly, without executing
the computation itself (there is no ``join`` in the first output or debugprint). the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can also hide errors. In This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of the output of ``join`` is done only this example, the computation of the shape of the output of ``join`` is done only
based on the first input Theano variable, which leads to an error. based on the first input Theano variable, which leads to an error.
This might happen with other ops such as elemwise, dot, ... This might happen with other ops such as ``elemwise`` and ``dot``, for example.
Indeed, to perform some optimizations (for speed or stability, for instance), Indeed, to perform some optimizations (for speed or stability, for instance),
Theano assumes that the computation is correct and consistent Theano assumes that the computation is correct and consistent
in the first place, as it does here. in the first place, as it does here.
You can detect those problems by running the code without this You can detect those problems by running the code without this
optimization, with the Theano flag optimization, using the Theano flag
``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the ``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
same effect by running in the modes FAST_COMPILE (it will not apply this same effect by running in the modes ``FAST_COMPILE`` (it will not apply this
optimization, nor most other optimizations) or DEBUG_MODE (it will test optimization, nor most other optimizations) or ``DEBUG_MODE`` (it will test
before and after all optimizations (much slower)). before and after all optimizations (much slower)).
...@@ -113,15 +113,15 @@ Currently, specifying a shape is not as easy and flexible as we wish and we plan ...@@ -113,15 +113,15 @@ Currently, specifying a shape is not as easy and flexible as we wish and we plan
upgrade. Here is the current state of what can be done: upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the ``ConvOp`` created - You can pass the shape info directly to the ``ConvOp`` created
when calling conv2d. You simply add the parameters image_shape when calling ``conv2d``. You simply set the parameters ``image_shape``
and filter_shape to the call. They must be tuples of 4 and ``filter_shape`` inside the call. They must be tuples of 4
elements. For example: elements. For example:
.. code-block:: python .. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4)) theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape information anywhere in the - You can use the ``SpecifyShape`` op to add shape information anywhere in the
graph. This allows to perform some optimizations. In the following example, graph. This allows to perform some optimizations. In the following example,
this makes it possible to precompute the Theano function to a constant. this makes it possible to precompute the Theano function to a constant.
...@@ -138,6 +138,6 @@ Future Plans ...@@ -138,6 +138,6 @@ Future Plans
============ ============
The parameter "constant shape" will be added to ``theano.shared()``. This is probably The parameter "constant shape" will be added to ``theano.shared()``. This is probably
the most frequent case with ``shared variables``. This will make the code the most frequent occurrence with ``shared`` variables. It will make the code
simpler and will make it possible to check that the shape does not change when simpler and will make it possible to check that the shape does not change when
updating the shared variable. updating the ``shared`` variable.
...@@ -19,7 +19,7 @@ relations using symbolic placeholders (**variables**). When writing down ...@@ -19,7 +19,7 @@ relations using symbolic placeholders (**variables**). When writing down
these expressions you use operations like ``+``, ``-``, ``**``, these expressions you use operations like ``+``, ``-``, ``**``,
``sum()``, ``tanh()``. All these are represented internally as **ops**. ``sum()``, ``tanh()``. All these are represented internally as **ops**.
An **op** represents a certain computation on some type of inputs An **op** represents a certain computation on some type of inputs
producing some type of output. You can see it as a function definition producing some type of output. You can see it as a *function definition*
in most programming languages. in most programming languages.
Theano builds internally a graph structure composed of interconnected Theano builds internally a graph structure composed of interconnected
...@@ -69,15 +69,15 @@ Take for example the following code: ...@@ -69,15 +69,15 @@ Take for example the following code:
x = T.dmatrix('x') x = T.dmatrix('x')
y = x*2. y = x*2.
If you print `type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
which is the apply node that connects the op and the inputs to get this which is the apply node that connects the op and the inputs to get this
output. You can now print the name of the op that is applied to get output. You can now print the name of the op that is applied to get
``y``: *y*:
>>> y.owner.op.name >>> y.owner.op.name
'Elemwise{mul,no_inplace}' 'Elemwise{mul,no_inplace}'
Hence, an elementwise multiplication is used to compute ``y``. This Hence, an elementwise multiplication is used to compute *y*. This
multiplication is done between the inputs: multiplication is done between the inputs:
>>> len(y.owner.inputs) >>> len(y.owner.inputs)
...@@ -89,7 +89,7 @@ InplaceDimShuffle{x,x}.0 ...@@ -89,7 +89,7 @@ InplaceDimShuffle{x,x}.0
Note that the second input is not 2 as we would have expected. This is Note that the second input is not 2 as we would have expected. This is
because 2 was first :term:`broadcasted <broadcasting>` to a matrix of because 2 was first :term:`broadcasted <broadcasting>` to a matrix of
same shape as x. This is done by using the op ``DimShuffle`` : same shape as *x*. This is done by using the op ``DimShuffle`` :
>>> type(y.owner.inputs[1]) >>> type(y.owner.inputs[1])
<class 'theano.tensor.basic.TensorVariable'> <class 'theano.tensor.basic.TensorVariable'>
...@@ -122,7 +122,7 @@ Using the ...@@ -122,7 +122,7 @@ Using the
these gradients can be composed in order to obtain the expression of the these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs . gradient of the graph's output with respect to the graph's inputs .
A coming section of this tutorial will address the topic of differentiation A following section of this tutorial will examine the topic of differentiation
in greater detail. in greater detail.
...@@ -131,7 +131,7 @@ Optimizations ...@@ -131,7 +131,7 @@ Optimizations
When compiling a Theano function, what you give to the When compiling a Theano function, what you give to the
:func:`theano.function <function.function>` is actually a graph :func:`theano.function <function.function>` is actually a graph
(starting from the outputs variables you can traverse the graph up to (starting from the output variables you can traverse the graph up to
the input variables). While this graph structure shows how to compute the input variables). While this graph structure shows how to compute
the output from the input, it also offers the possibility to improve the the output from the input, it also offers the possibility to improve the
way this computation is carried out. The way optimizations work in way this computation is carried out. The way optimizations work in
......
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论