提交 00183e72 authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merge pull request #905 from nouiz/add_exerc_docu_rebase

Documentation improvements
...@@ -19,7 +19,7 @@ I wrote a new optimization, but it's not getting used... ...@@ -19,7 +19,7 @@ I wrote a new optimization, but it's not getting used...
Remember that you have to register optimizations with the :ref:`optdb` Remember that you have to register optimizations with the :ref:`optdb`
for them to get used by the normal modes like FAST_COMPILE, FAST_RUN, for them to get used by the normal modes like FAST_COMPILE, FAST_RUN,
and DEBUG_MODE. and DebugMode.
I wrote a new optimization, and it changed my results even though I'm pretty sure it is correct. I wrote a new optimization, and it changed my results even though I'm pretty sure it is correct.
......
...@@ -168,7 +168,7 @@ not modify any of the inputs. ...@@ -168,7 +168,7 @@ not modify any of the inputs.
TODO: EXPLAIN DESTROYMAP and VIEWMAP BETTER AND GIVE EXAMPLE. TODO: EXPLAIN DESTROYMAP and VIEWMAP BETTER AND GIVE EXAMPLE.
When developing an Op, you should run computations in DebugMode, by using When developing an Op, you should run computations in DebugMode, by using
argument ``mode='DEBUG_MODE'`` to ``theano.function``. DebugMode is argument ``mode='DebugMode'`` to ``theano.function``. DebugMode is
slow, but it can catch many common violations of the Op contract. slow, but it can catch many common violations of the Op contract.
TODO: Like what? How? Talk about Python vs. C too. TODO: Like what? How? Talk about Python vs. C too.
......
...@@ -6,15 +6,15 @@ Extending Theano ...@@ -6,15 +6,15 @@ Extending Theano
================ ================
This documentation is for users who want to extend Theano with new Types, new This advanced tutorial is for users who want to extend Theano with new Types, new
Operations (Ops), and new graph optimizations. Operations (Ops), and new graph optimizations.
Along the way, it also introduces many aspects of how Theano works, so it is Along the way, it also introduces many aspects of how Theano works, so it is
also good for you if you are interested in getting more under the hood with also good for you if you are interested in getting more under the hood with
Theano itself. Theano itself.
Before tackling this tutorial, it is highly recommended to read the Before tackling this more advanced presentation, it is highly recommended to read the
:ref:`tutorial`. introductory :ref:`Tutorial<tutorial>`.
The first few pages will walk you through the definition of a new :ref:`type`, The first few pages will walk you through the definition of a new :ref:`type`,
``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We ``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We
......
...@@ -289,7 +289,7 @@ Example: ...@@ -289,7 +289,7 @@ Example:
f = T.function([a,b],[c],mode='FAST_RUN') f = T.function([a,b],[c],mode='FAST_RUN')
m = theano.Module() m = theano.Module()
minstance = m.make(mode='DEBUG_MODE') minstance = m.make(mode='DebugMode')
Whenever possible, unit tests should omit this parameter. Leaving Whenever possible, unit tests should omit this parameter. Leaving
out the mode will ensure that unit tests use the default mode. out the mode will ensure that unit tests use the default mode.
...@@ -306,7 +306,7 @@ type this: ...@@ -306,7 +306,7 @@ type this:
THEANO_FLAGS='mode=FAST_COMPILE' nosetests THEANO_FLAGS='mode=FAST_COMPILE' nosetests
THEANO_FLAGS='mode=FAST_RUN' nosetests THEANO_FLAGS='mode=FAST_RUN' nosetests
THEANO_FLAGS='mode=DEBUG_MODE' nosetests THEANO_FLAGS='mode=DebugMode' nosetests
.. _random_value_in_tests: .. _random_value_in_tests:
......
.. _glossary: .. _glossary:
Glossary of terminology Glossary
======================= ========
.. glossary:: .. glossary::
......
...@@ -190,12 +190,10 @@ Here is the state of that vision as of 24 October 2011 (after Theano release ...@@ -190,12 +190,10 @@ Here is the state of that vision as of 24 October 2011 (after Theano release
* Will provide better support for GPU on Windows and use an OpenCL backend on CPU. * Will provide better support for GPU on Windows and use an OpenCL backend on CPU.
* Loops work, but not all related optimizations are currently done. * Loops work, but not all related optimizations are currently done.
* The cvm linker allows lazy evaluation. It works, but some work is still * The cvm linker allows lazy evaluation. It is the current default linker.
needed before enabling it by default.
* All tests pass with linker=cvm? * How to have `DebugMode` check it? Right now, DebugMode checks the computation non-lazily.
* How to have `DEBUG_MODE` check it? Right now, DebugMode checks the computation non-lazily. * The profiler used by cvm is less complete than `ProfileMode`.
* The profiler used by cvm is less complete than `PROFILE_MODE`.
* SIMD parallelism on the CPU comes from the compiler. * SIMD parallelism on the CPU comes from the compiler.
* Multi-core parallelism is only supported for gemv and gemm, and only * Multi-core parallelism is only supported for gemv and gemm, and only
......
...@@ -29,7 +29,7 @@ DebugMode can be used as follows: ...@@ -29,7 +29,7 @@ DebugMode can be used as follows:
x = tensor.dvector('x') x = tensor.dvector('x')
f = theano.function([x], 10*x, mode='DEBUG_MODE') f = theano.function([x], 10*x, mode='DebugMode')
f(5) f(5)
f(0) f(0)
...@@ -42,7 +42,7 @@ It can also be used by passing a DebugMode instance as the mode, as in ...@@ -42,7 +42,7 @@ It can also be used by passing a DebugMode instance as the mode, as in
If any problem is detected, DebugMode will raise an exception according to If any problem is detected, DebugMode will raise an exception according to
what went wrong, either at call time (``f(5)``) or compile time ( what went wrong, either at call time (``f(5)``) or compile time (
``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions ``f = theano.function(x, 10*x, mode='DebugMode')``). These exceptions
should *not* be ignored; talk to your local Theano guru or email the should *not* be ignored; talk to your local Theano guru or email the
users list if you cannot make the exception go away. users list if you cannot make the exception go away.
...@@ -51,7 +51,7 @@ In the example above, there is no way to guarantee that a future call to say, ...@@ -51,7 +51,7 @@ In the example above, there is no way to guarantee that a future call to say,
``f(-1)`` won't cause a problem. DebugMode is not a silver bullet. ``f(-1)`` won't cause a problem. DebugMode is not a silver bullet.
If you instantiate DebugMode using the constructor ``compile.DebugMode`` If you instantiate DebugMode using the constructor ``compile.DebugMode``
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via rather than the keyword ``DebugMode`` you can configure its behaviour via
constructor arguments. constructor arguments.
Reference Reference
...@@ -133,7 +133,7 @@ Reference ...@@ -133,7 +133,7 @@ Reference
The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``) The keyword version of DebugMode (which you get by using ``mode='DebugMode``)
is quite strict, and can raise several different Exception types. is quite strict, and can raise several different Exception types.
There following are DebugMode exceptions you might encounter: There following are DebugMode exceptions you might encounter:
...@@ -200,7 +200,7 @@ There following are DebugMode exceptions you might encounter: ...@@ -200,7 +200,7 @@ There following are DebugMode exceptions you might encounter:
in the same order when run several times in a row. This can happen if any in the same order when run several times in a row. This can happen if any
steps are ordered by ``id(object)`` somehow, such as via the default object steps are ordered by ``id(object)`` somehow, such as via the default object
hash function. A Stochastic optimization invalidates the pattern of work hash function. A Stochastic optimization invalidates the pattern of work
whereby we debug in DEBUG_MODE and then run the full-size jobs in FAST_RUN. whereby we debug in DebugMode and then run the full-size jobs in FAST_RUN.
.. class:: InvalidValueError(DebugModeError) .. class:: InvalidValueError(DebugModeError)
......
.. _libdoc_compile_mode:
====================================== ======================================
:mod:`mode` -- controlling compilation :mod:`mode` -- controlling compilation
====================================== ======================================
...@@ -17,9 +20,10 @@ Theano defines the following modes by name: ...@@ -17,9 +20,10 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations. - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible. - ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python - ``'DebugMode'``: A mode for debuging. See :ref:`DebugMode <debugmode>` for details.
implementations. This mode can take much longer than the other modes, - ``'ProfileMode'``: A mode for profiling. See :ref:`ProfileMode <profilemode>` for details.
but can identify many kinds of problems. - ``'DEBUG_MODE'``: Deprecated. Use the string DebugMode.
- ``'PROFILE_MODE'``: Deprecated. Use the string ProfileMode.
The default mode is typically ``FAST_RUN``, but it can be controlled via the The default mode is typically ``FAST_RUN``, but it can be controlled via the
configuration variable :attr:`config.mode`, which can be configuration variable :attr:`config.mode`, which can be
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
Guide Guide
===== =====
The config module contains many attributes that modify Theano's behavior. Many of these The config module contains many ``attributes`` that modify Theano's behavior. Many of these
attributes are consulted during the import of the ``theano`` module and many are assumed to be attributes are consulted during the import of the ``theano`` module and many are assumed to be
read-only. read-only.
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
fgraph fg
toolbox toolbox
type type
......
...@@ -12,18 +12,18 @@ ...@@ -12,18 +12,18 @@
Guide Guide
====== ======
Symbolic printing: the Print() Op Printing during execution
---------------------------------- -------------------------
Intermediate values in a computation cannot be printed in Intermediate values in a computation cannot be printed in
the normal python way with the print statement, because Theano has no *statements*. the normal python way with the print statement, because Theano has no *statements*.
Instead there is the `Print` Op. Instead there is the :class:`Print` Op.
>>> x = T.dvector() >>> x = T.dvector()
>>> hello_world_op = Print('hello world') >>> hello_world_op = printing.Print('hello world')
>>> printed_x = hello_world_op(x) >>> printed_x = hello_world_op(x)
>>> f = function([x], printed_x) >>> f = function([x], printed_x)
>>> f([1,2,3]) >>> f([1, 2, 3])
>>> # output: "hello world __str__ = [ 1. 2. 3.]" >>> # output: "hello world __str__ = [ 1. 2. 3.]"
If you print more than one thing in a function like `f`, they will not If you print more than one thing in a function like `f`, they will not
...@@ -39,15 +39,15 @@ Printing graphs ...@@ -39,15 +39,15 @@ Printing graphs
--------------- ---------------
Theano provides two functions (:func:`theano.pp` and Theano provides two functions (:func:`theano.pp` and
:func:`theano.debugprint`) to print a graph to the terminal before or after :func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways: compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose. :func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a png image of the function. Theano also provides :func:`theano.printing.pydotprint` that creates a png image of the function.
1) The first is :func:`theano.pp`. 1) The first is :func:`theano.pp`.
>>> x = T.dscalar('x') >>> x = T.dscalar('x')
>>> y = x**2 >>> y = x ** 2
>>> gy = T.grad(y, x) >>> gy = T.grad(y, x)
>>> pp(gy) # print out the gradient prior to optimization >>> pp(gy) # print out the gradient prior to optimization
'((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))' '((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))'
...@@ -71,56 +71,63 @@ iteration number or other kinds of information in the name. ...@@ -71,56 +71,63 @@ iteration number or other kinds of information in the name.
To make graphs legible, :func:`pp` hides some Ops that are actually in the graph. For example, To make graphs legible, :func:`pp` hides some Ops that are actually in the graph. For example,
automatic DimShuffles are not shown. automatic DimShuffles are not shown.
2) The second function to print a graph is :func:`theano.printing.debugprint(variable_or_function, depth=-1)`
2) The second function to print a graph is :func:`theano.printing.debugprint`
>>> theano.printing.debugprint(f.maker.fgraph.outputs[0]) >>> theano.printing.debugprint(f.maker.fgraph.outputs[0])
Elemwise{mul,no_inplace} 46950805397392 Elemwise{mul,no_inplace} [@A] ''
2.0 46950805310800 |TensorConstant{2.0} [@B]
x 46950804895504 |x [@C]
Each line printed represents a Variable in the graph. Each line printed represents a Variable in the graph.
The line `` x 46950804895504`` means the variable named 'x' at memory The line ``|x [@C`` means the variable named ``x`` with debugprint identifier
location 46950804895504. If you accidentally have two variables called 'x' in [@C] is an input of the Elemwise. If you accidentally have two variables called ``x`` in
your graph, their different memory locations will be your clue. your graph, their different debugprint identifier will be your clue.
The line `` 2.0 46950805310800`` means that there is a constant 2.0 at the The line ``|TensorConstant{2.0} [@B]`` means that there is a constant 2.0
given memory location. wit this debugprint identifier.
The line `` Elemwise{mul,no_inplace} 46950805397392`` is indented less than The line ``Elemwise{mul,no_inplace} [@A] ''`` is indented less than
the other ones, because it means there is a variable computed by multiplying the other ones, because it means there is a variable computed by multiplying
the other (more indented) ones together. the other (more indented) ones together.
The ``|`` symbol are just there to help read big graph. The group
together inputs to a node.
Sometimes, you'll see a Variable but not the inputs underneath. That can Sometimes, you'll see a Variable but not the inputs underneath. That can
happen when that Variable has already been printed. Where else has it been happen when that Variable has already been printed. Where else has it been
printed? Look for the memory address using the Find feature of your text printed? Look for debugprint identifier using the Find feature of your text
editor. editor.
>>> theano.printing.debugprint(gy) >>> theano.printing.debugprint(gy)
Elemwise{mul} 46950804894224 Elemwise{mul} [@A] ''
Elemwise{mul} 46950804735120 |Elemwise{mul} [@B] ''
Elemwise{second,no_inplace} 46950804626128 | |Elemwise{second,no_inplace} [@C] ''
Elemwise{pow,no_inplace} 46950804625040 | | |Elemwise{pow,no_inplace} [@D] ''
x 46950658736720 | | | |x [@E]
2 46950804039760 | | | |TensorConstant{2} [@F]
1.0 46950804625488 | | |TensorConstant{1.0} [@G]
2 46950804039760 | |TensorConstant{2} [@F]
Elemwise{pow} 46950804737616 |Elemwise{pow} [@H] ''
x 46950658736720 |x [@E]
Elemwise{sub} 46950804736720 |Elemwise{sub} [@I] ''
2 46950804039760 |TensorConstant{2} [@F]
InplaceDimShuffle{} 46950804736016 |InplaceDimShuffle{} [@J] ''
1 46950804735760 |TensorConstant{1} [@K]
>>> theano.printing.debugprint(gy, depth=2) >>> theano.printing.debugprint(gy, depth=2)
Elemwise{mul} 46950804894224 Elemwise{mul} [@A] ''
Elemwise{mul} 46950804735120 |Elemwise{mul} [@B] ''
Elemwise{pow} 46950804737616 |Elemwise{pow} [@C] ''
If the depth parameter is provided, it limits the nuber of levels that are If the depth parameter is provided, it limits the nuber of levels that are
shown. shown.
3) The function :func:`theano.printing.pydotprint(fct, outfile=SOME_DEFAULT_VALUE)` will print a compiled theano function to a png file. 3) The function :func:`theano.printing.pydotprint` will print a compiled theano function to a png file.
In the image, Apply nodes (the applications of ops) are shown as boxes and variables are shown as ovals. In the image, Apply nodes (the applications of ops) are shown as boxes and variables are shown as ovals.
The number at the end of each label indicates graph position. The number at the end of each label indicates graph position.
...@@ -170,10 +177,13 @@ Reference ...@@ -170,10 +177,13 @@ Reference
running the function will print the value that `x` takes in the graph. running the function will print the value that `x` takes in the graph.
.. function:: theano.printing.pp(*args) .. autofunction:: theano.printing.debugprint
TODO .. function:: theano.pp(*args)
.. autofunction:: theano.printing.debugprint Just a shortcut to :func:`theano.printing.pp`
.. autofunction:: theano.printing.pp(*args)
.. autofunction:: theano.printing.pydotprint
...@@ -136,19 +136,35 @@ arange must have its length specified at creation time. ...@@ -136,19 +136,35 @@ arange must have its length specified at creation time.
Simple accumulation into a scalar, ditching lamba Simple accumulation into a scalar, ditching lamba
------------------------------------------------- -------------------------------------------------
This should be fairly self-explanatory. Although this example would seem almost self-explanatory, it stresses a
pitfall to be careful of: the initial output state that is supplied, that is
``output_info``, must be of a **shape similar to that of the output variable**
generated at each iteration and moreover, it **must not involve an implicit
downcast** of the latter.
.. code-block:: python .. code-block:: python
import numpy as np
import theano
import theano.tensor as T
up_to = T.iscalar("up_to") up_to = T.iscalar("up_to")
# define a named function, rather than using lambda # define a named function, rather than using lambda
def accumulate_by_adding(arange_val, sum_to_date): def accumulate_by_adding(arange_val, sum_to_date):
return sum_to_date + arange_val return sum_to_date + arange_val
seq = T.arange(up_to)
# An unauthorized implicit downcast from the dtype of 'seq', to that of
# 'T.as_tensor_variable(0)' which is of dtype 'int8' by default would occur
# if this instruction were to be used instead of the next one:
# outputs_info = T.as_tensor_variable(0)
outputs_info = T.as_tensor_variable(np.asarray(0, seq.dtype))
scan_result, scan_updates = theano.scan(fn=accumulate_by_adding, scan_result, scan_updates = theano.scan(fn=accumulate_by_adding,
outputs_info=T.as_tensor_variable(0), outputs_info=outputs_info,
sequences=T.arange(up_to)) sequences=seq)
triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result) triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result)
# test # test
...@@ -157,7 +173,6 @@ This should be fairly self-explanatory. ...@@ -157,7 +173,6 @@ This should be fairly self-explanatory.
print [n * (n + 1) // 2 for n in xrange(some_num)] print [n * (n + 1) // 2 for n in xrange(some_num)]
Another simple example Another simple example
---------------------- ----------------------
......
.. currentmodule:: tensor .. currentmodule:: tensor
.. _libdoc_basic_tensor:
=========================== ===========================
Basic Tensor Functionality Basic Tensor Functionality
=========================== ===========================
...@@ -532,7 +534,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`. ...@@ -532,7 +534,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.
.. function:: shape_padright(x,n_ones = 1) .. function:: shape_padright(x, n_ones=1)
Reshape `x` by right padding the shape with `n_ones` 1s. Note that all Reshape `x` by right padding the shape with `n_ones` 1s. Note that all
this new dimension will be broadcastable. To make them non-broadcastable this new dimension will be broadcastable. To make them non-broadcastable
...@@ -597,7 +599,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`. ...@@ -597,7 +599,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.
Create a matrix by filling the shape of `a` with `b` Create a matrix by filling the shape of `a` with `b`
.. function:: eye(n, m = None, k = 0, dtype=theano.config.floatX) .. function:: eye(n, m=None, k=0, dtype=theano.config.floatX)
:param n: number of rows in output (value or theano scalar) :param n: number of rows in output (value or theano scalar)
:param m: number of columns in output (value or theano scalar) :param m: number of columns in output (value or theano scalar)
...@@ -1065,11 +1067,11 @@ Mathematical ...@@ -1065,11 +1067,11 @@ Mathematical
Returns a variable representing the exponential of a, ie e^a. Returns a variable representing the exponential of a, ie e^a.
.. function:: maximum(a,b) .. function:: maximum(a, b)
Returns a variable representing the maximum element by element of a and b Returns a variable representing the maximum element by element of a and b
.. function:: minimum(a,b) .. function:: minimum(a, b)
Returns a variable representing the minimum element by element of a and b Returns a variable representing the minimum element by element of a and b
......
.. _adding: .. _adding:
======================================== ====================
Baby steps - Adding two numbers together Baby Steps - Algebra
======================================== ====================
Adding two Scalars
Adding two scalars
================== ==================
So, to get us started with Theano and get a feel of what we're working with, To get us started with Theano and get a feel of what we're working with,
let's make a simple function: add two numbers together. Here is how you do let's make a simple function: add two numbers together. Here is how you do
it: it:
...@@ -34,12 +33,12 @@ Let's break this down into several steps. The first step is to define ...@@ -34,12 +33,12 @@ Let's break this down into several steps. The first step is to define
two symbols (*Variables*) representing the quantities that you want two symbols (*Variables*) representing the quantities that you want
to add. Note that from now on, we will use the term to add. Note that from now on, we will use the term
*Variable* to mean "symbol" (in other words, *Variable* to mean "symbol" (in other words,
``x``, ``y``, ``z`` are all *Variable* objects). The output of the function *x*, *y*, *z* are all *Variable* objects). The output of the function
``f`` is a ``numpy.ndarray`` with zero dimensions. *f* is a ``numpy.ndarray`` with zero dimensions.
If you are following along and typing into an interpreter, you may have If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function`` noticed that there was a slight delay in executing the ``function``
instruction. Behind the scenes, ``f`` was being compiled into C code. instruction. Behind the scene, *f* was being compiled into C code.
.. note: .. note:
...@@ -52,12 +51,10 @@ instruction. Behind the scenes, ``f`` was being compiled into C code. ...@@ -52,12 +51,10 @@ instruction. Behind the scenes, ``f`` was being compiled into C code.
>>> x = theano.tensor.ivector() >>> x = theano.tensor.ivector()
>>> y = -x >>> y = -x
``x`` and ``y`` are both Variables, i.e. instances of the *x* and *y* are both Variables, i.e. instances of the
``theano.gof.graph.Variable`` class. The ``theano.gof.graph.Variable`` class. The
type of both ``x`` and ``y`` is ``theano.tensor.ivector``. type of both *x* and *y* is ``theano.tensor.ivector``.
-------------------------------------------
**Step 1** **Step 1**
...@@ -68,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar`` ...@@ -68,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar``
is the type we assign to "0-dimensional arrays (`scalar`) of doubles is the type we assign to "0-dimensional arrays (`scalar`) of doubles
(`d`)". It is a Theano :ref:`type`. (`d`)". It is a Theano :ref:`type`.
``dscalar`` is not a class. Therefore, neither ``x`` nor ``y`` ``dscalar`` is not a class. Therefore, neither *x* nor *y*
are actually instances of ``dscalar``. They are instances of are actually instances of ``dscalar``. They are instances of
:class:`TensorVariable`. ``x`` and ``y`` :class:`TensorVariable`. *x* and *y*
are, however, assigned the theano Type ``dscalar`` in their ``type`` are, however, assigned the theano Type ``dscalar`` in their ``type``
field, as you can see here: field, as you can see here:
...@@ -83,52 +80,49 @@ TensorType(float64, scalar) ...@@ -83,52 +80,49 @@ TensorType(float64, scalar)
>>> x.type is T.dscalar >>> x.type is T.dscalar
True True
You can learn more about the structures in Theano in :ref:`graphstructures`.
By calling ``T.dscalar`` with a string argument, you create a By calling ``T.dscalar`` with a string argument, you create a
*Variable* representing a floating-point scalar quantity with the *Variable* representing a floating-point scalar quantity with the
given name. If you provide no argument, the symbol will be unnamed. Names given name. If you provide no argument, the symbol will be unnamed. Names
are not required, but they can help debugging. are not required, but they can help debugging.
More will be said in a moment regarding Theano's inner structure. You
could also learn more by looking into :ref:`graphstructures`.
-------------------------------------------
**Step 2** **Step 2**
The second step is to combine ``x`` and ``y`` into their sum ``z``: The second step is to combine *x* and *y* into their sum *z*:
>>> z = x + y >>> z = x + y
``z`` is yet another *Variable* which represents the addition of *z* is yet another *Variable* which represents the addition of
``x`` and ``y``. You can use the :ref:`pp <libdoc_printing>` *x* and *y*. You can use the :ref:`pp <libdoc_printing>`
function to pretty-print out the computation associated to ``z``. function to pretty-print out the computation associated to *z*.
>>> print pp(z) >>> print pp(z)
(x + y) (x + y)
-------------------------------------------
**Step 3** **Step 3**
The last step is to create a function taking ``x`` and ``y`` as inputs The last step is to create a function taking *x* and *y* as inputs
and giving ``z`` as output: and giving *z* as output:
>>> f = function([x, y], z) >>> f = function([x, y], z)
The first argument to :func:`function <function.function>` is a list of Variables The first argument to :func:`function <function.function>` is a list of Variables
that will be provided as inputs to the function. The second argument that will be provided as inputs to the function. The second argument
is a single Variable *or* a list of Variables. For either case, the second is a single Variable *or* a list of Variables. For either case, the second
argument is what we want to see as output when we apply the function. argument is what we want to see as output when we apply the function. *f* may
then be used like a normal Python function.
``f`` may then be used like a normal Python function.
Adding two Matrices
Adding two matrices
=================== ===================
You might already have guessed how to do this. Indeed, the only change You might already have guessed how to do this. Indeed, the only change
from the previous example is that you need to instantiate ``x`` and from the previous example is that you need to instantiate *x* and
``y`` using the matrix Types: *y* using the matrix Types:
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_adding.test_adding_2 .. theano/tests/test_tutorial.py:T_adding.test_adding_2
...@@ -138,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and ...@@ -138,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and
>>> z = x + y >>> z = x + y
>>> f = function([x, y], z) >>> f = function([x, y], z)
``dmatrix`` is the Type for matrices of doubles. And then we can use ``dmatrix`` is the Type for matrices of doubles. Then we can use
our new function on 2D arrays: our new function on 2D arrays:
>>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]]) >>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])
array([[ 11., 22.], array([[ 11., 22.],
[ 33., 44.]]) [ 33., 44.]])
The variable is a numpy array. We can also use numpy arrays directly as The variable is a NumPy array. We can also use NumPy arrays directly as
inputs: inputs:
>>> import numpy >>> import numpy
...@@ -159,18 +153,36 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`. ...@@ -159,18 +153,36 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`.
The following types are available: The following types are available:
* **byte**: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4 * **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``
* **32-bit integers**: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4 * **16-bit integers**: ``wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4``
* **64-bit integers**: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4 * **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``
* **float**: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4 * **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``
* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4 * **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``
* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4 * **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``
* **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``
The previous list is not exhaustive. A guide to all types compatible The previous list is not exhaustive and a guide to all types compatible
with numpy arrays may be found :ref:`here <libdoc_tensor_creation>`. with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.
.. note:: .. note::
You, the user---not the system architecture---have to choose whether your You, the user---not the system architecture---have to choose whether your
program will use 32- or 64-bit integers (``i`` prefix vs. the ``l`` prefix) program will use 32- or 64-bit integers (``i`` prefix vs. the ``l`` prefix)
and floats (``f`` prefix vs. the ``d`` prefix). and floats (``f`` prefix vs. the ``d`` prefix).
-------------------------------------------
**Exercise**
.. code-block:: python
import theano
a = theano.tensor.vector() # declare variable
out = a + a ** 10 # build symbolic expression
f = theano.function([a], out) # compile function
print f([0, 1, 2]) # prints `array([0, 2, 1026])`
Modify and execute this code to compute this expression: a ** 2 + b ** 2 + 2 * a * b.
:download:`Solution<adding_solution_1.py>`
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Baby Steps - Algebra'
import theano
a = theano.tensor.vector() # declare variable
b = theano.tensor.vector() # declare variable
out = a ** 2 + b ** 2 + 2 * a * b # build symbolic expression
f = theano.function([a, b], out) # compile function
print f([1, 2], [4, 5]) # prints [ 25. 49.]
...@@ -5,53 +5,52 @@ ...@@ -5,53 +5,52 @@
Understanding Memory Aliasing for Speed and Correctness Understanding Memory Aliasing for Speed and Correctness
======================================================= =======================================================
The aggressive reuse of memory is one of the ways Theano makes code fast, and The aggressive reuse of memory is one of the ways through which Theano makes code fast, and
it's important for the correctness and speed of your program that you understand it is important for the correctness and speed of your program that you understand
which buffers Theano might alias to which others. how Theano might alias buffers.
This file describes the principles for how Theano treats memory, and explains This section describes the principles based on which Theano handles memory, and explains
when you might want to change the default behaviour of some functions and when you might want to alter the default behaviour of some functions and
methods for faster performance. methods for faster performance.
The memory model: 2 spaces The Memory Model: Two Spaces
========================== ============================
There are some simple principles that guide Theano's treatment of memory. The There are some simple principles that guide Theano's handling of memory. The
main idea is that there is a pool of memory managed by Theano, and Theano tracks main idea is that there is a pool of memory managed by Theano, and Theano tracks
changes to values in that pool. changes to values in that pool.
1. Theano manages its own memory space, which typically does not overlap with * Theano manages its own memory space, which typically does not overlap with
the memory of normal python variables that non-Theano code creates. the memory of normal Python variables that non-Theano code creates.
1. Theano Functions only modify buffers that are in Theano's memory space. * Theano functions only modify buffers that are in Theano's memory space.
1. Theano's memory space includes the buffers allocated to store shared * Theano's memory space includes the buffers allocated to store ``shared``
variables and the temporaries used to evaluate Functions. variables and the temporaries used to evaluate functions.
1. Physically, Theano's memory space may be spread across the host, a GPU * Physically, Theano's memory space may be spread across the host, a GPU
device(s), and in the future may even include objects on a remote machine. device(s), and in the future may even include objects on a remote machine.
1. The memory allocated for a shared variable buffer is unique: it is never * The memory allocated for a ``shared`` variable buffer is unique: it is never
aliased to another shared variable. aliased to another ``shared`` variable.
1. Theano's managed memory is constant while Theano Functions are not running * Theano's managed memory is constant while Theano functions are not running
and Theano library code is not running. and Theano's library code is not running.
1. The default behaviour of Function is to return user-space values for * The default behaviour of a function is to return user-space values for
outputs, and to expect user-space values for inputs. outputs, and to expect user-space values for inputs.
The distinction between Theano-managed memory and user-managed memory can be The distinction between Theano-managed memory and user-managed memory can be
broken down by some Theano functions (e.g. shared, get_value and the broken down by some Theano functions (e.g. ``shared``, ``get_value`` and the
constructors for In and Out) by using constructors for ``In`` and ``Out``) by using a ``borrow=True`` flag.
a ``borrow=True`` flag. This can make those methods faster (by avoiding copy This can make those methods faster (by avoiding copy operations) at the expense
operations) at the expense of risking subtle bugs in the overall program (by of risking subtle bugs in the overall program (by aliasing memory).
aliasing memory).
The rest of this section is aimed at helping you to understand when it is safe The rest of this section is aimed at helping you to understand when it is safe
to use the ``borrow=True`` argument and reap the benefit of faster code. to use the ``borrow=True`` argument and reap the benefits of faster code.
Borrowing when creating shared variables Borrowing when Creating Shared Variables
======================================== ========================================
A ``borrow`` argument can be provided to the shared-variable constructor. A ``borrow`` argument can be provided to the shared-variable constructor.
...@@ -69,9 +68,9 @@ A ``borrow`` argument can be provided to the shared-variable constructor. ...@@ -69,9 +68,9 @@ A ``borrow`` argument can be provided to the shared-variable constructor.
s_false = theano.shared(np_array, borrow=False) s_false = theano.shared(np_array, borrow=False)
s_true = theano.shared(np_array, borrow=True) s_true = theano.shared(np_array, borrow=True)
By default (``s_default``) and when explicitly setting ``borrow=False``, the By default (*s_default*) and when explicitly setting ``borrow=False``, the
shared variable we construct gets a [deep] copy of ``np_array``. So changes we shared variable we construct gets a [deep] copy of *np_array*. So changes we
subsequently make to ``np_array`` have no effect on our shared variable. subsequently make to *np_array* have no effect on our shared variable.
.. code-block:: python .. code-block:: python
...@@ -82,40 +81,40 @@ subsequently make to ``np_array`` have no effect on our shared variable. ...@@ -82,40 +81,40 @@ subsequently make to ``np_array`` have no effect on our shared variable.
s_true.get_value() # -> array([2.0, 2.0]) s_true.get_value() # -> array([2.0, 2.0])
If we are running this with the CPU as the device, If we are running this with the CPU as the device,
then changes we make to np_array *right away* will show up in then changes we make to *np_array* *right away* will show up in
``s_true.get_value()`` ``s_true.get_value()``
because numpy arrays are mutable, and ``s_true`` is using the ``np_array`` because NumPy arrays are mutable, and *s_true* is using the *np_array*
object as it's internal buffer. object as it's internal buffer.
However, this aliasing of ``np_array`` and ``s_true`` is not guaranteed to occur, However, this aliasing of *np_array* and *s_true* is not guaranteed to occur,
and may occur only temporarily even if it occurs at all. and may occur only temporarily even if it occurs at all.
It is not guaranteed to occur because if Theano is using a GPU device, then the It is not guaranteed to occur because if Theano is using a GPU device, then the
borrow flag has no effect. ``borrow`` flag has no effect. It may occur only temporarily because
It may occur only temporarily because if we call a Theano function that updates the value of *s_true* the aliasing
if we call a Theano function that updates the value of ``s_true`` the aliasing
relationship *may* or *may not* be broken (the function is allowed to relationship *may* or *may not* be broken (the function is allowed to
update the shared variable by modifying its buffer, which will preserve update the ``shared`` variable by modifying its buffer, which will preserve
the aliasing, or by changing which buffer the variable points to, which the aliasing, or by changing which buffer the variable points to, which
will terminate the aliasing). will terminate the aliasing).
*Take home message:* *Take home message:*
It is safe practice (and a good idea) to use ``borrow=True`` in a shared It is a safe practice (and a good idea) to use ``borrow=True`` in a ``shared``
variable constructor when the shared variable stands for a large object (in variable constructor when the ``shared`` variable stands for a large object (in
terms of memory footprint) and you do not want to create copies of it in terms of memory footprint) and you do not want to create copies of it in
memory. memory.
It is not a reliable technique to use ``borrow=True`` to modify shared variables It is not a reliable technique to use ``borrow=True`` to modify ``shared`` variables
by side-effect, because with some devices (e.g. GPU devices) this technique will through side-effect, because with some devices (e.g. GPU devices) this technique will
not work. not work.
Borrowing when accessing value of shared variables Borrowing when Accessing Value of Shared Variables
================================================== ==================================================
Retrieving Retrieving
---------- ----------
A ``borrow`` argument can also be used to control how a shared variable's value is retrieved. A ``borrow`` argument can also be used to control how a ``shared`` variable's value is
retrieved.
.. If you modify this code, also change : .. If you modify this code, also change :
...@@ -136,16 +135,16 @@ When ``borrow=True`` is passed to ``get_value``, it means that the return value ...@@ -136,16 +135,16 @@ When ``borrow=True`` is passed to ``get_value``, it means that the return value
But both of these calls might create copies of the internal memory. But both of these calls might create copies of the internal memory.
The reason that ``borrow=True`` might still make a copy is that the internal The reason that ``borrow=True`` might still make a copy is that the internal
representation of a shared variable might not be what you expect. When you representation of a ``shared`` variable might not be what you expect. When you
create a shared variable by passing a numpy array for example, then ``get_value()`` create a ``shared`` variable by passing a NumPy array for example, then ``get_value()``
must return a numpy array too. That's how Theano can make the GPU use must return a NumPy array too. That's how Theano can make the GPU use
transparent. But when you are using a GPU (or in future perhaps a remote machine), then the numpy.ndarray transparent. But when you are using a GPU (or in the future perhaps a remote machine),
is not the internal representation of your data. then the numpy.ndarray is not the internal representation of your data.
If you really want Theano to return its internal representation *and never copy it* If you really want Theano to return its internal representation *and never copy it*
then you should use the ``return_internal_type=True`` argument to then you should use the ``return_internal_type=True`` argument to
``get_value``. It will never cast the internal object (always return in ``get_value``. It will never cast the internal object (always return in
constant time), but might return various datatypes depending on contextual constant time), but might return various datatypes depending on contextual
factors (e.g. the compute device, the dtype of the numpy array). factors (e.g. the compute device, the dtype of the NumPy array).
.. code-block:: python .. code-block:: python
...@@ -156,28 +155,28 @@ It is possible to use ``borrow=False`` in conjunction with ...@@ -156,28 +155,28 @@ It is possible to use ``borrow=False`` in conjunction with
This is primarily for internal debugging, not for typical use. This is primarily for internal debugging, not for typical use.
For the transparent use of different type of optimization Theano can make, For the transparent use of different type of optimization Theano can make,
there is the policy that get_value() always return by default the same object type there is the policy that ``get_value()`` always return by default the same object type
it received when the shared variable was created. So if you created manually data on it received when the ``shared`` variable was created. So if you created manually data on
the gpu and create a shared variable on the gpu with this data, get_value will always the gpu and create a ``shared`` variable on the gpu with this data, ``get_value`` will always
return gpu data even when return_internal_type=False. return gpu data even when ``return_internal_type=False``.
*Take home message:* *Take home message:*
It is safe (and sometimes much faster) to use ``get_value(borrow=True)`` when It is safe (and sometimes much faster) to use ``get_value(borrow=True)`` when
your code does not modify the return value. *Do not use this to modify a shared your code does not modify the return value. *Do not use this to modify a ``shared``
variable by side-effect* because it will make your code device-dependent. variable by side-effect* because it will make your code device-dependent.
Modification of GPU variables by this sort of side-effect is impossible. Modification of GPU variables through this sort of side-effect is impossible.
Assigning Assigning
--------- ---------
Shared variables also have a ``set_value`` method that can accept an optional ``borrow=True`` argument. ``Shared`` variables also have a ``set_value`` method that can accept an optional
The semantics are similar to those of creating a new shared variable - ``borrow=True`` argument. The semantics are similar to those of creating a new
``borrow=False`` is the default and ``borrow=True`` means that Theano *may* ``shared`` variable - ``borrow=False`` is the default and ``borrow=True`` means
reuse the buffer you provide as the internal storage for the variable. that Theano *may* reuse the buffer you provide as the internal storage for the variable.
A standard pattern for manually updating the value of a shared variable is as A standard pattern for manually updating the value of a ``shared`` variable is as
follows. follows:
.. code-block:: python .. code-block:: python
...@@ -185,60 +184,44 @@ follows. ...@@ -185,60 +184,44 @@ follows.
some_inplace_fn(s.get_value(borrow=True)), some_inplace_fn(s.get_value(borrow=True)),
borrow=True) borrow=True)
This pattern works regardless of the compute device, and when the compute device This pattern works regardless of the computing device, and when the latter
makes it possible to expose Theano's internal variables without a copy, then it makes it possible to expose Theano's internal variables without a copy, then it
goes as fast as an in-place update. proceeds as fast as an in-place update.
When shared variables are allocated on the GPU, the transfers to and from GPU device memory can When ``shared`` variables are allocated on the GPU, the transfers to and from the GPU device memory can
be costly. Here are a few tips to ensure fast and efficient use of GPU memory and bandwidth: be costly. Here are a few tips to ensure fast and efficient use of GPU memory and bandwidth:
* Prior to Theano 0.3.1, set_value did not work in-place on the GPU. This meant that sometimes, * Prior to Theano 0.3.1, ``set_value`` did not work in-place on the GPU. This meant that, sometimes,
GPU memory for the new value would be allocated before the old memory was released. If you're GPU memory for the new value would be allocated before the old memory was released. If you're
running near the limits of GPU memory, this could cause you to run out of GPU memory running near the limits of GPU memory, this could cause you to run out of GPU memory
unnecessarily. *Solution*: update to a newer version of Theano. unnecessarily.
* If you are going to swap several chunks of data in and out of a shared variable repeatedly, *Solution*: update to a newer version of Theano.
* If you are going to swap several chunks of data in and out of a ``shared`` variable repeatedly,
you will want to reuse the memory that you allocated the first time if possible - it is both you will want to reuse the memory that you allocated the first time if possible - it is both
faster and more memory efficient. faster and more memory efficient.
*Solution*: upgrade to a recent version of Theano (>0.3.0) and consider padding your source *Solution*: upgrade to a recent version of Theano (>0.3.0) and consider padding your source
data to make sure that every chunk is the same size. data to make sure that every chunk is the same size.
* It is also worth mentioning that, current GPU copying routines support only contiguous memory. * It is also worth mentioning that, current GPU copying routines support only contiguous memory.
So Theano must make the ``value`` you provide ``c_contiguous`` prior to copying it. So Theano must make the value you provide *C-contiguous* prior to copying it.
This can require an extra copy of the data on the host. *Solution*: make sure that the value This can require an extra copy of the data on the host.
you assign to a CudaNdarraySharedVariable is *already* ``c_contiguous``.
(Further remarks on the current implementation of the GPU version of set_value() can be found
here: :ref:`libdoc_cuda_var`)
*Solution*: make sure that the value
you assign to a CudaNdarraySharedVariable is *already* *C-contiguous*.
Retrieving and assigning via the .value property (Further information on the current implementation of the GPU version of ``set_value()`` can be found
------------------------------------------------ here: :ref:`libdoc_cuda_var`)
Shared variables have a ``.value`` property that is connected to ``get_value``
and ``set_value``. The borrowing behaviour of the property is controlled by a
boolean configuration variable ``config.shared.value_borrows``, which currently
defaults to ``True``. If that variable is ``True`` then an assignment like ``s.value=v``
is equivalent to ``s.set_value(v, borrow=True)``, and a retrieval like ``print
s.value`` is equivalent to ``print s.get_value(borrow=True)``. Likewise,
if ``config.shared.value_borrows`` is ``False``, then the borrow parameter that the ``.value`` property
passes to ``set_value`` and ``get_value`` is ``False``.
The ``True`` default value of ``config.shared.value_borrows`` means that
aliasing can sometimes happen and sometimes not, which can be confusing.
Be aware that the default value may be changed to ``False`` sometime in the
not-to-distant future. This change will create more copies, and potentially slow
down code that accesses ``.value`` attributes inside tight loops. To avoid this
potential impact on your code, use the ``.get_value`` and ``.set_value`` methods
directly with appropriate flags.
Borrowing when constructing Function objects Borrowing when Constructing Function Objects
============================================ ============================================
A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects
that control how ``theano.function`` handles its arguments and return value[s]. that control how ``theano.function`` handles its argument[s] and return value[s].
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_aliasing.test_aliasing_3 .. theano/tests/test_tutorial.py:T_aliasing.test_aliasing_3
...@@ -248,7 +231,7 @@ that control how ``theano.function`` handles its arguments and return value[s]. ...@@ -248,7 +231,7 @@ that control how ``theano.function`` handles its arguments and return value[s].
import theano, theano.tensor import theano, theano.tensor
x = theano.tensor.matrix() x = theano.tensor.matrix()
y = 2*x y = 2 * x
f = theano.function([theano.In(x, borrow=True)], theano.Out(y, borrow=True)) f = theano.function([theano.In(x, borrow=True)], theano.Out(y, borrow=True))
Borrowing an input means that Theano will treat the argument you provide as if Borrowing an input means that Theano will treat the argument you provide as if
...@@ -259,28 +242,29 @@ course of evaluating that function (e.g. ``f``). ...@@ -259,28 +242,29 @@ course of evaluating that function (e.g. ``f``).
Borrowing an output means that Theano will not insist on allocating a fresh Borrowing an output means that Theano will not insist on allocating a fresh
output buffer every time you call the function. It will possibly reuse the same one as output buffer every time you call the function. It will possibly reuse the same one as
a previous call, and overwrite the old contents. Consequently, it may overwrite on a previous call, and overwrite the old content. Consequently, it may overwrite
old return values by side effect. old return values through side-effect.
Those return values may also be overwritten in Those return values may also be overwritten in
the course of evaluating *another compiled function* (for example, the output the course of evaluating *another compiled function* (for example, the output
may be aliased to a shared variable). So be careful to use a borrowed return may be aliased to a ``shared`` variable). So be careful to use a borrowed return
value right away before calling any more Theano functions. value right away before calling any more Theano functions.
The default is of course to *not borrow* internal results. The default is of course to *not borrow* internal results.
It is also possible to pass an ``return_internal_type=True`` flag to the ``Out`` It is also possible to pass a ``return_internal_type=True`` flag to the ``Out``
variable which has the same interpretation as the ``return_internal_type`` flag variable which has the same interpretation as the ``return_internal_type`` flag
to the shared variable's ``get_value`` function. Unlike ``get_value()``, the to the ``shared`` variable's ``get_value`` function. Unlike ``get_value()``, the
combination of ``return_internal_type=True`` and ``borrow=True`` arguments to combination of ``return_internal_type=True`` and ``borrow=True`` arguments to
``Out()`` are not guaranteed to avoid copying an output value. They are just ``Out()`` are not guaranteed to avoid copying an output value. They are just
hints that give more flexibility to the compilation and optimization of the hints that give more flexibility to the compilation and optimization of the
graph. graph.
*Take home message:* *Take home message:*
When an input ``x`` to a function is not needed after the function returns and you
When an input *x* to a function is not needed after the function returns and you
would like to make it available to Theano as additional workspace, then consider would like to make it available to Theano as additional workspace, then consider
marking it with ``In(x, borrow=True)``. It may make the function faster and marking it with ``In(x, borrow=True)``. It may make the function faster and
reduce its memory requirement. reduce its memory requirement.
When a return value ``y`` is large (in terms of memory footprint), and you only need to read from it once, right When a return value *y* is large (in terms of memory footprint), and you only need to read from it once, right
away when it's returned, then consider marking it with an ``Out(y, away when it's returned, then consider marking it with an ``Out(y,
borrow=True)``. borrow=True)``.
...@@ -4,53 +4,56 @@ ...@@ -4,53 +4,56 @@
Conditions Conditions
========== ==========
**IfElse vs switch** IfElse vs Switch
================
- Build condition over symbolic variables.
- IfElse Op takes a `boolean` condition and two variables to compute as input. - Both ops build a condition over symbolic variables.
- Switch take a `tensor` as condition and two variables to compute as input. - ``IfElse`` takes a *boolean* condition and two variables as inputs.
- Switch is an elementwise operation. It is more general than IfElse. - ``Switch`` takes a *tensor* as condition and two variables as inputs.
- While Switch Op evaluates both 'output' variables, IfElse Op is lazy and only ``switch`` is an elementwise operation and is thus more general than ``ifelse``.
evaluates one variable respect to the condition. - Whereas ``switch`` evaluates both *output* variables, ``ifelse`` is lazy and only
evaluates one variable with respect to the condition.
**Example** **Example**
.. code-block:: python .. code-block:: python
from theano import tensor as T from theano import tensor as T
from theano.ifelse import ifelse from theano.ifelse import ifelse
import theano, time, numpy import theano, time, numpy
a,b = T.scalars('a','b') a,b = T.scalars('a', 'b')
x,y = T.matrices('x','y') x,y = T.matrices('x', 'y')
z_switch = T.switch(T.lt(a,b), T.mean(x), T.mean(y)) z_switch = T.switch(T.lt(a, b), T.mean(x), T.mean(y))
z_lazy = ifelse(T.lt(a,b), T.mean(x), T.mean(y)) z_lazy = ifelse(T.lt(a, b), T.mean(x), T.mean(y))
f_switch = theano.function([a,b,x,y], z_switch, f_switch = theano.function([a, b, x, y], z_switch,
mode=theano.Mode(linker='vm')) mode=theano.Mode(linker='vm'))
f_lazyifelse = theano.function([a,b,x,y], z_lazy, f_lazyifelse = theano.function([a, b, x, y], z_lazy,
mode=theano.Mode(linker='vm')) mode=theano.Mode(linker='vm'))
val1 = 0. val1 = 0.
val2 = 1. val2 = 1.
big_mat1 = numpy.ones((10000,1000)) big_mat1 = numpy.ones((10000, 1000))
big_mat2 = numpy.ones((10000,1000)) big_mat2 = numpy.ones((10000, 1000))
n_times = 10 n_times = 10
tic = time.clock() tic = time.clock()
for i in xrange(n_times): for i in xrange(n_times):
f_switch(val1, val2, big_mat1, big_mat2) f_switch(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating both values %f sec'%(time.clock()-tic) print 'time spent evaluating both values %f sec' % (time.clock() - tic)
tic = time.clock() tic = time.clock()
for i in xrange(n_times): for i in xrange(n_times):
f_lazyifelse(val1, val2, big_mat1, big_mat2) f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic) print 'time spent evaluating one value %f sec' % (time.clock() - tic)
In this example, IfElse Op spend less time (about an half) than Switch In this example, the ``IfElse`` op spends less time (about half as much) than ``Switch``
since it computes only one variable instead of both. since it computes only one variable out of the two.
.. code-block:: python .. code-block:: python
...@@ -59,11 +62,10 @@ since it computes only one variable instead of both. ...@@ -59,11 +62,10 @@ since it computes only one variable instead of both.
time spent evaluating one value 0.3500 sec time spent evaluating one value 0.3500 sec
It is actually important to use ``linker='vm'`` or ``linker='cvm'``, Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
otherwise IfElse will compute both variables and take the same computation variables and take the same computation time as ``switch``. Although the linker
time as the Switch Op. The linker is not currently set by default to 'cvm' but is not currently set by default to ``cvm``, it will be in the near future.
it will be in a near future.
There is not an optimization to automatically change a switch with a There is no automatic optimization replacing a ``switch`` with a
broadcasted scalar to an ifelse, as this is not always the faster. See broadcasted scalar to an ``ifelse``, as this is not always faster. See
this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_. this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
...@@ -6,22 +6,23 @@ Debugging Theano: FAQ and Troubleshooting ...@@ -6,22 +6,23 @@ Debugging Theano: FAQ and Troubleshooting
========================================= =========================================
There are many kinds of bugs that might come up in a computer program. There are many kinds of bugs that might come up in a computer program.
This page is structured as an FAQ. It should provide recipes to tackle common This page is structured as a FAQ. It provides recipes to tackle common
problems, and introduce some of the tools that we use to find problems in our problems, and introduces some of the tools that we use to find problems in our
Theano code, and even (it happens) in Theano's internals, such as own Theano code, and even (it happens) in Theano's internals, in
:ref:`using_debugmode`. :ref:`using_debugmode`.
Isolating the problem/Testing Theano compiler Isolating the Problem/Testing Theano Compiler
--------------------------------------------- ---------------------------------------------
You can run your Theano function in a DebugMode(:ref:`using_debugmode`). This test the Theano optimizations and help to find where NaN, inf and other problem come from. You can run your Theano function in a :ref:`DebugMode<using_debugmode>`.
This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.
Using Test Values Using Test Values
----------------- -----------------
As of v.0.4.0, Theano has a new mechanism by which graphs are executed As of v.0.4.0, Theano has a new mechanism by which graphs are executed
on-the-fly, before a theano.function is ever compiled. Since optimizations on-the-fly, before a ``theano.function`` is ever compiled. Since optimizations
haven't been applied at this stage, it is easier for the user to locate the haven't been applied at this stage, it is easier for the user to locate the
source of some bug. This functionality is enabled through the config flag source of some bug. This functionality is enabled through the config flag
``theano.config.compute_test_value``. Its use is best shown through the ``theano.config.compute_test_value``. Its use is best shown through the
...@@ -34,27 +35,27 @@ following example. ...@@ -34,27 +35,27 @@ following example.
theano.config.compute_test_value = 'off' theano.config.compute_test_value = 'off'
# configure shared variables # configure shared variables
W1val = numpy.random.rand(2,10,10).astype(theano.config.floatX) W1val = numpy.random.rand(2, 10, 10).astype(theano.config.floatX)
W1 = theano.shared(W1val, 'W1') W1 = theano.shared(W1val, 'W1')
W2val = numpy.random.rand(15,20).astype(theano.config.floatX) W2val = numpy.random.rand(15, 20).astype(theano.config.floatX)
W2 = theano.shared(W2val, 'W2') W2 = theano.shared(W2val, 'W2')
# input which will be of shape (5,10) # input which will be of shape (5,10)
x = T.matrix('x') x = T.matrix('x')
# transform the shared variable in some way. Theano does not # transform the shared variable in some way. Theano does not
# know off hand that the matrix func_of_W1 has shape (20,10) # know off hand that the matrix func_of_W1 has shape (20, 10)
func_of_W1 = W1.dimshuffle(2,0,1).flatten(2).T func_of_W1 = W1.dimshuffle(2, 0, 1).flatten(2).T
# source of error: dot product of 5x10 with 20x10 # source of error: dot product of 5x10 with 20x10
h1 = T.dot(x,func_of_W1) h1 = T.dot(x, func_of_W1)
# do more stuff # do more stuff
h2 = T.dot(h1,W2.T) h2 = T.dot(h1, W2.T)
# compile and call the actual function # compile and call the actual function
f = theano.function([x], h2) f = theano.function([x], h2)
f(numpy.random.rand(5,10)) f(numpy.random.rand(5, 10))
Running the above code generates the following error message: Running the above code generates the following error message:
...@@ -86,9 +87,9 @@ Running the above code generates the following error message: ...@@ -86,9 +87,9 @@ Running the above code generates the following error message:
_dot22(x, <TensorType(float64, matrix)>), [_dot22.0], _dot22(x, <TensorType(float64, matrix)>), [_dot22.0],
_dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4') _dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4')
Needless to say the above is not very informative and does not provide much in Needless to say, the above is not very informative and does not provide much in
the way of guidance. However, by instrumenting the code ever so slightly, we the way of guidance. However, by instrumenting the code ever so slightly, we
can get Theano to give us the exact source of the error. can get Theano to reveal the exact source of the error.
.. code-block:: python .. code-block:: python
...@@ -97,17 +98,17 @@ can get Theano to give us the exact source of the error. ...@@ -97,17 +98,17 @@ can get Theano to give us the exact source of the error.
... ...
# input which will be of shape (5,10) # input which will be of shape (5, 10)
x = T.matrix('x') x = T.matrix('x')
# provide Theano with a default test-value # provide Theano with a default test-value
x.tag.test_value = numpy.random.rand(5,10) x.tag.test_value = numpy.random.rand(5, 10)
In the above, we're tagging the symbolic matrix ``x`` with a special test In the above, we are tagging the symbolic matrix *x* with a special test
value. This allows Theano to evaluate symbolic expressions on-the-fly (by value. This allows Theano to evaluate symbolic expressions on-the-fly (by
calling the ``perform`` method of each Op), as they are being defined. Sources calling the ``perform`` method of each op), as they are being defined. Sources
of error can thus be identified with much more precision and much earlier in of error can thus be identified with much more precision and much earlier in
the compilation pipeline. For example, running the above code yields the the compilation pipeline. For example, running the above code yields the
following error message, which properly identifies line 23 as the culprit. following error message, which properly identifies *line 23* as the culprit.
.. code-block:: bash .. code-block:: bash
...@@ -120,33 +121,33 @@ following error message, which properly identifies line 23 as the culprit. ...@@ -120,33 +121,33 @@ following error message, which properly identifies line 23 as the culprit.
z[0] = numpy.asarray(numpy.dot(x, y)) z[0] = numpy.asarray(numpy.dot(x, y))
ValueError: ('matrices are not aligned', (5, 10), (20, 10)) ValueError: ('matrices are not aligned', (5, 10), (20, 10))
The compute_test_value mechanism works as follows: The ``compute_test_value`` mechanism works as follows:
* Theano Constants and SharedVariable are used as is. No need to instrument them. * Theano ``constants`` and ``shared`` variables are used as is. No need to instrument them.
* A Theano ``Variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be * A Theano *variable* (i.e. ``dmatrix``, ``vector``, etc.) should be
given a special test value through the attribute ``tag.test_value``. given a special test value through the attribute ``tag.test_value``.
* Theano automatically instruments intermediate results. As such, any quantity * Theano automatically instruments intermediate results. As such, any quantity
derived from ``x`` will be given a `tag.test_value` automatically. derived from *x* will be given a ``tag.test_value`` automatically.
`compute_test_value` can take the following values: ``compute_test_value`` can take the following values:
* ``off``: default behavior. This debugging mechanism is inactive. * ``off``: Default behavior. This debugging mechanism is inactive.
* ``raise``: compute test values on the fly. Any variable for which a test * ``raise``: Compute test values on the fly. Any variable for which a test
value is required, but not provided by the user, is treated as an error. An value is required, but not provided by the user, is treated as an error. An
exception is raised accordingly. exception is raised accordingly.
* ``warn``: idem, but a warning is issued instead of an Exception. * ``warn``: Idem, but a warning is issued instead of an *Exception*.
* ``ignore``: silently ignore the computation of intermediate test values, if a * ``ignore``: Silently ignore the computation of intermediate test values, if a
variable is missing a test value. variable is missing a test value.
.. note:: .. note::
This feature is currently not compatible with ``Scan`` and also with Ops This feature is currently incompatible with ``Scan`` and also with ops
which do not implement a ``perform`` method. which do not implement a ``perform`` method.
How do I print an intermediate value in a Function/Method? "How do I Print an Intermediate Value in a Function/Method?"
---------------------------------------------------------- ------------------------------------------------------------
Theano provides a 'Print' Op to do this. Theano provides a 'Print' op to do this.
.. code-block:: python .. code-block:: python
...@@ -158,15 +159,15 @@ Theano provides a 'Print' Op to do this. ...@@ -158,15 +159,15 @@ Theano provides a 'Print' Op to do this.
f_with_print = theano.function([x], x_printed * 5) f_with_print = theano.function([x], x_printed * 5)
#this runs the graph without any printing #this runs the graph without any printing
assert numpy.all( f([1,2,3]) == [5, 10, 15]) assert numpy.all( f([1, 2, 3]) == [5, 10, 15])
#this runs the graph with the message, and value printed #this runs the graph with the message, and value printed
assert numpy.all( f_with_print([1,2,3]) == [5, 10, 15]) assert numpy.all( f_with_print([1, 2, 3]) == [5, 10, 15])
Since Theano runs your program in a topological order, you won't have precise Since Theano runs your program in a topological order, you won't have precise
control over the order in which multiple Print() Ops are evaluted. For a more control over the order in which multiple ``Print()`` ops are evaluted. For a more
precise inspection of what's being computed where, when, and how, see the precise inspection of what's being computed where, when, and how, see the discussion
:ref:`faq_wraplinker`. :ref:`faq_wraplinker`.
.. warning:: .. warning::
...@@ -177,40 +178,50 @@ precise inspection of what's being computed where, when, and how, see the ...@@ -177,40 +178,50 @@ precise inspection of what's being computed where, when, and how, see the
to remove them to know if this is the cause or not. to remove them to know if this is the cause or not.
How do I print a graph (before or after compilation)? "How do I Print a Graph?" (before or after compilation)
---------------------------------------------------------- -------------------------------------------------------
.. TODO: dead links in the next paragraph
Theano provides two functions (:func:`theano.pp` and Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after :func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways: compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose. :func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a png image of the function. Theano also provides :func:`theano.printing.pydotprint` that creates a png image of the function.
You can read about them in :ref:`libdoc_printing`. You can read about them in :ref:`libdoc_printing`.
The function I compiled is too slow, what's up? "The Function I Compiled is Too Slow, what's up?"
----------------------------------------------- -------------------------------------------------
First, make sure you're running in FAST_RUN mode.
FAST_RUN is the default mode, but make sure by passing ``mode='FAST_RUN'`` First, make sure you're running in ``FAST_RUN`` mode. Even though
``FAST_RUN`` is the default mode, insist by passing ``mode='FAST_RUN'``
to ``theano.function`` (or ``theano.make``) or by setting :attr:`config.mode` to ``theano.function`` (or ``theano.make``) or by setting :attr:`config.mode`
to ``FAST_RUN``. to ``FAST_RUN``.
Second, try the theano :ref:`using_profilemode`. This will tell you which Second, try the Theano :ref:`using_profilemode`. This will tell you which
Apply nodes, and which Ops are eating up your CPU cycles. ``Apply`` nodes, and which ops are eating up your CPU cycles.
Tips: Tips:
* use the flags floatX=float32 to use float32 instead of float64 for the theano type matrix(),vector(),...(if you used dmatrix, dvector() they stay at float64). * Use the flags ``floatX=float32`` to require type *float32* instead of *float64*;
* Check that in the profile mode that there is no Dot operation and you're multiplying two matrices of the same type. Dot should be optimized to dot22 when the inputs are matrices and of the same type. This can happen when using floatX=float32 and something in the graph makes one of the inputs float64. Use the Theano constructors matrix(),vector(),... instead of dmatrix(), dvector(),...
since they respectively involve the default types *float32* and *float64*.
* Check in the ``profile`` mode that there is no ``Dot`` op in the post-compilation
graph while you are multiplying two matrices of the same type. ``Dot`` should be
optimized to ``dot22`` when the inputs are matrices and of the same type. This can
still happen when using ``floatX=float32`` when one of the inputs of the graph is
of type *float64*.
.. _faq_wraplinker: .. _faq_wraplinker:
How do I step through a compiled function with the WrapLinker? "How do I Step through a Compiled Function with the WrapLinker?"
-------------------------------------------------------------- ----------------------------------------------------------------
This is not exactly an FAQ, but the doc is here for now... This is not exactly a FAQ, but the doc is here for now...
It's pretty easy to roll-your-own evaluation mode. It's pretty easy to roll-your-own evaluation mode.
Check out this one: Check out this one:
...@@ -225,37 +236,37 @@ Check out this one: ...@@ -225,37 +236,37 @@ Check out this one:
wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()], [print_eval]) wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()], [print_eval])
super(PrintEverythingMode, self).__init__(wrap_linker, optimizer='fast_run') super(PrintEverythingMode, self).__init__(wrap_linker, optimizer='fast_run')
When you use ``mode=PrintEverythingMode()`` as the mode for Function or Method, When you use ``mode=PrintEverythingMode()`` as the mode for ``Function`` or ``Method``,
then you should see [potentially a lot of] output. Every Apply node will be printed out, then you should see [potentially a lot of] output. Every ``Apply`` node will be printed out,
along with its position in the graph, the arguments to the ``perform`` or along with its position in the graph, the arguments to the functions ``perform`` or
``c_code`` and the output it computed. ``c_code`` and the output it computed.
>>> x = T.dscalar('x') >>> x = T.dscalar('x')
>>> f = function([x], [5*x], mode=PrintEverythingMode()) >>> f = function([x], [5 * x], mode=PrintEverythingMode())
>>> f(3) >>> f(3)
>>> # print: 0 Elemwise{mul,no_inplace}(5, x) [array(5, dtype=int8), array(3.0)] [array(15.0)] >>> # print: 0 Elemwise{mul,no_inplace}(5, x) [array(5, dtype=int8), array(3.0)] [array(15.0)]
>>> # print: [array(15.0)] >>> # print: [array(15.0)]
Admittedly, this may be a huge amount of Admittedly, this may be a huge amount of
output to read through if you are using big tensors... but you can choose to output to read through if you are using big tensors... but you can choose to
put logic inside of the print_eval function that would, for example, only put logic inside of the *print_eval* function that would, for example, print
print something out if a certain kind of Op was used, at a certain program something out only if a certain kind of op were used, at a certain program
position, or if a particular value shows up in one of the inputs or outputs. position, or only if a particular value showed up in one of the inputs or outputs.
Use your imagination :) Use your imagination :)
.. TODO: documentation for link.WrapLinkerMany .. TODO: documentation for link.WrapLinkerMany
This can be a really powerful debugging tool. This can be a really powerful debugging tool. Note the call to *fn* inside the call to
Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all! *print_eval*; without it, the graph wouldn't get computed at all!
How to use pdb ? How to Use pdb
---------------- --------------
In the majority of cases, you won't be executing from the interactive shell In the majority of cases, you won't be executing from the interactive shell
but from a set of Python scripts. In such cases, the use of the Python but from a set of Python scripts. In such cases, the use of the Python
debugger can come in handy, especially as your models become more complex. debugger can come in handy, especially as your models become more complex.
Intermediate results don't necessarily have a clear name and you can get Intermediate results don't necessarily have a clear name and you can get
exceptions which are hard to decipher, due to the "compiled" nature of exceptions which are hard to decipher, due to the "compiled" nature of the
functions. functions.
Consider this example script ("ex.py"): Consider this example script ("ex.py"):
...@@ -269,16 +280,16 @@ Consider this example script ("ex.py"): ...@@ -269,16 +280,16 @@ Consider this example script ("ex.py"):
a = T.dmatrix('a') a = T.dmatrix('a')
b = T.dmatrix('b') b = T.dmatrix('b')
f = theano.function([a,b], [a*b]) f = theano.function([a, b], [a * b])
# matrices chosen so dimensions are unsuitable for multiplication # matrices chosen so dimensions are unsuitable for multiplication
mat1 = numpy.arange(12).reshape((3,4)) mat1 = numpy.arange(12).reshape((3, 4))
mat2 = numpy.arange(25).reshape((5,5)) mat2 = numpy.arange(25).reshape((5, 5))
f(mat1, mat2) f(mat1, mat2)
This is actually so simple the debugging could be done easily, but it's for This is actually so simple the debugging could be done easily, but it's for
illustrative purposes. As the matrices can't be element-wise multiplied illustrative purposes. As the matrices can't be multiplied element-wise
(unsuitable shapes), we get the following exception: (unsuitable shapes), we get the following exception:
.. code-block:: text .. code-block:: text
...@@ -290,12 +301,12 @@ illustrative purposes. As the matrices can't be element-wise multiplied ...@@ -290,12 +301,12 @@ illustrative purposes. As the matrices can't be element-wise multiplied
File "/u/username/Theano/theano/gof/link.py", line 267, in streamline_default_f File "/u/username/Theano/theano/gof/link.py", line 267, in streamline_default_f
File "/u/username/Theano/theano/gof/cc.py", line 1049, in execute ValueError: ('Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 5)', Elemwise{mul,no_inplace}(a, b), Elemwise{mul,no_inplace}(a, b)) File "/u/username/Theano/theano/gof/cc.py", line 1049, in execute ValueError: ('Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 5)', Elemwise{mul,no_inplace}(a, b), Elemwise{mul,no_inplace}(a, b))
The call stack contains a few useful informations to trace back the source The call stack contains some useful information to trace back the source
of the error. There's the script where the compiled function was called -- of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line might originate from ops in these modules, not this script. The last line
tells us about the Op that caused the exception. In thise case it's a "mul" tells us about the op that caused the exception. In this case it's a "mul"
involving Variables name "a" and "b". But suppose we instead had an involving variables with names "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name. intermediate result to which we hadn't given a name.
After learning a few things about the graph structure in Theano, we can use After learning a few things about the graph structure in Theano, we can use
...@@ -328,7 +339,7 @@ explore around the graph. ...@@ -328,7 +339,7 @@ explore around the graph.
That graph is purely symbolic (no data, just symbols to manipulate it That graph is purely symbolic (no data, just symbols to manipulate it
abstractly). To get information about the actual parameters, you explore the abstractly). To get information about the actual parameters, you explore the
"thunks" objects, which bind the storage for the inputs (and outputs) with "thunk" objects, which bind the storage for the inputs (and outputs) with
the function itself (a "thunk" is a concept related to closures). Here, to the function itself (a "thunk" is a concept related to closures). Here, to
get the current node's first input's shape, you'd therefore do "p get the current node's first input's shape, you'd therefore do "p
thunk.inputs[0][0].shape", which prints out "(3, 4)". thunk.inputs[0][0].shape", which prints out "(3, 4)".
......
...@@ -2,11 +2,19 @@ ...@@ -2,11 +2,19 @@
.. _basictutexamples: .. _basictutexamples:
============= =============
More examples More Examples
============= =============
At this point it would be wise to begin familiarizing yourself
more systematically with Theano's fundamental objects and operations by browsing
this section of the library: :ref:`libdoc_basic_tensor`.
Logistic function As the tutorial unfolds, you should also gradually acquaint yourself with the other
relevant areas of the library and with the relevant subjects of the documentation
entrance page.
Logistic Function
================= =================
Here's another straightforward example, though a bit more elaborate Here's another straightforward example, though a bit more elaborate
...@@ -61,12 +69,12 @@ array([[ 0.5 , 0.73105858], ...@@ -61,12 +69,12 @@ array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]]) [ 0.26894142, 0.11920292]])
Computing more than one thing at the same time Computing More than one Thing at the Same Time
============================================== ==============================================
Theano supports functions with multiple outputs. For example, we can Theano supports functions with multiple outputs. For example, we can
compute the :ref:`elementwise <libdoc_tensor_elementwise>` difference, absolute difference, and compute the :ref:`elementwise <libdoc_tensor_elementwise>` difference, absolute difference, and
squared difference between two matrices ``a`` and ``b`` at the same time: squared difference between two matrices *a* and *b* at the same time:
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_3 .. theano/tests/test_tutorial.py:T_examples.test_examples_3
...@@ -82,7 +90,7 @@ squared difference between two matrices ``a`` and ``b`` at the same time: ...@@ -82,7 +90,7 @@ squared difference between two matrices ``a`` and ``b`` at the same time:
shortcut for allocating symbolic variables that we will often use in the shortcut for allocating symbolic variables that we will often use in the
tutorials. tutorials.
When we use the function, it will return the three variables (the printing When we use the function f, it returns the three variables (the printing
was reformatted for readability): was reformatted for readability):
>>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]]) >>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
...@@ -94,9 +102,7 @@ was reformatted for readability): ...@@ -94,9 +102,7 @@ was reformatted for readability):
[ 1., 4.]])] [ 1., 4.]])]
Setting a Default Value for an Argument
Setting a default value for an argument
======================================= =======================================
Let's say you want to define a function that adds two numbers, except Let's say you want to define a function that adds two numbers, except
...@@ -117,11 +123,11 @@ array(35.0) ...@@ -117,11 +123,11 @@ array(35.0)
This makes use of the :ref:`Param <function_inputs>` class which allows This makes use of the :ref:`Param <function_inputs>` class which allows
you to specify properties of your function's parameters with greater detail. Here we you to specify properties of your function's parameters with greater detail. Here we
give a default value of 1 for ``y`` by creating a ``Param`` instance with give a default value of 1 for *y* by creating a ``Param`` instance with
its ``default`` field set to 1. its ``default`` field set to 1.
Inputs with default values must follow inputs without default Inputs with default values must follow inputs without default
values (like python's functions). There can be multiple inputs with default values. These parameters can values (like Python's functions). There can be multiple inputs with default values. These parameters can
be set positionally or by name, as in standard Python: be set positionally or by name, as in standard Python:
...@@ -143,18 +149,21 @@ array(34.0) ...@@ -143,18 +149,21 @@ array(34.0)
array(33.0) array(33.0)
.. note:: .. note::
``Param`` does not know the name of the local variables ``y`` and ``w`` ``Param`` does not know the name of the local variables *y* and *w*
that are passed as arguments. The symbolic variable objects have name that are passed as arguments. The symbolic variable objects have name
attributes (set by ``dscalars`` in the example above) and *these* are the attributes (set by ``dscalars`` in the example above) and *these* are the
names of the keyword parameters in the functions that we build. This is names of the keyword parameters in the functions that we build. This is
the mechanism at work in ``Param(y, default=1)``. In the case of ``Param(w, the mechanism at work in ``Param(y, default=1)``. In the case of ``Param(w,
default=2, name='w_by_name')``, we override the symbolic variable's name default=2, name='w_by_name')``. We override the symbolic variable's name
attribute with a name to be used for this function. attribute with a name to be used for this function.
You may like to see :ref:`Function<usingfunction>` in the library for more detail.
.. _functionstateexample: .. _functionstateexample:
Using shared variables Using Shared Variables
====================== ======================
It is also possible to make a function with an internal state. For It is also possible to make a function with an internal state. For
...@@ -162,7 +171,7 @@ example, let's say we want to make an accumulator: at the beginning, ...@@ -162,7 +171,7 @@ example, let's say we want to make an accumulator: at the beginning,
the state is initialized to zero. Then, on each function call, the state the state is initialized to zero. Then, on each function call, the state
is incremented by the function's argument. is incremented by the function's argument.
First let's define the ``accumulator`` function. It adds its argument to the First let's define the *accumulator* function. It adds its argument to the
internal state, and returns the old state value. internal state, and returns the old state value.
.. If you modify this code, also change : .. If you modify this code, also change :
...@@ -174,24 +183,24 @@ internal state, and returns the old state value. ...@@ -174,24 +183,24 @@ internal state, and returns the old state value.
>>> accumulator = function([inc], state, updates=[(state, state+inc)]) >>> accumulator = function([inc], state, updates=[(state, state+inc)])
This code introduces a few new concepts. The ``shared`` function constructs This code introduces a few new concepts. The ``shared`` function constructs
so-called :term:`shared variables <shared variable>`. so-called :ref:`shared variables<libdoc_compile_shared>`.
These are hybrid symbolic and non-symbolic These are hybrid symbolic and non-symbolic variables whose value may be shared
variables. Shared variables can be used in symbolic expressions just like between multiple functions. Shared variables can be used in symbolic expressions just like
the objects returned by ``dmatrices(...)`` but they also have an internal the objects returned by ``dmatrices(...)`` but they also have an internal
value, that defines the value taken by this symbolic variable in *all* the value that defines the value taken by this symbolic variable in *all* the
functions that use it. It is called a *shared* variable because its value is functions that use it. It is called a *shared* variable because its value is
shared between many functions. The value can be accessed and modified by the shared between many functions. The value can be accessed and modified by the
``.get_value()`` and ``.set_value()`` methods. We will come back to this soon. ``.get_value()`` and ``.set_value()`` methods. We will come back to this soon.
The other new thing in this code is the ``updates`` parameter of function. The other new thing in this code is the ``updates`` parameter of ``function``.
The updates is a list of pairs of the form (shared-variable, new expression). ``updates`` must be supplied with a list of pairs of the form (shared-variable, new expression).
It can also be a dictionary whose keys are shared-variables and values are It can also be a dictionary whose keys are shared-variables and values are
the new expressions. Either way, it means "whenever this function runs, it the new expressions. Either way, it means "whenever this function runs, it
will replace the ``.value`` of each shared variable with the result of the will replace the ``.value`` of each shared variable with the result of the
corresponding expression". Above, our accumulator replaces the ``state``'s value with the sum corresponding expression". Above, our accumulator replaces the ``state``'s value with the sum
of the state and the increment amount. of the state and the increment amount.
Anyway, let's try it out! Let's try it out!
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_8 .. theano/tests/test_tutorial.py:T_examples.test_examples_8
...@@ -216,7 +225,7 @@ array(-1) ...@@ -216,7 +225,7 @@ array(-1)
array(2) array(2)
As we mentioned above, you can define more than one function to use the same As we mentioned above, you can define more than one function to use the same
shared variable. These functions can both update the value. shared variable. These functions can all update the value.
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_8 .. theano/tests/test_tutorial.py:T_examples.test_examples_8
...@@ -228,13 +237,13 @@ array(2) ...@@ -228,13 +237,13 @@ array(2)
array(0) array(0)
You might be wondering why the updates mechanism exists. You can always You might be wondering why the updates mechanism exists. You can always
achieve a similar thing by returning the new expressions, and working with achieve a similar result by returning the new expressions, and working with
them in numpy as usual. The updates mechanism can be a syntactic convenience, them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
updates). Also, theano has more control over where and how shared variables are updates). Also, Theano has more control over where and how shared variables are
allocated, which is one of the important elements of getting good performance allocated, which is one of the important elements of getting good performance
on the GPU. on the :ref:`GPU<using_gpu>`.
It may happen that you expressed some formula using a shared variable, but It may happen that you expressed some formula using a shared variable, but
you do *not* want to use its value. In this case, you can use the you do *not* want to use its value. In this case, you can use the
...@@ -254,15 +263,15 @@ array(7) ...@@ -254,15 +263,15 @@ array(7)
>>> state.get_value() # old state still there, but we didn't use it >>> state.get_value() # old state still there, but we didn't use it
array(0) array(0)
The givens parameter can be used to replace any symbolic variable, not just a The ``givens`` parameter can be used to replace any symbolic variable, not just a
shared variable. You can replace constants, and expressions, in general. Be shared variable. You can replace constants, and expressions, in general. Be
careful though, not to allow the expressions introduced by a givens careful though, not to allow the expressions introduced by a ``givens``
substitution to be co-dependent, the order of substitution is not defined, so substitution to be co-dependent, the order of substitution is not defined, so
the substitutions have to work in any order. the substitutions have to work in any order.
In practice, a good way of thinking about the ``givens`` is as a mechanism In practice, a good way of thinking about the ``givens`` is as a mechanism
that allows you to replace any part of your formula with a different that allows you to replace any part of your formula with a different
expression that evaluates to a tensor of same shape and dtype. ``givens`` expression that evaluates to a tensor of same shape and dtype.
.. _using_random_numbers: .. _using_random_numbers:
...@@ -272,17 +281,19 @@ Using Random Numbers ...@@ -272,17 +281,19 @@ Using Random Numbers
Because in Theano you first express everything symbolically and Because in Theano you first express everything symbolically and
afterwards compile this expression to get functions, afterwards compile this expression to get functions,
using pseudo-random numbers is not as straightforward as it is in using pseudo-random numbers is not as straightforward as it is in
numpy, though also not too complicated. NumPy, though also not too complicated.
The way to think about putting randomness into Theano's computations is The way to think about putting randomness into Theano's computations is
to put random variables in your graph. Theano will allocate a numpy to put random variables in your graph. Theano will allocate a NumPy
RandomStream object (a random number generator) for each such RandomStream object (a random number generator) for each such
variable, and draw from it as necessary. We will call this sort of variable, and draw from it as necessary. We will call this sort of
sequence of random numbers a *random stream*. *Random streams* are at sequence of random numbers a *random stream*. *Random streams* are at
their core shared variables, so the observations on shared variables their core shared variables, so the observations on shared variables
hold here as well. hold here as well. Theanos's random objects are defined and implemented in
:ref:`RandomStreams<libdoc_tensor_shared_randomstreams>` and, at a lower level,
in :ref:`RandomStreamsBase<libdoc_tensor_raw_random>`.
Brief example Brief Example
------------- -------------
Here's a brief example. The setup code is: Here's a brief example. The setup code is:
...@@ -303,7 +314,9 @@ Here's a brief example. The setup code is: ...@@ -303,7 +314,9 @@ Here's a brief example. The setup code is:
Here, 'rv_u' represents a random stream of 2x2 matrices of draws from a uniform Here, 'rv_u' represents a random stream of 2x2 matrices of draws from a uniform
distribution. Likewise, 'rv_n' represents a random stream of 2x2 matrices of distribution. Likewise, 'rv_n' represents a random stream of 2x2 matrices of
draws from a normal distribution. The distributions that are implemented are draws from a normal distribution. The distributions that are implemented are
defined in :class:`RandomStreams`. defined in :class:`RandomStreams` and, at a lower level, in :ref:`raw_random<libdoc_tensor_raw_random>`.
.. TODO: repair the latter reference on RandomStreams
Now let's use these objects. If we call f(), we get random uniform numbers. Now let's use these objects. If we call f(), we get random uniform numbers.
The internal state of the random number generator is automatically updated, The internal state of the random number generator is automatically updated,
...@@ -313,22 +326,22 @@ so we get different random numbers every time. ...@@ -313,22 +326,22 @@ so we get different random numbers every time.
>>> f_val1 = f() #different numbers from f_val0 >>> f_val1 = f() #different numbers from f_val0
When we add the extra argument ``no_default_updates=True`` to When we add the extra argument ``no_default_updates=True`` to
``function`` (as in ``g``), then the random number generator state is ``function`` (as in *g*), then the random number generator state is
not affected by calling the returned function. So for example, calling not affected by calling the returned function. So, for example, calling
``g`` multiple times will return the same numbers. *g* multiple times will return the same numbers.
>>> g_val0 = g() # different numbers from f_val0 and f_val1 >>> g_val0 = g() # different numbers from f_val0 and f_val1
>>> g_val1 = g() # same numbers as g_val0! >>> g_val1 = g() # same numbers as g_val0!
An important remark is that a random variable is drawn at most once during any An important remark is that a random variable is drawn at most once during any
single function execution. So the ``nearly_zeros`` function is guaranteed to single function execution. So the *nearly_zeros* function is guaranteed to
return approximately 0 (except for rounding error) even though the ``rv_u`` return approximately 0 (except for rounding error) even though the *rv_u*
random variable appears three times in the output expression. random variable appears three times in the output expression.
>>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u) >>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
Seedings Streams Seeding Streams
---------------- ---------------
Random variables can be seeded individually or collectively. Random variables can be seeded individually or collectively.
...@@ -346,12 +359,12 @@ of the random variables. ...@@ -346,12 +359,12 @@ of the random variables.
>>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each >>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each
Sharing Streams between Functions Sharing Streams Between Functions
--------------------------------- ---------------------------------
As usual for shared variables, the random number generators used for random As usual for shared variables, the random number generators used for random
variables are common between functions. So our ``nearly_zeros`` function will variables are common between functions. So our *nearly_zeros* function will
update the state of the generators used in function ``f`` above. update the state of the generators used in function *f* above.
For example: For example:
...@@ -364,7 +377,64 @@ For example: ...@@ -364,7 +377,64 @@ For example:
>>> v2 = f() # v2 != v1 >>> v2 = f() # v2 != v1
Others Random Distributions Other Random Distributions
--------------------------- ---------------------------
There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`. There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
.. _logistic_regression:
A Real Example: Logistic Regression
===================================
The preceding elements are featured in this more realistic example. It will be used repeatedly.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N,low=0, high=2))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print "Initial model:"
print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probability that target = 1
prediction = p_1 > 0.5 # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw,gb = T.grad(cost, [w, b]) # Compute the gradient of the cost
# (we shall return to this in a
# following section of this tutorial)
# Compile
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w: w - 0.1 * gw, b: b - 0.1 * gb})
predict = theano.function(inputs=[x], outputs=prediction)
# Train
for i in range(training_steps):
pred, err = train(D[0], D[1])
print "Final model:"
print w.get_value(), b.get_value()
print "target values for D:", D[1]
print "prediction on D:", predict(D[0])
.. _extending_theano: .. _extending_theano:
**************** ================
Extending Theano Extending Theano
**************** ================
Theano graphs Theano Graphs
------------- =============
- Theano works with symbolic graphs - Theano works with symbolic graphs.
- Those graphs are bi-partite graphs (graph with 2 types of nodes) - Those graphs are bi-partite graphs (graph with 2 types of nodes).
- The 2 types of nodes are Apply and Variable nodes - The two types of nodes are ``Apply`` and ``Variable`` nodes.
- Each Apply node has a link to the Op that it executes - Each ``Apply`` node has a link to the op that it executes.
Inputs and Outputs are lists of Theano variables Inputs and Outputs are lists of Theano variables.
.. image:: ../hpcs2011_tutorial/pics/apply_node.png .. image:: ../hpcs2011_tutorial/pics/apply_node.png
:width: 500 px :width: 500 px
...@@ -21,27 +21,29 @@ Inputs and Outputs are lists of Theano variables ...@@ -21,27 +21,29 @@ Inputs and Outputs are lists of Theano variables
.. note:: .. note::
This tutorial does not cover how to make an op that returns a view or This tutorial does not cover how to make an op that returns a view or
modify the values in its inputs. So all modifies the values in its inputs. Thus, all ops created with the
Ops created with the instructions here MUST return newly allocated instructions described here MUST return newly allocated
memory or reuse the memory provided in the parameter memory or reuse the memory provided in the parameter
output_storage of the :func:`perform` function. See :ref:`views_and_inplace` ``output_storage`` of the :func:`perform` function. See :ref:`views_and_inplace`
for explanation of how to do this. for an explanation on how to do this.
If your Op returns a view or change the value on its inputs If your op returns a view or changes the value of its inputs
without doing as said in that page, Theano will run, but will without doing as prescribed in that page, Theano will run, but will
return good results for some graphs, but bad results for others. return correct results for some graphs and wrong results for others.
It is recommented that you run your tests in DebugMode (Theano flag It is recommended that you run your tests in DebugMode (Theano *flag*
mode=DebugMode) that checks if your Op behaves correctly in this ``mode=DebugMode``) since it verifies if your op behaves correctly in this
regard. regard.
.. note:: .. note::
See :ref:`dev_start_guide` for information about git, github, the See the :ref:`dev_start_guide` for information regarding the versioning
development workflow and how to make a quality contribution. framework, namely about *git* and *GitHub*, regarding the development workflow and
how to make a quality contribution.
Op contract
----------- Op Contract
===========
.. code-block:: python .. code-block:: python
...@@ -66,8 +68,8 @@ Op contract ...@@ -66,8 +68,8 @@ Op contract
pass pass
# C implementation: [see theano web site for other functions] # C implementation: [see theano web site for other functions]
def c_code(...): def c_code(...):
# ... # ...
pass pass
# others implementation (pycuda, ...): # others implementation (pycuda, ...):
...@@ -81,7 +83,7 @@ Op contract ...@@ -81,7 +83,7 @@ Op contract
def grad(self, inputs, g): def grad(self, inputs, g):
pass pass
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
pass pass
def infer_shape(node, (i0_shapes, ...)) def infer_shape(node, (i0_shapes, ...))
...@@ -89,28 +91,28 @@ Op contract ...@@ -89,28 +91,28 @@ Op contract
.. ../extending/op.txt .. ../extending/op.txt
There are 2 mandatory methods that one needs to implement. There are two mandatory methods that one needs to implement.
The first one is :func:`make_node`. The second one The first one is :func:`make_node`. The second one
would describe the computations that are required to be done would describe the computations that are required to be done
at run time. Currently there are 2 different possibilites: at run time. Currently there are 2 different possibilites:
implement the :func:`perform` implement the :func:`perform`
and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods and/or :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. The ``perform`` allows <cop>`), or the :func:`make_thunk` method. ``perform`` allows
to easily wrap an existing python function into Theano. The ``c_code`` to easily wrap an existing Python function into Theano. ``c_code``
and related methods allow the op to generate c code that will be and the related methods allow the op to generate C code that will be
compiled and linked by Theano. On the other hand, the ``make_thunk`` compiled and linked by Theano. On the other hand, ``make_thunk``
method will be called only once during compilation and should generate will be called only once during compilation and should generate
a ``thunk``: a standalone function that when called will do the wanted computations. a ``thunk``: a standalone function that when called will do the wanted computations.
This is useful if you want to generate code and compile it yourself. For This is useful if you want to generate code and compile it yourself. For
example, this allows you to use PyCUDA to compile gpu code. example, this allows you to use PyCUDA to compile GPU code.
Also there are 2 methods that are highly recommended to be implemented. They are Also there are two methods whose implementations are highly recommended. They are
needed in order to merge duplicate computations involving your op. So if you needed in order to merge duplicate computations involving your op. So if you
do not want Theano to execute your op multiple times with the same inputs, do not want Theano to execute your op multiple times with the same inputs,
do implement them. Those methods are :func:`__eq__` and do implement them. Those methods are :func:`__eq__` and
:func:`__hash__`. :func:`__hash__`.
The :func:`infer_shape` method allows to infer shape of some variable, somewhere in the The :func:`infer_shape` method allows to infer the shape of some variable, somewhere in the
middle of the computational graph without actually computing the outputs (when possible). middle of the computational graph without actually computing the outputs (when possible).
This could be helpful if one only needs the shape of the output instead of the actual outputs. This could be helpful if one only needs the shape of the output instead of the actual outputs.
...@@ -118,13 +120,13 @@ The :func:`grad` method is required if you want to differentiate some cost whose ...@@ -118,13 +120,13 @@ The :func:`grad` method is required if you want to differentiate some cost whose
includes your op. includes your op.
The :func:`__str__` method is useful in order to provide a more meaningful The :func:`__str__` method is useful in order to provide a more meaningful
string representation of your Op. string representation of your op.
The :func:`R_op` method is needed if you want `theano.tensor.Rop` to The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
work with your op. work with your op.
Op example Op Example
---------- ==========
.. code-block:: python .. code-block:: python
...@@ -155,7 +157,7 @@ Op example ...@@ -155,7 +157,7 @@ Op example
def grad(self, inputs, output_grads): def grad(self, inputs, output_grads):
return [output_grads[0] * 2] return [output_grads[0] * 2]
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
# R_op can receive None as eval_points. # R_op can receive None as eval_points.
# That mean there is no diferientiable path through that input # That mean there is no diferientiable path through that input
# If this imply that you cannot compute some outputs, # If this imply that you cannot compute some outputs,
...@@ -164,7 +166,7 @@ Op example ...@@ -164,7 +166,7 @@ Op example
return eval_points return eval_points
return self.grad(inputs, eval_points) return self.grad(inputs, eval_points)
Try it! You can try it as follows:
.. code-block:: python .. code-block:: python
...@@ -177,19 +179,20 @@ Try it! ...@@ -177,19 +179,20 @@ Try it!
print inp print inp
print out print out
How to test it
--------------
Theano has some functions to simplify testing. These help test the How To Test it
==============
Theano has some functionalities to simplify testing. These help test the
``infer_shape``, ``grad`` and ``R_op`` methods. Put the following code ``infer_shape``, ``grad`` and ``R_op`` methods. Put the following code
in a file and execute it with the ``nosetests`` program. in a file and execute it with the ``theano-nose`` program.
Basic tests Basic Tests
=========== -----------
Basic tests are done by you just by using the Op and checking that it Basic tests are done by you just by using the op and checking that it
returns the right answer. If you detect an error, you must raise an returns the right answer. If you detect an error, you must raise an
exception. You can use the `assert` keyword to automatically raise an *exception*. You can use the ``assert`` keyword to automatically raise an
``AssertionError``. ``AssertionError``.
.. code-block:: python .. code-block:: python
...@@ -210,23 +213,24 @@ exception. You can use the `assert` keyword to automatically raise an ...@@ -210,23 +213,24 @@ exception. You can use the `assert` keyword to automatically raise an
# Compare the result computed to the expected value. # Compare the result computed to the expected value.
assert numpy.allclose(inp * 2, out) assert numpy.allclose(inp * 2, out)
Testing the infer_shape Testing the infer_shape
======================= -----------------------
When a class inherits from the ``InferShapeTester`` class, it gets the When a class inherits from the ``InferShapeTester`` class, it gets the
`self._compile_and_check` method that tests the Op ``infer_shape`` ``self._compile_and_check`` method that tests the op's ``infer_shape``
method. It tests that the Op gets optimized out of the graph if only method. It tests that the op gets optimized out of the graph if only
the shape of the output is needed and not the output the shape of the output is needed and not the output
itself. Additionally, it checks that such an optimized graph computes itself. Additionally, it checks that the optimized graph computes
the correct shape, by comparing it to the actual shape of the computed the correct shape, by comparing it to the actual shape of the computed
output. output.
`self._compile_and_check` compiles a Theano function. It takes as ``self._compile_and_check`` compiles a Theano function. It takes as
parameters the lists of input and output Theano variables, as would be parameters the lists of input and output Theano variables, as would be
provided to theano.function, and a list of real values to pass to the provided to ``theano.function``, and a list of real values to pass to the
compiled function (don't use shapes that are symmetric, e.g. (3, 3), compiled function (do not use symmetric shapes, e.g. (3, 3),
as they can easily to hide errors). It also takes the Op class to as they can easily hide errors). It also takes the op class as a parameter
verify that no Ops of that type appear in the shape-optimized graph. in order to verify that no instance of it appears in the shape-optimized graph.
If there is an error, the function raises an exception. If you want to If there is an error, the function raises an exception. If you want to
see it fail, you can implement an incorrect ``infer_shape``. see it fail, you can implement an incorrect ``infer_shape``.
...@@ -249,10 +253,10 @@ see it fail, you can implement an incorrect ``infer_shape``. ...@@ -249,10 +253,10 @@ see it fail, you can implement an incorrect ``infer_shape``.
self.op_class) self.op_class)
Testing the gradient Testing the gradient
==================== --------------------
The function :ref:`verify_grad <validating_grad>` The function :ref:`verify_grad <validating_grad>`
verifies the gradient of an Op or Theano graph. It compares the verifies the gradient of an op or Theano graph. It compares the
analytic (symbolically computed) gradient and the numeric analytic (symbolically computed) gradient and the numeric
gradient (computed through the Finite Difference Method). gradient (computed through the Finite Difference Method).
...@@ -267,15 +271,16 @@ the multiplication by 2). ...@@ -267,15 +271,16 @@ the multiplication by 2).
[numpy.random.rand(5, 7, 2)]) [numpy.random.rand(5, 7, 2)])
Testing the Rop Testing the Rop
=============== ---------------
.. TODO: repair defective links in the following paragraph
The class :class:`RopLop_checker`, give the functions The class :class:`RopLop_checker` defines the functions
:func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_rop_lop` and
:func:`RopLop_checker.check_rop_lop` and :func:`RopLop_checker.check_nondiff_rop`. These allow to test the
:func:`RopLop_checker.check_nondiff_rop` that allow to test the implementation of the Rop method of a particular op.
implementation of the Rop method of one Op.
To verify the Rop method of the DoubleOp, you can use this: For instance, to verify the Rop method of the DoubleOp, you can use this:
.. code-block:: python .. code-block:: python
...@@ -288,20 +293,64 @@ To verify the Rop method of the DoubleOp, you can use this: ...@@ -288,20 +293,64 @@ To verify the Rop method of the DoubleOp, you can use this:
def test_double_rop(self): def test_double_rop(self):
self.check_rop_lop(DoubleRop()(self.x), self.in_shape) self.check_rop_lop(DoubleRop()(self.x), self.in_shape)
Running your tests
Testing GPU Ops
---------------
Ops to be executed on the GPU should inherit from the ``theano.sandbox.cuda.GpuOp``
and not ``theano.Op``. This allows Theano to distinguish them. Currently, we
use this to test if the NVIDIA driver works correctly with our sum reduction code on the
GPU.
Running Your Tests
================== ==================
You can run ``nosetests`` in the Theano folder to run all of Theano's To perform your tests, you may select either one of the three following methods:
tests, including yours if they are somewhere in the directory
structure. You can run ``nosetests test_file.py`` to run only the theano-nose
tests in that file. You can run ``nosetests -----------
test_file.py:test_DoubleRop`` to run only the tests inside that test
class. You can run ``nosetests The method of choice to conduct tests is to run the file ``theano-nose``. In a regular
test_file.py:test_DoubleRop.test_double_op`` to run only one Theano installation, the latter will be on the operating system's path and directly accessible
particular test. More `nosetests from any folder. Otherwise, it can be accessed in the ``Theano/bin`` folder. The following command
<http://readthedocs.org/docs/nose/en/latest/>`_ documentation. lines may be used for the corresponding purposes:
* ``theano-nose --theano``: Run every test found in Theano's path.
* ``theano-nose folder_name``: Run every test found in the folder *folder_name*.
* ``theano-nose test_file.py``: Run every test found in the file *test_file.py*.
The following are particularly useful for development purposes since they call for
particular classes or even for particular tests:
You can also add this at the end of the test file: * ``theano-nose test_file.py:test_DoubleRop``: Run every test found inside the class *test_DoubleRop*.
* ``theano-nose test_file.py:test_DoubleRop.test_double_op``: Run only the test *test_double_op*
in the class *test_DoubleRop*.
Help with the use and functionalities of ``theano-nose`` may be obtained by running
it with the command line parameter ``--help (-h)``.
nosetests
---------
The command ``nosetests`` can also be used. Although it lacks the useful
functionalities that ``theano-nose`` provides, ``nosetests`` can be called similarly
to ``theano-nose`` from any folder in Python's path like so:
``nosetests [suffix similar to the above]``.
More documentation on ``nosetests`` is available here:
`nosetests <http://readthedocs.org/docs/nose/en/latest/>`_.
In-file
-------
One may also add a block of code similar to the following at the end of the
file containing a specific test of interest and run the file. In this example, the test
*test_DoubleRop* in the class *test_double_op* would be performed.
.. code-block:: python .. code-block:: python
...@@ -310,14 +359,30 @@ You can also add this at the end of the test file: ...@@ -310,14 +359,30 @@ You can also add this at the end of the test file:
t.setUp() t.setUp()
t.test_double_rop() t.test_double_rop()
Exercises 8 We recommand that when we execute a file, we run all tests in that
----------- file. This can be done by adding this at the end of your test files:
- Run the code in the file double_op.py. .. code-block:: python
- Modify and execute to compute: x * y
- Modify and execute the example to return 2 outputs: x + y and x - y
- Our current element-wise fusion generates computation with only 1 output. if __name__ == '__main__':
unittest.main()
Exercise
========
Run the code of the *DoubleOp* example above.
Modify and execute to compute: x * y.
Modify and execute the example to return two outputs: x + y and x - y.
You can omit the Rop functions. Try to implement the testing apparatus described above.
(Notice that Theano's current *elemwise fusion* optimization is
only applicable to computations involving a single output. Hence, to gain
efficiency over the basic solution that is asked here, the two operations would
have to be jointly optimized explicitly in the code.)
SciPy SciPy
----- -----
...@@ -361,18 +426,15 @@ don't forget to call the parent ``setUp`` function. ...@@ -361,18 +426,15 @@ don't forget to call the parent ``setUp`` function.
For more details see :ref:`random_value_in_tests`. For more details see :ref:`random_value_in_tests`.
GPU Op
------
Op that execute on the GPU should inherit from the :download:`Solution<extending_theano_solution_1.py>`
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
to make the distinction between both. Currently, we use this to test
if the NVIDIA driver works correctly with our sum reduction code on the
gpu.
Final Note
==========
Documentation A more extensive discussion of this section's content may be found in the advanced
------------- tutorial :ref:`Extending Theano<extending>`
See :ref:`metadocumentation`, for some information on how to generate See :ref:`metadocumentation`, for some information on how to generate
the documentation. the documentation.
......
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Extending Theano'
import unittest
import theano
# 1. Op returns x * y
class ProdOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, x, y):
x = theano.tensor.as_tensor_variable(x)
y = theano.tensor.as_tensor_variable(y)
outdim = x.ndim
output = (theano.tensor.TensorType
(dtype=theano.scalar.upcast(x.dtype, y.dtype),
broadcastable=[False] * outdim)())
return theano.Apply(self, inputs=[x, y], outputs=[output])
def perform(self, node, inputs, output_storage):
x, y = inputs
z = output_storage[0]
z[0] = x * y
def infer_shape(self, node, i0_shapes):
return [i0_shapes[0]]
def grad(self, inputs, output_grads):
return [output_grads[0] * inputs[1], output_grads[0] * inputs[0]]
# 2. Op returns x + y and x - y
class SumDiffOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, x, y):
x = theano.tensor.as_tensor_variable(x)
y = theano.tensor.as_tensor_variable(y)
outdim = x.ndim
output1 = (theano.tensor.TensorType
(dtype=theano.scalar.upcast(x.dtype, y.dtype),
broadcastable=[False] * outdim)())
output2 = (theano.tensor.TensorType
(dtype=theano.scalar.upcast(x.dtype, y.dtype),
broadcastable=[False] * outdim)())
return theano.Apply(self, inputs=[x, y], outputs=[output1, output2])
def perform(self, node, inputs, output_storage):
x, y = inputs
z1, z2 = output_storage
z1[0] = x + y
z2[0] = x - y
def infer_shape(self, node, i0_shapes):
return [i0_shapes[0], i0_shapes[0]]
def grad(self, inputs, output_grads):
og1, og2 = output_grads
if og1 is None:
og1 = theano.tensor.zeros_like(og2)
if og2 is None:
og2 = theano.tensor.zeros_like(og1)
return [og1 + og2, og1 - og2]
# 3. Testing apparatus
import numpy
from theano.gof import Op, Apply
from theano import tensor, function, printing
from theano.tests import unittest_tools as utt
class TestProdOp(utt.InferShapeTester):
rng = numpy.random.RandomState(43)
def setUp(self):
super(TestProdOp, self).setUp()
self.op_class = ProdOp # case 1
def test_perform(self):
x = theano.tensor.matrix()
y = theano.tensor.matrix()
f = theano.function([x, y], self.op_class()(x, y))
x_val = numpy.random.rand(5, 4)
y_val = numpy.random.rand(5, 4)
out = f(x_val, y_val)
assert numpy.allclose(x_val * y_val, out)
def test_gradient(self):
utt.verify_grad(self.op_class(), [numpy.random.rand(5, 4),
numpy.random.rand(5, 4)],
n_tests=1, rng=TestProdOp.rng)
def test_infer_shape(self):
x = tensor.dmatrix()
y = tensor.dmatrix()
self._compile_and_check([x, y], [self.op_class()(x, y)],
[numpy.random.rand(5, 6),
numpy.random.rand(5, 6)],
self.op_class)
class TestSumDiffOp(utt.InferShapeTester):
rng = numpy.random.RandomState(43)
def setUp(self):
super(TestSumDiffOp, self).setUp()
self.op_class = SumDiffOp
def test_perform(self):
x = theano.tensor.matrix()
y = theano.tensor.matrix()
f = theano.function([x, y], self.op_class()(x, y))
x_val = numpy.random.rand(5, 4)
y_val = numpy.random.rand(5, 4)
out = f(x_val, y_val)
assert numpy.allclose([x_val + y_val, x_val - y_val], out)
def test_gradient(self):
def output_0(x, y):
return self.op_class()(x, y)[0]
def output_1(x, y):
return self.op_class()(x, y)[1]
utt.verify_grad(output_0, [numpy.random.rand(5, 4),
numpy.random.rand(5, 4)],
n_tests=1, rng=TestSumDiffOp.rng)
utt.verify_grad(output_1, [numpy.random.rand(5, 4),
numpy.random.rand(5, 4)],
n_tests=1, rng=TestSumDiffOp.rng)
def test_infer_shape(self):
x = tensor.dmatrix()
y = tensor.dmatrix()
# adapt the choice of the next instruction to the op under test
self._compile_and_check([x, y], self.op_class()(x, y),
[numpy.random.rand(5, 6),
numpy.random.rand(5, 6)],
self.op_class)
if __name__ == "__main__":
unittest.main()
...@@ -8,33 +8,46 @@ Frequently Asked Questions ...@@ -8,33 +8,46 @@ Frequently Asked Questions
TypeError: object of type 'TensorVariable' has no len() TypeError: object of type 'TensorVariable' has no len()
------------------------------------------------------- -------------------------------------------------------
If you receive this error: If you receive the following error, it is because the Python function *__len__* cannot
be implemented on Theano variables:
.. code-block:: python .. code-block:: python
TypeError: object of type 'TensorVariable' has no len() TypeError: object of type 'TensorVariable' has no len()
We can't implement the __len__ function on Theano Variables. This is Python requires that *__len__* returns an integer, yet it cannot be done as Theano's variables are symbolic. However, `var.shape[0]` can be used as a workaround.
because Python requires that this function returns an integer, but we
can't do this as we are working with symbolic variables. You can use
`var.shape[0]` as a workaround.
Also we can't change the above error message into a more explicit one This error message cannot be made more explicit because the relevant aspects of Python's
because of some other Python internal behavior that can't be modified. internals cannot be modified.
Faster gcc optimization Faster gcc optimization
----------------------- -----------------------
You can enable faster gcc optimization with the cxxflags. This list of flags was suggested on the mailing list:: You can enable faster gcc optimization with the ``cxxflags``. This list of flags was suggested on the mailing list::
cxxflags=-march=native -O3 -ffast-math -ftree-loop-distribution -funroll-loops -ftracer cxxflags=-march=native -O3 -ffast-math -ftree-loop-distribution -funroll-loops -ftracer
Use it at your own risk. Some people warned that the -ftree-loop-distribution optimization caused them wrong results in the past. Use it at your own risk. Some people warned that the ``-ftree-loop-distribution`` optimization resulted in wrong results in the past.
Also the -march=native must be used with care if you have NFS. In that case, you MUST set the compiledir to a local path of the computer. Also the ``-march=native`` flag must be used with care if you have NFS. In that case, you MUST set the compiledir to a local path of the computer.
Related Projects Related Projects
---------------- ----------------
We try to list in this `wiki page <https://github.com/Theano/Theano/wiki/Related-projects>`_ other Theano related projects. We try to list in this `wiki page <https://github.com/Theano/Theano/wiki/Related-projects>`_ other Theano related projects.
"What are Theano's Limitations?"
--------------------------------
Theano offers a good amount of flexibility, but has some limitations too.
You must answer for yourself the following question: How can my algorithm be cleverly written
so as to make the most of what Theano can do?
Here is a list of some of the known limitations:
- *While*- or *for*-Loops within an expression graph are supported, but only via
the :func:`theano.scan` op (which puts restrictions on how the loop body can
interact with the rest of the graph).
- Neither *goto* nor *recursion* is supported or planned within expression graphs.
...@@ -7,54 +7,130 @@ PyCUDA/CUDAMat/Gnumpy compatibility ...@@ -7,54 +7,130 @@ PyCUDA/CUDAMat/Gnumpy compatibility
PyCUDA PyCUDA
====== ======
Currently PyCUDA and Theano have different object to store GPU Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features. data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports Theano's implementation is called *CudaNdarray* and supports
strides. It support only the float32 dtype. PyCUDA's implementation *strides*. It also only supports the *float32* dtype. PyCUDA's implementation
is called GPUArray and doesn't support strides. Instead it can deal with all numpy and Cuda dtypes. is called *GPUArray* and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes.
We are currently working on having the same base object that will We are currently working on having the same base object for both that will
mimic numpy. Until this is ready, here is some information on how to also mimic Numpy. Until this is ready, here is some information on how to
use both Project in the same script. use both objects in the same script.
Transfer Transfer
-------- --------
You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
from CudaNdarray. The function `to_cudandarray(x, copyif=False)` and from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
`to_gpuarray(x)` return a new object that share the same memory space ``to_gpuarray(x)`` return a new object that occupies the same memory space
as the original. Otherwise it raise an ValueError. Because GPUArray don't as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't
support strides, if the CudaNdarray is strided, we could copy it to support strides, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set `copyif=True` in memory region. If you want this behavior, set ``copyif=True`` in
`to_gpuarray`. ``to_gpuarray``.
Compiling with PyCUDA Compiling with PyCUDA
--------------------- ---------------------
You can use PyCUDA to compile some CUDA function that work directly on You can use PyCUDA to compile CUDA functions that work directly on
CudaNdarray. There is an example in the function `test_pycuda_theano` CudaNdarrays. Here is an example from the file ``theano/misc/tests/test_pycuda_theano_simple.py``:
in the file `theano/misc/tests/test_pycuda_theano_simple.py`. Also,
there is an example that shows how to make an op that calls a pycuda .. code-block:: python
function :ref:`here <pyCUDA_theano>`.
import sys
Theano op using PyCUDA function import numpy
------------------------------- import theano
import theano.sandbox.cuda as cuda_ndarray
You can use gpu function compiled with PyCUDA in a Theano op. Look import theano.misc.pycuda_init
into the `HPCS2011 tutorial import pycuda
<http://www.iro.umontreal.ca/~lisa/pointeurs/tutorial_hpcs2011_fixed.pdf>`_ for an example. import pycuda.driver as drv
import pycuda.gpuarray
def test_pycuda_theano():
"""Simple example with pycuda function and Theano CudaNdarray object."""
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(100).astype(numpy.float32)
b = numpy.random.randn(100).astype(numpy.float32)
# Test with Theano object
ga = cuda_ndarray.CudaNdarray(a)
gb = cuda_ndarray.CudaNdarray(b)
dest = cuda_ndarray.CudaNdarray.zeros(a.shape)
multiply_them(dest, ga, gb,
block=(400, 1, 1), grid=(1, 1))
assert (numpy.asarray(dest) == a * b).all()
Theano Op using a PyCUDA function
---------------------------------
You can use a GPU function compiled with PyCUDA in a Theano op:
.. code-block:: python
import numpy, theano
import theano.misc.pycuda_init
from pycuda.compiler import SourceModule
import theano.sandbox.cuda as cuda
class PyCUDADoubleOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, inp):
inp = cuda.basic_ops.gpu_contiguous(
cuda.basic_ops.as_cuda_ndarray_variable(inp))
assert inp.dtype == "float32"
return theano.Apply(self, [inp], [inp.type()])
def make_thunk(self, node, storage_map, _, _2):
mod = SourceModule("""
__global__ void my_fct(float * i0, float * o0, int size) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(i<size){
o0[i] = i0[i] * 2;
}
}""")
pycuda_fct = mod.get_function("my_fct")
inputs = [ storage_map[v] for v in node.inputs]
outputs = [ storage_map[v] for v in node.outputs]
def thunk():
z = outputs[0]
if z[0] is None or z[0].shape!=inputs[0][0].shape:
z[0] = cuda.CudaNdarray.zeros(inputs[0][0].shape)
grid = (int(numpy.ceil(inputs[0][0].size / 512.)),1)
pycuda_fct(inputs[0][0], z[0], numpy.intc(inputs[0][0].size),
block=(512, 1, 1), grid=grid)
return thunk
CUDAMat CUDAMat
======= =======
There is conversion function between CUDAMat object and Theano CudaNdArray. They are with the same principe as PyCUDA one's. They are in theano.misc.cudamat_utils.py There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects.
They obey the same principles as Theano's PyCUDA functions and can be found in
``theano.misc.cudamat_utils.py``.
.. TODO: this statement is unclear:
WARNING: there is a strange problem with stride/shape with those converter. The test to work need a transpose and reshape... WARNING: There is a peculiar problem associated with stride/shape with those converters.
In order to work, the test needs a *transpose* and *reshape*...
Gnumpy Gnumpy
====== ======
There is conversion function between gnumpy garray object and Theano CudaNdArray. They are with the same principe as PyCUDA one's. They are in theano.misc.gnumpy_utils.py There are conversion functions between Gnumpy *garray* objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in ``theano.misc.gnumpy_utils.py``.
...@@ -6,24 +6,26 @@ ...@@ -6,24 +6,26 @@
Derivatives in Theano Derivatives in Theano
===================== =====================
Computing gradients Computing Gradients
=================== ===================
Now let's use Theano for a slightly more sophisticated task: create a Now let's use Theano for a slightly more sophisticated task: create a
function which computes the derivative of some expression ``y`` with function which computes the derivative of some expression *y* with
respect to its parameter ``x``. To do this we will use the macro ``T.grad``. respect to its parameter *x*. To do this we will use the macro ``T.grad``.
For instance, we can compute the For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`. Note that: gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`. :math:`d(x^2)/dx = 2 \cdot x`.
Here is code to compute this gradient: .. TODO: fix the vertical positioning of the expressions in the preceding paragraph
Here is the code to compute this gradient:
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_4 .. theano/tests/test_tutorial.py:T_examples.test_examples_4
>>> from theano import pp >>> from theano import pp
>>> x = T.dscalar('x') >>> x = T.dscalar('x')
>>> y = x**2 >>> y = x ** 2
>>> gy = T.grad(y, x) >>> gy = T.grad(y, x)
>>> pp(gy) # print out the gradient prior to optimization >>> pp(gy) # print out the gradient prior to optimization
'((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))' '((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))'
...@@ -33,10 +35,10 @@ array(8.0) ...@@ -33,10 +35,10 @@ array(8.0)
>>> f(94.2) >>> f(94.2)
array(188.40000000000001) array(188.40000000000001)
In the example above, we can see from ``pp(gy)`` that we are computing In this example, we can see from ``pp(gy)`` that we are computing
the correct symbolic gradient. the correct symbolic gradient.
``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as ``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as
``x ** 2`` and fill it with 1.0. *x* ** *2* and fill it with *1.0*.
.. note:: .. note::
The optimizer simplifies the symbolic gradient expression. You can see The optimizer simplifies the symbolic gradient expression. You can see
...@@ -56,7 +58,7 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`. ...@@ -56,7 +58,7 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
.. figure:: dlogistic.png .. figure:: dlogistic.png
A plot of the gradient of the logistic function, with x on the x-axis A plot of the gradient of the logistic function, with *x* on the x-axis
and :math:`ds(x)/dx` on the y-axis. and :math:`ds(x)/dx` on the y-axis.
...@@ -71,133 +73,137 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`. ...@@ -71,133 +73,137 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
array([[ 0.25 , 0.19661193], array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]]) [ 0.19661193, 0.10499359]])
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides In general, for any **scalar** expression *s*, ``T.grad(s, w)`` provides
the theano expression for computing :math:`\frac{\partial s}{\partial w}`. In the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation this way Theano can be used for doing **efficient** symbolic differentiation
(as (as the expression returned by ``T.grad`` will be optimized during compilation), even for
the expression return by ``T.grad`` will be optimized during compilation) even for function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
function with many inputs. ( see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation). of symbolic differentiation).
.. note:: .. note::
The second argument of ``T.grad`` can be a list, in which case the The second argument of ``T.grad`` can be a list, in which case the
output is also a list. The order in both list is important, element output is also a list. The order in both lists is important: element
*i* of the output list is the gradient of the first argument of *i* of the output list is the gradient of the first argument of
``T.grad`` with respect to the *i*-th element of the list given as second argument. ``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor The first argument of ``T.grad`` has to be a scalar (a tensor
of size 1). For more information on the semantics of the arguments of of size 1). For more information on the semantics of the arguments of
``T.grad`` and details about the implementation, see :ref:`this <libdoc_gradient>`. ``T.grad`` and details about the implementation, see
:ref:`this<libdoc_gradient>` section of the library.
Additional information on the inner workings of differentiation may also be
found in the more advanced tutorial :ref:`Extending Theano<extending>`.
Computing the Jacobian Computing the Jacobian
====================== ======================
Theano implements :func:`theano.gradient.jacobian` macro that does all In Theano's parlance, the term *Jacobian* designates the tensor comprising the
what is needed to compute the Jacobian. The following text explains how first partial derivatives of the output of a function with respect to its inputs.
(This is a generalization of to the so-called Jacobian matrix in Mathematics.)
Theano implements the :func:`theano.gradient.jacobian` macro that does all
that is needed to compute the Jacobian. The following text explains how
to do it manually. to do it manually.
In order to manually compute the Jacobian of some function ``y`` with In order to manually compute the Jacobian of some function *y* with
respect to some parameter ``x`` we need to use ``scan``. What we respect to some parameter *x* we need to use ``scan``. What we
do is to loop over the entries in ``y`` and compute the gradient of do is to loop over the entries in *y* and compute the gradient of
``y[i]`` with respect to ``x``. *y[i]* with respect to *x*.
.. note:: .. note::
``scan`` is a generic op in Theano that allows writting in a symbolic ``scan`` is a generic op in Theano that allows writing in a symbolic
manner all kind of recurrent equations. While in principle, creating manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task, symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. More effort is being done for improving the performance of ``scan``. We
information about how to use this op, see :ref:`this <lib_scan>`. shall return to :ref:`scan<tutloop>` later in this tutorial.
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> y = x**2 >>> y = x ** 2
>>> J, updates = theano.scan(lambda i, y,x : T.grad(y[i], x), sequences = T.arange(y.shape[0]), non_sequences = [y,x]) >>> J, updates = theano.scan(lambda i, y,x : T.grad(y[i], x), sequences=T.arange(y.shape[0]), non_sequences=[y,x])
>>> f = function([x], J, updates = updates) >>> f = function([x], J, updates=updates)
>>> f([4,4]) >>> f([4, 4])
array([[ 8., 0.], array([[ 8., 0.],
[ 0., 8.]]) [ 0., 8.]])
What we did in this code, is to generate a sequence of ints from ``0`` to What we do in this code is to generate a sequence of *ints* from *0* to
``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and ``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
at each step, we compute the gradient of element ``y[[i]`` with respect to at each step, we compute the gradient of element *y[i]* with respect to
``x``. ``scan`` automatically concatenates all these rows, generating a *x*. ``scan`` automatically concatenates all these rows, generating a
matrix, which corresponds to the Jacobian. matrix which corresponds to the Jacobian.
.. note:: .. note::
There are a few gotchas regarding ``T.grad``. One of them is that you There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you
can not re-write the above expression of the jacobian as cannot re-write the above expression of the Jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y, ``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this non_sequences=x)``, even though from the documentation of scan this
seems possible. The reason is that ``y_i`` will not be a function of seems possible. The reason is that *y_i* will not be a function of
``x`` anymore, while ``y[i]`` still is. *x* anymore, while *y[i]* still is.
Computing the Hessian Computing the Hessian
===================== =====================
Theano implements :func:`theano.gradient.hessian` macro that does all In Theano, the term *Hessian* has the usual mathematical acception: It is the
matrix comprising the second order partial derivative of a function with scalar
output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all
that is needed to compute the Hessian. The following text explains how that is needed to compute the Hessian. The following text explains how
to do it manually. to do it manually.
You can compute the Hessian manually as the Jacobian. The only You can compute the Hessian manually similarly to the Jacobian. The only
difference is that now, instead of computing the Jacobian of some expression difference is that now, instead of computing the Jacobian of some expression
``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some *y*, we compute the Jacobian of ``T.grad(cost,x)``, where *cost* is some
scalar. scalar.
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> y = x**2 >>> y = x ** 2
>>> cost = y.sum() >>> cost = y.sum()
>>> gy = T.grad(cost, x) >>> gy = T.grad(cost, x)
>>> H, updates = theano.scan(lambda i, gy,x : T.grad(gy[i], x), sequences = T.arange(gy.shape[0]), non_sequences = [gy,x]) >>> H, updates = theano.scan(lambda i, gy,x : T.grad(gy[i], x), sequences=T.arange(gy.shape[0]), non_sequences=[gy, x])
>>> f = function([x], H, updates = updates) >>> f = function([x], H, updates=updates)
>>> f([4,4]) >>> f([4, 4])
array([[ 2., 0.], array([[ 2., 0.],
[ 0., 2.]]) [ 0., 2.]])
Jacobian times a vector Jacobian times a Vector
======================= =======================
Sometimes we can express the algorithm in terms of Jacobians times vectors, Sometimes we can express the algorithm in terms of Jacobians times vectors,
or vectors times Jacobians. Compared to evaluating the Jacobian and then or vectors times Jacobians. Compared to evaluating the Jacobian and then
doing the product, there are methods that computes the wanted result, while doing the product, there are methods that compute the desired results while
avoiding actually evaluating the Jacobian. This can bring about significant avoiding actual evaluation of the Jacobian. This can bring about significant
performance gains. A description of one such algorithm can be found here: performance gains. A description of one such algorithm can be found here:
* Barak A. Pearlmutter, "Fast Exact Multiplication by the Hessian", *Neural * Barak A. Pearlmutter, "Fast Exact Multiplication by the Hessian", *Neural
Computation, 1994* Computation, 1994*
While in principle we would want Theano to identify such patterns for us, While in principle we would want Theano to identify these patterns automatically for us,
in practice, implementing such optimizations in a generic manner can be in practice, implementing such optimizations in a generic manner is extremely
close to impossible. As such, we offer special functions that difficult. Therefore, we provide special functions dedicated to these tasks.
can be used to compute such expression.
R-operator R-operator
---------- ----------
The *R operator* is suppose to evaluate the product between a Jacobian and a The *R operator* is built to evaluate the product between a Jacobian and a
vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in can be extended even for *x* being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such of tensor product. Because in practice we end up needing to compute such
expression in terms of weight matrices, theano supports this more generic expressions in terms of weight matrices, Theano supports this more generic
meaning of the operation. In order to evaluate the *R-operation* of form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v`` expression *y*, with respect to *x*, multiplying the Jacobian with *v*
you need to do something similar to this: you need to do something similar to this:
>>> W = T.dmatrix('W') >>> W = T.dmatrix('W')
>>> V = T.dmatrix('V') >>> V = T.dmatrix('V')
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> y = T.dot(x,W) >>> y = T.dot(x, W)
>>> JV = T.Rop(y, W, V) >>> JV = T.Rop(y, W, V)
>>> f = theano.function([W,V,x], JV) >>> f = theano.function([W, V, x], JV)
>>> f([[1,1],[1,1]], [[2,2,],[2,2]], [0,1]) >>> f([[1, 1], [1, 1]], [[2, 2], [2, 2]], [0,1])
array([ 2., 2.]) array([ 2., 2.])
:ref:`List <R_op_list>` of Op that implement Rop. :ref:`List <R_op_list>` of Op that implement Rop.
...@@ -205,51 +211,50 @@ array([ 2., 2.]) ...@@ -205,51 +211,50 @@ array([ 2., 2.])
L-operator L-operator
---------- ----------
Similar to *R-operator* the *L-operator* would compute a *row* vector times In similitude to the *R-operator*, the *L-operator* would compute a *row* vector times
the Jacobian. The mathematical forumla would be :math:`v \frac{\partial the Jacobian. The mathematical formula would be :math:`v \frac{\partial
f(x)}{\partial x}`. As for the *R-operator*, the *L-operator* is supported f(x)}{\partial x}`. The *L-operator* is also supported for generic tensors
for generic tensors (not only for vectors). Similarly, it can be used as (not only for vectors). Similarly, it can be implemented as follows:
follows:
>>> W = T.dmatrix('W') >>> W = T.dmatrix('W')
>>> v = T.dvector('v') >>> v = T.dvector('v')
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> y = T.dot(x,W) >>> y = T.dot(x, W)
>>> VJ = T.Lop(y, W, v) >>> VJ = T.Lop(y, W, v)
>>> f = theano.function([W,v,x], JV) >>> f = theano.function([W,v,x], JV)
>>> f([[1,1],[1,1]], [2,2,], [0,1]) >>> f([[1, 1], [1, 1]], [2, 2], [0, 1])
array([[ 0., 0.], array([[ 0., 0.],
[ 2., 2.]]) [ 2., 2.]])
.. note:: .. note::
`v`, the evaluation point, differs between the *L-operator* and the *R-operator*. `v`, the *point of evaluation*, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the evaluation point needs to have the same shape For the *L-operator*, the point of evaluation needs to have the same shape
as the output, while for the *R-operator* the evaluation point should as the output, whereas for the *R-operator* this point should
have the same shape as the input parameter. Also the result of these two have the same shape as the input parameter. Furthermore, the results of these two
opeartion differs. The result of the *L-operator* is of the same shape operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* is the same as the input parameter, while the result of the *R-operator* has a shape similar
as the output. to that of the output.
Hessian times a vector Hessian times a Vector
====================== ======================
If you need to compute the Hessian times a vector, you can make use of the If you need to compute the *Hessian times a vector*, you can make use of the
above defined operators to do it more efficiently than actually computing above-defined operators to do it more efficiently than actually computing
the exact Hessian and then doing the product. Due to the symmetry of the the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will Hessian matrix, you have two options that will
give you the same result, though these options might exhibit different performance, so we give you the same result, though these options might exhibit differing performances.
suggest to profile the methods before using either of the two: Hence, we suggest profiling the methods before using either one of the two:
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> v = T.dvector('v') >>> v = T.dvector('v')
>>> y = T.sum(x**2) >>> y = T.sum(x ** 2)
>>> gy = T.grad(y, x) >>> gy = T.grad(y, x)
>>> vH = T.grad( T.sum(gy*v), x) >>> vH = T.grad(T.sum(gy * v), x)
>>> f = theano.function([x,v], vH) >>> f = theano.function([x, v], vH)
>>> f([4,4],[2,2]) >>> f([4, 4], [2, 2])
array([ 4., 4.]) array([ 4., 4.])
...@@ -257,10 +262,26 @@ or, making use of the *R-operator*: ...@@ -257,10 +262,26 @@ or, making use of the *R-operator*:
>>> x = T.dvector('x') >>> x = T.dvector('x')
>>> v = T.dvector('v') >>> v = T.dvector('v')
>>> y = T.sum(x**2) >>> y = T.sum(x ** 2)
>>> gy = T.grad(y, x) >>> gy = T.grad(y, x)
>>> Hv = T.Rop(gy,x,v) >>> Hv = T.Rop(gy, x, v)
>>> f = theano.function([x,v], Hv) >>> f = theano.function([x, v], Hv)
>>> f([4,4],[2,2]) >>> f([4, 4], [2, 2])
array([ 4., 4.]) array([ 4., 4.])
Final Pointers
==============
* The ``grad`` function works symbolically: it receives and returns Theano variables.
* ``grad`` can be compared to a macro since it can be applied repeatedly.
* Scalar costs only can be directly handled by ``grad``. Arrays are handled through repeated applications.
* Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*.
* Work is in progress on the optimizations required to compute efficiently the full
Jacobian and the Hessian matrix as well as the *Jacobian times vector*.
...@@ -5,20 +5,21 @@ ...@@ -5,20 +5,21 @@
Tutorial Tutorial
======== ========
Let us start an interactive session (e.g. ``python`` or ``ipython``) and import Theano. Let us start an interactive session (e.g. with ``python`` or ``ipython``) and import Theano.
>>> from theano import * >>> from theano import *
Many of symbols you will need to use are in the ``tensor`` subpackage Several of the symbols you will need to use are in the ``tensor`` subpackage
of Theano. Let's import that subpackage under a handy name like of Theano. Let us import that subpackage under a handy name like
``T`` (many tutorials use this convention). ``T`` (the tutorials will frequently use this convention).
>>> import theano.tensor as T >>> import theano.tensor as T
If that worked you are ready for the tutorial, otherwise check your If that succeeded you are ready for the tutorial, otherwise check your
installation (see :ref:`install`). installation (see :ref:`install`).
Throughout the tutorial, bear in mind that there is a :ref:`glossary` to help Throughout the tutorial, bear in mind that there is a :ref:`glossary` as well
as *index* and *modules* links in the upper-right corner of each page to help
you out. you out.
.. toctree:: .. toctree::
...@@ -27,18 +28,18 @@ you out. ...@@ -27,18 +28,18 @@ you out.
numpy numpy
adding adding
examples examples
gradients
loading_and_saving
symbolic_graphs symbolic_graphs
printing_drawing
gradients
modes modes
aliasing loading_and_saving
conditions conditions
loop loop
sparse sparse
using_gpu using_gpu
gpu_data_convert gpu_data_convert
aliasing
shape_info shape_info
remarks
extending_theano
debug_faq debug_faq
extending_theano
faq faq
...@@ -6,8 +6,8 @@ Loading and Saving ...@@ -6,8 +6,8 @@ Loading and Saving
================== ==================
Python's standard way of saving class instances and reloading them Python's standard way of saving class instances and reloading them
is the pickle_ mechanism. Many Theano objects can be serialized (and is the pickle_ mechanism. Many Theano objects can be *serialized* (and
deserialized) by ``pickle``, however, a limitation of ``pickle`` is that *deserialized*) by ``pickle``, however, a limitation of ``pickle`` is that
it does not save the code or data of a class along with the instance of it does not save the code or data of a class along with the instance of
the class being serialized. As a result, reloading objects created by a the class being serialized. As a result, reloading objects created by a
previous version of a class can be really problematic. previous version of a class can be really problematic.
...@@ -24,7 +24,7 @@ as you would in the course of any other Python program. ...@@ -24,7 +24,7 @@ as you would in the course of any other Python program.
.. _pickle: http://docs.python.org/library/pickle.html .. _pickle: http://docs.python.org/library/pickle.html
The basics of pickling The Basics of Pickling
====================== ======================
The two modules ``pickle`` and ``cPickle`` have the same functionalities, but The two modules ``pickle`` and ``cPickle`` have the same functionalities, but
...@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with ...@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with
.. note:: .. note::
If you want your saved object to be stored efficiently, don't forget If you want your saved object to be stored efficiently, don't forget
to use ``cPickle.HIGHEST_PROTOCOL``, the resulting file can be to use ``cPickle.HIGHEST_PROTOCOL``. The resulting file can be
dozens of times smaller than with the default protocol. dozens of times smaller than with the default protocol.
.. note:: .. note::
...@@ -81,7 +81,7 @@ For more details about pickle's usage, see ...@@ -81,7 +81,7 @@ For more details about pickle's usage, see
`Python documentation <http://docs.python.org/library/pickle.html#usage>`_. `Python documentation <http://docs.python.org/library/pickle.html#usage>`_.
Short-term serialization Short-Term Serialization
======================== ========================
If you are confident that the class instance you are serializing will be If you are confident that the class instance you are serializing will be
...@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of: ...@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of:
self.training_set = cPickle.load(file(self.training_set_file, 'rb')) self.training_set = cPickle.load(file(self.training_set_file, 'rb'))
Long-term serialization Long-Term Serialization
======================= =======================
If the implementation of the class you want to save is quite unstable, for If the implementation of the class you want to save is quite unstable, for
...@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you ...@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you
don't. don't.
For instance, if the only parameters you want to save are a weight For instance, if the only parameters you want to save are a weight
matrix ``W`` and a bias ``b``, you can define: matrix *W* and a bias *b*, you can define:
.. code-block:: python .. code-block:: python
...@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define: ...@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define:
self.W = W self.W = W
self.b = b self.b = b
If, at some point in time, ``W`` is renamed to ``weights`` and ``b`` to If at some point in time *W* is renamed to *weights* and *b* to
``bias``, the older pickled files will still be usable, if you update these *bias*, the older pickled files will still be usable, if you update these
functions to reflect the change in name: functions to reflect the change in name:
.. code-block:: python .. code-block:: python
...@@ -152,6 +152,6 @@ functions to reflect the change in name: ...@@ -152,6 +152,6 @@ functions to reflect the change in name:
self.weights = W self.weights = W
self.bias = b self.bias = b
For more information on advanced use of pickle and its internals, see Python's For more information on advanced use of ``pickle`` and its internals, see Python's
pickle_ documentation. pickle_ documentation.
...@@ -4,4 +4,94 @@ ...@@ -4,4 +4,94 @@
Loop Loop
==== ====
You can use :ref:`Scan <lib_scan>` to do all type of loop in Theano. All the documentation about it is in the library for now.
Scan
====
- A general form of *recurrence*, which can be used for looping.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of ``scan``.
- You ``scan`` a function along some input sequence, producing an output at each time-step.
- The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the *z + x(i)* function over a list, given an initial state of *z=0*.
- Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over *for* loops:
- Number of iterations to be part of the symbolic graph.
- Minimizes GPU transfers (if GPU is involved).
- Computes gradients through sequential steps.
- Slightly faster than using a *for* loop in Python with a compiled Theano function.
- Can lower the overall memory usage by detecting the actual amount of memory needed.
The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
**Scan Example: Computing pow(A,k)**
.. code-block:: python
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
k = T.iscalar("k")
A = T.vector("A")
def inner_fct(prior_result, A):
return prior_result * A
# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
outputs_info=T.ones_like(A),
non_sequences=A, n_steps=k)
# Scan has provided us with A ** 1 through A ** k. Keep only the last
# value. Scan notices this and does not waste memory saving them.
final_result = result[-1]
power = theano.function(inputs=[A, k], outputs=final_result,
updates=updates)
print power(range(10),2)
#[ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81.]
**Scan Example: Calculating a Polynomial**
.. code-block:: python
import numpy
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range=theano.tensor.arange(max_coefficients_supported)
components, updates = theano.scan(fn=lambda coeff, power, free_var:
coeff * (free_var ** power),
outputs_info=None,
sequences=[coefficients, full_range],
non_sequences=x)
polynomial = components.sum()
calculate_polynomial = theano.function(inputs=[coefficients, x],
outputs=polynomial)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial(test_coeff, 3)
# 19.0
-------------------------------------------
**Exercise**
Run both examples.
Modify and execute the polynomial example to have the reduction done by ``scan``.
:download:`Solution<loop_solution_1.py>`
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Loop'
import numpy
import theano
import theano.tensor as tt
# 1. First example
theano.config.warn.subtensor_merge_bug = False
k = tt.iscalar("k")
A = tt.vector("A")
def inner_fct(prior_result, A):
return prior_result * A
# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
outputs_info=tt.ones_like(A),
non_sequences=A, n_steps=k)
# Scan has provided us with A ** 1 through A ** k. Keep only the last
# value. Scan notices this and does not waste memory saving them.
final_result = result[-1]
power = theano.function(inputs=[A, k], outputs=final_result,
updates=updates)
print power(range(10), 2)
# [ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81.]
# 2. Second example
coefficients = tt.vector("coefficients")
x = tt.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range = tt.arange(max_coefficients_supported)
components, updates = theano.scan(fn=lambda coeff, power, free_var:
coeff * (free_var ** power),
sequences=[coefficients, full_range],
outputs_info=None,
non_sequences=x)
polynomial = components.sum()
calculate_polynomial1 = theano.function(inputs=[coefficients, x],
outputs=polynomial)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial1(test_coeff, 3)
# 19.0
# 3. Reduction performed inside scan
theano.config.warn.subtensor_merge_bug = False
coefficients = tt.vector("coefficients")
x = tt.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range = tt.arange(max_coefficients_supported)
outputs_info = tt.as_tensor_variable(numpy.asarray(0, 'float64'))
components, updates = theano.scan(fn=lambda coeff, power, prior_value, free_var:
prior_value + (coeff * (free_var ** power)),
sequences=[coefficients, full_range],
outputs_info=outputs_info,
non_sequences=x)
polynomial = components[-1]
calculate_polynomial = theano.function(inputs=[coefficients, x],
outputs=polynomial, updates=updates)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial(test_coeff, 3)
# 19.0
.. _using_modes: .. _using_modes:
=============================== ==========================================
Using different compiling modes Configuration Settings and Compiling Modes
=============================== ==========================================
Configuration
=============
The ``config`` module contains several *attributes* that modify Theano's behavior. Many of these
attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only.
*As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
Theano's code comes with default values for these attributes, but you can
override them from your ``.theanorc`` file, and override those values in turn by
the :envvar:`THEANO_FLAGS` environment variable.
The order of precedence is:
1. an assignment to theano.config.<property>
2. an assignment in :envvar:`THEANO_FLAGS`
3. an assignment in the .theanorc file (or the file indicated in :envvar:`THEANORC`)
You can display the current/effective configuration at any time by printing
theano.config. For example, to see a list of all active configuration
variables, type this from the command-line:
.. code-block:: bash
python -c 'import theano; print theano.config' | less
For more detail, see :ref:`Configuration <libdoc_config>` in the library.
-------------------------------------------
**Exercise**
Consider the logistic regression:
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.01*gw, b:b-0.01*gb},
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
Modify and execute this example to run on CPU (the default) with floatX=float32 and
time the execution using the command line ``time python file.py``. Save your code
as it will be useful later on.
.. Note::
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of *int32* with *float32* to *float64*:
* Insert manual cast in your code or use *[u]int{8,16}*.
* Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
* Notice that a new casting mechanism is being developed.
:download:`Solution<modes_solution_1.py>`
-------------------------------------------
Mode Mode
==== ====
Everytime :func:`theano.function <function.function>` is called Everytime :func:`theano.function <function.function>` is called,
the symbolic relationships between the input and output Theano *variables* the symbolic relationships between the input and output Theano *variables*
are optimized and compiled. The way this compilation occurs are optimized and compiled. The way this compilation occurs
is controlled by the value of the ``mode`` parameter. is controlled by the value of the ``mode`` parameter.
...@@ -17,10 +134,10 @@ Theano defines the following modes by name: ...@@ -17,10 +134,10 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations. - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible. - ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python - ``'DebugMode``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes, implementations. This mode can take much longer than the other modes, but can identify
but can identify many kinds of problems. several kinds of problems.
- ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information - ``'ProfileMode'``: Same optimization then FAST_RUN, put print some profiling information
The default mode is typically ``FAST_RUN``, but it can be controlled via The default mode is typically ``FAST_RUN``, but it can be controlled via
the configuration variable :attr:`config.mode`, the configuration variable :attr:`config.mode`,
...@@ -30,18 +147,18 @@ which can be overridden by passing the keyword argument to ...@@ -30,18 +147,18 @@ which can be overridden by passing the keyword argument to
================= =============================================================== =============================================================================== ================= =============================================================== ===============================================================================
short name Full constructor What does it do? short name Full constructor What does it do?
================= =============================================================== =============================================================================== ================= =============================================================== ===============================================================================
FAST_COMPILE ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations ``FAST_COMPILE`` ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations
FAST_RUN ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations. ``FAST_RUN`` ``compile.mode.Mode(linker='cvm', optimizer='fast_run')`` C implementations where available, all available graph transformations.
DEBUG_MODE ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations. ``DebugMode`` ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
PROFILE_MODE ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information. ``ProfileMode`` ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information.
================= =============================================================== =============================================================================== ================= =============================================================== ===============================================================================
Linkers Linkers
======= =======
A mode is composed of 2 things: an optimizer and a linker. Some modes, A mode is composed of 2 things: an optimizer and a linker. Some modes,
like PROFILE_MODE and DEBUG_MODE, add logic around the optimizer and like ``ProfileMode`` and ``DebugMode``, add logic around the optimizer and
linker. PROFILE_MODE and DEBUG_MODE use their own linker. linker. ``ProfileMode`` and ``DebugMode`` use their own linker.
You can select witch linker to use with the Theano flag :attr:`config.linker`. You can select witch linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers. Here is a table to compare the different linkers.
...@@ -49,11 +166,13 @@ Here is a table to compare the different linkers. ...@@ -49,11 +166,13 @@ Here is a table to compare the different linkers.
============= ========= ================= ========= === ============= ========= ================= ========= ===
linker gc [#gc]_ Raise error by op Overhead Definition linker gc [#gc]_ Raise error by op Overhead Definition
============= ========= ================= ========= === ============= ========= ================= ========= ===
c|py [#cpy1]_ yes yes "+++" Try c code. If none exist for an op, use python cvm yes yes "++" As c|py, but the runtime algo to execute the code is in c
cvm_nogc no yes "+" As cvm, but without gc
c|py [#cpy1]_ yes yes "+++" Try C code. If none exists for an op, use Python
c|py_nogc no yes "++" As c|py, but without gc c|py_nogc no yes "++" As c|py, but without gc
c no yes "+" Use only c code (if none available for an op, raise an error) c no yes "+" Use only C code (if none available for an op, raise an error)
py yes yes "+++" Use only python code py yes yes "+++" Use only Python code
c&py [#cpy2]_ no yes "+++++" Use c and python code c&py [#cpy2]_ no yes "+++++" Use C and Python code
ProfileMode no no "++++" Compute some extra profiling info ProfileMode no no "++++" Compute some extra profiling info
DebugMode no yes VERY HIGH Make many checks on what Theano computes DebugMode no yes VERY HIGH Make many checks on what Theano computes
============= ========= ================= ========= === ============= ========= ================= ========= ===
...@@ -62,11 +181,14 @@ DebugMode no yes VERY HIGH Make many checks on what ...@@ -62,11 +181,14 @@ DebugMode no yes VERY HIGH Make many checks on what
.. [#gc] Garbage collection of intermediate results during computation. .. [#gc] Garbage collection of intermediate results during computation.
Otherwise, their memory space used by the ops is kept between Otherwise, their memory space used by the ops is kept between
Theano function calls, in order not to Theano function calls, in order not to
reallocate memory, and lower the overhead (make it faster...) reallocate memory, and lower the overhead (make it faster...).
.. [#cpy1] default .. [#cpy1] Default
.. [#cpy2] Deprecated .. [#cpy2] Deprecated
For more detail, see :ref:`Mode<libdoc_compile_mode>` in the library.
.. _using_debugmode: .. _using_debugmode:
Using DebugMode Using DebugMode
...@@ -75,11 +197,11 @@ Using DebugMode ...@@ -75,11 +197,11 @@ Using DebugMode
While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode, While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode,
it is useful at first (especially when you are defining new kinds of it is useful at first (especially when you are defining new kinds of
expressions or new optimizations) to run your code using the DebugMode expressions or new optimizations) to run your code using the DebugMode
(available via ``mode='DEBUG_MODE'``). The DebugMode is designed to (available via ``mode='DebugMode``). The DebugMode is designed to
do several self-checks and assertations that can help to diagnose run several self-checks and assertions that can help diagnose
possible programming errors that can lead to incorect output. Note that possible programming errors leading to incorrect output. Note that
``DEBUG_MODE`` is much slower then ``FAST_RUN`` or ``FAST_COMPILE`` so ``DebugMode`` is much slower than ``FAST_RUN`` or ``FAST_COMPILE`` so
use it only during development (not when you launch 1000 process on a use it only during development (not when you launch 1000 processes on a
cluster!). cluster!).
...@@ -92,7 +214,7 @@ DebugMode is used as follows: ...@@ -92,7 +214,7 @@ DebugMode is used as follows:
x = T.dvector('x') x = T.dvector('x')
f = theano.function([x], 10*x, mode='DEBUG_MODE') f = theano.function([x], 10 * x, mode='DebugMode')
f([5]) f([5])
f([0]) f([0])
...@@ -100,46 +222,51 @@ DebugMode is used as follows: ...@@ -100,46 +222,51 @@ DebugMode is used as follows:
If any problem is detected, DebugMode will raise an exception according to If any problem is detected, DebugMode will raise an exception according to
what went wrong, either at call time (``f(5)``) or compile time ( what went wrong, either at call time (*f(5)*) or compile time (
``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions ``f = theano.function(x, 10 * x, mode='DebugMode')``). These exceptions
should *not* be ignored; talk to your local Theano guru or email the should *not* be ignored; talk to your local Theano guru or email the
users list if you cannot make the exception go away. users list if you cannot make the exception go away.
Some kinds of errors can only be detected for certain input value combinations. Some kinds of errors can only be detected for certain input value combinations.
In the example above, there is no way to guarantee that a future call to say, In the example above, there is no way to guarantee that a future call to, say
``f(-1)`` won't cause a problem. DebugMode is not a silver bullet. *f(-1)*, won't cause a problem. DebugMode is not a silver bullet.
.. TODO: repair the following link
If you instantiate DebugMode using the constructor (see :class:`DebugMode`) If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via rather than the keyword ``DebugMode`` you can configure its behaviour via
constructor arguments. See :ref:`DebugMode <debugMode>` for details. constructor arguments. The keyword version of DebugMode (which you get by using ``mode='DebugMode'``)
The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``)
is quite strict. is quite strict.
For more detail, see :ref:`DebugMode<debugmode>` in the library.
.. _using_profilemode: .. _using_profilemode:
ProfileMode ProfileMode
=========== ===========
Beside checking for errors, another important task is to profile your Besides checking for errors, another important task is to profile your
code. For this Theano uses a special mode called ProfileMode which has code. For this Theano uses a special mode called ProfileMode which has
to be passed as an argument to :func:`theano.function <function.function>`. to be passed as an argument to :func:`theano.function <function.function>`.
Using the ProfileMode is a three-step process. Using the ProfileMode is a three-step process.
.. note:: .. note::
To change the default to it, put the Theano flags To switch the default accordingly, set the Theano flag
:attr:`config.mode` to ProfileMode. In that case, when the python :attr:`config.mode` to ProfileMode. In that case, when the Python
process exit, it will automatically print the profiling process exits, it will automatically print the profiling
information on the stdout. information on the standard output.
The memory profile of the output of each apply node can be enabled with the The memory profile of the output of each ``apply`` node can be enabled with the
Theano flag :attr:`config.ProfileMode.profile_memory`. Theano flag :attr:`config.ProfileMode.profile_memory`.
For more detail, see :ref:`ProfileMode <profilemode>` in the library.
Creating a ProfileMode Instance Creating a ProfileMode Instance
------------------------------- -------------------------------
First create a ProfileMode instance. First create a ProfileMode instance:
>>> from theano import ProfileMode >>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker()) >>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
...@@ -151,10 +278,10 @@ implementation only, should use the gof.PerformLinker (or "py" for ...@@ -151,10 +278,10 @@ implementation only, should use the gof.PerformLinker (or "py" for
short). On the other hand, a user wanting to profile his graph using C short). On the other hand, a user wanting to profile his graph using C
implementations wherever possible should use the ``gof.OpWiseCLinker`` implementations wherever possible should use the ``gof.OpWiseCLinker``
(or "c|py"). For testing the speed of your code we would recommend (or "c|py"). For testing the speed of your code we would recommend
using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker. using the ``fast_run`` optimizer and the ``gof.OpWiseCLinker`` linker.
Compiling your Graph with ProfileMode Compiling your Graph with ProfileMode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -------------------------------------
Once the ProfileMode instance is created, simply compile your graph as you Once the ProfileMode instance is created, simply compile your graph as you
would normally, by specifying the mode parameter. would normally, by specifying the mode parameter.
...@@ -166,19 +293,15 @@ would normally, by specifying the mode parameter. ...@@ -166,19 +293,15 @@ would normally, by specifying the mode parameter.
>>> minst = m.make(mode=profmode) >>> minst = m.make(mode=profmode)
Retrieving Timing Information Retrieving Timing Information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -----------------------------
Once your graph is compiled, simply run the program or operation you wish to Once your graph is compiled, simply run the program or operation you wish to
profile, then call ``profmode.print_summary()``. This will provide you with profile, then call ``profmode.print_summary()``. This will provide you with
the desired timing information, indicating where your graph is spending most the desired timing information, indicating where your graph is spending most
of its time. of its time. This is best shown through an example. Let's use our logistic
regression example.
This is best shown through an example.
Lets use the example of logistic
regression. (Code for this example is in the file
``benchmark/regression/regression.py``.)
Compiling the module with ProfileMode and calling ``profmode.print_summary()`` Compiling the module with ``ProfileMode`` and calling ``profmode.print_summary()``
generates the following output: generates the following output:
.. code-block:: python .. code-block:: python
...@@ -228,16 +351,18 @@ generates the following output: ...@@ -228,16 +351,18 @@ generates the following output:
""" """
The summary has two components to it. In the first section called the This output has two components. In the first section called
Apply-wise summary, timing information is provided for the worst *Apply-wise summary*, timing information is provided for the worst
offending Apply nodes. This corresponds to individual Op applications offending ``Apply`` nodes. This corresponds to individual op applications
within your graph which take the longest to execute (so if you use within your graph which took longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion, ``dot`` twice, you will see two entries there). In the second portion,
the Op-wise summary, the execution time of all Apply nodes executing the *Op-wise summary*, the execution time of all ``Apply`` nodes executing
the same Op are grouped together and the total execution time per Op the same op are grouped together and the total execution time per op
is shown (so if you use ``dot`` twice, you will see only one entry is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them). there corresponding to the sum of the time spent in each of them).
Finally, notice that the ``ProfileMode`` also shows which ops were running a C
Note that the ProfileMode also shows which Ops were running a c
implementation. implementation.
For more detail, see :ref:`ProfileMode<libdoc_compile_mode>` in the library.
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Configuration Settings and Compiling Modes'
import numpy
import theano
import theano.tensor as tt
theano.config.floatX = 'float32'
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N, low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = tt.matrix("x")
y = tt.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1) # Cross-entropy
cost = tt.cast(xent.mean(), 'float32') + \
0.01 * (w ** 2).sum() # The cost to optimize
gw, gb = tt.grad(cost, [w, b])
# Compile expressions to functions
train = theano.function(
inputs=[x, y],
outputs=[prediction, xent],
updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
name="train")
predict = theano.function(inputs=[x], outputs=prediction,
name="predict")
if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
...@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a ...@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a
neural network then the weights from the input to the first hidden neural network then the weights from the input to the first hidden
layer would represent a matrix of size (5, #hid). layer would represent a matrix of size (5, #hid).
If I have an array: Consider this array:
>>> numpy.asarray([[1., 2], [3, 4], [5, 6]]) >>> numpy.asarray([[1., 2], [3, 4], [5, 6]])
array([[ 1., 2.], array([[ 1., 2.],
...@@ -37,7 +37,7 @@ This is a 3x2 matrix, i.e. there are 3 rows and 2 columns. ...@@ -37,7 +37,7 @@ This is a 3x2 matrix, i.e. there are 3 rows and 2 columns.
To access the entry in the 3rd row (row #2) and the 1st column (column #0): To access the entry in the 3rd row (row #2) and the 1st column (column #0):
>>> numpy.asarray([[1., 2], [3, 4], [5, 6]])[2,0] >>> numpy.asarray([[1., 2], [3, 4], [5, 6]])[2, 0]
5.0 5.0
...@@ -61,5 +61,5 @@ array([2., 4., 6.]) ...@@ -61,5 +61,5 @@ array([2., 4., 6.])
The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size
as ``a`` during the multiplication. This trick is often useful in as ``a`` during the multiplication. This trick is often useful in
simplifying how expression are written. More details about *broadcasting* simplifying how expression are written. More detail about *broadcasting*
can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__. can be found in the `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
.. _tutorial_printing_drawing:
==============================
Printing/Drawing Theano graphs
==============================
.. TODO: repair the defective links in the next paragraph
Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a *png* image of the function.
You can read about them in :ref:`libdoc_printing`.
Consider again the logistic regression but notice the additional printing instuctions.
The following output depicts the pre- and post- compilation graphs.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y * T.log(p_1) - (1 - y) * T.log(1 - p_1) # Cross-entropy
cost = xent.mean() + 0.01 * (w ** 2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w, b])
# Compile expressions to functions
train = theano.function(
inputs=[x, y],
outputs=[prediction, xent],
updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
name="train")
predict = theano.function(inputs=[x], outputs=prediction,
name="predict")
if any( [x.op.__class__.__name__=='Gemv' for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
# Print the picture graphs
# after compilation
theano.printing.pydotprint(predict,
outfile="pics/logreg_pydotprint_predic.png",
var_with_name_simple=True)
# before compilation
theano.printing.pydotprint_variables(prediction,
outfile="pics/logreg_pydotprint_prediction.png",
var_with_name_simple=True)
theano.printing.pydotprint(train,
outfile="pics/logreg_pydotprint_train.png",
var_with_name_simple=True)
Pretty Printing
===============
``theano.printing.pprint(variable)``
>>> theano.printing.pprint(prediction) # (pre-compilation)
gt((TensorConstant{1} / (TensorConstant{1} + exp(((-(x \\dot w)) - b)))),TensorConstant{0.5})
Debug Printing
==============
``theano.printing.debugprint({fct, variable, list of variables})``
>>> theano.printing.debugprint(prediction) # (pre-compilation)
Elemwise{gt,no_inplace} [@181772236] ''
|Elemwise{true_div,no_inplace} [@181746668] ''
| |InplaceDimShuffle{x} [@181746412] ''
| | |TensorConstant{1} [@181745836]
| |Elemwise{add,no_inplace} [@181745644] ''
| | |InplaceDimShuffle{x} [@181745420] ''
| | | |TensorConstant{1} [@181744844]
| | |Elemwise{exp,no_inplace} [@181744652] ''
| | | |Elemwise{sub,no_inplace} [@181744012] ''
| | | | |Elemwise{neg,no_inplace} [@181730764] ''
| | | | | |dot [@181729676] ''
| | | | | | |x [@181563948]
| | | | | | |w [@181729964]
| | | | |InplaceDimShuffle{x} [@181743788] ''
| | | | | |b [@181730156]
|InplaceDimShuffle{x} [@181771788] ''
| |TensorConstant{0.5} [@181771148]
>>> theano.printing.debugprint(predict) # (post-compilation)
Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] '' 2
|dot [@183018796] '' 1
| |x [@183000780]
| |w [@183000812]
|InplaceDimShuffle{x} [@183133580] '' 0
| |b [@183000876]
|TensorConstant{[ 0.5]} [@183084108]
Picture Printing
================
>>> theano.printing.pydotprint_variables(prediction) # (pre-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
:width: 800 px
Notice that ``pydotprint()`` requires *Graphviz* and Python's ``pydot``.
>>> theano.printing.pydotprint(predict) # (post-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
:width: 800 px
>>> theano.printing.pydotprint(train) # This is a small train example!
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
:width: 1500 px
...@@ -5,7 +5,8 @@ ...@@ -5,7 +5,8 @@
Python tutorial Python tutorial
*************** ***************
In this documentation, we suppose that reader know python. Here is a small list of python tutorials/exercices if you know know it or need a refresher: In this documentation, we suppose that the reader knows Python. Here is a small list of Python
tutorials/exercises if you need to learn it or only need a refresher:
* `Python Challenge <http://www.pythonchallenge.com/>`__ * `Python Challenge <http://www.pythonchallenge.com/>`__
* `Dive into Python <http://diveintopython.net/>`__ * `Dive into Python <http://diveintopython.net/>`__
......
.. _tutorial_general_remarks:
=====================
Some general Remarks
=====================
Theano offers quite a bit of flexibility, but has some limitations too.
How should you write your algorithm to make the most of what Theano can do?
Limitations
-----------
- While- or for-Loops within an expression graph are supported, but only via
the :func:`theano.scan` op (which puts restrictions on how the loop body can
interact with the rest of the graph).
- Neither ``goto`` nor recursion is supported or planned within expression graphs.
.. _shape_info: .. _shape_info:
============================================ ==========================================
How shape informations are handled by Theano How Shape Information is Handled by Theano
============================================ ==========================================
It is not possible to enforce strict shape into a Theano variable when It is not possible to strictly enforce the shape of a Theano variable when
building a graph. The given parameter of theano.function can change the building a graph since the particular value provided at run-time for a parameter of a
shape any TheanoVariable in a graph. Theano function may condition the shape of the Theano variables in its graph.
Currently shape informations are used for 2 things in Theano: Currently, information regarding shape is used in two ways in Theano:
- When the exact shape is known, we use it to generate faster c code for - To generate faster C code for the 2d convolution on the CPU and the GPU,
the 2d convolution on the cpu and gpu. when the exact output shape is known in advance.
- To remove computations in the graph when we only want to know the - To remove computations in the graph when we only want to know the
shape, but not the actual value of a variable. This is done with the shape, but not the actual value of a variable. This is done with the
`Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_ `Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_
method. method.
ex: Example:
.. code-block:: python .. code-block:: python
import theano import theano
x = theano.tensor.matrix('x') x = theano.tensor.matrix('x')
f = theano.function([x], (x**2).shape) f = theano.function([x], (x ** 2).shape)
theano.printing.debugprint(f) theano.printing.debugprint(f)
#MakeVector [@43860304] '' 2 #MakeVector [@43860304] '' 2
# |Shape_i{0} [@43424912] '' 1 # |Shape_i{0} [@43424912] '' 1
...@@ -32,15 +32,15 @@ Currently shape informations are used for 2 things in Theano: ...@@ -32,15 +32,15 @@ Currently shape informations are used for 2 things in Theano:
# |Shape_i{1} [@43797968] '' 0 # |Shape_i{1} [@43797968] '' 0
# | |x [@43423568] # | |x [@43423568]
The output of this compiled function do not contain any multiplication The output of this compiled function does not contain any multiplication
or power. Theano has removed them to compute directly the shape of the or power. Theano has removed them to compute directly the shape of the
output. output.
Shape inference problem Shape Inference Problem
======================= =======================
Theano propagates shape information in the graph. Sometimes this Theano propagates information about shape in the graph. Sometimes this
can lead to errors. For example: can lead to errors. Consider this example:
.. code-block:: python .. code-block:: python
...@@ -48,9 +48,9 @@ can lead to errors. For example: ...@@ -48,9 +48,9 @@ can lead to errors. For example:
import theano import theano
x = theano.tensor.matrix('x') x = theano.tensor.matrix('x')
y = theano.tensor.matrix('y') y = theano.tensor.matrix('y')
z = theano.tensor.join(0,x,y) z = theano.tensor.join(0, x, y)
xv = numpy.random.rand(5,4) xv = numpy.random.rand(5, 4)
yv = numpy.random.rand(3,3) yv = numpy.random.rand(3, 3)
f = theano.function([x,y], z.shape) f = theano.function([x,y], z.shape)
theano.printing.debugprint(f) theano.printing.debugprint(f)
...@@ -83,61 +83,61 @@ can lead to errors. For example: ...@@ -83,61 +83,61 @@ can lead to errors. For example:
# |y [@44540304] # |y [@44540304]
f(xv,yv) f(xv,yv)
# Raise a dimensions mismatch error. # Raises a dimensions mismatch error.
As you see, when you ask for the shape of some computation (join in the As you can see, when asking only for the shape of some computation (``join`` in the
example), we sometimes compute an inferred shape directly, without executing example), an inferred shape is computed directly, without executing
the computation itself (there is no join in the first output or debugprint). the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can hide errors. In This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of join is done on the first this example, the computation of the shape of the output of ``join`` is done only
theano variable in the join, not on the other. based on the first input Theano variable, which leads to an error.
This can probably happen with many other op as elemwise, dot, ... This might happen with other ops such as ``elemwise`` and ``dot``, for example.
Indeed, to make some optimizations (for speed or stability, for instance), Indeed, to perform some optimizations (for speed or stability, for instance),
Theano can assume that the computation is correct and consistent Theano assumes that the computation is correct and consistent
in the first place, this is the case here. in the first place, as it does here.
You can detect those problem by running the code without this You can detect those problems by running the code without this
optimization, with the Theano flag optimization, using the Theano flag
`optimizer_excluding=local_shape_to_shape_i`. You can also have the ``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
same effect by running in the mode FAST_COMPILE (it will not apply this same effect by running in the modes ``FAST_COMPILE`` (it will not apply this
optimization, nor most other optimizations) or DEBUG_MODE (it will test optimization, nor most other optimizations) or ``DebugMode`` (it will test
before and after all optimizations (much slower)). before and after all optimizations (much slower)).
Specifing exact shape Specifing Exact Shape
===================== =====================
Currently, specifying a shape is not as easy as we want. We plan some Currently, specifying a shape is not as easy and flexible as we wish and we plan some
upgrade, but this is the current state of what can be done. upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the `ConvOp` created - You can pass the shape info directly to the ``ConvOp`` created
when calling conv2d. You must add the parameter image_shape when calling ``conv2d``. You simply set the parameters ``image_shape``
and filter_shape to that call. They but most be tuple of 4 and ``filter_shape`` inside the call. They must be tuples of 4
elements. Ex: elements. For example:
.. code-block:: python .. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4)) theano.tensor.nnet.conv2d(..., image_shape=(7, 3, 5, 5), filter_shape=(2, 3, 4, 4))
- You can use the SpecifyShape op to add shape anywhere in the - You can use the ``SpecifyShape`` op to add shape information anywhere in the
graph. This allows to do some optimizations. In the following example, graph. This allows to perform some optimizations. In the following example,
this allows to precompute the Theano function to a constant. this makes it possible to precompute the Theano function to a constant.
.. code-block:: python .. code-block:: python
import theano import theano
x = theano.tensor.matrix() x = theano.tensor.matrix()
x_specify_shape = theano.tensor.specify_shape(x, (2,2)) x_specify_shape = theano.tensor.specify_shape(x, (2, 2))
f = theano.function([x], (x_specify_shape**2).shape) f = theano.function([x], (x_specify_shape ** 2).shape)
theano.printing.debugprint(f) theano.printing.debugprint(f)
# [2 2] [@72791376] # [2 2] [@72791376]
Future plans Future Plans
============ ============
- Add the parameter "constant shape" to theano.shared(). This is probably The parameter "constant shape" will be added to ``theano.shared()``. This is probably
the most frequent use case when we will use it. This will make the code the most frequent occurrence with ``shared`` variables. It will make the code
simpler and we will be able to check that the shape does not change when simpler and will make it possible to check that the shape does not change when
we update the shared variable. updating the ``shared`` variable.
...@@ -4,9 +4,6 @@ ...@@ -4,9 +4,6 @@
Sparse Sparse
====== ======
Sparse Matrices
===============
In general, *sparse* matrices provide the same functionality as regular In general, *sparse* matrices provide the same functionality as regular
matrices. The difference lies in the way the elements of *sparse* matrices are matrices. The difference lies in the way the elements of *sparse* matrices are
represented and stored in memory. Only the non-zero elements of the latter are stored. represented and stored in memory. Only the non-zero elements of the latter are stored.
......
...@@ -5,27 +5,31 @@ ...@@ -5,27 +5,31 @@
Graph Structures Graph Structures
================ ================
Theano Graphs
=============
Debugging or profiling code written in Theano is not that simple if you Debugging or profiling code written in Theano is not that simple if you
do not know what goes on under the hood. This chapter is meant to do not know what goes on under the hood. This chapter is meant to
introduce you to a required minimum of the inner workings of Theano, introduce you to a required minimum of the inner workings of Theano.
for more details see :ref:`extending`. For more detail see :ref:`extending`.
The first step in writing Theano code is to write down all mathematical The first step in writing Theano code is to write down all mathematical
relations using symbolic placeholders (**variables**). When writing down relations using symbolic placeholders (**variables**). When writing down
these expressions you use operations like ``+``, ``-``, ``**``, these expressions you use operations like ``+``, ``-``, ``**``,
``sum()``, ``tanh()``. All these are represented internally as **ops**. ``sum()``, ``tanh()``. All these are represented internally as **ops**.
An **op** represents a certain computation on some type of inputs An *op* represents a certain computation on some type of inputs
producing some type of output. You can see it as a function definition producing some type of output. You can see it as a *function definition*
in most programming languages. in most programming languages.
Theano builds internally a graph structure composed of interconnected Theano builds internally a graph structure composed of interconnected
**variable** nodes, **op** nodes and **apply** nodes. An **variable** nodes, **op** nodes and **apply** nodes. An
**apply** node represents the application of an **op** to some *apply* node represents the application of an *op* to some
**variables**. It is important to make the difference between the *variables*. It is important to draw the difference between the
definition of a computation represented by an **op** and its application definition of a computation represented by an *op* and its application
to some actual data which is represented by the **apply** node. For more to some actual data which is represented by the *apply* node. For more
details about these building blocks see :ref:`variable`, :ref:`op`, detail about these building blocks refer to :ref:`variable`, :ref:`op`,
:ref:`apply`. A graph example is the following: :ref:`apply`. Here is an example of a graph:
**Code** **Code**
...@@ -50,9 +54,9 @@ details about these building blocks see :ref:`variable`, :ref:`op`, ...@@ -50,9 +54,9 @@ details about these building blocks see :ref:`variable`, :ref:`op`,
WARNING: hyper-links and ref's seem to break the PDF build when placed WARNING: hyper-links and ref's seem to break the PDF build when placed
into this figure caption. into this figure caption.
Arrows in this :ref:`figure <tutorial-graphfigure>` represent references to the Arrows in this figure represent references to the
Python objects pointed at. The blue Python objects pointed at. The blue
box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green box is an :ref:`Apply` node. Red boxes are :ref:`Variable` nodes. Green
circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`. circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
...@@ -63,17 +67,17 @@ Take for example the following code: ...@@ -63,17 +67,17 @@ Take for example the following code:
.. code-block:: python .. code-block:: python
x = T.dmatrix('x') x = T.dmatrix('x')
y = x*2. y = x * 2.
If you print `type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
which is the apply node that connects the op and the inputs to get this which is the apply node that connects the op and the inputs to get this
output. You can now print the name of the op that is applied to get output. You can now print the name of the op that is applied to get
``y``: *y*:
>>> y.owner.op.name >>> y.owner.op.name
'Elemwise{mul,no_inplace}' 'Elemwise{mul,no_inplace}'
So a elementwise multiplication is used to compute ``y``. This Hence, an elementwise multiplication is used to compute *y*. This
multiplication is done between the inputs: multiplication is done between the inputs:
>>> len(y.owner.inputs) >>> len(y.owner.inputs)
...@@ -85,7 +89,7 @@ InplaceDimShuffle{x,x}.0 ...@@ -85,7 +89,7 @@ InplaceDimShuffle{x,x}.0
Note that the second input is not 2 as we would have expected. This is Note that the second input is not 2 as we would have expected. This is
because 2 was first :term:`broadcasted <broadcasting>` to a matrix of because 2 was first :term:`broadcasted <broadcasting>` to a matrix of
same shape as x. This is done by using the op ``DimShuffle`` : same shape as *x*. This is done by using the op ``DimShuffle`` :
>>> type(y.owner.inputs[1]) >>> type(y.owner.inputs[1])
<class 'theano.tensor.basic.TensorVariable'> <class 'theano.tensor.basic.TensorVariable'>
...@@ -97,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` : ...@@ -97,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` :
[2.0] [2.0]
Starting from this graph structure it is easy to understand how Starting from this graph structure it is easier to understand how
*automatic differentiation* is done, or how the symbolic relations *automatic differentiation* proceeds and how the symbolic relations
can be optimized for performance or stability. can be *optimized* for performance or stability.
Automatic Differentiation Automatic Differentiation
...@@ -107,16 +111,19 @@ Automatic Differentiation ...@@ -107,16 +111,19 @@ Automatic Differentiation
Having the graph structure, computing automatic differentiation is Having the graph structure, computing automatic differentiation is
simple. The only thing :func:`tensor.grad` has to do is to traverse the simple. The only thing :func:`tensor.grad` has to do is to traverse the
graph from the outputs back towards the inputs through all :ref:`apply` graph from the outputs back towards the inputs through all *apply*
nodes (:ref:`apply` nodes are those that define which computations the nodes (*apply* nodes are those that define which computations the
graph does). For each such :ref:`apply` node, its :ref:`op` defines graph does). For each such *apply* node, its *op* defines
how to compute the gradient of the node's outputs with respect to its how to compute the *gradient* of the node's outputs with respect to its
inputs. Note that if an :ref:`op` does not provide this information, inputs. Note that if an *op* does not provide this information,
it is assumed that the gradient is not defined. it is assumed that the *gradient* is not defined.
Using the Using the
`chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_ `chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_
these gradients can be composed in order to obtain the expression of the these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs . *gradient* of the graph's output with respect to the graph's inputs .
A following section of this tutorial will examine the topic of :ref:`differentiation<tutcomputinggrads>`
in greater detail.
Optimizations Optimizations
...@@ -124,7 +131,7 @@ Optimizations ...@@ -124,7 +131,7 @@ Optimizations
When compiling a Theano function, what you give to the When compiling a Theano function, what you give to the
:func:`theano.function <function.function>` is actually a graph :func:`theano.function <function.function>` is actually a graph
(starting from the outputs variables you can traverse the graph up to (starting from the output variables you can traverse the graph up to
the input variables). While this graph structure shows how to compute the input variables). While this graph structure shows how to compute
the output from the input, it also offers the possibility to improve the the output from the input, it also offers the possibility to improve the
way this computation is carried out. The way optimizations work in way this computation is carried out. The way optimizations work in
...@@ -135,4 +142,27 @@ identical subgraphs and ensure that the same values are not computed ...@@ -135,4 +142,27 @@ identical subgraphs and ensure that the same values are not computed
twice or reformulate parts of the graph to a GPU specific version. twice or reformulate parts of the graph to a GPU specific version.
For example, one (simple) optimization that Theano uses is to replace For example, one (simple) optimization that Theano uses is to replace
the pattern :math:`\frac{xy}{y}` by :math:`x`. the pattern :math:`\frac{xy}{y}` by *x.*
Further information regarding the optimization
:ref:`process<optimization>` and the specific :ref:`optimizations<optimizations>` that are applicable
is respectively available in the library and on the entrance page of the documentation.
**Example**
Symbolic programming involves a change of paradigm: it will become clearer
as we apply it. Consider the following example of optimization:
>>> import theano
>>> a = theano.tensor.vector("a") # declare symbolic variable
>>> b = a + a ** 10 # build symbolic expression
>>> f = theano.function([a], b) # compile function
>>> print f([0, 1, 2]) # prints `array([0,2,1026])`
====================================================== =====================================================
Unoptimized graph Optimized graph
====================================================== =====================================================
.. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
====================================================== =====================================================
...@@ -5,13 +5,16 @@ ...@@ -5,13 +5,16 @@
Using the GPU Using the GPU
============= =============
One of the Theano's design goals is to specify computations at an For an introductory discussion of *Graphical Processing Units* (GPU) and their use for
intensive parallel computation purposes, see `GPGPU <http://en.wikipedia.org/wiki/GPGPU>`_.
One of Theano's design goals is to specify computations at an
abstract level, so that the internal function compiler has a lot of flexibility abstract level, so that the internal function compiler has a lot of flexibility
about how to carry out those computations. One of the ways we take advantage of about how to carry out those computations. One of the ways we take advantage of
this flexibility is in carrying out calculations on an Nvidia graphics card when this flexibility is in carrying out calculations on an Nvidia graphics card when
there is a CUDA-enabled device in your computer. the device present in the computer is CUDA-enabled.
Setting up CUDA Setting Up CUDA
---------------- ----------------
If you have not done so already, you will need to install Nvidia's If you have not done so already, you will need to install Nvidia's
...@@ -41,6 +44,7 @@ file and run it. ...@@ -41,6 +44,7 @@ file and run it.
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x)) f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
...@@ -52,38 +56,46 @@ file and run it. ...@@ -52,38 +56,46 @@ file and run it.
else: else:
print 'Used the gpu' print 'Used the gpu'
The program just computes the exp() of a bunch of random numbers. The program just computes the ``exp()`` of a bunch of random numbers.
Note that we use the `shared` function to Note that we use the ``shared`` function to
make sure that the input `x` are stored on the graphics device. make sure that the input *x* is stored on the graphics device.
.. the following figures have been measured twice on BART3 on Aug 2nd 2012 with no other job running simultaneously
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds, If I run this program (in check1.py) with ``device=cpu``, my computer takes a little over 3 seconds,
whereas on the GPU it takes just over 0.4 seconds. Note that the results are close but not whereas on the GPU it takes just over 0.64 seconds. The GPU will not always produce the exact
identical! The GPU will not always produce the exact same floating-point numbers as the CPU. same floating-point numbers as the CPU. As a benchmark, a loop that calls ``numpy.exp(x.get_value())`` takes about 46 seconds.
As a point of reference, a loop that calls ``numpy.exp(x.value)`` also takes about 7 seconds.
.. code-block:: text .. code-block:: text
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python thing.py $ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py
Looping 1000 times took 7.17374897003 seconds [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753 1.62323285] Looping 1000 times took 3.06635117531 seconds
Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]
Used the cpu
$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python thing.py $ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check1.py
Using gpu device 0: GeForce GTX 285 Using gpu device 0: GeForce GTX 580
Looping 1000 times took 0.418929815292 seconds [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296] Looping 1000 times took 0.638810873032 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
Note that for now GPU operations in Theano require floatX to be float32 (see below also). Note that GPU operations in Theano require for now ``floatX`` to be *float32* (see also below).
Returning a handle to device-allocated data
Returning a Handle to Device-Allocated Data
------------------------------------------- -------------------------------------------
The speedup is not greater in the example above because the function is The speedup is not greater in the preceding example because the function is
returning its result as a numpy ndarray which has already been copied from the returning its result as a NumPy ndarray which has already been copied from the
device to the host for your convenience. This is what makes it so easy to swap in device=gpu, but device to the host for your convenience. This is what makes it so easy to swap in ``device=gpu``, but
if you don't mind being less portable, you might prefer to see a bigger speedup by changing if you don't mind less portability, you might gain a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The gpu_from_host the graph to express a computation with a GPU-stored result. The ``gpu_from_host``
Op means "copy the input from the host to the gpu" and it is optimized away op means "copy the input from the host to the GPU" and it is optimized away
after the T.exp(x) is replaced by a GPU version of exp(). after the ``T.exp(x)`` is replaced by a GPU version of ``exp()``.
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_2 .. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_2
...@@ -101,6 +113,7 @@ after the T.exp(x) is replaced by a GPU version of exp(). ...@@ -101,6 +113,7 @@ after the T.exp(x) is replaced by a GPU version of exp().
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x))) f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))
print f.maker.fgraph.toposort()
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
...@@ -117,32 +130,42 @@ The output from this program is ...@@ -117,32 +130,42 @@ The output from this program is
.. code-block:: text .. code-block:: text
Using gpu device 0: GeForce GTX 285 $ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check2.py
Looping 1000 times took 0.185714006424 seconds Using gpu device 0: GeForce GTX 580
Result is <CudaNdarray object at 0x3e9e970> [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>)]
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296] Looping 1000 times took 0.34898686409 seconds
Result is <CudaNdarray object at 0x6a7a5f0>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
Here we've shaved off about 50% of the run-time by simply not copying the Here we've shaved off about 50% of the run-time by simply not copying the
resulting array back to the host. resulting array back to the host.
The object returned by each function call is now not a numpy array but a The object returned by each function call is now not a NumPy array but a
"CudaNdarray" which can be converted to a numpy ndarray by the normal "CudaNdarray" which can be converted to a NumPy ndarray by the normal
numpy casting mechanism. NumPy casting mechanism.
Running the GPU at Full Speed Running the GPU at Full Speed
------------------------------ ------------------------------
To really get maximum performance in this simple example, we need to use an :class:`Out` To really get maximum performance in this simple example, we need to use an
instance to tell Theano not to copy the output it returns to us. Theano allocates memory for :class:`out<function.Out>` instance with the flag ``borrow=True`` to tell Theano not to copy
internal use like a working buffer, but by default it will never return a result that is the output it returns to us. This is because Theano pre-allocates memory for internal use
allocated in the working buffer. This is normally what you want, but our example is so simple (like working buffers), and by default will never return a result that is aliased to one of
that it has the un-wanted side-effect of really slowing things down. its internal buffers: instead, it will copy the buffers associated to outputs into newly
allocated memory at each function call. This is to ensure that subsequent function calls will
not overwrite previously computed outputs. Although this is normally what you want, our last
example was so simple that it had the unwanted side-effect of really slowing things down.
.. ..
TODO: TODO:
The story here about copying and working buffers is misleading and potentially not correct The story here about copying and working buffers is misleading and potentially not correct
... why exactly does borrow=True cut 75% of the runtime ??? ... why exactly does borrow=True cut 75% of the runtime ???
.. TODO: Answer by Olivier D: it sounds correct to me -- memory allocations must be slow.
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_3 .. theano/tests/test_tutorial.py:T_using_gpu.test_using_gpu_3
.. code-block:: python .. code-block:: python
...@@ -152,7 +175,7 @@ that it has the un-wanted side-effect of really slowing things down. ...@@ -152,7 +175,7 @@ that it has the un-wanted side-effect of really slowing things down.
import numpy import numpy
import time import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core vlen = 10 * 30 * 768 # 10 x # cores x # threads per core
iters = 1000 iters = 1000
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
...@@ -160,6 +183,7 @@ that it has the un-wanted side-effect of really slowing things down. ...@@ -160,6 +183,7 @@ that it has the un-wanted side-effect of really slowing things down.
f = function([], f = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)), Out(sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)),
borrow=True)) borrow=True))
print f.maker.fgraph.toposort()
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
...@@ -172,34 +196,51 @@ that it has the un-wanted side-effect of really slowing things down. ...@@ -172,34 +196,51 @@ that it has the un-wanted side-effect of really slowing things down.
else: else:
print 'Used the gpu' print 'Used the gpu'
Running this version of the code takes just under 0.05 seconds, over 140x faster than Running this version of the code takes just over 0.05 seconds, that is 60x faster than
the CPU implementation! the CPU implementation!
.. code-block:: text .. code-block:: text
Using gpu device 0: GeForce GTX 285 With *flag* ``borrow=False``:
Looping 1000 times took 0.0497219562531 seconds
Result is <CudaNdarray object at 0x31eeaf0> $ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python using_gpu_solution_1.py
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296] Using gpu device 0: GeForce GTX 580
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>)]
Looping 1000 times took 0.31614613533 seconds
Result is <CudaNdarray object at 0x77e9270>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
With *flag* ``borrow=True``:
This version of the code ``using borrow=True`` is slightly less safe because if we had saved $ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python using_gpu_solution_1.py
the `r` returned from one function call, we would have to take care and remember that its value might Using gpu device 0: GeForce GTX 580
be over-written by a subsequent function call. Although borrow=True makes a dramatic difference in this example, [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>)]
be careful! The advantage of Looping 1000 times took 0.0502779483795 seconds
borrow=True is much weaker in larger graphs, and there is a lot of potential for making a Result is <CudaNdarray object at 0x83e5cb0>
mistake by failing to account for the resulting memory aliasing. Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761
1.62323296]
Used the gpu
What can be accelerated on the GPU? This version of the code including the flag ``borrow=True`` is slightly less safe because if we had saved
------------------------------------ the *r* returned from one function call, we would have to take care and remember that its value might
be over-written by a subsequent function call. Although ``borrow=True`` makes a dramatic difference
in this example, be careful! The advantage of ``borrow=True`` is much weaker in larger graphs, and
there is a lot of potential for making a mistake by failing to account for the resulting memory aliasing.
What Can Be Accelerated on the GPU
----------------------------------
The performance characteristics will change as we continue to optimize our The performance characteristics will change as we continue to optimize our
implementations, and vary from device to device, but to give a rough idea of implementations, and vary from device to device, but to give a rough idea of
what to expect right now: what to expect right now:
* Only computations * Only computations
with float32 data-type can be accelerated. Better support for float64 is expected in upcoming hardware but with *float32* data-type can be accelerated. Better support for *float64* is expected in upcoming hardware but
float64 computations are still relatively slow (Jan 2010). *float64* computations are still relatively slow (Jan 2010).
* Matrix * Matrix
multiplication, convolution, and large element-wise operations can be multiplication, convolution, and large element-wise operations can be
accelerated a lot (5-50x) when arguments are large enough to keep 30 accelerated a lot (5-50x) when arguments are large enough to keep 30
...@@ -208,7 +249,7 @@ what to expect right now: ...@@ -208,7 +249,7 @@ what to expect right now:
dimension-shuffling and constant-time reshaping will be equally fast on GPU dimension-shuffling and constant-time reshaping will be equally fast on GPU
as on CPU. as on CPU.
* Summation * Summation
over rows/columns of tensors can be a little slower on the GPU than on the CPU over rows/columns of tensors can be a little slower on the GPU than on the CPU.
* Copying * Copying
of large quantities of data to and from a device is relatively slow, and of large quantities of data to and from a device is relatively slow, and
often cancels most of the advantage of one or two accelerated functions on often cancels most of the advantage of one or two accelerated functions on
...@@ -216,38 +257,358 @@ what to expect right now: ...@@ -216,38 +257,358 @@ what to expect right now:
the device pay off. the device pay off.
Tips for improving performance on GPU Tips for Improving Performance on GPU
-------------------------------------- -------------------------------------
* Consider * Consider
adding ``floatX = float32`` to your .theanorc file if you plan to do a lot of adding ``floatX=float32`` to your ``.theanorc`` file if you plan to do a lot of
GPU work. GPU work.
* Prefer * Prefer
constructors like 'matrix' 'vector' and 'scalar' to 'dmatrix', 'dvector' and constructors like ``matrix``, ``vector`` and ``scalar`` to ``dmatrix``, ``dvector`` and
'dscalar' because the former will give you float32 variables when ``dscalar`` because the former will give you *float32* variables when
floatX=float32. ``floatX=float32``.
* Ensure * Ensure
that your output variables have a float32 dtype and not float64. The that your output variables have a *float32* dtype and not *float64*. The
more float32 variables are in your graph, the more work the GPU can do for more *float32* variables are in your graph, the more work the GPU can do for
you. you.
* Minimize * Minimize
tranfers to the GPU device by using shared 'float32' variables to store tranfers to the GPU device by using ``shared`` *float32* variables to store
frequently-accessed data (see :func:`shared()<shared.shared>`). When using frequently-accessed data (see :func:`shared()<shared.shared>`). When using
the GPU, 'float32' tensor shared variables are stored on the GPU by default to the GPU, *float32* tensor ``shared`` variables are stored on the GPU by default to
eliminate transfer time for GPU ops using those variables. eliminate transfer time for GPU ops using those variables.
* If you aren't happy with the performance you see, try building your functions with * If you aren't happy with the performance you see, try building your functions with
mode='PROFILE_MODE'. This should print some timing information at program ``mode='ProfileMode'``. This should print some timing information at program
termination (atexit). Is time being used sensibly? If an Op or Apply is termination. Is time being used sensibly? If an op or Apply is
taking more time than its share, then if you know something about GPU taking more time than its share, then if you know something about GPU
programming have a look at how it's implemented in theano.sandbox.cuda. programming, have a look at how it's implemented in theano.sandbox.cuda.
Check the line like 'Spent Xs(X%) in cpu Op, Xs(X%) in gpu Op and Xs(X%) transfert Op' Check the line similar to *Spent Xs(X%) in cpu op, Xs(X%) in gpu op and Xs(X%) in transfer op*.
that can tell you if not enough of your graph is on the gpu or if their This can tell you if not enough of your graph is on the GPU or if there
is too much memory transfert. is too much memory transfer.
Changing the value of shared variables Changing the Value of Shared Variables
-------------------------------------- --------------------------------------
To change the value of a shared variable, e.g. to provide new data to process, To change the value of a ``shared`` variable, e.g. to provide new data to processes,
use ``shared_variable.set_value(new_value)``. For a lot more detail about this, use ``shared_variable.set_value(new_value)``. For a lot more detail about this,
see :ref:`aliasing`. see :ref:`aliasing`.
-------------------------------------------
**Exercise**
Consider again the logistic regression:
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.01*gw, b:b-0.01*gb},
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
Modify and execute this example to run on GPU with ``floatX=float32`` and
time it using the command line ``time python file.py``. (Of course, you may use some of your answer
to the exercise in section :ref:`Configuration Settings and Compiling Mode<using_modes>`.)
Is there an increase in speed from CPU to GPU?
Where does it come from? (Use ``ProfileMode``)
What can be done to further increase the speed of the GPU version? Put your ideas to test.
.. Note::
* Only 32 bit floats are currently supported (development is in progress).
* ``Shared`` variables with *float32* dtype are by default moved to the GPU memory space.
* There is a limit of one GPU per process.
* Use the Theano flag ``device=gpu`` to require use of the GPU device.
* Use ``device=gpu{0, 1, ...}`` to specify which GPU if you have more than one.
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* ``Cast`` inputs before storing them into a ``shared`` variable.
* Circumvent the automatic cast of *int32* with *float32* to *float64*:
* Insert manual cast in your code or use *[u]int{8,16}*.
* Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
* Notice that a new casting mechanism is being developed.
:download:`Solution<using_gpu_solution_1.py>`
-------------------------------------------
Software for Directly Programming a GPU
---------------------------------------
Leaving aside Theano which is a meta-programmer, there are:
* **CUDA**: GPU programming API by NVIDIA based on extension to C (CUDA C)
* Vendor-specific
* Numeric libraries (BLAS, RNG, FFT) are maturing.
* **OpenCL**: multi-vendor version of CUDA
* More general, standardized.
* Fewer libraries, lesser spread.
* **PyCUDA**: Python bindings to CUDA driver interface allow to access Nvidia's CUDA parallel
computation API from Python
* Convenience:
Makes it easy to do GPU meta-programming from within Python.
Abstractions to compile low-level CUDA code from Python (``pycuda.driver.SourceModule``).
GPU memory buffer (``pycuda.gpuarray.GPUArray``).
Helpful documentation.
* Completeness: Binding to all of CUDA's driver API.
* Automatic error checking: All CUDA errors are automatically translated into Python exceptions.
* Speed: PyCUDA's base layer is written in C++.
* Good memory management of GPU objects:
Object cleanup tied to lifetime of objects (RAII, 'Resource Acquisition Is Initialization').
Makes it much easier to write correct, leak- and crash-free code.
PyCUDA knows about dependencies (e.g. it won't detach from a context before all memory
allocated in it is also freed).
(This is adapted from PyCUDA's `documentation <http://documen.tician.de/pycuda/index.html>`_
and Andreas Kloeckner's `website <http://mathema.tician.de/software/pycuda>`_ on PyCUDA.)
* **PyOpenCL**: PyCUDA for OpenCL
Learning to Program with PyCUDA
-------------------------------
If you already enjoy a good proficiency with the C programming language, you
may easily leverage your knowledge by learning, first, to program a GPU with the
CUDA extension to C (CUDA C) and, second, to use PyCUDA to access the CUDA
API with a Python wrapper.
The following resources will assist you in this learning process:
* **CUDA API and CUDA C: Introductory**
* `NVIDIA's slides <http://www.sdsc.edu/us/training/assets/docs/NVIDIA-02-BasicsOfCUDA.pdf>`_
* `Stein's (NYU) slides <http://www.cs.nyu.edu/manycores/cuda_many_cores.pdf>`_
* **CUDA API and CUDA C: Advanced**
* `MIT IAP2009 CUDA <https://sites.google.com/site/cudaiap2009/home>`_
(full coverage: lectures, leading Kirk-Hwu textbook, examples, additional resources)
* `Course U. of Illinois <http://courses.engr.illinois.edu/ece498/al/index.html>`_
(full lectures, Kirk-Hwu textbook)
* `NVIDIA's knowledge base <http://www.nvidia.com/content/cuda/cuda-developer-resources.html>`_
(extensive coverage, levels from introductory to advanced)
* `practical issues <http://stackoverflow.com/questions/2392250/understanding-cuda-grid-dimensions-block-dimensions-and-threads-organization-s>`_
(on the relationship between grids, blocks and threads; see also linked and related issues on same page)
* `CUDA optimisation <http://www.gris.informatik.tu-darmstadt.de/cuda-workshop/slides.html>`_
* **PyCUDA: Introductory**
* `Kloeckner's slides <http://www.gputechconf.com/gtcnew/on-demand-gtc.php?sessionTopic=&searchByKeyword=kloeckner&submit=&select=+&sessionEvent=2&sessionYear=2010&sessionFormat=3>`_
* `Kloeckner' website <http://mathema.tician.de/software/pycuda>`_
* **PYCUDA: Advanced**
* `PyCUDA documentation website <http://documen.tician.de/pycuda/>`_
The following examples give a foretaste of programming a GPU with PyCUDA. Once
you feel competent enough, you may try yourself on the corresponding exercises.
**Example: PyCUDA**
.. code-block:: python
# (from PyCUDA's documentation)
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
drv.Out(dest), drv.In(a), drv.In(b),
block=(400,1,1), grid=(1,1))
assert numpy.allclose(dest, a*b)
print dest
-------------------------------------------
**Exercise**
Run the preceding example.
Modify and execute to work for a matrix of shape (20, 10).
-------------------------------------------
.. _pyCUDA_theano:
**Example: Theano + PyCUDA**
.. code-block:: python
import numpy, theano
import theano.misc.pycuda_init
from pycuda.compiler import SourceModule
import theano.sandbox.cuda as cuda
class PyCUDADoubleOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, inp):
inp = cuda.basic_ops.gpu_contiguous(
cuda.basic_ops.as_cuda_ndarray_variable(inp))
assert inp.dtype == "float32"
return theano.Apply(self, [inp], [inp.type()])
def make_thunk(self, node, storage_map, _, _2):
mod = SourceModule("""
__global__ void my_fct(float * i0, float * o0, int size) {
int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i<size){
o0[i] = i0[i]*2;
}
}""")
pycuda_fct = mod.get_function("my_fct")
inputs = [ storage_map[v] for v in node.inputs]
outputs = [ storage_map[v] for v in node.outputs]
def thunk():
z = outputs[0]
if z[0] is None or z[0].shape!=inputs[0][0].shape:
z[0] = cuda.CudaNdarray.zeros(inputs[0][0].shape)
grid = (int(numpy.ceil(inputs[0][0].size / 512.)),1)
pycuda_fct(inputs[0][0], z[0], numpy.intc(inputs[0][0].size),
block=(512,1,1), grid=grid)
return thunk
Use this code to test it:
>>> x = theano.tensor.fmatrix()
>>> f = theano.function([x], PyCUDADoubleOp()(x))
>>> xv=numpy.ones((4,5), dtype="float32")
>>> assert numpy.allclose(f(xv), xv*2)
>>> print numpy.asarray(f(xv))
-------------------------------------------
**Exercise**
Run the preceding example.
Modify and execute to multiply two matrices: *x* * *y*.
Modify and execute to return two outputs: *x + y* and *x - y*.
(Notice that Theano's current *elemwise fusion* optimization is
only applicable to computations involving a single output. Hence, to gain
efficiency over the basic solution that is asked here, the two operations would
have to be jointly optimized explicitly in the code.)
Modify and execute to support *stride* (i.e. so as not constrain the input to be *C-contiguous*).
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Using the GPU'
# 1. Raw results
#
# same code as in mode_solution_1 but run with following command lines:
# THEANO_FLAGS=mode=FAST_RUN,device=gpu time python program_name.py
# THEANO_FLAGS=mode=FAST_RUN,device=cpu time python program_name.py
# for GPU and CPU respectively
# typical time: 20 sec (CPU), 10 sec (GPU)
import numpy
import theano
import theano.tensor as tt
from theano import sandbox, Out
theano.config.floatX = 'float32'
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N, low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = tt.matrix("x")
y = tt.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1) # Cross-entropy
cost = tt.cast(xent.mean(), 'float32') + \
0.01 * (w ** 2).sum() # The cost to optimize
gw, gb = tt.grad(cost, [w, b])
"""
# Compile expressions to functions
train = theano.function(
inputs=[x, y],
outputs=[Out(theano.sandbox.cuda.basic_ops.gpu_from_host(tt.cast(prediction, 'float32')),borrow=True), Out(theano.sandbox.cuda.basic_ops.gpu_from_host(tt.cast(xent, 'float32')), borrow=True)],
updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
name="train")
predict = theano.function(inputs=[x], outputs=Out(theano.sandbox.cuda.basic_ops.gpu_from_host(tt.cast(prediction, 'float32')), borrow=True),
name="predict")
"""
# Compile expressions to functions
train = theano.function(
inputs=[x, y],
outputs=[prediction, xent],
updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
name="train")
predict = theano.function(inputs=[x], outputs=prediction,
name="predict")
if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
"""
# 2. Profiling
#
# same code as above but run with following command lines:
# THEANO_FLAGS=mode=ProfileMode,device=gpu python program_name.py
# THEANO_FLAGS=mode=ProfileMode,device=cpu python program_name.py
# for GPU and CPU
# 2.1 Profiling output for CPU computations
$ THEANO_FLAGS=mode=ProfileMode,device=cpu python program_name.py
Used the cpu
target values for D
prediction on D
Used the cpu
target values for D
prediction on D
ProfileMode.print_summary()
---------------------------
Time since import 12.586s
Theano compile time: 0.000s (0.0% since import)
Optimization time: 0.000s
Linker time: 0.000s
Theano fct call 5.147s (40.9% since import)
Theano Op time 3.595s 28.6%(since import) 69.8%(of fct call)
Theano function overhead in ProfileMode 1.552s 12.3%(since import) 30.2%(of fct call)
20002 Theano fct call, 0.000s per call
Rest of the time since import 7.440s 59.1%
Theano fct summary:
<% total fct time> <total time> <time per call> <nb call> <fct name>
49.9% 2.567s 2.57e-04s 10000 train
0.0% 0.000s 1.24e-04s 1 predict
0.0% 0.000s 1.26e-04s 1 predict
50.1% 2.579s 2.58e-04s 10000 train
Single Op-wise summary:
<% of local_time spent on this kind of Op> <cumulative %> <self seconds> <cumulative seconds> <time per call> [*] <nb_call> <nb_op> <nb_apply> <Op name>
59.3% 59.3% 2.133s 2.133s 5.33e-05s * 40002 1 6 <class 'theano.tensor.blas_c.CGemv'>
34.4% 93.8% 1.238s 3.371s 6.19e-06s * 200002 11 22 <class 'theano.tensor.elemwise.Elemwise'>
2.8% 96.6% 0.100s 3.471s 2.51e-06s * 40002 1 6 <class 'theano.tensor.basic.Alloc'>
2.1% 98.7% 0.075s 3.546s 1.26e-06s * 60002 2 8 <class 'theano.tensor.elemwise.DimShuffle'>
0.7% 99.3% 0.024s 3.571s 6.11e-07s * 40002 1 6 <class 'theano.tensor.opt.Shape_i'>
0.7% 100.0% 0.024s 3.595s 1.18e-06s * 20000 1 2 <class 'theano.tensor.elemwise.Sum'>
... (remaining 0 single Op account for 0.00%(0.00s) of the runtime)
(*) Op is running a c implementation
Op-wise summary:
<% of local_time spent on this kind of Op> <cumulative %> <self seconds> <cumulative seconds> <time per call> [*] <nb_call> <nb apply> <Op name>
59.3% 59.3% 2.133s 2.133s 5.33e-05s * 40002 6 CGemv{inplace}
18.1% 77.4% 0.650s 2.783s 3.25e-05s * 20000 2 Elemwise{Composite{[Composite{[Composite{[sub(mul(i0, i1), neg(i2))]}(i0, scalar_softplus(i1), mul(i2, i3))]}(i0, i1, i2, scalar_softplus(i3))]}}
6.4% 83.9% 0.231s 3.014s 1.16e-05s * 20000 2 Elemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)]
4.0% 87.8% 0.142s 3.157s 7.11e-06s * 20000 2 Elemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)]
2.8% 90.6% 0.100s 3.257s 2.51e-06s * 40002 6 Alloc
1.4% 92.1% 0.052s 3.309s 1.30e-06s * 40002 6 InplaceDimShuffle{x}
1.1% 93.1% 0.038s 3.347s 1.92e-06s * 20000 2 Elemwise{Cast{float32}}
1.1% 94.2% 0.038s 3.386s 1.91e-06s * 20000 2 Elemwise{sub,no_inplace}
1.0% 95.2% 0.036s 3.421s 1.79e-06s * 20000 2 Elemwise{gt,no_inplace}
0.8% 96.0% 0.029s 3.450s 1.44e-06s * 20000 2 Elemwise{Composite{[sub(neg(i0), i1)]}}[(0, 0)]
0.8% 96.8% 0.028s 3.479s 1.42e-06s * 20000 2 Elemwise{neg,no_inplace}
0.7% 97.5% 0.024s 3.503s 6.11e-07s * 40002 6 Shape_i{0}
0.7% 98.1% 0.024s 3.527s 1.18e-06s * 20000 2 Sum
0.6% 98.8% 0.023s 3.550s 1.16e-06s * 20000 2 InplaceDimShuffle{1,0}
0.6% 99.4% 0.023s 3.573s 1.15e-06s * 20000 2 Elemwise{Composite{[sub(i0, mul(i1, i2))]}}[(0, 0)]
0.6% 100.0% 0.022s 3.595s 1.08e-06s * 20000 2 Elemwise{inv,no_inplace}
0.0% 100.0% 0.000s 3.595s 1.19e-05s * 2 2 Elemwise{Composite{[Composite{[Composite{[Composite{[GT(scalar_sigmoid(i0), i1)]}(neg(i0), i1)]}(sub(i0, i1), i2)]}(neg(i0), i1, i2)]}}
... (remaining 0 Op account for 0.00%(0.00s) of the runtime)
(*) Op is running a c implementation
Apply-wise summary:
<% of local_time spent at this position> <cumulative %%> <apply time> <cumulative seconds> <time per call> [*] <nb_call> <Apply position> <Apply Op name>
14.9% 14.9% 0.536s 0.536s 5.36e-05s * 10000 7 CGemv{inplace}(Alloc.0, TensorConstant{1.0}, x, w, TensorConstant{1.0})
14.9% 29.8% 0.534s 1.070s 5.34e-05s * 10000 18 CGemv{inplace}(w, TensorConstant{-0.00999999977648}, x.T, Elemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)].0, TensorConstant{0.999800026417})
14.8% 44.6% 0.532s 1.602s 5.32e-05s * 10000 7 CGemv{inplace}(Alloc.0, TensorConstant{1.0}, x, w, TensorConstant{1.0})
14.7% 59.3% 0.530s 2.132s 5.30e-05s * 10000 18 CGemv{inplace}(w, TensorConstant{-0.00999999977648}, x.T, Elemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)].0, TensorConstant{0.999800026417})
9.1% 68.4% 0.327s 2.460s 3.27e-05s * 10000 13 Elemwise{Composite{[Composite{[Composite{[sub(mul(i0, i1), neg(i2))]}(i0, scalar_softplus(i1), mul(i2, i3))]}(i0, i1, i2, scalar_softplus(i3))]}}(y, Elemwise{Composite{[sub(neg(i0), i1)]}}[(0, 0)].0, Elemwise{sub,no_inplace}.0, Elemwise{neg,no_inplace}.0)
9.0% 77.4% 0.323s 2.783s 3.23e-05s * 10000 13 Elemwise{Composite{[Composite{[Composite{[sub(mul(i0, i1), neg(i2))]}(i0, scalar_softplus(i1), mul(i2, i3))]}(i0, i1, i2, scalar_softplus(i3))]}}(y, Elemwise{Composite{[sub(neg(i0), i1)]}}[(0, 0)].0, Elemwise{sub,no_inplace}.0, Elemwise{neg,no_inplace}.0)
3.2% 80.6% 0.116s 2.899s 1.16e-05s * 10000 16 Elemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)](Elemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)].0, Alloc.0, y, Elemwise{Composite{[sub(neg(i0), i1)]}}[(0, 0)].0, Elemwise{sub,no_inplace}.0, Elemwise{Cast{float32}}.0)
3.2% 83.9% 0.116s 3.014s 1.16e-05s * 10000 16 Elemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)](Elemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)].0, Alloc.0, y, Elemwise{Composite{[sub(neg(i0), i1)]}}[(0, 0)].0, Elemwise{sub,no_inplace}.0, Elemwise{Cast{float32}}.0)
2.0% 85.8% 0.071s 3.086s 7.12e-06s * 10000 14 Elemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)](Elemwise{neg,no_inplace}.0)
2.0% 87.8% 0.071s 3.156s 7.09e-06s * 10000 14 Elemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)](Elemwise{neg,no_inplace}.0)
0.9% 88.8% 0.034s 3.190s 3.38e-06s * 10000 12 Alloc(Elemwise{inv,no_inplace}.0, Shape_i{0}.0)
0.9% 89.7% 0.034s 3.224s 3.37e-06s * 10000 12 Alloc(Elemwise{inv,no_inplace}.0, Shape_i{0}.0)
0.5% 90.2% 0.019s 3.243s 1.93e-06s * 10000 8 Elemwise{Cast{float32}}(InplaceDimShuffle{x}.0)
0.5% 90.8% 0.019s 3.262s 1.92e-06s * 10000 4 Elemwise{sub,no_inplace}(TensorConstant{(1,) of 1.0}, y)
0.5% 91.3% 0.019s 3.282s 1.90e-06s * 10000 4 Elemwise{sub,no_inplace}(TensorConstant{(1,) of 1.0}, y)
... (remaining 35 Apply instances account for 8.71%(0.31s) of the runtime)
(*) Op is running a c implementation
Profile of Theano functions memory:
(This check only the output of each apply node. It don't check the temporary memory used by the op in the apply node.)
We skipped 4 theano function(s). Each of them used less then 1024B(theano flags ProfileMode.min_memory_size) of total intermediate memory size
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
Sorry, no tip for today.
# 2.2 Profiling output for GPU computations
$ THEANO_FLAGS=mode=ProfileMode,device=gpu python program_name.py
Using gpu device 0: GeForce GTX 580
Used the gpu
target values for D
prediction on D
Used the gpu
target values for D
prediction on D
ProfileMode.print_summary()
---------------------------
Time since import 25.682s
Theano compile time: 0.000s (0.0% since import)
Optimization time: 0.000s
Linker time: 0.000s
Theano fct call 17.052s (66.4% since import)
Theano Op time 14.548s 56.6%(since import) 85.3%(of fct call)
Theano function overhead in ProfileMode 2.505s 9.8%(since import) 14.7%(of fct call)
20002 Theano fct call, 0.001s per call
Rest of the time since import 8.630s 33.6%
Theano fct summary:
<% total fct time> <total time> <time per call> <nb call> <fct name>
50.0% 8.526s 8.53e-04s 10000 train
0.0% 0.001s 1.09e-03s 1 predict
50.0% 8.524s 8.52e-04s 10000 train
0.0% 0.001s 1.10e-03s 1 predict
Single Op-wise summary:
<% of local_time spent on this kind of Op> <cumulative %> <self seconds> <cumulative seconds> <time per call> [*] <nb_call> <nb_op> <nb_apply> <Op name>
54.8% 54.8% 7.968s 7.968s 1.33e-04s 60002 1 8 <class 'theano.sandbox.cuda.basic_ops.GpuFromHost'>
16.2% 71.0% 2.358s 10.325s 1.47e-05s * 160002 9 18 <class 'theano.sandbox.cuda.basic_ops.GpuElemwise'>
12.3% 83.3% 1.795s 12.120s 4.49e-05s * 40002 1 6 <class 'theano.sandbox.cuda.blas.GpuGemv'>
7.0% 90.4% 1.024s 13.144s 2.56e-05s 40002 1 6 <class 'theano.sandbox.cuda.basic_ops.HostFromGpu'>
5.0% 95.4% 0.728s 13.872s 1.82e-05s * 40002 1 6 <class 'theano.sandbox.cuda.basic_ops.GpuAlloc'>
2.1% 97.4% 0.300s 14.171s 1.50e-05s * 20000 1 2 <class 'theano.sandbox.cuda.basic_ops.GpuSum'>
1.3% 98.7% 0.189s 14.360s 3.15e-06s * 60002 3 8 <class 'theano.sandbox.cuda.basic_ops.GpuDimShuffle'>
0.6% 99.4% 0.094s 14.454s 2.35e-06s * 40002 2 6 <class 'theano.tensor.elemwise.Elemwise'>
0.3% 99.7% 0.048s 14.503s 1.21e-06s * 40002 1 6 <class 'theano.tensor.opt.Shape_i'>
0.3% 100.0% 0.045s 14.548s 2.25e-06s * 20000 1 2 <class 'theano.tensor.elemwise.DimShuffle'>
... (remaining 0 single Op account for 0.00%(0.00s) of the runtime)
(*) Op is running a c implementation
Op-wise summary:
<% of local_time spent on this kind of Op> <cumulative %> <self seconds> <cumulative seconds> <time per call> [*] <nb_call> <nb apply> <Op name>
54.8% 54.8% 7.968s 7.968s 1.33e-04s 60002 8 GpuFromHost
12.3% 67.1% 1.795s 9.763s 4.49e-05s * 40002 6 GpuGemv{inplace}
7.0% 74.1% 1.024s 10.786s 2.56e-05s 40002 6 HostFromGpu
5.0% 79.1% 0.728s 11.514s 1.82e-05s * 40002 6 GpuAlloc
2.3% 81.4% 0.334s 11.848s 1.67e-05s * 20000 2 GpuElemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)]
2.2% 83.6% 0.319s 12.167s 1.59e-05s * 20000 2 GpuElemwise{Composite{[Composite{[Composite{[sub(mul(i0, i1), neg(i2))]}(i0, scalar_softplus(i1), mul(i2, i3))]}(i0, i1, i2, scalar_softplus(i3))]},no_inplace}
2.1% 85.7% 0.301s 12.468s 1.50e-05s * 20000 2 GpuElemwise{neg,no_inplace}
2.1% 87.8% 0.300s 12.768s 1.50e-05s * 20000 2 GpuSum{1}
2.0% 89.8% 0.292s 13.060s 1.46e-05s * 20000 2 GpuElemwise{inv,no_inplace}
1.9% 91.7% 0.283s 13.343s 1.42e-05s * 20000 2 GpuElemwise{Composite{[sub(neg(i0), i1)]}}[(0, 0)]
1.9% 93.7% 0.281s 13.625s 1.41e-05s * 20000 2 GpuElemwise{sub,no_inplace}
1.9% 95.5% 0.273s 13.898s 1.37e-05s * 20000 2 GpuElemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)]
1.9% 97.4% 0.273s 14.171s 1.37e-05s * 20000 2 GpuElemwise{Composite{[sub(i0, mul(i1, i2))]}}[(0, 0)]
1.0% 98.4% 0.141s 14.313s 7.06e-06s * 20002 4 GpuDimShuffle{x}
0.4% 98.8% 0.057s 14.370s 2.87e-06s * 20002 4 Elemwise{gt,no_inplace}
0.3% 99.1% 0.048s 14.418s 1.21e-06s * 40002 6 Shape_i{0}
0.3% 99.4% 0.045s 14.463s 2.25e-06s * 20000 2 InplaceDimShuffle{x}
0.3% 99.7% 0.037s 14.500s 1.83e-06s * 20000 2 Elemwise{Cast{float32}}
0.2% 99.8% 0.025s 14.525s 1.24e-06s * 20000 2 GpuDimShuffle{0}
0.2% 100.0% 0.023s 14.548s 1.14e-06s * 20000 2 GpuDimShuffle{1,0}
... (remaining 1 Op account for 0.00%(0.00s) of the runtime)
(*) Op is running a c implementation
Apply-wise summary:
<% of local_time spent at this position> <cumulative %%> <apply time> <cumulative seconds> <time per call> [*] <nb_call> <Apply position> <Apply Op name>
24.0% 24.0% 3.493s 3.493s 3.49e-04s 10000 1 GpuFromHost(x)
23.9% 47.9% 3.479s 6.972s 3.48e-04s 10000 1 GpuFromHost(x)
4.3% 52.3% 0.629s 7.602s 6.29e-05s * 10000 24 GpuGemv{inplace}(w, TensorConstant{-0.00999999977648}, GpuDimShuffle{1,0}.0, GpuElemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)].0, TensorConstant{0.999800026417})
4.3% 56.6% 0.629s 8.231s 6.29e-05s * 10000 24 GpuGemv{inplace}(w, TensorConstant{-0.00999999977648}, GpuDimShuffle{1,0}.0, GpuElemwise{Composite{[Composite{[Composite{[Composite{[mul(i0, add(i1, i2))]}(i0, neg(i1), true_div(i2, i3))]}(i0, mul(i1, i2, i3), i4, i5)]}(i0, i1, i2, exp(i3), i4, i5)]}}[(0, 0)].0, TensorConstant{0.999800026417})
1.8% 58.4% 0.269s 8.499s 2.69e-05s * 10000 9 GpuGemv{inplace}(GpuAlloc.0, TensorConstant{1.0}, GpuFromHost.0, w, TensorConstant{1.0})
1.8% 60.3% 0.268s 8.767s 2.68e-05s * 10000 9 GpuGemv{inplace}(GpuAlloc.0, TensorConstant{1.0}, GpuFromHost.0, w, TensorConstant{1.0})
1.8% 62.1% 0.266s 9.033s 2.66e-05s 10000 18 HostFromGpu(GpuElemwise{Composite{[Composite{[Composite{[sub(mul(i0, i1), neg(i2))]}(i0, scalar_softplus(i1), mul(i2, i3))]}(i0, i1, i2, scalar_softplus(i3))]},no_inplace}.0)
1.8% 63.9% 0.262s 9.296s 2.62e-05s 10000 18 HostFromGpu(GpuElemwise{Composite{[Composite{[Composite{[sub(mul(i0, i1), neg(i2))]}(i0, scalar_softplus(i1), mul(i2, i3))]}(i0, i1, i2, scalar_softplus(i3))]},no_inplace}.0)
1.8% 65.7% 0.260s 9.555s 2.60e-05s 10000 3 GpuFromHost(y)
1.8% 67.5% 0.258s 9.813s 2.58e-05s 10000 3 GpuFromHost(y)
1.7% 69.2% 0.248s 10.061s 2.48e-05s 10000 20 HostFromGpu(GpuElemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)].0)
1.7% 70.9% 0.247s 10.309s 2.47e-05s 10000 20 HostFromGpu(GpuElemwise{ScalarSigmoid{output_types_preference=transfer_type{0}}}[(0, 0)].0)
1.6% 72.5% 0.238s 10.547s 2.38e-05s 10000 12 GpuFromHost(Elemwise{Cast{float32}}.0)
1.6% 74.1% 0.237s 10.785s 2.37e-05s 10000 12 GpuFromHost(Elemwise{Cast{float32}}.0)
1.3% 75.4% 0.185s 10.969s 1.85e-05s * 10000 6 GpuAlloc(CudaNdarrayConstant{[ 1.58212732e-09]}, Shape_i{0}.0)
... (remaining 53 Apply instances account for 24.60%(3.58s) of the runtime)
(*) Op is running a c implementation
Some info useful for gpu:
Spent 1.211s(8.324%) in cpu Op, 13.337s(91.676%) in gpu Op and 0.000s(0.000%) transfert Op
Theano function input that are float64
<fct name> <input name> <input type> <str input>
List of apply that don't have float64 as input but have float64 in outputs
(Useful to know if we forgot some cast when using floatX=float32 or gpu code)
<Apply> <Apply position> <fct name> <inputs type> <outputs type>
Profile of Theano functions memory:
(This check only the output of each apply node. It don't check the temporary memory used by the op in the apply node.)
We skipped 4 theano function(s). Each of them used less then 1024B(theano flags ProfileMode.min_memory_size) of total intermediate memory size
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
Sorry, no tip for today.
# 3. Conclusions
Facts:
Examine and compare 'Single Op-wise' summaries for CPU and GPU. GPU ops 'GpuFromHost' (and 'HostFromGpu') by themselves
consume a large amount of extra time. Furthermore, notice that each of the GPU ops consumes more time than its CPU counterpart.
An additional experiment also confirms that adding an 'out' instance in the GPU version only brings about a minor
improvement in this situation.
Tentative conclusion:
The large number of external training steps (10000) generates disproportionate GPU overhead costs.
Tentative solution:
Include the training steps inside the definition of the Theano function.
Implement this solution and put it to test.
"""
...@@ -30,7 +30,7 @@ _logger = logging.getLogger("theano.printing") ...@@ -30,7 +30,7 @@ _logger = logging.getLogger("theano.printing")
def debugprint(obj, depth=-1, print_type=False, def debugprint(obj, depth=-1, print_type=False,
file=None, ids='CHAR', stop_on_name=False): file=None, ids='CHAR', stop_on_name=False):
"""Print a computation graph to file """Print a computation graph as text to stdout or a file.
:type obj: Variable, Apply, or Function instance :type obj: Variable, Apply, or Function instance
:param obj: symbolic thing to print :param obj: symbolic thing to print
...@@ -56,12 +56,12 @@ def debugprint(obj, depth=-1, print_type=False, ...@@ -56,12 +56,12 @@ def debugprint(obj, depth=-1, print_type=False,
The first part of the text identifies whether it is an input The first part of the text identifies whether it is an input
(if a name or type is printed) or the output of some Apply (in which case (if a name or type is printed) or the output of some Apply (in which case
the Op is printed). the Op is printed).
The second part of the text is the memory location of the Variable. The second part of the text is an identifier of the Variable.
If print_type is True, we add a part containing the type of the Variable If print_type is True, we add a part containing the type of the Variable
If a Variable is encountered multiple times in the depth-first search, If a Variable is encountered multiple times in the depth-first search,
it is only printed recursively the first time. Later, just the Variable it is only printed recursively the first time. Later, just the Variable
and its memory location are printed. identifier is printed.
If an Apply has multiple outputs, then a '.N' suffix will be appended If an Apply has multiple outputs, then a '.N' suffix will be appended
to the Apply's identifier, to indicate which output a line corresponds to. to the Apply's identifier, to indicate which output a line corresponds to.
...@@ -461,7 +461,9 @@ pprint.assign(lambda pstate, r: hasattr(pstate, 'target') ...@@ -461,7 +461,9 @@ pprint.assign(lambda pstate, r: hasattr(pstate, 'target')
LeafPrinter()) LeafPrinter())
pp = pprint pp = pprint
"""
Print to the terminal a math-like expression.
"""
# colors not used: orange, amber#FFBF00, purple, pink, # colors not used: orange, amber#FFBF00, purple, pink,
# used by default: green, blue, grey, red # used by default: green, blue, grey, red
...@@ -530,7 +532,7 @@ def pydotprint(fct, outfile=None, ...@@ -530,7 +532,7 @@ def pydotprint(fct, outfile=None,
blue boxes are outputs variables of the graph blue boxes are outputs variables of the graph
grey boxes are variables that are not outputs and are not used grey boxes are variables that are not outputs and are not used
red ellipses are transfers from/to the gpu (ops with names GpuFromHost, red ellipses are transfers from/to the gpu (ops with names GpuFromHost,
HostFromGpu) HostFromGpu)
""" """
if colorCodes is None: if colorCodes is None:
......
...@@ -197,11 +197,12 @@ def scan(fn, ...@@ -197,11 +197,12 @@ def scan(fn,
* ``initial`` -- Theano variable that represents the initial * ``initial`` -- Theano variable that represents the initial
state of a given output. In case the output is not computed state of a given output. In case the output is not computed
recursively (think of a map) and does not require a initial recursively (think of a map) and does not require an initial
state this field can be skiped. Given that only the previous state this field can be skipped. Given that (only) the previous
time step of the output is used by ``fn`` the initial state time step of the output is used by ``fn``, the initial state
should have the same shape as the output. If multiple time **should have the same shape** as the output and **should not
taps are used, the initial state should have one extra involve a downcast** of the data type of the output. If multiple
time taps are used, the initial state should have one extra
dimension that should cover all the possible taps. For example dimension that should cover all the possible taps. For example
if we use ``-5``, ``-2`` and ``-1`` as past taps, at step 0, if we use ``-5``, ``-2`` and ``-1`` as past taps, at step 0,
``fn`` will require (by an abuse of notation) ``output[-5]``, ``fn`` will require (by an abuse of notation) ``output[-5]``,
......
...@@ -797,6 +797,7 @@ class T_using_gpu(unittest.TestCase): ...@@ -797,6 +797,7 @@ class T_using_gpu(unittest.TestCase):
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x)) f = function([], T.exp(x))
# print f.maker.fgraph.toposort()
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
...@@ -813,7 +814,6 @@ class T_using_gpu(unittest.TestCase): ...@@ -813,7 +814,6 @@ class T_using_gpu(unittest.TestCase):
assert numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]) assert numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()])
def test_using_gpu_2(self): def test_using_gpu_2(self):
if theano.config.device.find('gpu') > -1: if theano.config.device.find('gpu') > -1:
...@@ -829,6 +829,7 @@ class T_using_gpu(unittest.TestCase): ...@@ -829,6 +829,7 @@ class T_using_gpu(unittest.TestCase):
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x))) f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))
# print f.maker.fgraph.toposort()
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
...@@ -844,9 +845,6 @@ class T_using_gpu(unittest.TestCase): ...@@ -844,9 +845,6 @@ class T_using_gpu(unittest.TestCase):
assert not numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]) assert not numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()])
def test_using_gpu_3(self): def test_using_gpu_3(self):
if theano.config.device.find('gpu') >-1: if theano.config.device.find('gpu') >-1:
...@@ -864,6 +862,7 @@ class T_using_gpu(unittest.TestCase): ...@@ -864,6 +862,7 @@ class T_using_gpu(unittest.TestCase):
f = function([], f = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)), Out(sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)),
borrow=True)) borrow=True))
# print f.maker.fgraph.toposort()
t0 = time.time() t0 = time.time()
for i in xrange(iters): for i in xrange(iters):
r = f() r = f()
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论