提交 00183e72 authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merge pull request #905 from nouiz/add_exerc_docu_rebase

Documentation improvements
...@@ -19,7 +19,7 @@ I wrote a new optimization, but it's not getting used... ...@@ -19,7 +19,7 @@ I wrote a new optimization, but it's not getting used...
Remember that you have to register optimizations with the :ref:`optdb` Remember that you have to register optimizations with the :ref:`optdb`
for them to get used by the normal modes like FAST_COMPILE, FAST_RUN, for them to get used by the normal modes like FAST_COMPILE, FAST_RUN,
and DEBUG_MODE. and DebugMode.
I wrote a new optimization, and it changed my results even though I'm pretty sure it is correct. I wrote a new optimization, and it changed my results even though I'm pretty sure it is correct.
......
...@@ -168,7 +168,7 @@ not modify any of the inputs. ...@@ -168,7 +168,7 @@ not modify any of the inputs.
TODO: EXPLAIN DESTROYMAP and VIEWMAP BETTER AND GIVE EXAMPLE. TODO: EXPLAIN DESTROYMAP and VIEWMAP BETTER AND GIVE EXAMPLE.
When developing an Op, you should run computations in DebugMode, by using When developing an Op, you should run computations in DebugMode, by using
argument ``mode='DEBUG_MODE'`` to ``theano.function``. DebugMode is argument ``mode='DebugMode'`` to ``theano.function``. DebugMode is
slow, but it can catch many common violations of the Op contract. slow, but it can catch many common violations of the Op contract.
TODO: Like what? How? Talk about Python vs. C too. TODO: Like what? How? Talk about Python vs. C too.
......
...@@ -6,15 +6,15 @@ Extending Theano ...@@ -6,15 +6,15 @@ Extending Theano
================ ================
This documentation is for users who want to extend Theano with new Types, new This advanced tutorial is for users who want to extend Theano with new Types, new
Operations (Ops), and new graph optimizations. Operations (Ops), and new graph optimizations.
Along the way, it also introduces many aspects of how Theano works, so it is Along the way, it also introduces many aspects of how Theano works, so it is
also good for you if you are interested in getting more under the hood with also good for you if you are interested in getting more under the hood with
Theano itself. Theano itself.
Before tackling this tutorial, it is highly recommended to read the Before tackling this more advanced presentation, it is highly recommended to read the
:ref:`tutorial`. introductory :ref:`Tutorial<tutorial>`.
The first few pages will walk you through the definition of a new :ref:`type`, The first few pages will walk you through the definition of a new :ref:`type`,
``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We ``double``, and a basic arithmetic :ref:`operations <op>` on that Type. We
......
...@@ -289,7 +289,7 @@ Example: ...@@ -289,7 +289,7 @@ Example:
f = T.function([a,b],[c],mode='FAST_RUN') f = T.function([a,b],[c],mode='FAST_RUN')
m = theano.Module() m = theano.Module()
minstance = m.make(mode='DEBUG_MODE') minstance = m.make(mode='DebugMode')
Whenever possible, unit tests should omit this parameter. Leaving Whenever possible, unit tests should omit this parameter. Leaving
out the mode will ensure that unit tests use the default mode. out the mode will ensure that unit tests use the default mode.
...@@ -306,7 +306,7 @@ type this: ...@@ -306,7 +306,7 @@ type this:
THEANO_FLAGS='mode=FAST_COMPILE' nosetests THEANO_FLAGS='mode=FAST_COMPILE' nosetests
THEANO_FLAGS='mode=FAST_RUN' nosetests THEANO_FLAGS='mode=FAST_RUN' nosetests
THEANO_FLAGS='mode=DEBUG_MODE' nosetests THEANO_FLAGS='mode=DebugMode' nosetests
.. _random_value_in_tests: .. _random_value_in_tests:
......
.. _glossary: .. _glossary:
Glossary of terminology Glossary
======================= ========
.. glossary:: .. glossary::
......
...@@ -190,12 +190,10 @@ Here is the state of that vision as of 24 October 2011 (after Theano release ...@@ -190,12 +190,10 @@ Here is the state of that vision as of 24 October 2011 (after Theano release
* Will provide better support for GPU on Windows and use an OpenCL backend on CPU. * Will provide better support for GPU on Windows and use an OpenCL backend on CPU.
* Loops work, but not all related optimizations are currently done. * Loops work, but not all related optimizations are currently done.
* The cvm linker allows lazy evaluation. It works, but some work is still * The cvm linker allows lazy evaluation. It is the current default linker.
needed before enabling it by default.
* All tests pass with linker=cvm? * How to have `DebugMode` check it? Right now, DebugMode checks the computation non-lazily.
* How to have `DEBUG_MODE` check it? Right now, DebugMode checks the computation non-lazily. * The profiler used by cvm is less complete than `ProfileMode`.
* The profiler used by cvm is less complete than `PROFILE_MODE`.
* SIMD parallelism on the CPU comes from the compiler. * SIMD parallelism on the CPU comes from the compiler.
* Multi-core parallelism is only supported for gemv and gemm, and only * Multi-core parallelism is only supported for gemv and gemm, and only
......
...@@ -29,7 +29,7 @@ DebugMode can be used as follows: ...@@ -29,7 +29,7 @@ DebugMode can be used as follows:
x = tensor.dvector('x') x = tensor.dvector('x')
f = theano.function([x], 10*x, mode='DEBUG_MODE') f = theano.function([x], 10*x, mode='DebugMode')
f(5) f(5)
f(0) f(0)
...@@ -42,7 +42,7 @@ It can also be used by passing a DebugMode instance as the mode, as in ...@@ -42,7 +42,7 @@ It can also be used by passing a DebugMode instance as the mode, as in
If any problem is detected, DebugMode will raise an exception according to If any problem is detected, DebugMode will raise an exception according to
what went wrong, either at call time (``f(5)``) or compile time ( what went wrong, either at call time (``f(5)``) or compile time (
``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions ``f = theano.function(x, 10*x, mode='DebugMode')``). These exceptions
should *not* be ignored; talk to your local Theano guru or email the should *not* be ignored; talk to your local Theano guru or email the
users list if you cannot make the exception go away. users list if you cannot make the exception go away.
...@@ -51,7 +51,7 @@ In the example above, there is no way to guarantee that a future call to say, ...@@ -51,7 +51,7 @@ In the example above, there is no way to guarantee that a future call to say,
``f(-1)`` won't cause a problem. DebugMode is not a silver bullet. ``f(-1)`` won't cause a problem. DebugMode is not a silver bullet.
If you instantiate DebugMode using the constructor ``compile.DebugMode`` If you instantiate DebugMode using the constructor ``compile.DebugMode``
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via rather than the keyword ``DebugMode`` you can configure its behaviour via
constructor arguments. constructor arguments.
Reference Reference
...@@ -133,7 +133,7 @@ Reference ...@@ -133,7 +133,7 @@ Reference
The keyword version of DebugMode (which you get by using ``mode='DEBUG_MODE``) The keyword version of DebugMode (which you get by using ``mode='DebugMode``)
is quite strict, and can raise several different Exception types. is quite strict, and can raise several different Exception types.
There following are DebugMode exceptions you might encounter: There following are DebugMode exceptions you might encounter:
...@@ -200,7 +200,7 @@ There following are DebugMode exceptions you might encounter: ...@@ -200,7 +200,7 @@ There following are DebugMode exceptions you might encounter:
in the same order when run several times in a row. This can happen if any in the same order when run several times in a row. This can happen if any
steps are ordered by ``id(object)`` somehow, such as via the default object steps are ordered by ``id(object)`` somehow, such as via the default object
hash function. A Stochastic optimization invalidates the pattern of work hash function. A Stochastic optimization invalidates the pattern of work
whereby we debug in DEBUG_MODE and then run the full-size jobs in FAST_RUN. whereby we debug in DebugMode and then run the full-size jobs in FAST_RUN.
.. class:: InvalidValueError(DebugModeError) .. class:: InvalidValueError(DebugModeError)
......
.. _libdoc_compile_mode:
====================================== ======================================
:mod:`mode` -- controlling compilation :mod:`mode` -- controlling compilation
====================================== ======================================
...@@ -17,9 +20,10 @@ Theano defines the following modes by name: ...@@ -17,9 +20,10 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations. - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible. - ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and python - ``'DebugMode'``: A mode for debuging. See :ref:`DebugMode <debugmode>` for details.
implementations. This mode can take much longer than the other modes, - ``'ProfileMode'``: A mode for profiling. See :ref:`ProfileMode <profilemode>` for details.
but can identify many kinds of problems. - ``'DEBUG_MODE'``: Deprecated. Use the string DebugMode.
- ``'PROFILE_MODE'``: Deprecated. Use the string ProfileMode.
The default mode is typically ``FAST_RUN``, but it can be controlled via the The default mode is typically ``FAST_RUN``, but it can be controlled via the
configuration variable :attr:`config.mode`, which can be configuration variable :attr:`config.mode`, which can be
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
Guide Guide
===== =====
The config module contains many attributes that modify Theano's behavior. Many of these The config module contains many ``attributes`` that modify Theano's behavior. Many of these
attributes are consulted during the import of the ``theano`` module and many are assumed to be attributes are consulted during the import of the ``theano`` module and many are assumed to be
read-only. read-only.
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
fgraph fg
toolbox toolbox
type type
......
...@@ -12,18 +12,18 @@ ...@@ -12,18 +12,18 @@
Guide Guide
====== ======
Symbolic printing: the Print() Op Printing during execution
---------------------------------- -------------------------
Intermediate values in a computation cannot be printed in Intermediate values in a computation cannot be printed in
the normal python way with the print statement, because Theano has no *statements*. the normal python way with the print statement, because Theano has no *statements*.
Instead there is the `Print` Op. Instead there is the :class:`Print` Op.
>>> x = T.dvector() >>> x = T.dvector()
>>> hello_world_op = Print('hello world') >>> hello_world_op = printing.Print('hello world')
>>> printed_x = hello_world_op(x) >>> printed_x = hello_world_op(x)
>>> f = function([x], printed_x) >>> f = function([x], printed_x)
>>> f([1,2,3]) >>> f([1, 2, 3])
>>> # output: "hello world __str__ = [ 1. 2. 3.]" >>> # output: "hello world __str__ = [ 1. 2. 3.]"
If you print more than one thing in a function like `f`, they will not If you print more than one thing in a function like `f`, they will not
...@@ -39,15 +39,15 @@ Printing graphs ...@@ -39,15 +39,15 @@ Printing graphs
--------------- ---------------
Theano provides two functions (:func:`theano.pp` and Theano provides two functions (:func:`theano.pp` and
:func:`theano.debugprint`) to print a graph to the terminal before or after :func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways: compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose. :func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a png image of the function. Theano also provides :func:`theano.printing.pydotprint` that creates a png image of the function.
1) The first is :func:`theano.pp`. 1) The first is :func:`theano.pp`.
>>> x = T.dscalar('x') >>> x = T.dscalar('x')
>>> y = x**2 >>> y = x ** 2
>>> gy = T.grad(y, x) >>> gy = T.grad(y, x)
>>> pp(gy) # print out the gradient prior to optimization >>> pp(gy) # print out the gradient prior to optimization
'((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))' '((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))'
...@@ -71,56 +71,63 @@ iteration number or other kinds of information in the name. ...@@ -71,56 +71,63 @@ iteration number or other kinds of information in the name.
To make graphs legible, :func:`pp` hides some Ops that are actually in the graph. For example, To make graphs legible, :func:`pp` hides some Ops that are actually in the graph. For example,
automatic DimShuffles are not shown. automatic DimShuffles are not shown.
2) The second function to print a graph is :func:`theano.printing.debugprint(variable_or_function, depth=-1)`
2) The second function to print a graph is :func:`theano.printing.debugprint`
>>> theano.printing.debugprint(f.maker.fgraph.outputs[0]) >>> theano.printing.debugprint(f.maker.fgraph.outputs[0])
Elemwise{mul,no_inplace} 46950805397392 Elemwise{mul,no_inplace} [@A] ''
2.0 46950805310800 |TensorConstant{2.0} [@B]
x 46950804895504 |x [@C]
Each line printed represents a Variable in the graph. Each line printed represents a Variable in the graph.
The line `` x 46950804895504`` means the variable named 'x' at memory The line ``|x [@C`` means the variable named ``x`` with debugprint identifier
location 46950804895504. If you accidentally have two variables called 'x' in [@C] is an input of the Elemwise. If you accidentally have two variables called ``x`` in
your graph, their different memory locations will be your clue. your graph, their different debugprint identifier will be your clue.
The line `` 2.0 46950805310800`` means that there is a constant 2.0 at the The line ``|TensorConstant{2.0} [@B]`` means that there is a constant 2.0
given memory location. wit this debugprint identifier.
The line `` Elemwise{mul,no_inplace} 46950805397392`` is indented less than The line ``Elemwise{mul,no_inplace} [@A] ''`` is indented less than
the other ones, because it means there is a variable computed by multiplying the other ones, because it means there is a variable computed by multiplying
the other (more indented) ones together. the other (more indented) ones together.
The ``|`` symbol are just there to help read big graph. The group
together inputs to a node.
Sometimes, you'll see a Variable but not the inputs underneath. That can Sometimes, you'll see a Variable but not the inputs underneath. That can
happen when that Variable has already been printed. Where else has it been happen when that Variable has already been printed. Where else has it been
printed? Look for the memory address using the Find feature of your text printed? Look for debugprint identifier using the Find feature of your text
editor. editor.
>>> theano.printing.debugprint(gy) >>> theano.printing.debugprint(gy)
Elemwise{mul} 46950804894224 Elemwise{mul} [@A] ''
Elemwise{mul} 46950804735120 |Elemwise{mul} [@B] ''
Elemwise{second,no_inplace} 46950804626128 | |Elemwise{second,no_inplace} [@C] ''
Elemwise{pow,no_inplace} 46950804625040 | | |Elemwise{pow,no_inplace} [@D] ''
x 46950658736720 | | | |x [@E]
2 46950804039760 | | | |TensorConstant{2} [@F]
1.0 46950804625488 | | |TensorConstant{1.0} [@G]
2 46950804039760 | |TensorConstant{2} [@F]
Elemwise{pow} 46950804737616 |Elemwise{pow} [@H] ''
x 46950658736720 |x [@E]
Elemwise{sub} 46950804736720 |Elemwise{sub} [@I] ''
2 46950804039760 |TensorConstant{2} [@F]
InplaceDimShuffle{} 46950804736016 |InplaceDimShuffle{} [@J] ''
1 46950804735760 |TensorConstant{1} [@K]
>>> theano.printing.debugprint(gy, depth=2) >>> theano.printing.debugprint(gy, depth=2)
Elemwise{mul} 46950804894224 Elemwise{mul} [@A] ''
Elemwise{mul} 46950804735120 |Elemwise{mul} [@B] ''
Elemwise{pow} 46950804737616 |Elemwise{pow} [@C] ''
If the depth parameter is provided, it limits the nuber of levels that are If the depth parameter is provided, it limits the nuber of levels that are
shown. shown.
3) The function :func:`theano.printing.pydotprint(fct, outfile=SOME_DEFAULT_VALUE)` will print a compiled theano function to a png file. 3) The function :func:`theano.printing.pydotprint` will print a compiled theano function to a png file.
In the image, Apply nodes (the applications of ops) are shown as boxes and variables are shown as ovals. In the image, Apply nodes (the applications of ops) are shown as boxes and variables are shown as ovals.
The number at the end of each label indicates graph position. The number at the end of each label indicates graph position.
...@@ -170,10 +177,13 @@ Reference ...@@ -170,10 +177,13 @@ Reference
running the function will print the value that `x` takes in the graph. running the function will print the value that `x` takes in the graph.
.. function:: theano.printing.pp(*args) .. autofunction:: theano.printing.debugprint
.. function:: theano.pp(*args)
TODO Just a shortcut to :func:`theano.printing.pp`
.. autofunction:: theano.printing.pp(*args)
.. autofunction:: theano.printing.debugprint .. autofunction:: theano.printing.pydotprint
...@@ -136,19 +136,35 @@ arange must have its length specified at creation time. ...@@ -136,19 +136,35 @@ arange must have its length specified at creation time.
Simple accumulation into a scalar, ditching lamba Simple accumulation into a scalar, ditching lamba
------------------------------------------------- -------------------------------------------------
This should be fairly self-explanatory. Although this example would seem almost self-explanatory, it stresses a
pitfall to be careful of: the initial output state that is supplied, that is
``output_info``, must be of a **shape similar to that of the output variable**
generated at each iteration and moreover, it **must not involve an implicit
downcast** of the latter.
.. code-block:: python .. code-block:: python
import numpy as np
import theano
import theano.tensor as T
up_to = T.iscalar("up_to") up_to = T.iscalar("up_to")
# define a named function, rather than using lambda # define a named function, rather than using lambda
def accumulate_by_adding(arange_val, sum_to_date): def accumulate_by_adding(arange_val, sum_to_date):
return sum_to_date + arange_val return sum_to_date + arange_val
seq = T.arange(up_to)
# An unauthorized implicit downcast from the dtype of 'seq', to that of
# 'T.as_tensor_variable(0)' which is of dtype 'int8' by default would occur
# if this instruction were to be used instead of the next one:
# outputs_info = T.as_tensor_variable(0)
outputs_info = T.as_tensor_variable(np.asarray(0, seq.dtype))
scan_result, scan_updates = theano.scan(fn=accumulate_by_adding, scan_result, scan_updates = theano.scan(fn=accumulate_by_adding,
outputs_info=T.as_tensor_variable(0), outputs_info=outputs_info,
sequences=T.arange(up_to)) sequences=seq)
triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result) triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result)
# test # test
...@@ -157,7 +173,6 @@ This should be fairly self-explanatory. ...@@ -157,7 +173,6 @@ This should be fairly self-explanatory.
print [n * (n + 1) // 2 for n in xrange(some_num)] print [n * (n + 1) // 2 for n in xrange(some_num)]
Another simple example Another simple example
---------------------- ----------------------
......
.. currentmodule:: tensor .. currentmodule:: tensor
.. _libdoc_basic_tensor:
=========================== ===========================
Basic Tensor Functionality Basic Tensor Functionality
=========================== ===========================
...@@ -532,7 +534,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`. ...@@ -532,7 +534,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.
.. function:: shape_padright(x,n_ones = 1) .. function:: shape_padright(x, n_ones=1)
Reshape `x` by right padding the shape with `n_ones` 1s. Note that all Reshape `x` by right padding the shape with `n_ones` 1s. Note that all
this new dimension will be broadcastable. To make them non-broadcastable this new dimension will be broadcastable. To make them non-broadcastable
...@@ -597,7 +599,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`. ...@@ -597,7 +599,7 @@ dimensions, see :meth:`_tensor_py_operators.dimshuffle`.
Create a matrix by filling the shape of `a` with `b` Create a matrix by filling the shape of `a` with `b`
.. function:: eye(n, m = None, k = 0, dtype=theano.config.floatX) .. function:: eye(n, m=None, k=0, dtype=theano.config.floatX)
:param n: number of rows in output (value or theano scalar) :param n: number of rows in output (value or theano scalar)
:param m: number of columns in output (value or theano scalar) :param m: number of columns in output (value or theano scalar)
...@@ -1065,11 +1067,11 @@ Mathematical ...@@ -1065,11 +1067,11 @@ Mathematical
Returns a variable representing the exponential of a, ie e^a. Returns a variable representing the exponential of a, ie e^a.
.. function:: maximum(a,b) .. function:: maximum(a, b)
Returns a variable representing the maximum element by element of a and b Returns a variable representing the maximum element by element of a and b
.. function:: minimum(a,b) .. function:: minimum(a, b)
Returns a variable representing the minimum element by element of a and b Returns a variable representing the minimum element by element of a and b
......
.. _adding: .. _adding:
======================================== ====================
Baby steps - Adding two numbers together Baby Steps - Algebra
======================================== ====================
Adding two Scalars
Adding two scalars
================== ==================
So, to get us started with Theano and get a feel of what we're working with, To get us started with Theano and get a feel of what we're working with,
let's make a simple function: add two numbers together. Here is how you do let's make a simple function: add two numbers together. Here is how you do
it: it:
...@@ -34,12 +33,12 @@ Let's break this down into several steps. The first step is to define ...@@ -34,12 +33,12 @@ Let's break this down into several steps. The first step is to define
two symbols (*Variables*) representing the quantities that you want two symbols (*Variables*) representing the quantities that you want
to add. Note that from now on, we will use the term to add. Note that from now on, we will use the term
*Variable* to mean "symbol" (in other words, *Variable* to mean "symbol" (in other words,
``x``, ``y``, ``z`` are all *Variable* objects). The output of the function *x*, *y*, *z* are all *Variable* objects). The output of the function
``f`` is a ``numpy.ndarray`` with zero dimensions. *f* is a ``numpy.ndarray`` with zero dimensions.
If you are following along and typing into an interpreter, you may have If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function`` noticed that there was a slight delay in executing the ``function``
instruction. Behind the scenes, ``f`` was being compiled into C code. instruction. Behind the scene, *f* was being compiled into C code.
.. note: .. note:
...@@ -52,12 +51,10 @@ instruction. Behind the scenes, ``f`` was being compiled into C code. ...@@ -52,12 +51,10 @@ instruction. Behind the scenes, ``f`` was being compiled into C code.
>>> x = theano.tensor.ivector() >>> x = theano.tensor.ivector()
>>> y = -x >>> y = -x
``x`` and ``y`` are both Variables, i.e. instances of the *x* and *y* are both Variables, i.e. instances of the
``theano.gof.graph.Variable`` class. The ``theano.gof.graph.Variable`` class. The
type of both ``x`` and ``y`` is ``theano.tensor.ivector``. type of both *x* and *y* is ``theano.tensor.ivector``.
-------------------------------------------
**Step 1** **Step 1**
...@@ -68,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar`` ...@@ -68,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar``
is the type we assign to "0-dimensional arrays (`scalar`) of doubles is the type we assign to "0-dimensional arrays (`scalar`) of doubles
(`d`)". It is a Theano :ref:`type`. (`d`)". It is a Theano :ref:`type`.
``dscalar`` is not a class. Therefore, neither ``x`` nor ``y`` ``dscalar`` is not a class. Therefore, neither *x* nor *y*
are actually instances of ``dscalar``. They are instances of are actually instances of ``dscalar``. They are instances of
:class:`TensorVariable`. ``x`` and ``y`` :class:`TensorVariable`. *x* and *y*
are, however, assigned the theano Type ``dscalar`` in their ``type`` are, however, assigned the theano Type ``dscalar`` in their ``type``
field, as you can see here: field, as you can see here:
...@@ -83,52 +80,49 @@ TensorType(float64, scalar) ...@@ -83,52 +80,49 @@ TensorType(float64, scalar)
>>> x.type is T.dscalar >>> x.type is T.dscalar
True True
You can learn more about the structures in Theano in :ref:`graphstructures`.
By calling ``T.dscalar`` with a string argument, you create a By calling ``T.dscalar`` with a string argument, you create a
*Variable* representing a floating-point scalar quantity with the *Variable* representing a floating-point scalar quantity with the
given name. If you provide no argument, the symbol will be unnamed. Names given name. If you provide no argument, the symbol will be unnamed. Names
are not required, but they can help debugging. are not required, but they can help debugging.
More will be said in a moment regarding Theano's inner structure. You
could also learn more by looking into :ref:`graphstructures`.
-------------------------------------------
**Step 2** **Step 2**
The second step is to combine ``x`` and ``y`` into their sum ``z``: The second step is to combine *x* and *y* into their sum *z*:
>>> z = x + y >>> z = x + y
``z`` is yet another *Variable* which represents the addition of *z* is yet another *Variable* which represents the addition of
``x`` and ``y``. You can use the :ref:`pp <libdoc_printing>` *x* and *y*. You can use the :ref:`pp <libdoc_printing>`
function to pretty-print out the computation associated to ``z``. function to pretty-print out the computation associated to *z*.
>>> print pp(z) >>> print pp(z)
(x + y) (x + y)
-------------------------------------------
**Step 3** **Step 3**
The last step is to create a function taking ``x`` and ``y`` as inputs The last step is to create a function taking *x* and *y* as inputs
and giving ``z`` as output: and giving *z* as output:
>>> f = function([x, y], z) >>> f = function([x, y], z)
The first argument to :func:`function <function.function>` is a list of Variables The first argument to :func:`function <function.function>` is a list of Variables
that will be provided as inputs to the function. The second argument that will be provided as inputs to the function. The second argument
is a single Variable *or* a list of Variables. For either case, the second is a single Variable *or* a list of Variables. For either case, the second
argument is what we want to see as output when we apply the function. argument is what we want to see as output when we apply the function. *f* may
then be used like a normal Python function.
``f`` may then be used like a normal Python function.
Adding two Matrices
Adding two matrices
=================== ===================
You might already have guessed how to do this. Indeed, the only change You might already have guessed how to do this. Indeed, the only change
from the previous example is that you need to instantiate ``x`` and from the previous example is that you need to instantiate *x* and
``y`` using the matrix Types: *y* using the matrix Types:
.. If you modify this code, also change : .. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_adding.test_adding_2 .. theano/tests/test_tutorial.py:T_adding.test_adding_2
...@@ -138,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and ...@@ -138,14 +132,14 @@ from the previous example is that you need to instantiate ``x`` and
>>> z = x + y >>> z = x + y
>>> f = function([x, y], z) >>> f = function([x, y], z)
``dmatrix`` is the Type for matrices of doubles. And then we can use ``dmatrix`` is the Type for matrices of doubles. Then we can use
our new function on 2D arrays: our new function on 2D arrays:
>>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]]) >>> f([[1, 2], [3, 4]], [[10, 20], [30, 40]])
array([[ 11., 22.], array([[ 11., 22.],
[ 33., 44.]]) [ 33., 44.]])
The variable is a numpy array. We can also use numpy arrays directly as The variable is a NumPy array. We can also use NumPy arrays directly as
inputs: inputs:
>>> import numpy >>> import numpy
...@@ -159,18 +153,36 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`. ...@@ -159,18 +153,36 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`.
The following types are available: The following types are available:
* **byte**: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4 * **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``
* **32-bit integers**: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4 * **16-bit integers**: ``wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4``
* **64-bit integers**: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4 * **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``
* **float**: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4 * **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``
* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4 * **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``
* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4 * **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``
* **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``
The previous list is not exhaustive. A guide to all types compatible The previous list is not exhaustive and a guide to all types compatible
with numpy arrays may be found :ref:`here <libdoc_tensor_creation>`. with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.
.. note:: .. note::
You, the user---not the system architecture---have to choose whether your You, the user---not the system architecture---have to choose whether your
program will use 32- or 64-bit integers (``i`` prefix vs. the ``l`` prefix) program will use 32- or 64-bit integers (``i`` prefix vs. the ``l`` prefix)
and floats (``f`` prefix vs. the ``d`` prefix). and floats (``f`` prefix vs. the ``d`` prefix).
-------------------------------------------
**Exercise**
.. code-block:: python
import theano
a = theano.tensor.vector() # declare variable
out = a + a ** 10 # build symbolic expression
f = theano.function([a], out) # compile function
print f([0, 1, 2]) # prints `array([0, 2, 1026])`
Modify and execute this code to compute this expression: a ** 2 + b ** 2 + 2 * a * b.
:download:`Solution<adding_solution_1.py>`
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Baby Steps - Algebra'
import theano
a = theano.tensor.vector() # declare variable
b = theano.tensor.vector() # declare variable
out = a ** 2 + b ** 2 + 2 * a * b # build symbolic expression
f = theano.function([a, b], out) # compile function
print f([1, 2], [4, 5]) # prints [ 25. 49.]
差异被折叠。
...@@ -4,53 +4,56 @@ ...@@ -4,53 +4,56 @@
Conditions Conditions
========== ==========
**IfElse vs switch** IfElse vs Switch
================
- Build condition over symbolic variables.
- IfElse Op takes a `boolean` condition and two variables to compute as input. - Both ops build a condition over symbolic variables.
- Switch take a `tensor` as condition and two variables to compute as input. - ``IfElse`` takes a *boolean* condition and two variables as inputs.
- Switch is an elementwise operation. It is more general than IfElse. - ``Switch`` takes a *tensor* as condition and two variables as inputs.
- While Switch Op evaluates both 'output' variables, IfElse Op is lazy and only ``switch`` is an elementwise operation and is thus more general than ``ifelse``.
evaluates one variable respect to the condition. - Whereas ``switch`` evaluates both *output* variables, ``ifelse`` is lazy and only
evaluates one variable with respect to the condition.
**Example** **Example**
.. code-block:: python .. code-block:: python
from theano import tensor as T from theano import tensor as T
from theano.ifelse import ifelse from theano.ifelse import ifelse
import theano, time, numpy import theano, time, numpy
a,b = T.scalars('a','b') a,b = T.scalars('a', 'b')
x,y = T.matrices('x','y') x,y = T.matrices('x', 'y')
z_switch = T.switch(T.lt(a,b), T.mean(x), T.mean(y)) z_switch = T.switch(T.lt(a, b), T.mean(x), T.mean(y))
z_lazy = ifelse(T.lt(a,b), T.mean(x), T.mean(y)) z_lazy = ifelse(T.lt(a, b), T.mean(x), T.mean(y))
f_switch = theano.function([a,b,x,y], z_switch, f_switch = theano.function([a, b, x, y], z_switch,
mode=theano.Mode(linker='vm')) mode=theano.Mode(linker='vm'))
f_lazyifelse = theano.function([a,b,x,y], z_lazy, f_lazyifelse = theano.function([a, b, x, y], z_lazy,
mode=theano.Mode(linker='vm')) mode=theano.Mode(linker='vm'))
val1 = 0. val1 = 0.
val2 = 1. val2 = 1.
big_mat1 = numpy.ones((10000,1000)) big_mat1 = numpy.ones((10000, 1000))
big_mat2 = numpy.ones((10000,1000)) big_mat2 = numpy.ones((10000, 1000))
n_times = 10 n_times = 10
tic = time.clock() tic = time.clock()
for i in xrange(n_times): for i in xrange(n_times):
f_switch(val1, val2, big_mat1, big_mat2) f_switch(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating both values %f sec'%(time.clock()-tic) print 'time spent evaluating both values %f sec' % (time.clock() - tic)
tic = time.clock() tic = time.clock()
for i in xrange(n_times): for i in xrange(n_times):
f_lazyifelse(val1, val2, big_mat1, big_mat2) f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic) print 'time spent evaluating one value %f sec' % (time.clock() - tic)
In this example, IfElse Op spend less time (about an half) than Switch In this example, the ``IfElse`` op spends less time (about half as much) than ``Switch``
since it computes only one variable instead of both. since it computes only one variable out of the two.
.. code-block:: python .. code-block:: python
...@@ -59,11 +62,10 @@ since it computes only one variable instead of both. ...@@ -59,11 +62,10 @@ since it computes only one variable instead of both.
time spent evaluating one value 0.3500 sec time spent evaluating one value 0.3500 sec
It is actually important to use ``linker='vm'`` or ``linker='cvm'``, Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
otherwise IfElse will compute both variables and take the same computation variables and take the same computation time as ``switch``. Although the linker
time as the Switch Op. The linker is not currently set by default to 'cvm' but is not currently set by default to ``cvm``, it will be in the near future.
it will be in a near future.
There is not an optimization to automatically change a switch with a There is no automatic optimization replacing a ``switch`` with a
broadcasted scalar to an ifelse, as this is not always the faster. See broadcasted scalar to an ``ifelse``, as this is not always faster. See
this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_. this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
差异被折叠。
差异被折叠。
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Extending Theano'
import unittest
import theano
# 1. Op returns x * y
class ProdOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, x, y):
x = theano.tensor.as_tensor_variable(x)
y = theano.tensor.as_tensor_variable(y)
outdim = x.ndim
output = (theano.tensor.TensorType
(dtype=theano.scalar.upcast(x.dtype, y.dtype),
broadcastable=[False] * outdim)())
return theano.Apply(self, inputs=[x, y], outputs=[output])
def perform(self, node, inputs, output_storage):
x, y = inputs
z = output_storage[0]
z[0] = x * y
def infer_shape(self, node, i0_shapes):
return [i0_shapes[0]]
def grad(self, inputs, output_grads):
return [output_grads[0] * inputs[1], output_grads[0] * inputs[0]]
# 2. Op returns x + y and x - y
class SumDiffOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, x, y):
x = theano.tensor.as_tensor_variable(x)
y = theano.tensor.as_tensor_variable(y)
outdim = x.ndim
output1 = (theano.tensor.TensorType
(dtype=theano.scalar.upcast(x.dtype, y.dtype),
broadcastable=[False] * outdim)())
output2 = (theano.tensor.TensorType
(dtype=theano.scalar.upcast(x.dtype, y.dtype),
broadcastable=[False] * outdim)())
return theano.Apply(self, inputs=[x, y], outputs=[output1, output2])
def perform(self, node, inputs, output_storage):
x, y = inputs
z1, z2 = output_storage
z1[0] = x + y
z2[0] = x - y
def infer_shape(self, node, i0_shapes):
return [i0_shapes[0], i0_shapes[0]]
def grad(self, inputs, output_grads):
og1, og2 = output_grads
if og1 is None:
og1 = theano.tensor.zeros_like(og2)
if og2 is None:
og2 = theano.tensor.zeros_like(og1)
return [og1 + og2, og1 - og2]
# 3. Testing apparatus
import numpy
from theano.gof import Op, Apply
from theano import tensor, function, printing
from theano.tests import unittest_tools as utt
class TestProdOp(utt.InferShapeTester):
rng = numpy.random.RandomState(43)
def setUp(self):
super(TestProdOp, self).setUp()
self.op_class = ProdOp # case 1
def test_perform(self):
x = theano.tensor.matrix()
y = theano.tensor.matrix()
f = theano.function([x, y], self.op_class()(x, y))
x_val = numpy.random.rand(5, 4)
y_val = numpy.random.rand(5, 4)
out = f(x_val, y_val)
assert numpy.allclose(x_val * y_val, out)
def test_gradient(self):
utt.verify_grad(self.op_class(), [numpy.random.rand(5, 4),
numpy.random.rand(5, 4)],
n_tests=1, rng=TestProdOp.rng)
def test_infer_shape(self):
x = tensor.dmatrix()
y = tensor.dmatrix()
self._compile_and_check([x, y], [self.op_class()(x, y)],
[numpy.random.rand(5, 6),
numpy.random.rand(5, 6)],
self.op_class)
class TestSumDiffOp(utt.InferShapeTester):
rng = numpy.random.RandomState(43)
def setUp(self):
super(TestSumDiffOp, self).setUp()
self.op_class = SumDiffOp
def test_perform(self):
x = theano.tensor.matrix()
y = theano.tensor.matrix()
f = theano.function([x, y], self.op_class()(x, y))
x_val = numpy.random.rand(5, 4)
y_val = numpy.random.rand(5, 4)
out = f(x_val, y_val)
assert numpy.allclose([x_val + y_val, x_val - y_val], out)
def test_gradient(self):
def output_0(x, y):
return self.op_class()(x, y)[0]
def output_1(x, y):
return self.op_class()(x, y)[1]
utt.verify_grad(output_0, [numpy.random.rand(5, 4),
numpy.random.rand(5, 4)],
n_tests=1, rng=TestSumDiffOp.rng)
utt.verify_grad(output_1, [numpy.random.rand(5, 4),
numpy.random.rand(5, 4)],
n_tests=1, rng=TestSumDiffOp.rng)
def test_infer_shape(self):
x = tensor.dmatrix()
y = tensor.dmatrix()
# adapt the choice of the next instruction to the op under test
self._compile_and_check([x, y], self.op_class()(x, y),
[numpy.random.rand(5, 6),
numpy.random.rand(5, 6)],
self.op_class)
if __name__ == "__main__":
unittest.main()
...@@ -8,33 +8,46 @@ Frequently Asked Questions ...@@ -8,33 +8,46 @@ Frequently Asked Questions
TypeError: object of type 'TensorVariable' has no len() TypeError: object of type 'TensorVariable' has no len()
------------------------------------------------------- -------------------------------------------------------
If you receive this error: If you receive the following error, it is because the Python function *__len__* cannot
be implemented on Theano variables:
.. code-block:: python .. code-block:: python
TypeError: object of type 'TensorVariable' has no len() TypeError: object of type 'TensorVariable' has no len()
We can't implement the __len__ function on Theano Variables. This is Python requires that *__len__* returns an integer, yet it cannot be done as Theano's variables are symbolic. However, `var.shape[0]` can be used as a workaround.
because Python requires that this function returns an integer, but we
can't do this as we are working with symbolic variables. You can use
`var.shape[0]` as a workaround.
Also we can't change the above error message into a more explicit one This error message cannot be made more explicit because the relevant aspects of Python's
because of some other Python internal behavior that can't be modified. internals cannot be modified.
Faster gcc optimization Faster gcc optimization
----------------------- -----------------------
You can enable faster gcc optimization with the cxxflags. This list of flags was suggested on the mailing list:: You can enable faster gcc optimization with the ``cxxflags``. This list of flags was suggested on the mailing list::
cxxflags=-march=native -O3 -ffast-math -ftree-loop-distribution -funroll-loops -ftracer cxxflags=-march=native -O3 -ffast-math -ftree-loop-distribution -funroll-loops -ftracer
Use it at your own risk. Some people warned that the -ftree-loop-distribution optimization caused them wrong results in the past. Use it at your own risk. Some people warned that the ``-ftree-loop-distribution`` optimization resulted in wrong results in the past.
Also the -march=native must be used with care if you have NFS. In that case, you MUST set the compiledir to a local path of the computer. Also the ``-march=native`` flag must be used with care if you have NFS. In that case, you MUST set the compiledir to a local path of the computer.
Related Projects Related Projects
---------------- ----------------
We try to list in this `wiki page <https://github.com/Theano/Theano/wiki/Related-projects>`_ other Theano related projects. We try to list in this `wiki page <https://github.com/Theano/Theano/wiki/Related-projects>`_ other Theano related projects.
"What are Theano's Limitations?"
--------------------------------
Theano offers a good amount of flexibility, but has some limitations too.
You must answer for yourself the following question: How can my algorithm be cleverly written
so as to make the most of what Theano can do?
Here is a list of some of the known limitations:
- *While*- or *for*-Loops within an expression graph are supported, but only via
the :func:`theano.scan` op (which puts restrictions on how the loop body can
interact with the rest of the graph).
- Neither *goto* nor *recursion* is supported or planned within expression graphs.
...@@ -7,54 +7,130 @@ PyCUDA/CUDAMat/Gnumpy compatibility ...@@ -7,54 +7,130 @@ PyCUDA/CUDAMat/Gnumpy compatibility
PyCUDA PyCUDA
====== ======
Currently PyCUDA and Theano have different object to store GPU Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features. data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports Theano's implementation is called *CudaNdarray* and supports
strides. It support only the float32 dtype. PyCUDA's implementation *strides*. It also only supports the *float32* dtype. PyCUDA's implementation
is called GPUArray and doesn't support strides. Instead it can deal with all numpy and Cuda dtypes. is called *GPUArray* and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes.
We are currently working on having the same base object that will We are currently working on having the same base object for both that will
mimic numpy. Until this is ready, here is some information on how to also mimic Numpy. Until this is ready, here is some information on how to
use both Project in the same script. use both objects in the same script.
Transfer Transfer
-------- --------
You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
from CudaNdarray. The function `to_cudandarray(x, copyif=False)` and from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
`to_gpuarray(x)` return a new object that share the same memory space ``to_gpuarray(x)`` return a new object that occupies the same memory space
as the original. Otherwise it raise an ValueError. Because GPUArray don't as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't
support strides, if the CudaNdarray is strided, we could copy it to support strides, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set `copyif=True` in memory region. If you want this behavior, set ``copyif=True`` in
`to_gpuarray`. ``to_gpuarray``.
Compiling with PyCUDA Compiling with PyCUDA
--------------------- ---------------------
You can use PyCUDA to compile some CUDA function that work directly on You can use PyCUDA to compile CUDA functions that work directly on
CudaNdarray. There is an example in the function `test_pycuda_theano` CudaNdarrays. Here is an example from the file ``theano/misc/tests/test_pycuda_theano_simple.py``:
in the file `theano/misc/tests/test_pycuda_theano_simple.py`. Also,
there is an example that shows how to make an op that calls a pycuda .. code-block:: python
function :ref:`here <pyCUDA_theano>`.
import sys
Theano op using PyCUDA function import numpy
------------------------------- import theano
import theano.sandbox.cuda as cuda_ndarray
You can use gpu function compiled with PyCUDA in a Theano op. Look import theano.misc.pycuda_init
into the `HPCS2011 tutorial import pycuda
<http://www.iro.umontreal.ca/~lisa/pointeurs/tutorial_hpcs2011_fixed.pdf>`_ for an example. import pycuda.driver as drv
import pycuda.gpuarray
def test_pycuda_theano():
"""Simple example with pycuda function and Theano CudaNdarray object."""
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(100).astype(numpy.float32)
b = numpy.random.randn(100).astype(numpy.float32)
# Test with Theano object
ga = cuda_ndarray.CudaNdarray(a)
gb = cuda_ndarray.CudaNdarray(b)
dest = cuda_ndarray.CudaNdarray.zeros(a.shape)
multiply_them(dest, ga, gb,
block=(400, 1, 1), grid=(1, 1))
assert (numpy.asarray(dest) == a * b).all()
Theano Op using a PyCUDA function
---------------------------------
You can use a GPU function compiled with PyCUDA in a Theano op:
.. code-block:: python
import numpy, theano
import theano.misc.pycuda_init
from pycuda.compiler import SourceModule
import theano.sandbox.cuda as cuda
class PyCUDADoubleOp(theano.Op):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash(type(self))
def __str__(self):
return self.__class__.__name__
def make_node(self, inp):
inp = cuda.basic_ops.gpu_contiguous(
cuda.basic_ops.as_cuda_ndarray_variable(inp))
assert inp.dtype == "float32"
return theano.Apply(self, [inp], [inp.type()])
def make_thunk(self, node, storage_map, _, _2):
mod = SourceModule("""
__global__ void my_fct(float * i0, float * o0, int size) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(i<size){
o0[i] = i0[i] * 2;
}
}""")
pycuda_fct = mod.get_function("my_fct")
inputs = [ storage_map[v] for v in node.inputs]
outputs = [ storage_map[v] for v in node.outputs]
def thunk():
z = outputs[0]
if z[0] is None or z[0].shape!=inputs[0][0].shape:
z[0] = cuda.CudaNdarray.zeros(inputs[0][0].shape)
grid = (int(numpy.ceil(inputs[0][0].size / 512.)),1)
pycuda_fct(inputs[0][0], z[0], numpy.intc(inputs[0][0].size),
block=(512, 1, 1), grid=grid)
return thunk
CUDAMat CUDAMat
======= =======
There is conversion function between CUDAMat object and Theano CudaNdArray. They are with the same principe as PyCUDA one's. They are in theano.misc.cudamat_utils.py There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects.
They obey the same principles as Theano's PyCUDA functions and can be found in
``theano.misc.cudamat_utils.py``.
.. TODO: this statement is unclear:
WARNING: there is a strange problem with stride/shape with those converter. The test to work need a transpose and reshape... WARNING: There is a peculiar problem associated with stride/shape with those converters.
In order to work, the test needs a *transpose* and *reshape*...
Gnumpy Gnumpy
====== ======
There is conversion function between gnumpy garray object and Theano CudaNdArray. They are with the same principe as PyCUDA one's. They are in theano.misc.gnumpy_utils.py There are conversion functions between Gnumpy *garray* objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in ``theano.misc.gnumpy_utils.py``.
差异被折叠。
...@@ -5,20 +5,21 @@ ...@@ -5,20 +5,21 @@
Tutorial Tutorial
======== ========
Let us start an interactive session (e.g. ``python`` or ``ipython``) and import Theano. Let us start an interactive session (e.g. with ``python`` or ``ipython``) and import Theano.
>>> from theano import * >>> from theano import *
Many of symbols you will need to use are in the ``tensor`` subpackage Several of the symbols you will need to use are in the ``tensor`` subpackage
of Theano. Let's import that subpackage under a handy name like of Theano. Let us import that subpackage under a handy name like
``T`` (many tutorials use this convention). ``T`` (the tutorials will frequently use this convention).
>>> import theano.tensor as T >>> import theano.tensor as T
If that worked you are ready for the tutorial, otherwise check your If that succeeded you are ready for the tutorial, otherwise check your
installation (see :ref:`install`). installation (see :ref:`install`).
Throughout the tutorial, bear in mind that there is a :ref:`glossary` to help Throughout the tutorial, bear in mind that there is a :ref:`glossary` as well
as *index* and *modules* links in the upper-right corner of each page to help
you out. you out.
.. toctree:: .. toctree::
...@@ -27,18 +28,18 @@ you out. ...@@ -27,18 +28,18 @@ you out.
numpy numpy
adding adding
examples examples
gradients
loading_and_saving
symbolic_graphs symbolic_graphs
printing_drawing
gradients
modes modes
aliasing loading_and_saving
conditions conditions
loop loop
sparse sparse
using_gpu using_gpu
gpu_data_convert gpu_data_convert
aliasing
shape_info shape_info
remarks
extending_theano
debug_faq debug_faq
extending_theano
faq faq
...@@ -6,8 +6,8 @@ Loading and Saving ...@@ -6,8 +6,8 @@ Loading and Saving
================== ==================
Python's standard way of saving class instances and reloading them Python's standard way of saving class instances and reloading them
is the pickle_ mechanism. Many Theano objects can be serialized (and is the pickle_ mechanism. Many Theano objects can be *serialized* (and
deserialized) by ``pickle``, however, a limitation of ``pickle`` is that *deserialized*) by ``pickle``, however, a limitation of ``pickle`` is that
it does not save the code or data of a class along with the instance of it does not save the code or data of a class along with the instance of
the class being serialized. As a result, reloading objects created by a the class being serialized. As a result, reloading objects created by a
previous version of a class can be really problematic. previous version of a class can be really problematic.
...@@ -24,7 +24,7 @@ as you would in the course of any other Python program. ...@@ -24,7 +24,7 @@ as you would in the course of any other Python program.
.. _pickle: http://docs.python.org/library/pickle.html .. _pickle: http://docs.python.org/library/pickle.html
The basics of pickling The Basics of Pickling
====================== ======================
The two modules ``pickle`` and ``cPickle`` have the same functionalities, but The two modules ``pickle`` and ``cPickle`` have the same functionalities, but
...@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with ...@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with
.. note:: .. note::
If you want your saved object to be stored efficiently, don't forget If you want your saved object to be stored efficiently, don't forget
to use ``cPickle.HIGHEST_PROTOCOL``, the resulting file can be to use ``cPickle.HIGHEST_PROTOCOL``. The resulting file can be
dozens of times smaller than with the default protocol. dozens of times smaller than with the default protocol.
.. note:: .. note::
...@@ -81,7 +81,7 @@ For more details about pickle's usage, see ...@@ -81,7 +81,7 @@ For more details about pickle's usage, see
`Python documentation <http://docs.python.org/library/pickle.html#usage>`_. `Python documentation <http://docs.python.org/library/pickle.html#usage>`_.
Short-term serialization Short-Term Serialization
======================== ========================
If you are confident that the class instance you are serializing will be If you are confident that the class instance you are serializing will be
...@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of: ...@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of:
self.training_set = cPickle.load(file(self.training_set_file, 'rb')) self.training_set = cPickle.load(file(self.training_set_file, 'rb'))
Long-term serialization Long-Term Serialization
======================= =======================
If the implementation of the class you want to save is quite unstable, for If the implementation of the class you want to save is quite unstable, for
...@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you ...@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you
don't. don't.
For instance, if the only parameters you want to save are a weight For instance, if the only parameters you want to save are a weight
matrix ``W`` and a bias ``b``, you can define: matrix *W* and a bias *b*, you can define:
.. code-block:: python .. code-block:: python
...@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define: ...@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define:
self.W = W self.W = W
self.b = b self.b = b
If, at some point in time, ``W`` is renamed to ``weights`` and ``b`` to If at some point in time *W* is renamed to *weights* and *b* to
``bias``, the older pickled files will still be usable, if you update these *bias*, the older pickled files will still be usable, if you update these
functions to reflect the change in name: functions to reflect the change in name:
.. code-block:: python .. code-block:: python
...@@ -152,6 +152,6 @@ functions to reflect the change in name: ...@@ -152,6 +152,6 @@ functions to reflect the change in name:
self.weights = W self.weights = W
self.bias = b self.bias = b
For more information on advanced use of pickle and its internals, see Python's For more information on advanced use of ``pickle`` and its internals, see Python's
pickle_ documentation. pickle_ documentation.
...@@ -4,4 +4,94 @@ ...@@ -4,4 +4,94 @@
Loop Loop
==== ====
You can use :ref:`Scan <lib_scan>` to do all type of loop in Theano. All the documentation about it is in the library for now.
Scan
====
- A general form of *recurrence*, which can be used for looping.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of ``scan``.
- You ``scan`` a function along some input sequence, producing an output at each time-step.
- The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the *z + x(i)* function over a list, given an initial state of *z=0*.
- Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over *for* loops:
- Number of iterations to be part of the symbolic graph.
- Minimizes GPU transfers (if GPU is involved).
- Computes gradients through sequential steps.
- Slightly faster than using a *for* loop in Python with a compiled Theano function.
- Can lower the overall memory usage by detecting the actual amount of memory needed.
The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
**Scan Example: Computing pow(A,k)**
.. code-block:: python
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
k = T.iscalar("k")
A = T.vector("A")
def inner_fct(prior_result, A):
return prior_result * A
# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
outputs_info=T.ones_like(A),
non_sequences=A, n_steps=k)
# Scan has provided us with A ** 1 through A ** k. Keep only the last
# value. Scan notices this and does not waste memory saving them.
final_result = result[-1]
power = theano.function(inputs=[A, k], outputs=final_result,
updates=updates)
print power(range(10),2)
#[ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81.]
**Scan Example: Calculating a Polynomial**
.. code-block:: python
import numpy
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range=theano.tensor.arange(max_coefficients_supported)
components, updates = theano.scan(fn=lambda coeff, power, free_var:
coeff * (free_var ** power),
outputs_info=None,
sequences=[coefficients, full_range],
non_sequences=x)
polynomial = components.sum()
calculate_polynomial = theano.function(inputs=[coefficients, x],
outputs=polynomial)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial(test_coeff, 3)
# 19.0
-------------------------------------------
**Exercise**
Run both examples.
Modify and execute the polynomial example to have the reduction done by ``scan``.
:download:`Solution<loop_solution_1.py>`
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Loop'
import numpy
import theano
import theano.tensor as tt
# 1. First example
theano.config.warn.subtensor_merge_bug = False
k = tt.iscalar("k")
A = tt.vector("A")
def inner_fct(prior_result, A):
return prior_result * A
# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
outputs_info=tt.ones_like(A),
non_sequences=A, n_steps=k)
# Scan has provided us with A ** 1 through A ** k. Keep only the last
# value. Scan notices this and does not waste memory saving them.
final_result = result[-1]
power = theano.function(inputs=[A, k], outputs=final_result,
updates=updates)
print power(range(10), 2)
# [ 0. 1. 4. 9. 16. 25. 36. 49. 64. 81.]
# 2. Second example
coefficients = tt.vector("coefficients")
x = tt.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range = tt.arange(max_coefficients_supported)
components, updates = theano.scan(fn=lambda coeff, power, free_var:
coeff * (free_var ** power),
sequences=[coefficients, full_range],
outputs_info=None,
non_sequences=x)
polynomial = components.sum()
calculate_polynomial1 = theano.function(inputs=[coefficients, x],
outputs=polynomial)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial1(test_coeff, 3)
# 19.0
# 3. Reduction performed inside scan
theano.config.warn.subtensor_merge_bug = False
coefficients = tt.vector("coefficients")
x = tt.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
full_range = tt.arange(max_coefficients_supported)
outputs_info = tt.as_tensor_variable(numpy.asarray(0, 'float64'))
components, updates = theano.scan(fn=lambda coeff, power, prior_value, free_var:
prior_value + (coeff * (free_var ** power)),
sequences=[coefficients, full_range],
outputs_info=outputs_info,
non_sequences=x)
polynomial = components[-1]
calculate_polynomial = theano.function(inputs=[coefficients, x],
outputs=polynomial, updates=updates)
test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
print calculate_polynomial(test_coeff, 3)
# 19.0
差异被折叠。
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Configuration Settings and Compiling Modes'
import numpy
import theano
import theano.tensor as tt
theano.config.floatX = 'float32'
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N, low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = tt.matrix("x")
y = tt.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + tt.exp(-tt.dot(x, w) - b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y * tt.log(p_1) - (1 - y) * tt.log(1 - p_1) # Cross-entropy
cost = tt.cast(xent.mean(), 'float32') + \
0.01 * (w ** 2).sum() # The cost to optimize
gw, gb = tt.grad(cost, [w, b])
# Compile expressions to functions
train = theano.function(
inputs=[x, y],
outputs=[prediction, xent],
updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
name="train")
predict = theano.function(inputs=[x], outputs=prediction,
name="predict")
if any([x.op.__class__.__name__ in ['Gemv', 'CGemv', 'Gemm', 'CGemm'] for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any([x.op.__class__.__name__ in ['GpuGemm', 'GpuGemv'] for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
...@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a ...@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a
neural network then the weights from the input to the first hidden neural network then the weights from the input to the first hidden
layer would represent a matrix of size (5, #hid). layer would represent a matrix of size (5, #hid).
If I have an array: Consider this array:
>>> numpy.asarray([[1., 2], [3, 4], [5, 6]]) >>> numpy.asarray([[1., 2], [3, 4], [5, 6]])
array([[ 1., 2.], array([[ 1., 2.],
...@@ -37,7 +37,7 @@ This is a 3x2 matrix, i.e. there are 3 rows and 2 columns. ...@@ -37,7 +37,7 @@ This is a 3x2 matrix, i.e. there are 3 rows and 2 columns.
To access the entry in the 3rd row (row #2) and the 1st column (column #0): To access the entry in the 3rd row (row #2) and the 1st column (column #0):
>>> numpy.asarray([[1., 2], [3, 4], [5, 6]])[2,0] >>> numpy.asarray([[1., 2], [3, 4], [5, 6]])[2, 0]
5.0 5.0
...@@ -61,5 +61,5 @@ array([2., 4., 6.]) ...@@ -61,5 +61,5 @@ array([2., 4., 6.])
The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size
as ``a`` during the multiplication. This trick is often useful in as ``a`` during the multiplication. This trick is often useful in
simplifying how expression are written. More details about *broadcasting* simplifying how expression are written. More detail about *broadcasting*
can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__. can be found in the `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
.. _tutorial_printing_drawing:
==============================
Printing/Drawing Theano graphs
==============================
.. TODO: repair the defective links in the next paragraph
Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a *png* image of the function.
You can read about them in :ref:`libdoc_printing`.
Consider again the logistic regression but notice the additional printing instuctions.
The following output depicts the pre- and post- compilation graphs.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y * T.log(p_1) - (1 - y) * T.log(1 - p_1) # Cross-entropy
cost = xent.mean() + 0.01 * (w ** 2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w, b])
# Compile expressions to functions
train = theano.function(
inputs=[x, y],
outputs=[prediction, xent],
updates={w: w - 0.01 * gw, b: b - 0.01 * gb},
name="train")
predict = theano.function(inputs=[x], outputs=prediction,
name="predict")
if any( [x.op.__class__.__name__=='Gemv' for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
# Print the picture graphs
# after compilation
theano.printing.pydotprint(predict,
outfile="pics/logreg_pydotprint_predic.png",
var_with_name_simple=True)
# before compilation
theano.printing.pydotprint_variables(prediction,
outfile="pics/logreg_pydotprint_prediction.png",
var_with_name_simple=True)
theano.printing.pydotprint(train,
outfile="pics/logreg_pydotprint_train.png",
var_with_name_simple=True)
Pretty Printing
===============
``theano.printing.pprint(variable)``
>>> theano.printing.pprint(prediction) # (pre-compilation)
gt((TensorConstant{1} / (TensorConstant{1} + exp(((-(x \\dot w)) - b)))),TensorConstant{0.5})
Debug Printing
==============
``theano.printing.debugprint({fct, variable, list of variables})``
>>> theano.printing.debugprint(prediction) # (pre-compilation)
Elemwise{gt,no_inplace} [@181772236] ''
|Elemwise{true_div,no_inplace} [@181746668] ''
| |InplaceDimShuffle{x} [@181746412] ''
| | |TensorConstant{1} [@181745836]
| |Elemwise{add,no_inplace} [@181745644] ''
| | |InplaceDimShuffle{x} [@181745420] ''
| | | |TensorConstant{1} [@181744844]
| | |Elemwise{exp,no_inplace} [@181744652] ''
| | | |Elemwise{sub,no_inplace} [@181744012] ''
| | | | |Elemwise{neg,no_inplace} [@181730764] ''
| | | | | |dot [@181729676] ''
| | | | | | |x [@181563948]
| | | | | | |w [@181729964]
| | | | |InplaceDimShuffle{x} [@181743788] ''
| | | | | |b [@181730156]
|InplaceDimShuffle{x} [@181771788] ''
| |TensorConstant{0.5} [@181771148]
>>> theano.printing.debugprint(predict) # (post-compilation)
Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] '' 2
|dot [@183018796] '' 1
| |x [@183000780]
| |w [@183000812]
|InplaceDimShuffle{x} [@183133580] '' 0
| |b [@183000876]
|TensorConstant{[ 0.5]} [@183084108]
Picture Printing
================
>>> theano.printing.pydotprint_variables(prediction) # (pre-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
:width: 800 px
Notice that ``pydotprint()`` requires *Graphviz* and Python's ``pydot``.
>>> theano.printing.pydotprint(predict) # (post-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
:width: 800 px
>>> theano.printing.pydotprint(train) # This is a small train example!
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
:width: 1500 px
...@@ -5,7 +5,8 @@ ...@@ -5,7 +5,8 @@
Python tutorial Python tutorial
*************** ***************
In this documentation, we suppose that reader know python. Here is a small list of python tutorials/exercices if you know know it or need a refresher: In this documentation, we suppose that the reader knows Python. Here is a small list of Python
tutorials/exercises if you need to learn it or only need a refresher:
* `Python Challenge <http://www.pythonchallenge.com/>`__ * `Python Challenge <http://www.pythonchallenge.com/>`__
* `Dive into Python <http://diveintopython.net/>`__ * `Dive into Python <http://diveintopython.net/>`__
......
.. _tutorial_general_remarks:
=====================
Some general Remarks
=====================
Theano offers quite a bit of flexibility, but has some limitations too.
How should you write your algorithm to make the most of what Theano can do?
Limitations
-----------
- While- or for-Loops within an expression graph are supported, but only via
the :func:`theano.scan` op (which puts restrictions on how the loop body can
interact with the rest of the graph).
- Neither ``goto`` nor recursion is supported or planned within expression graphs.
.. _shape_info: .. _shape_info:
============================================ ==========================================
How shape informations are handled by Theano How Shape Information is Handled by Theano
============================================ ==========================================
It is not possible to enforce strict shape into a Theano variable when It is not possible to strictly enforce the shape of a Theano variable when
building a graph. The given parameter of theano.function can change the building a graph since the particular value provided at run-time for a parameter of a
shape any TheanoVariable in a graph. Theano function may condition the shape of the Theano variables in its graph.
Currently shape informations are used for 2 things in Theano: Currently, information regarding shape is used in two ways in Theano:
- When the exact shape is known, we use it to generate faster c code for - To generate faster C code for the 2d convolution on the CPU and the GPU,
the 2d convolution on the cpu and gpu. when the exact output shape is known in advance.
- To remove computations in the graph when we only want to know the - To remove computations in the graph when we only want to know the
shape, but not the actual value of a variable. This is done with the shape, but not the actual value of a variable. This is done with the
`Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_ `Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_
method. method.
ex: Example:
.. code-block:: python .. code-block:: python
import theano import theano
x = theano.tensor.matrix('x') x = theano.tensor.matrix('x')
f = theano.function([x], (x**2).shape) f = theano.function([x], (x ** 2).shape)
theano.printing.debugprint(f) theano.printing.debugprint(f)
#MakeVector [@43860304] '' 2 #MakeVector [@43860304] '' 2
# |Shape_i{0} [@43424912] '' 1 # |Shape_i{0} [@43424912] '' 1
...@@ -32,15 +32,15 @@ Currently shape informations are used for 2 things in Theano: ...@@ -32,15 +32,15 @@ Currently shape informations are used for 2 things in Theano:
# |Shape_i{1} [@43797968] '' 0 # |Shape_i{1} [@43797968] '' 0
# | |x [@43423568] # | |x [@43423568]
The output of this compiled function do not contain any multiplication The output of this compiled function does not contain any multiplication
or power. Theano has removed them to compute directly the shape of the or power. Theano has removed them to compute directly the shape of the
output. output.
Shape inference problem Shape Inference Problem
======================= =======================
Theano propagates shape information in the graph. Sometimes this Theano propagates information about shape in the graph. Sometimes this
can lead to errors. For example: can lead to errors. Consider this example:
.. code-block:: python .. code-block:: python
...@@ -48,9 +48,9 @@ can lead to errors. For example: ...@@ -48,9 +48,9 @@ can lead to errors. For example:
import theano import theano
x = theano.tensor.matrix('x') x = theano.tensor.matrix('x')
y = theano.tensor.matrix('y') y = theano.tensor.matrix('y')
z = theano.tensor.join(0,x,y) z = theano.tensor.join(0, x, y)
xv = numpy.random.rand(5,4) xv = numpy.random.rand(5, 4)
yv = numpy.random.rand(3,3) yv = numpy.random.rand(3, 3)
f = theano.function([x,y], z.shape) f = theano.function([x,y], z.shape)
theano.printing.debugprint(f) theano.printing.debugprint(f)
...@@ -83,61 +83,61 @@ can lead to errors. For example: ...@@ -83,61 +83,61 @@ can lead to errors. For example:
# |y [@44540304] # |y [@44540304]
f(xv,yv) f(xv,yv)
# Raise a dimensions mismatch error. # Raises a dimensions mismatch error.
As you see, when you ask for the shape of some computation (join in the As you can see, when asking only for the shape of some computation (``join`` in the
example), we sometimes compute an inferred shape directly, without executing example), an inferred shape is computed directly, without executing
the computation itself (there is no join in the first output or debugprint). the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can hide errors. In This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of join is done on the first this example, the computation of the shape of the output of ``join`` is done only
theano variable in the join, not on the other. based on the first input Theano variable, which leads to an error.
This can probably happen with many other op as elemwise, dot, ... This might happen with other ops such as ``elemwise`` and ``dot``, for example.
Indeed, to make some optimizations (for speed or stability, for instance), Indeed, to perform some optimizations (for speed or stability, for instance),
Theano can assume that the computation is correct and consistent Theano assumes that the computation is correct and consistent
in the first place, this is the case here. in the first place, as it does here.
You can detect those problem by running the code without this You can detect those problems by running the code without this
optimization, with the Theano flag optimization, using the Theano flag
`optimizer_excluding=local_shape_to_shape_i`. You can also have the ``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
same effect by running in the mode FAST_COMPILE (it will not apply this same effect by running in the modes ``FAST_COMPILE`` (it will not apply this
optimization, nor most other optimizations) or DEBUG_MODE (it will test optimization, nor most other optimizations) or ``DebugMode`` (it will test
before and after all optimizations (much slower)). before and after all optimizations (much slower)).
Specifing exact shape Specifing Exact Shape
===================== =====================
Currently, specifying a shape is not as easy as we want. We plan some Currently, specifying a shape is not as easy and flexible as we wish and we plan some
upgrade, but this is the current state of what can be done. upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the `ConvOp` created - You can pass the shape info directly to the ``ConvOp`` created
when calling conv2d. You must add the parameter image_shape when calling ``conv2d``. You simply set the parameters ``image_shape``
and filter_shape to that call. They but most be tuple of 4 and ``filter_shape`` inside the call. They must be tuples of 4
elements. Ex: elements. For example:
.. code-block:: python .. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4)) theano.tensor.nnet.conv2d(..., image_shape=(7, 3, 5, 5), filter_shape=(2, 3, 4, 4))
- You can use the SpecifyShape op to add shape anywhere in the - You can use the ``SpecifyShape`` op to add shape information anywhere in the
graph. This allows to do some optimizations. In the following example, graph. This allows to perform some optimizations. In the following example,
this allows to precompute the Theano function to a constant. this makes it possible to precompute the Theano function to a constant.
.. code-block:: python .. code-block:: python
import theano import theano
x = theano.tensor.matrix() x = theano.tensor.matrix()
x_specify_shape = theano.tensor.specify_shape(x, (2,2)) x_specify_shape = theano.tensor.specify_shape(x, (2, 2))
f = theano.function([x], (x_specify_shape**2).shape) f = theano.function([x], (x_specify_shape ** 2).shape)
theano.printing.debugprint(f) theano.printing.debugprint(f)
# [2 2] [@72791376] # [2 2] [@72791376]
Future plans Future Plans
============ ============
- Add the parameter "constant shape" to theano.shared(). This is probably The parameter "constant shape" will be added to ``theano.shared()``. This is probably
the most frequent use case when we will use it. This will make the code the most frequent occurrence with ``shared`` variables. It will make the code
simpler and we will be able to check that the shape does not change when simpler and will make it possible to check that the shape does not change when
we update the shared variable. updating the ``shared`` variable.
...@@ -4,9 +4,6 @@ ...@@ -4,9 +4,6 @@
Sparse Sparse
====== ======
Sparse Matrices
===============
In general, *sparse* matrices provide the same functionality as regular In general, *sparse* matrices provide the same functionality as regular
matrices. The difference lies in the way the elements of *sparse* matrices are matrices. The difference lies in the way the elements of *sparse* matrices are
represented and stored in memory. Only the non-zero elements of the latter are stored. represented and stored in memory. Only the non-zero elements of the latter are stored.
......
...@@ -5,27 +5,31 @@ ...@@ -5,27 +5,31 @@
Graph Structures Graph Structures
================ ================
Theano Graphs
=============
Debugging or profiling code written in Theano is not that simple if you Debugging or profiling code written in Theano is not that simple if you
do not know what goes on under the hood. This chapter is meant to do not know what goes on under the hood. This chapter is meant to
introduce you to a required minimum of the inner workings of Theano, introduce you to a required minimum of the inner workings of Theano.
for more details see :ref:`extending`. For more detail see :ref:`extending`.
The first step in writing Theano code is to write down all mathematical The first step in writing Theano code is to write down all mathematical
relations using symbolic placeholders (**variables**). When writing down relations using symbolic placeholders (**variables**). When writing down
these expressions you use operations like ``+``, ``-``, ``**``, these expressions you use operations like ``+``, ``-``, ``**``,
``sum()``, ``tanh()``. All these are represented internally as **ops**. ``sum()``, ``tanh()``. All these are represented internally as **ops**.
An **op** represents a certain computation on some type of inputs An *op* represents a certain computation on some type of inputs
producing some type of output. You can see it as a function definition producing some type of output. You can see it as a *function definition*
in most programming languages. in most programming languages.
Theano builds internally a graph structure composed of interconnected Theano builds internally a graph structure composed of interconnected
**variable** nodes, **op** nodes and **apply** nodes. An **variable** nodes, **op** nodes and **apply** nodes. An
**apply** node represents the application of an **op** to some *apply* node represents the application of an *op* to some
**variables**. It is important to make the difference between the *variables*. It is important to draw the difference between the
definition of a computation represented by an **op** and its application definition of a computation represented by an *op* and its application
to some actual data which is represented by the **apply** node. For more to some actual data which is represented by the *apply* node. For more
details about these building blocks see :ref:`variable`, :ref:`op`, detail about these building blocks refer to :ref:`variable`, :ref:`op`,
:ref:`apply`. A graph example is the following: :ref:`apply`. Here is an example of a graph:
**Code** **Code**
...@@ -50,9 +54,9 @@ details about these building blocks see :ref:`variable`, :ref:`op`, ...@@ -50,9 +54,9 @@ details about these building blocks see :ref:`variable`, :ref:`op`,
WARNING: hyper-links and ref's seem to break the PDF build when placed WARNING: hyper-links and ref's seem to break the PDF build when placed
into this figure caption. into this figure caption.
Arrows in this :ref:`figure <tutorial-graphfigure>` represent references to the Arrows in this figure represent references to the
Python objects pointed at. The blue Python objects pointed at. The blue
box is an :ref:`apply` node. Red boxes are :ref:`variable` nodes. Green box is an :ref:`Apply` node. Red boxes are :ref:`Variable` nodes. Green
circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`. circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
...@@ -63,17 +67,17 @@ Take for example the following code: ...@@ -63,17 +67,17 @@ Take for example the following code:
.. code-block:: python .. code-block:: python
x = T.dmatrix('x') x = T.dmatrix('x')
y = x*2. y = x * 2.
If you print `type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
which is the apply node that connects the op and the inputs to get this which is the apply node that connects the op and the inputs to get this
output. You can now print the name of the op that is applied to get output. You can now print the name of the op that is applied to get
``y``: *y*:
>>> y.owner.op.name >>> y.owner.op.name
'Elemwise{mul,no_inplace}' 'Elemwise{mul,no_inplace}'
So a elementwise multiplication is used to compute ``y``. This Hence, an elementwise multiplication is used to compute *y*. This
multiplication is done between the inputs: multiplication is done between the inputs:
>>> len(y.owner.inputs) >>> len(y.owner.inputs)
...@@ -85,7 +89,7 @@ InplaceDimShuffle{x,x}.0 ...@@ -85,7 +89,7 @@ InplaceDimShuffle{x,x}.0
Note that the second input is not 2 as we would have expected. This is Note that the second input is not 2 as we would have expected. This is
because 2 was first :term:`broadcasted <broadcasting>` to a matrix of because 2 was first :term:`broadcasted <broadcasting>` to a matrix of
same shape as x. This is done by using the op ``DimShuffle`` : same shape as *x*. This is done by using the op ``DimShuffle`` :
>>> type(y.owner.inputs[1]) >>> type(y.owner.inputs[1])
<class 'theano.tensor.basic.TensorVariable'> <class 'theano.tensor.basic.TensorVariable'>
...@@ -97,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` : ...@@ -97,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` :
[2.0] [2.0]
Starting from this graph structure it is easy to understand how Starting from this graph structure it is easier to understand how
*automatic differentiation* is done, or how the symbolic relations *automatic differentiation* proceeds and how the symbolic relations
can be optimized for performance or stability. can be *optimized* for performance or stability.
Automatic Differentiation Automatic Differentiation
...@@ -107,16 +111,19 @@ Automatic Differentiation ...@@ -107,16 +111,19 @@ Automatic Differentiation
Having the graph structure, computing automatic differentiation is Having the graph structure, computing automatic differentiation is
simple. The only thing :func:`tensor.grad` has to do is to traverse the simple. The only thing :func:`tensor.grad` has to do is to traverse the
graph from the outputs back towards the inputs through all :ref:`apply` graph from the outputs back towards the inputs through all *apply*
nodes (:ref:`apply` nodes are those that define which computations the nodes (*apply* nodes are those that define which computations the
graph does). For each such :ref:`apply` node, its :ref:`op` defines graph does). For each such *apply* node, its *op* defines
how to compute the gradient of the node's outputs with respect to its how to compute the *gradient* of the node's outputs with respect to its
inputs. Note that if an :ref:`op` does not provide this information, inputs. Note that if an *op* does not provide this information,
it is assumed that the gradient is not defined. it is assumed that the *gradient* is not defined.
Using the Using the
`chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_ `chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_
these gradients can be composed in order to obtain the expression of the these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs . *gradient* of the graph's output with respect to the graph's inputs .
A following section of this tutorial will examine the topic of :ref:`differentiation<tutcomputinggrads>`
in greater detail.
Optimizations Optimizations
...@@ -124,7 +131,7 @@ Optimizations ...@@ -124,7 +131,7 @@ Optimizations
When compiling a Theano function, what you give to the When compiling a Theano function, what you give to the
:func:`theano.function <function.function>` is actually a graph :func:`theano.function <function.function>` is actually a graph
(starting from the outputs variables you can traverse the graph up to (starting from the output variables you can traverse the graph up to
the input variables). While this graph structure shows how to compute the input variables). While this graph structure shows how to compute
the output from the input, it also offers the possibility to improve the the output from the input, it also offers the possibility to improve the
way this computation is carried out. The way optimizations work in way this computation is carried out. The way optimizations work in
...@@ -135,4 +142,27 @@ identical subgraphs and ensure that the same values are not computed ...@@ -135,4 +142,27 @@ identical subgraphs and ensure that the same values are not computed
twice or reformulate parts of the graph to a GPU specific version. twice or reformulate parts of the graph to a GPU specific version.
For example, one (simple) optimization that Theano uses is to replace For example, one (simple) optimization that Theano uses is to replace
the pattern :math:`\frac{xy}{y}` by :math:`x`. the pattern :math:`\frac{xy}{y}` by *x.*
Further information regarding the optimization
:ref:`process<optimization>` and the specific :ref:`optimizations<optimizations>` that are applicable
is respectively available in the library and on the entrance page of the documentation.
**Example**
Symbolic programming involves a change of paradigm: it will become clearer
as we apply it. Consider the following example of optimization:
>>> import theano
>>> a = theano.tensor.vector("a") # declare symbolic variable
>>> b = a + a ** 10 # build symbolic expression
>>> f = theano.function([a], b) # compile function
>>> print f([0, 1, 2]) # prints `array([0,2,1026])`
====================================================== =====================================================
Unoptimized graph Optimized graph
====================================================== =====================================================
.. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
====================================================== =====================================================
差异被折叠。
差异被折叠。
差异被折叠。
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论