提交 edfd9f24 authored 作者: Eric Larsen's avatar Eric Larsen 提交者: Frederic

Correct Theano's tutorial: one more round of corrections

上级 dba02a39
......@@ -33,12 +33,12 @@ Let's break this down into several steps. The first step is to define
two symbols (*Variables*) representing the quantities that you want
to add. Note that from now on, we will use the term
*Variable* to mean "symbol" (in other words,
``x``, ``y``, ``z`` are all *Variable* objects). The output of the function
``f`` is a ``numpy.ndarray`` with zero dimensions.
*x*, *y*, *z* are all *Variable* objects). The output of the function
*f* is a ``numpy.ndarray`` with zero dimensions.
If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function``
instruction. Behind the scene, ``f`` was being compiled into C code.
instruction. Behind the scene, *f* was being compiled into C code.
.. note:
......@@ -51,9 +51,9 @@ instruction. Behind the scene, ``f`` was being compiled into C code.
>>> x = theano.tensor.ivector()
>>> y = -x
``x`` and ``y`` are both Variables, i.e. instances of the
*x* and *y* are both Variables, i.e. instances of the
``theano.gof.graph.Variable`` class. The
type of both ``x`` and ``y`` is ``theano.tensor.ivector``.
type of both *x* and *y* is ``theano.tensor.ivector``.
**Step 1**
......@@ -65,9 +65,9 @@ In Theano, all symbols must be typed. In particular, ``T.dscalar``
is the type we assign to "0-dimensional arrays (`scalar`) of doubles
(`d`)". It is a Theano :ref:`type`.
``dscalar`` is not a class. Therefore, neither ``x`` nor ``y``
``dscalar`` is not a class. Therefore, neither *x* nor *y*
are actually instances of ``dscalar``. They are instances of
:class:`TensorVariable`. ``x`` and ``y``
:class:`TensorVariable`. *x* and *y*
are, however, assigned the theano Type ``dscalar`` in their ``type``
field, as you can see here:
......@@ -91,13 +91,13 @@ could also learn more by looking into :ref:`graphstructures`.
**Step 2**
The second step is to combine ``x`` and ``y`` into their sum ``z``:
The second step is to combine *x* and *y* into their sum *z*:
>>> z = x + y
``z`` is yet another *Variable* which represents the addition of
``x`` and ``y``. You can use the :ref:`pp <libdoc_printing>`
function to pretty-print out the computation associated to ``z``.
*z* is yet another *Variable* which represents the addition of
*x* and *y*. You can use the :ref:`pp <libdoc_printing>`
function to pretty-print out the computation associated to *z*.
>>> print pp(z)
(x + y)
......@@ -105,15 +105,15 @@ function to pretty-print out the computation associated to ``z``.
**Step 3**
The last step is to create a function taking ``x`` and ``y`` as inputs
and giving ``z`` as output:
The last step is to create a function taking *x* and *y* as inputs
and giving *z* as output:
>>> f = function([x, y], z)
The first argument to :func:`function <function.function>` is a list of Variables
that will be provided as inputs to the function. The second argument
is a single Variable *or* a list of Variables. For either case, the second
argument is what we want to see as output when we apply the function. ``f`` may
argument is what we want to see as output when we apply the function. *f* may
then be used like a normal Python function.
......@@ -121,8 +121,8 @@ Adding two Matrices
===================
You might already have guessed how to do this. Indeed, the only change
from the previous example is that you need to instantiate ``x`` and
``y`` using the matrix Types:
from the previous example is that you need to instantiate *x* and
*y* using the matrix Types:
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_adding.test_adding_2
......@@ -153,12 +153,12 @@ by :ref:`broadcasting <libdoc_tensor_broadcastable>`.
The following types are available:
* **byte**: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
* **32-bit integers**: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
* **64-bit integers**: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
* **float**: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
* **double**: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
* **complex**: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4
* **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``
* **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``
* **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``
* **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``
* **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``
* **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``
The previous list is not exhaustive and a guide to all types compatible
with NumPy arrays may be found here: :ref:`tensor creation<libdoc_tensor_creation>`.
......
差异被折叠。
......@@ -8,11 +8,11 @@ IfElse vs Switch
================
- Both Ops build a condition over symbolic variables.
- ``IfElse`` takes a `boolean` condition and two variables as inputs.
- ``Switch`` takes a `tensor` as condition and two variables as inputs.
- Both ops build a condition over symbolic variables.
- ``IfElse`` takes a *boolean* condition and two variables as inputs.
- ``Switch`` takes a *tensor* as condition and two variables as inputs.
``switch`` is an elementwise operation and is thus more general than ``ifelse``.
- Whereas ``switch`` evaluates both 'output' variables, ``ifelse`` is lazy and only
- Whereas ``switch`` evaluates both *output* variables, ``ifelse`` is lazy and only
evaluates one variable with respect to the condition.
**Example**
......@@ -52,7 +52,7 @@ IfElse vs Switch
f_lazyifelse(val1, val2, big_mat1, big_mat2)
print 'time spent evaluating one value %f sec'%(time.clock()-tic)
In this example, the ``IfElse`` Op spends less time (about half as much) than ``Switch``
In this example, the ``IfElse`` op spends less time (about half as much) than ``Switch``
since it computes only one variable out of the two.
.. code-block:: python
......@@ -64,7 +64,7 @@ since it computes only one variable out of the two.
Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both
variables and take the same computation time as ``switch``. Although the linker
is not currently set by default to 'cvm', it will be in the near future.
is not currently set by default to ``cvm``, it will be in the near future.
There is no automatic optimization replacing a ``switch`` with a
broadcasted scalar to an ``ifelse``, as this is not always faster. See
......
差异被折叠。
......@@ -74,7 +74,7 @@ Computing More than one Thing at the Same Time
Theano supports functions with multiple outputs. For example, we can
compute the :ref:`elementwise <libdoc_tensor_elementwise>` difference, absolute difference, and
squared difference between two matrices ``a`` and ``b`` at the same time:
squared difference between two matrices *a* and *b* at the same time:
.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_examples.test_examples_3
......@@ -123,7 +123,7 @@ array(35.0)
This makes use of the :ref:`Param <function_inputs>` class which allows
you to specify properties of your function's parameters with greater detail. Here we
give a default value of 1 for ``y`` by creating a ``Param`` instance with
give a default value of 1 for *y* by creating a ``Param`` instance with
its ``default`` field set to 1.
Inputs with default values must follow inputs without default
......@@ -149,7 +149,7 @@ array(34.0)
array(33.0)
.. note::
``Param`` does not know the name of the local variables ``y`` and ``w``
``Param`` does not know the name of the local variables *y* and *w*
that are passed as arguments. The symbolic variable objects have name
attributes (set by ``dscalars`` in the example above) and *these* are the
names of the keyword parameters in the functions that we build. This is
......@@ -171,7 +171,7 @@ example, let's say we want to make an accumulator: at the beginning,
the state is initialized to zero. Then, on each function call, the state
is incremented by the function's argument.
First let's define the ``accumulator`` function. It adds its argument to the
First let's define the *accumulator* function. It adds its argument to the
internal state, and returns the old state value.
.. If you modify this code, also change :
......@@ -187,13 +187,13 @@ so-called :ref:`shared variables<libdoc_compile_shared>`.
These are hybrid symbolic and non-symbolic variables whose value may be shared
between multiple functions. Shared variables can be used in symbolic expressions just like
the objects returned by ``dmatrices(...)`` but they also have an internal
value, that defines the value taken by this symbolic variable in *all* the
value that defines the value taken by this symbolic variable in *all* the
functions that use it. It is called a *shared* variable because its value is
shared between many functions. The value can be accessed and modified by the
``.get_value()`` and ``.set_value()`` methods. We will come back to this soon.
The other new thing in this code is the ``updates`` parameter of function.
The updates is a list of pairs of the form (shared-variable, new expression).
The other new thing in this code is the ``updates`` parameter of ``function``.
``updates`` must be supplied with a list of pairs of the form (shared-variable, new expression).
It can also be a dictionary whose keys are shared-variables and values are
the new expressions. Either way, it means "whenever this function runs, it
will replace the ``.value`` of each shared variable with the result of the
......@@ -241,9 +241,9 @@ achieve a similar result by returning the new expressions, and working with
them in NumPy as usual. The updates mechanism can be a syntactic convenience,
but it is mainly there for efficiency. Updates to shared variables can
sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
updates). Also, theano has more control over where and how shared variables are
updates). Also, Theano has more control over where and how shared variables are
allocated, which is one of the important elements of getting good performance
on the GPU.
on the :ref:`GPU<using_gpu>`.
It may happen that you expressed some formula using a shared variable, but
you do *not* want to use its value. In this case, you can use the
......@@ -326,16 +326,16 @@ so we get different random numbers every time.
>>> f_val1 = f() #different numbers from f_val0
When we add the extra argument ``no_default_updates=True`` to
``function`` (as in ``g``), then the random number generator state is
``function`` (as in *g*), then the random number generator state is
not affected by calling the returned function. So, for example, calling
``g`` multiple times will return the same numbers.
*g* multiple times will return the same numbers.
>>> g_val0 = g() # different numbers from f_val0 and f_val1
>>> g_val1 = g() # same numbers as g_val0!
An important remark is that a random variable is drawn at most once during any
single function execution. So the ``nearly_zeros`` function is guaranteed to
return approximately 0 (except for rounding error) even though the ``rv_u``
single function execution. So the *nearly_zeros* function is guaranteed to
return approximately 0 (except for rounding error) even though the *rv_u*
random variable appears three times in the output expression.
>>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
......@@ -363,8 +363,8 @@ Sharing Streams Between Functions
---------------------------------
As usual for shared variables, the random number generators used for random
variables are common between functions. So our ``nearly_zeros`` function will
update the state of the generators used in function ``f`` above.
variables are common between functions. So our *nearly_zeros* function will
update the state of the generators used in function *f* above.
For example:
......@@ -416,8 +416,9 @@ The preceding elements are featured in this more realistic example. It will be
prediction = p_1 > 0.5 # The prediction thresholded
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01*(w**2).sum() # The cost to minimize
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost:
# we shall return to this
gw,gb = T.grad(cost, [w,b]) # Compute the gradient of the cost
# (we shall return to this in a
# following section of this tutorial)
# Compile
train = theano.function(
......
......@@ -8,12 +8,12 @@ Extending Theano
Theano Graphs
-------------
- Theano works with symbolic graphs
- Those graphs are bi-partite graphs (graph with 2 types of nodes)
- The 2 types of nodes are Apply and Variable nodes
- Each Apply node has a link to the Op that it executes
- Theano works with symbolic graphs.
- Those graphs are bi-partite graphs (graph with 2 types of nodes).
- The 2 types of nodes are Apply and Variable nodes.
- Each Apply node has a link to the op that it executes.
Inputs and Outputs are lists of Theano variables
Inputs and Outputs are lists of Theano variables.
.. image:: ../hpcs2011_tutorial/pics/apply_node.png
:width: 500 px
......@@ -93,12 +93,12 @@ The first one is :func:`make_node`. The second one
would describe the computations that are required to be done
at run time. Currently there are 2 different possibilites:
implement the :func:`perform`
and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. The ``perform`` allows
to easily wrap an existing Python function into Theano. The ``c_code``
and/or :func:`c_code <Op.c_code>` methods (and other related :ref:`c methods
<cop>`), or the :func:`make_thunk` method. ``perform`` allows
to easily wrap an existing Python function into Theano. ``c_code``
and related methods allow the op to generate C code that will be
compiled and linked by Theano. On the other hand, the ``make_thunk``
method will be called only once during compilation and should generate
compiled and linked by Theano. On the other hand, ``make_thunk``
will be called only once during compilation and should generate
a ``thunk``: a standalone function that when called will do the wanted computations.
This is useful if you want to generate code and compile it yourself. For
example, this allows you to use PyCUDA to compile GPU code.
......@@ -117,7 +117,7 @@ The :func:`grad` method is required if you want to differentiate some cost whose
includes your op.
The :func:`__str__` method is useful in order to provide a more meaningful
string representation of your Op.
string representation of your op.
The :func:`R_op` method is needed if you want ``theano.tensor.Rop`` to
work with your op.
......@@ -185,9 +185,9 @@ in a file and execute it with the ``nosetests`` program.
**Basic Tests**
Basic tests are done by you just by using the Op and checking that it
Basic tests are done by you just by using the op and checking that it
returns the right answer. If you detect an error, you must raise an
exception. You can use the `assert` keyword to automatically raise an
*exception*. You can use the ``assert`` keyword to automatically raise an
``AssertionError``.
.. code-block:: python
......@@ -211,10 +211,10 @@ exception. You can use the `assert` keyword to automatically raise an
**Testing the infer_shape**
When a class inherits from the ``InferShapeTester`` class, it gets the
``self._compile_and_check`` method that tests the Op ``infer_shape``
method. It tests that the Op gets optimized out of the graph if only
``self._compile_and_check`` method that tests the op's ``infer_shape``
method. It tests that the op gets optimized out of the graph if only
the shape of the output is needed and not the output
itself. Additionally, it checks that such an optimized graph computes
itself. Additionally, it checks that the optimized graph computes
the correct shape, by comparing it to the actual shape of the computed
output.
......@@ -222,8 +222,8 @@ output.
parameters the lists of input and output Theano variables, as would be
provided to ``theano.function``, and a list of real values to pass to the
compiled function (don't use shapes that are symmetric, e.g. (3, 3),
as they can easily to hide errors). It also takes the Op class to
verify that no Ops of that type appear in the shape-optimized graph.
as they can easily to hide errors). It also takes the op class as a parameter to
verify that no instance of it appears in the shape-optimized graph.
If there is an error, the function raises an exception. If you want to
see it fail, you can implement an incorrect ``infer_shape``.
......@@ -248,7 +248,7 @@ see it fail, you can implement an incorrect ``infer_shape``.
**Testing the gradient**
The function :ref:`verify_grad <validating_grad>`
verifies the gradient of an Op or Theano graph. It compares the
verifies the gradient of an op or Theano graph. It compares the
analytic (symbolically computed) gradient and the numeric
gradient (computed through the Finite Difference Method).
......@@ -266,13 +266,12 @@ the multiplication by 2).
.. TODO: repair defective links in the following paragraph
The class :class:`RopLop_checker`, give the functions
:func:`RopLop_checker.check_mat_rop_lop`,
:func:`RopLop_checker.check_rop_lop` and
:func:`RopLop_checker.check_nondiff_rop` that allow to test the
implementation of the Rop method of one Op.
The class :class:`RopLop_checker` defines the functions
:func:`RopLop_checker.check_mat_rop_lop`, :func:`RopLop_checker.check_rop_lop` and
:func:`RopLop_checker.check_nondiff_rop`. These allow to test the
implementation of the Rop method of a particular op.
To verify the Rop method of the DoubleOp, you can use this:
For instance, to verify the Rop method of the DoubleOp, you can use this:
.. code-block:: python
......@@ -290,7 +289,7 @@ Running your tests
You can run ``nosetests`` in the Theano folder to run all of Theano's
tests, including yours if they are somewhere in the directory
structure. You can run ``nosetests test_file.py`` to run only the
structure. For instance, you can run the following command lines to ``nosetests test_file.py`` to run only the
tests in that file. You can run ``nosetests
test_file.py:test_DoubleRop`` to run only the tests inside that test
class. You can run ``nosetests
......@@ -298,7 +297,7 @@ test_file.py:test_DoubleRop.test_double_op`` to run only one
particular test. More `nosetests
<http://readthedocs.org/docs/nose/en/latest/>`_ documentation.
You can also add this at the end of the test file:
You can also add this block the end of the test file and run the file:
.. code-block:: python
......@@ -311,14 +310,13 @@ You can also add this at the end of the test file:
**Testing GPU Ops**
Ops that execute on the GPU should inherit from the
``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
to make the distinction between both. Currently, we use this to test
if the NVIDIA driver works correctly with our sum reduction code on the
Ops to be executed on the GPU should inherit from the ``theano.sandbox.cuda.GpuOp``
and not ``theano.Op``. This allows Theano to distinguish them. Currently, we
use this to test if the NVIDIA driver works correctly with our sum reduction code on the
GPU.
A more extensive discussion than this section's may be found in the advanced
A more extensive discussion of this section's topic may be found in the advanced
tutorial :ref:`Extending Theano<extending>`
-------------------------------------------
......
......@@ -8,19 +8,17 @@ Frequently Asked Questions
TypeError: object of type 'TensorVariable' has no len()
-------------------------------------------------------
If you receive this error:
If you receive the following error, it is because the Python function *__len__* cannot
be implemented on Theano variables:
.. code-block:: python
TypeError: object of type 'TensorVariable' has no len()
We can't implement the __len__ function on Theano Variables. This is
because Python requires that this function returns an integer, but we
can't do this as we are working with symbolic variables. You can use
`var.shape[0]` as a workaround.
Python requires that *__len__* returns an integer, yet it cannot be done as Theano's symbolic variables. However, `var.shape[0]` can be used as a workaround.
Also we can't change the above error message into a more explicit one
because of some other Python internal behavior that can't be modified.
This error message cannot be made more explicit because the relevant aspects of Python's
internals cannot be modified.
Faster gcc optimization
......
......@@ -9,13 +9,13 @@ PyCUDA
Currently, PyCUDA and Theano have different objects to store GPU
data. The two implementations do not support the same set of features.
Theano's implementation is called CudaNdarray and supports
Theano's implementation is called *CudaNdarray* and supports
*strides*. It also only supports the *float32* dtype. PyCUDA's implementation
is called GPUArray and doesn't support *strides*. However, it can deal with
is called *GPUArray* and doesn't support *strides*. However, it can deal with
all NumPy and CUDA dtypes.
We are currently working on having the same base object that will
mimic Numpy. Until this is ready, here is some information on how to
We are currently working on having the same base object for both that will
also mimic Numpy. Until this is ready, here is some information on how to
use both objects in the same script.
Transfer
......@@ -24,8 +24,8 @@ Transfer
You can use the ``theano.misc.pycuda_utils`` module to convert GPUArray to and
from CudaNdarray. The functions ``to_cudandarray(x, copyif=False)`` and
``to_gpuarray(x)`` return a new object that occupies the same memory space
as the original. Otherwise it raises a ValueError. Because GPUArrays don't
support *strides*, if the CudaNdarray is strided, we could copy it to
as the original. Otherwise it raises a *ValueError*. Because GPUArrays don't
support strides, if the CudaNdarray is strided, we could copy it to
have a non-strided copy. The resulting GPUArray won't share the same
memory region. If you want this behavior, set ``copyif=True`` in
``to_gpuarray``.
......@@ -122,13 +122,15 @@ CUDAMat
There are functions for conversion between CUDAMat objects and Theano's CudaNdArray objects.
They obey the same principles as Theano's PyCUDA functions and can be found in
theano.misc.cudamat_utils.py
``theano.misc.cudamat_utils.py``.
WARNING: There is a strange problem associated with stride/shape with those converters.
In order to work, the test needs a transpose and reshape...
.. TODO: this statement is unclear:
WARNING: There is a peculiar problem associated with stride/shape with those converters.
In order to work, the test needs a *transpose* and *reshape*...
Gnumpy
======
There are conversion functions between Gnumpy ``garray`` objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in theano.misc.gnumpy_utils.py.
There are conversion functions between Gnumpy *garray* objects and Theano CudaNdArray objects.
They are also similar to Theano's PyCUDA functions and can be found in ``theano.misc.gnumpy_utils.py``.
......@@ -10,12 +10,14 @@ Computing Gradients
===================
Now let's use Theano for a slightly more sophisticated task: create a
function which computes the derivative of some expression ``y`` with
respect to its parameter ``x``. To do this we will use the macro ``T.grad``.
function which computes the derivative of some expression *y* with
respect to its parameter *x*. To do this we will use the macro ``T.grad``.
For instance, we can compute the
gradient of :math:`x^2` with respect to :math:`x`. Note that:
:math:`d(x^2)/dx = 2 \cdot x`.
.. TODO: fix the vertical positioning of the expressions in the preceding paragraph
Here is the code to compute this gradient:
.. If you modify this code, also change :
......@@ -36,7 +38,7 @@ array(188.40000000000001)
In this example, we can see from ``pp(gy)`` that we are computing
the correct symbolic gradient.
``fill((x ** 2), 1.0)`` means to make a matrix of the same shape as
``x ** 2`` and fill it with 1.0.
*x* ** *2* and fill it with *1.0*.
.. note::
The optimizer simplifies the symbolic gradient expression. You can see
......@@ -56,7 +58,7 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
.. figure:: dlogistic.png
A plot of the gradient of the logistic function, with x on the x-axis
A plot of the gradient of the logistic function, with *x* on the x-axis
and :math:`ds(x)/dx` on the y-axis.
......@@ -71,17 +73,17 @@ logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]])
In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
In general, for any **scalar** expression *s*, ``T.grad(s, w)`` provides
the Theano expression for computing :math:`\frac{\partial s}{\partial w}`. In
this way Theano can be used for doing **efficient** symbolic differentiation
(as the expression return by ``T.grad`` will be optimized during compilation), even for
(as the expression returned by ``T.grad`` will be optimized during compilation), even for
function with many inputs. (see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
of symbolic differentiation).
.. note::
The second argument of ``T.grad`` can be a list, in which case the
output is also a list. The order in both lists is important, element
output is also a list. The order in both lists is important: element
*i* of the output list is the gradient of the first argument of
``T.grad`` with respect to the *i*-th element of the list given as second argument.
The first argument of ``T.grad`` has to be a scalar (a tensor
......@@ -95,14 +97,17 @@ of symbolic differentiation).
Computing the Jacobian
======================
Theano implements :func:`theano.gradient.jacobian` macro that does all
what is needed to compute the Jacobian. The following text explains how
In Theano's parlance, the term *Jacobian* designates the tensor comprising the
first differences of the output of a function with respect to its inputs.
(This is a generalization of to the so-called Jacobian matrix in Mathematics.)
Theano implements the :func:`theano.gradient.jacobian` macro that does all
that is needed to compute the Jacobian. The following text explains how
to do it manually.
In order to manually compute the Jacobian of some function ``y`` with
respect to some parameter ``x`` we need to use ``scan``. What we
do is to loop over the entries in ``y`` and compute the gradient of
``y[i]`` with respect to ``x``.
In order to manually compute the Jacobian of some function *y* with
respect to some parameter *x* we need to use ``scan``. What we
do is to loop over the entries in *y* and compute the gradient of
*y[i]* with respect to *x*.
.. note::
......@@ -110,7 +115,7 @@ do is to loop over the entries in ``y`` and compute the gradient of
manner all kinds of recurrent equations. While creating
symbolic loops (and optimizing them for performance) is a hard task,
effort is being done for improving the performance of ``scan``. We
shall return to ``scan`` in a moment.
shall return to ``scan`` later in this tutorial.
>>> x = T.dvector('x')
>>> y = x**2
......@@ -120,31 +125,33 @@ do is to loop over the entries in ``y`` and compute the gradient of
array([[ 8., 0.],
[ 0., 8.]])
What we do in this code is to generate a sequence of ints from ``0`` to
What we do in this code is to generate a sequence of *ints* from *0* to
``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
at each step, we compute the gradient of element ``y[[i]`` with respect to
``x``. ``scan`` automatically concatenates all these rows, generating a
at each step, we compute the gradient of element *y[i]* with respect to
*x*. ``scan`` automatically concatenates all these rows, generating a
matrix which corresponds to the Jacobian.
.. note::
There are some pitfalls to be aware of regarding ``T.grad``. One of them is that you
cannot re-write the above expression of the jacobian as
cannot re-write the above expression of the Jacobian as
``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
non_sequences=x)``, even though from the documentation of scan this
seems possible. The reason is that ``y_i`` will not be a function of
``x`` anymore, while ``y[i]`` still is.
seems possible. The reason is that *y_i* will not be a function of
*x* anymore, while *y[i]* still is.
Computing the Hessian
=====================
Theano implements :func:`theano.gradient.hessian` macro that does all
In Theano, the term *Hessian* has the usual mathematical acception: It is the
matrix comprising the second order partial derivative of a function with scalar
output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all
that is needed to compute the Hessian. The following text explains how
to do it manually.
You can compute the Hessian manually similarly to the Jacobian. The only
difference is that now, instead of computing the Jacobian of some expression
``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some
*y*, we compute the Jacobian of ``T.grad(cost,x)``, where *cost* is some
scalar.
......@@ -181,12 +188,12 @@ R-operator
The *R operator* is built to evaluate the product between a Jacobian and a
vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
can be extended even for `x` being a matrix, or a tensor in general, case in
can be extended even for *x* being a matrix, or a tensor in general, case in
which also the Jacobian becomes a tensor and the product becomes some kind
of tensor product. Because in practice we end up needing to compute such
expressions in terms of weight matrices, Theano supports this more generic
form of the operation. In order to evaluate the *R-operation* of
expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
expression *y*, with respect to *x*, multiplying the Jacobian with *v*
you need to do something similar to this:
......@@ -221,19 +228,19 @@ array([[ 0., 0.],
.. note::
`v`, the point of evaluation, differs between the *L-operator* and the *R-operator*.
`v`, the *point of evaluation*, differs between the *L-operator* and the *R-operator*.
For the *L-operator*, the point of evaluation needs to have the same shape
as the output, whereas for the *R-operator* this point should
have the same shape as the input parameter. Furthermore, the results of these two
operations differ. The result of the *L-operator* is of the same shape
as the input parameter, while the result of the *R-operator* has a shape similar
to the output.
to that of the output.
Hessian times a Vector
======================
If you need to compute the Hessian times a vector, you can make use of the
If you need to compute the *Hessian times a vector*, you can make use of the
above-defined operators to do it more efficiently than actually computing
the exact Hessian and then performing the product. Due to the symmetry of the
Hessian matrix, you have two options that will
......@@ -267,7 +274,7 @@ Final Pointers
==============
* The ``grad`` function works symbolically: it receives and returns a Theano variables.
* The ``grad`` function works symbolically: it receives and returns Theano variables.
* ``grad`` can be compared to a macro since it can be applied repeatedly.
......@@ -276,5 +283,5 @@ Final Pointers
* Built-in functions allow to compute efficiently *vector times Jacobian* and *vector times Hessian*.
* Work is in progress on the optimizations required to compute efficiently the full
Jacobian and Hessian matrices and the *Jacobian times vector* expression.
Jacobian and the Hessian matrix as well as the *Jacobian times vector*.
......@@ -6,8 +6,8 @@ Loading and Saving
==================
Python's standard way of saving class instances and reloading them
is the pickle_ mechanism. Many Theano objects can be serialized (and
deserialized) by ``pickle``, however, a limitation of ``pickle`` is that
is the pickle_ mechanism. Many Theano objects can be *serialized* (and
*deserialized*) by ``pickle``, however, a limitation of ``pickle`` is that
it does not save the code or data of a class along with the instance of
the class being serialized. As a result, reloading objects created by a
previous version of a class can be really problematic.
......@@ -126,7 +126,7 @@ maybe defining the attributes you want to save, rather than the ones you
don't.
For instance, if the only parameters you want to save are a weight
matrix ``W`` and a bias ``b``, you can define:
matrix *W* and a bias *b*, you can define:
.. code-block:: python
......@@ -138,8 +138,8 @@ matrix ``W`` and a bias ``b``, you can define:
self.W = W
self.b = b
If at some point in time ``W`` is renamed to ``weights`` and ``b`` to
``bias``, the older pickled files will still be usable, if you update these
If at some point in time *W* is renamed to *weights* and *b* to
*bias*, the older pickled files will still be usable, if you update these
functions to reflect the change in name:
.. code-block:: python
......@@ -152,6 +152,6 @@ functions to reflect the change in name:
self.weights = W
self.bias = b
For more information on advanced use of pickle and its internals, see Python's
For more information on advanced use of ``pickle`` and its internals, see Python's
pickle_ documentation.
......@@ -9,10 +9,10 @@ Scan
====
- A general form of *recurrence*, which can be used for looping.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of scan.
- You 'scan' a function along some input sequence, producing an output at each time-step.
- *Reduction* and *map* (loop over the leading dimensions) are special cases of ``scan``.
- You ``scan`` a function along some input sequence, producing an output at each time-step.
- The function can see the *previous K time-steps* of your function.
- ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
- ``sum()`` could be computed by scanning the *z + x(i)* function over a list, given an initial state of *z=0*.
- Often a *for* loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
- Advantages of using ``scan`` over *for* loops:
......@@ -30,6 +30,7 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
k = T.iscalar("k"); A = T.vector("A")
......@@ -54,8 +55,10 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
theano.config.warn.subtensor_merge_bug = False
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x"); max_coefficients_supported = 10000
......
......@@ -9,14 +9,14 @@ Configuration Settings and Compiling Modes
Configuration
=============
The ``config`` module contains several ``attributes`` that modify Theano's behavior. Many of these
The ``config`` module contains several *attributes* that modify Theano's behavior. Many of these
attributes are examined during the import of the ``theano`` module and several are assumed to be
read-only.
*As a rule, the attributes in the* ``config`` *module should not be modified inside the user code.*
Theano's code comes with default values for these attributes, but you can
override them from your .theanorc file, and override those values in turn by
override them from your ``.theanorc`` file, and override those values in turn by
the :envvar:`THEANO_FLAGS` environment variable.
The order of precedence is:
......@@ -110,6 +110,8 @@ time the execution using the command line ``time python file.py``.
.. TODO: To be resolved:
.. Solution said:
.. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
.. Why the latter portion?
......@@ -119,10 +121,10 @@ time the execution using the command line ``time python file.py``.
* Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
* Cast inputs before storing them into a shared variable.
* Circumvent the automatic cast of int32 with float32 to float64:
* Circumvent the automatic cast of *int32* with *float32* to *float64*:
* Insert manual cast in your code or use [u]int{8,16}.
* Insert manual cast around the mean operator (this involves division by length, which is an int64).
* Insert manual cast in your code or use *[u]int{8,16}*.
* Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
* Notice that a new casting mechanism is being developed.
-------------------------------------------
......@@ -130,7 +132,7 @@ time the execution using the command line ``time python file.py``.
Mode
====
Everytime :func:`theano.function <function.function>` is called
Everytime :func:`theano.function <function.function>` is called,
the symbolic relationships between the input and output Theano *variables*
are optimized and compiled. The way this compilation occurs
is controlled by the value of the ``mode`` parameter.
......@@ -139,9 +141,9 @@ Theano defines the following modes by name:
- ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations.
- ``'FAST_RUN'``: Apply all optimizations, and use C implementations where possible.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes,
but can identify many kinds of problems.
- ``'DEBUG_MODE'``: Verify the correctness of all optimizations, and compare C and Python
implementations. This mode can take much longer than the other modes,but can identify
several kinds of problems.
- ``'PROFILE_MODE'``: Same optimization then FAST_RUN, put print some profiling information
The default mode is typically ``FAST_RUN``, but it can be controlled via
......@@ -152,18 +154,18 @@ which can be overridden by passing the keyword argument to
================= =============================================================== ===============================================================================
short name Full constructor What does it do?
================= =============================================================== ===============================================================================
FAST_COMPILE ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations
FAST_RUN ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations.
DEBUG_MODE ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
PROFILE_MODE ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information.
``FAST_COMPILE`` ``compile.mode.Mode(linker='py', optimizer='fast_compile')`` Python implementations only, quick and cheap graph transformations
``FAST_RUN`` ``compile.mode.Mode(linker='c|py', optimizer='fast_run')`` C implementations where available, all available graph transformations.
``DEBUG_MODE`` ``compile.debugmode.DebugMode()`` Both implementations where available, all available graph transformations.
``PROFILE_MODE`` ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information.
================= =============================================================== ===============================================================================
Linkers
=======
A mode is composed of 2 things: an optimizer and a linker. Some modes,
like PROFILE_MODE and DEBUG_MODE, add logic around the optimizer and
linker. PROFILE_MODE and DEBUG_MODE use their own linker.
like ``PROFILE_MODE`` and ``DEBUG_MODE``, add logic around the optimizer and
linker. ``PROFILE_MODE`` and ``DEBUG_MODE`` use their own linker.
You can select witch linker to use with the Theano flag :attr:`config.linker`.
Here is a table to compare the different linkers.
......@@ -184,8 +186,8 @@ DebugMode no yes VERY HIGH Make many checks on what
.. [#gc] Garbage collection of intermediate results during computation.
Otherwise, their memory space used by the ops is kept between
Theano function calls, in order not to
reallocate memory, and lower the overhead (make it faster...)
.. [#cpy1] default
reallocate memory, and lower the overhead (make it faster...).
.. [#cpy1] Default
.. [#cpy2] Deprecated
......@@ -201,10 +203,10 @@ While normally you should use the ``FAST_RUN`` or ``FAST_COMPILE`` mode,
it is useful at first (especially when you are defining new kinds of
expressions or new optimizations) to run your code using the DebugMode
(available via ``mode='DEBUG_MODE'``). The DebugMode is designed to
do several self-checks and assertations that can help to diagnose
possible programming errors that can lead to incorect output. Note that
``DEBUG_MODE`` is much slower then ``FAST_RUN`` or ``FAST_COMPILE`` so
use it only during development (not when you launch 1000 process on a
run several self-checks and assertions that can help diagnose
possible programming errors leading to incorrect output. Note that
``DEBUG_MODE`` is much slower than ``FAST_RUN`` or ``FAST_COMPILE`` so
use it only during development (not when you launch 1000 processes on a
cluster!).
......@@ -225,14 +227,16 @@ DebugMode is used as follows:
If any problem is detected, DebugMode will raise an exception according to
what went wrong, either at call time (``f(5)``) or compile time (
what went wrong, either at call time (*f(5)*) or compile time (
``f = theano.function(x, 10*x, mode='DEBUG_MODE')``). These exceptions
should *not* be ignored; talk to your local Theano guru or email the
users list if you cannot make the exception go away.
Some kinds of errors can only be detected for certain input value combinations.
In the example above, there is no way to guarantee that a future call to say,
``f(-1)`` won't cause a problem. DebugMode is not a silver bullet.
In the example above, there is no way to guarantee that a future call to, say
*f(-1)*, won't cause a problem. DebugMode is not a silver bullet.
.. TODO: repair the following link
If you instantiate DebugMode using the constructor (see :class:`DebugMode`)
rather than the keyword ``DEBUG_MODE`` you can configure its behaviour via
......@@ -277,7 +281,7 @@ implementation only, should use the gof.PerformLinker (or "py" for
short). On the other hand, a user wanting to profile his graph using C
implementations wherever possible should use the ``gof.OpWiseCLinker``
(or "c|py"). For testing the speed of your code we would recommend
using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker.
using the ``fast_run`` optimizer and the ``gof.OpWiseCLinker`` linker.
Compiling your Graph with ProfileMode
-------------------------------------
......@@ -300,7 +304,7 @@ the desired timing information, indicating where your graph is spending most
of its time. This is best shown through an example. Let's use our logistic
regression example.
Compiling the module with ProfileMode and calling ``profmode.print_summary()``
Compiling the module with ``ProfileMode`` and calling ``profmode.print_summary()``
generates the following output:
.. code-block:: python
......@@ -352,14 +356,14 @@ generates the following output:
This output has two components. In the first section called
*Apply-wise summary*, timing information is provided for the worst
offending Apply nodes. This corresponds to individual Op applications
offending ``Apply`` nodes. This corresponds to individual op applications
within your graph which took longest to execute (so if you use
``dot`` twice, you will see two entries there). In the second portion,
the *Op-wise summary*, the execution time of all Apply nodes executing
the same Op are grouped together and the total execution time per Op
the *Op-wise summary*, the execution time of all ``Apply`` nodes executing
the same op are grouped together and the total execution time per op
is shown (so if you use ``dot`` twice, you will see only one entry
there corresponding to the sum of the time spent in each of them).
Finally, notice that the ProfileMode also shows which Ops were running a C
Finally, notice that the ``ProfileMode`` also shows which ops were running a C
implementation.
......
.. _tutorial_printing_drawing:
==============================
Printing/Drawing Theano graphs
==============================
.. TODO: repair the defective links in the next paragraph
Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation. These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a *png* image of the function.
You can read about them in :ref:`libdoc_printing`.
Consider again the logistic regression but notice the additional printing instuctions.
The following output depicts the pre- and post- compilation graphs.
.. code-block:: python
import numpy
import theano
import theano.tensor as T
rng = numpy.random
N = 400
feats = 784
D = (rng.randn(N, feats).astype(theano.config.floatX),
rng.randint(size=N,low=0, high=2).astype(theano.config.floatX))
training_steps = 10000
# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats).astype(theano.config.floatX), name="w")
b = theano.shared(numpy.asarray(0., dtype=theano.config.floatX), name="b")
x.tag.test_value = D[0]
y.tag.test_value = D[1]
#print "Initial model:"
#print w.get_value(), b.get_value()
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy
cost = xent.mean() + 0.01*(w**2).sum() # The cost to optimize
gw,gb = T.grad(cost, [w,b])
# Compile expressions to functions
train = theano.function(
inputs=[x,y],
outputs=[prediction, xent],
updates={w:w-0.01*gw, b:b-0.01*gb},
name = "train")
predict = theano.function(inputs=[x], outputs=prediction,
name = "predict")
if any( [x.op.__class__.__name__=='Gemv' for x in
train.maker.fgraph.toposort()]):
print 'Used the cpu'
elif any( [x.op.__class__.__name__=='GpuGemm' for x in
train.maker.fgraph.toposort()]):
print 'Used the gpu'
else:
print 'ERROR, not able to tell if theano used the cpu or the gpu'
print train.maker.fgraph.toposort()
for i in range(training_steps):
pred, err = train(D[0], D[1])
#print "Final model:"
#print w.get_value(), b.get_value()
print "target values for D"
print D[1]
print "prediction on D"
print predict(D[0])
# Print the picture graphs
# after compilation
theano.printing.pydotprint(predict,
outfile="pics/logreg_pydotprint_predic.png",
var_with_name_simple=True)
# before compilation
theano.printing.pydotprint_variables(prediction,
outfile="pics/logreg_pydotprint_prediction.png",
var_with_name_simple=True)
theano.printing.pydotprint(train,
outfile="pics/logreg_pydotprint_train.png",
var_with_name_simple=True)
Pretty Printing
===============
``theano.printing.pprint(variable)``
>>> theano.printing.pprint(prediction) # (pre-compilation)
gt((TensorConstant{1} / (TensorConstant{1} + exp(((-(x \\dot w)) - b)))),TensorConstant{0.5})
Debug Printing
==============
``theano.printing.debugprint({fct, variable, list of variables})``
>>> theano.printing.debugprint(prediction) # (pre-compilation)
Elemwise{gt,no_inplace} [@181772236] ''
|Elemwise{true_div,no_inplace} [@181746668] ''
| |InplaceDimShuffle{x} [@181746412] ''
| | |TensorConstant{1} [@181745836]
| |Elemwise{add,no_inplace} [@181745644] ''
| | |InplaceDimShuffle{x} [@181745420] ''
| | | |TensorConstant{1} [@181744844]
| | |Elemwise{exp,no_inplace} [@181744652] ''
| | | |Elemwise{sub,no_inplace} [@181744012] ''
| | | | |Elemwise{neg,no_inplace} [@181730764] ''
| | | | | |dot [@181729676] ''
| | | | | | |x [@181563948]
| | | | | | |w [@181729964]
| | | | |InplaceDimShuffle{x} [@181743788] ''
| | | | | |b [@181730156]
|InplaceDimShuffle{x} [@181771788] ''
| |TensorConstant{0.5} [@181771148]
>>> theano.printing.debugprint(predict) # (post-compilation)
Elemwise{Composite{neg,{sub,{{scalar_sigmoid,GT},neg}}}} [@183160204] '' 2
|dot [@183018796] '' 1
| |x [@183000780]
| |w [@183000812]
|InplaceDimShuffle{x} [@183133580] '' 0
| |b [@183000876]
|TensorConstant{[ 0.5]} [@183084108]
Picture Printing
================
>>> theano.printing.pydotprint_variables(prediction) # (pre-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_prediction.png
:width: 800 px
Notice that ``pydotprint()`` requires *Graphviz* and Python's ``pydot``.
>>> theano.printing.pydotprint(predict) # (post-compilation)
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_predic.png
:width: 800 px
>>> theano.printing.pydotprint(train) # This is a small train example!
.. image:: ../hpcs2011_tutorial/pics/logreg_pydotprint_train.png
:width: 1500 px
......@@ -5,15 +5,19 @@
Some general Remarks
=====================
Theano offers quite a bit of flexibility, but has some limitations too.
How should you write your algorithm to make the most of what Theano can do?
.. TODO: This discussion is awkward. Even with this beneficial reordering (28 July 2012) its purpose and message are unclear.
Limitations
-----------
- While- or for-Loops within an expression graph are supported, but only via
Theano offers a good amount of flexibility, but has some limitations too.
How then can you write your algorithm to make the most of what Theano can do?
- *While*- or *for*-Loops within an expression graph are supported, but only via
the :func:`theano.scan` op (which puts restrictions on how the loop body can
interact with the rest of the graph).
- Neither ``goto`` nor recursion is supported or planned within expression graphs.
- Neither *goto* nor *recursion* is supported or planned within expression graphs.
......@@ -18,7 +18,7 @@ Currently, information regarding shape is used in two ways in Theano:
`Op.infer_shape <http://deeplearning.net/software/theano/extending/cop.html#Op.infer_shape>`_
method.
ex:
Example:
.. code-block:: python
......@@ -40,7 +40,7 @@ Shape Inference Problem
=======================
Theano propagates information about shape in the graph. Sometimes this
can lead to errors. For example:
can lead to errors. Consider this example:
.. code-block:: python
......@@ -90,19 +90,19 @@ example), an inferred shape is computed directly, without executing
the computation itself (there is no ``join`` in the first output or debugprint).
This makes the computation of the shape faster, but it can also hide errors. In
the example, the computation of the shape of the output of ``join`` is done only
this example, the computation of the shape of the output of ``join`` is done only
based on the first input Theano variable, which leads to an error.
This might happen with other ops such as elemwise, dot, ...
This might happen with other ops such as ``elemwise`` and ``dot``, for example.
Indeed, to perform some optimizations (for speed or stability, for instance),
Theano assumes that the computation is correct and consistent
in the first place, as it does here.
You can detect those problems by running the code without this
optimization, with the Theano flag
optimization, using the Theano flag
``optimizer_excluding=local_shape_to_shape_i``. You can also obtain the
same effect by running in the modes FAST_COMPILE (it will not apply this
optimization, nor most other optimizations) or DEBUG_MODE (it will test
same effect by running in the modes ``FAST_COMPILE`` (it will not apply this
optimization, nor most other optimizations) or ``DEBUG_MODE`` (it will test
before and after all optimizations (much slower)).
......@@ -113,15 +113,15 @@ Currently, specifying a shape is not as easy and flexible as we wish and we plan
upgrade. Here is the current state of what can be done:
- You can pass the shape info directly to the ``ConvOp`` created
when calling conv2d. You simply add the parameters image_shape
and filter_shape to the call. They must be tuples of 4
when calling ``conv2d``. You simply set the parameters ``image_shape``
and ``filter_shape`` inside the call. They must be tuples of 4
elements. For example:
.. code-block:: python
theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape information anywhere in the
- You can use the ``SpecifyShape`` op to add shape information anywhere in the
graph. This allows to perform some optimizations. In the following example,
this makes it possible to precompute the Theano function to a constant.
......@@ -138,6 +138,6 @@ Future Plans
============
The parameter "constant shape" will be added to ``theano.shared()``. This is probably
the most frequent case with ``shared variables``. This will make the code
the most frequent occurrence with ``shared`` variables. It will make the code
simpler and will make it possible to check that the shape does not change when
updating the shared variable.
updating the ``shared`` variable.
......@@ -19,7 +19,7 @@ relations using symbolic placeholders (**variables**). When writing down
these expressions you use operations like ``+``, ``-``, ``**``,
``sum()``, ``tanh()``. All these are represented internally as **ops**.
An **op** represents a certain computation on some type of inputs
producing some type of output. You can see it as a function definition
producing some type of output. You can see it as a *function definition*
in most programming languages.
Theano builds internally a graph structure composed of interconnected
......@@ -69,15 +69,15 @@ Take for example the following code:
x = T.dmatrix('x')
y = x*2.
If you print `type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
which is the apply node that connects the op and the inputs to get this
output. You can now print the name of the op that is applied to get
``y``:
*y*:
>>> y.owner.op.name
'Elemwise{mul,no_inplace}'
Hence, an elementwise multiplication is used to compute ``y``. This
Hence, an elementwise multiplication is used to compute *y*. This
multiplication is done between the inputs:
>>> len(y.owner.inputs)
......@@ -89,7 +89,7 @@ InplaceDimShuffle{x,x}.0
Note that the second input is not 2 as we would have expected. This is
because 2 was first :term:`broadcasted <broadcasting>` to a matrix of
same shape as x. This is done by using the op ``DimShuffle`` :
same shape as *x*. This is done by using the op ``DimShuffle`` :
>>> type(y.owner.inputs[1])
<class 'theano.tensor.basic.TensorVariable'>
......@@ -122,7 +122,7 @@ Using the
these gradients can be composed in order to obtain the expression of the
gradient of the graph's output with respect to the graph's inputs .
A coming section of this tutorial will address the topic of differentiation
A following section of this tutorial will examine the topic of differentiation
in greater detail.
......@@ -131,7 +131,7 @@ Optimizations
When compiling a Theano function, what you give to the
:func:`theano.function <function.function>` is actually a graph
(starting from the outputs variables you can traverse the graph up to
(starting from the output variables you can traverse the graph up to
the input variables). While this graph structure shows how to compute
the output from the input, it also offers the possibility to improve the
way this computation is carried out. The way optimizations work in
......
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论