提交 c8b10686 authored 作者: Joseph Turian's avatar Joseph Turian

Merged more changes into basic tutorial

上级 61e31ecc
...@@ -6,3 +6,7 @@ Computation of the Gradient ...@@ -6,3 +6,7 @@ Computation of the Gradient
=========================== ===========================
WRITEME WRITEME
Describe what is happening in general when you compute the gradient
Give examples with varying shapes
...@@ -4,7 +4,7 @@ Making arithmetic Ops on double ...@@ -4,7 +4,7 @@ Making arithmetic Ops on double
=============================== ===============================
Now that we have a ``double`` type, we have yet to use it to perform Now that we have a ``double`` type, we have yet to use it to perform
computations. We'll start with defining multiplication. computations. We'll start by defining multiplication.
What is an Op? What is an Op?
...@@ -16,12 +16,12 @@ function definition in most programming languages. From a list of ...@@ -16,12 +16,12 @@ function definition in most programming languages. From a list of
input :ref:`Results <result>` and an Op, you can build an :ref:`apply` input :ref:`Results <result>` and an Op, you can build an :ref:`apply`
node representing the application of the Op to the inputs. node representing the application of the Op to the inputs.
It is important to understand the distinction between the definition It is important to understand the distinction between an Op (the
of a function (an Op) and the application of a function (an Apply definition of a function) and an Apply node (the application of a
node). If you were to interpret the Python language using Theano's function). If you were to interpret the Python language using Theano's
structures, code going like ``def f(x): ...`` would produce an Op for structures, code going like ``def f(x): ...`` would produce an Op for
``f`` whereas code like ``a = f(x)`` or ``g(f(4), 5)`` would produce ``f`` whereas code like ``a = f(x)`` or ``g(f(4), 5)`` would produce an
an Apply node involving the ``f`` Op. Apply node involving the ``f`` Op.
......
...@@ -25,10 +25,11 @@ array(28.4) ...@@ -25,10 +25,11 @@ array(28.4)
Let's break this down into several steps. The first step is to define Let's break this down into several steps. The first step is to define
two symbols, or Results, representing the quantities that you want to two symbols, or Results, representing the quantities that you want
add. Note that from now on, we will use the term :term:`Result` to to add. Note that from now on, we will use the term :term:`Result`
mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result to mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result
objects). objects). The output of the function ``f`` is a :api:`numpy.ndarray`
with zero dimensions.
If you are following along and typing into an interpreter, you may have If you are following along and typing into an interpreter, you may have
noticed that there was a slight delay in executing the ``function`` noticed that there was a slight delay in executing the ``function``
...@@ -80,7 +81,7 @@ The second step is to combine ``x`` and ``y`` into their sum ``z``: ...@@ -80,7 +81,7 @@ The second step is to combine ``x`` and ``y`` into their sum ``z``:
``z`` is yet another :term:`Result` which represents the addition of ``z`` is yet another :term:`Result` which represents the addition of
``x`` and ``y``. You can use the :api:`pp <theano.printing.pp>` ``x`` and ``y``. You can use the :api:`pp <theano.printing.pp>`
function to print out the computation associated to ``z``. function to pretty-print out the computation associated to ``z``.
>>> print pp(z) >>> print pp(z)
x + y x + y
...@@ -146,9 +147,9 @@ with numpy arrays may be found :ref:`here <predefinedtypes>`. ...@@ -146,9 +147,9 @@ with numpy arrays may be found :ref:`here <predefinedtypes>`.
.. note:: .. note::
Watch out for the distinction between 32 and 64 bit integers (i You the user---not the system architecture---choose whether your
prefix vs the l prefix) and between 32 and 64 bit floats (f prefix program will use 32- or 64-bit integers (i prefix vs the l prefix)
vs the d prefix). and floats (f prefix vs the d prefix).
**Next:** `More examples`_ **Next:** `More examples`_
......
...@@ -61,7 +61,7 @@ Theano supports functions with multiple outputs. For example, we can ...@@ -61,7 +61,7 @@ Theano supports functions with multiple outputs. For example, we can
compute the :term:`elementwise` difference, absolute difference, and compute the :term:`elementwise` difference, absolute difference, and
squared difference between two matrices ``x`` and ``y`` at the same time: squared difference between two matrices ``x`` and ``y`` at the same time:
>>> x, y = T.dmatrices('xy') >>> x, y = T.dmatrices('x', 'y')
>>> diff = x - y >>> diff = x - y
>>> abs_diff = abs(diff) >>> abs_diff = abs(diff)
>>> diff_squared = diff**2 >>> diff_squared = diff**2
...@@ -96,12 +96,22 @@ Here is code to compute this gradient: ...@@ -96,12 +96,22 @@ Here is code to compute this gradient:
>>> x = T.dscalar('x') >>> x = T.dscalar('x')
>>> y = x**2 >>> y = x**2
>>> gy = T.grad(y, x) >>> gy = T.grad(y, x)
>>> pp(gy)
'fill(x ** 2, 1.0) * 2 * x ** (2 - 1)'
>>> f = function([x], gy) >>> f = function([x], gy)
>>> f(4) >>> f(4)
array(8.0) array(8.0)
>>> f(94.2) >>> f(94.2)
array(188.40000000000001) array(188.40000000000001)
In the example above, we can see from ``pp(gw)`` that we are computing
the correct symbolic gradient.
``fill(x ** 2, 1.0)`` means to make a matrix of the same shape as ``x **
2`` and fill it with 1.0.
.. note::
The optimizer will simplify the symbolic gradient expression.
We can also compute the gradient of complex expressions such as the We can also compute the gradient of complex expressions such as the
logistic function defined above. It turns out that the derivative of the logistic function defined above. It turns out that the derivative of the
logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`. logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
...@@ -141,7 +151,7 @@ Let's say you want to define a function that adds two numbers, except ...@@ -141,7 +151,7 @@ Let's say you want to define a function that adds two numbers, except
that if you only provide one number, the other input is assumed to be that if you only provide one number, the other input is assumed to be
one. You can do it like this: one. You can do it like this:
>>> x, y = T.dscalars('xy') >>> x, y = T.dscalars('x', 'y')
>>> z = x + y >>> z = x + y
>>> f = function([x, (y, 1)], z) >>> f = function([x, (y, 1)], z)
>>> f(33) >>> f(33)
...@@ -153,6 +163,26 @@ The syntax is that if one of the elements in the list of inputs is a ...@@ -153,6 +163,26 @@ The syntax is that if one of the elements in the list of inputs is a
pair, the input is the first element of the pair and the second pair, the input is the first element of the pair and the second
element is its default value. Here ``y``'s default value is set to 1. element is its default value. Here ``y``'s default value is set to 1.
Inputs with default values should (must?) follow inputs without default
values. There can be multiple inputs with default values. Defaults can
be set positionally or by name, as in standard Python:
>>> x, y, w = T.dscalars('x', 'y', 'w')
>>> z = (x + y) * w
>>> f = function([x, (y, 1), (w, 2)], z)
>>> f(33)
array(68.0)
>>> f(33, 2)
array(70.0)
>>> f(33, 0, 1)
array(33.0)
>>> f(33, w=1)
array(34.0)
>>> f(33, w=1, y=0)
array(33.0)
>>> f(33, w=1, 2)
<type 'exceptions.SyntaxError'>: non-keyword arg after keyword arg (<ipython console>, line 1)
.. _functionstateexample: .. _functionstateexample:
...@@ -173,13 +203,17 @@ First let's define the accumulator function: ...@@ -173,13 +203,17 @@ First let's define the accumulator function:
>>> accumulator = function([(inc, 1), ((state, new_state), 0)], new_state) >>> accumulator = function([(inc, 1), ((state, new_state), 0)], new_state)
The first argument is a pair. As we saw in the previous section, this The first argument is a pair. As we saw in the previous section, this
means that ``inc`` is an input with a default value of 1. The means that ``inc`` is an input with a default value of 1. The second
second argument has syntax that creates an internal state or argument has syntax that creates an internal state. The syntax is
closure. The syntax is ``((state_result, new_state_result), ``((state_result, new_state_result), initial_value)``.
initial_value)``. What this means is that every time ``accumulator`` The internal storage associated with ``state_result`` is initialized to
is called, the value of the internal ``state`` will be replaced ``initial_value``. Every time ``accumulator`` is called, the value
by the value computed as ``new_state``. In this case, the state will of the internal ``state`` will be replaced by the value computed as
be replaced by the result of incrementing it by ``inc``. ``new_state``. In this case, the state will be replaced by the result
of incrementing it by ``inc``.
We recommend (insist?) that internl state arguments occur after any
plain arguments and arguments with default values.
There is no limit to how many states you can have. You can add an There is no limit to how many states you can have. You can add an
arbitrary number of elements to the input list which correspond to the arbitrary number of elements to the input list which correspond to the
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论