Merged more changes into basic tutorial

c8b10686 · Joseph Turian · 61e31ecc · c8b10686 · c8b10686 · c8b10686
--- a/doc/advanced/gradient.txt
+++ b/doc/advanced/gradient.txt
@@ -6,3 +6,7 @@ Computation of the Gradient
 ===========================
 WRITEME
+Describe what is happening in general when you compute the gradient
+Give examples with varying shapes
--- a/doc/tutorials/advanced/ex1/op.txt
+++ b/doc/tutorials/advanced/ex1/op.txt
@@ -4,7 +4,7 @@ Making arithmetic Ops on double
 ===============================
 Now that we have a ``double`` type, we have yet to use it to perform
-computations. We'll start with defining multiplication.
+computations. We'll start by defining multiplication.
 What is an Op?
@@ -16,12 +16,12 @@ function definition in most programming languages. From a list of
 input :ref:`Results <result>` and an Op, you can build an :ref:`apply`
 node representing the application of the Op to the inputs.
-It is important to understand the distinction between the definition
+It is important to understand the distinction between an Op (the
-of a function (an Op) and the application of a function (an Apply
+definition of a function) and an Apply node (the application of a
-node). If you were to interpret the Python language using Theano's
+function). If you were to interpret the Python language using Theano's
 structures, code going like ``def f(x): ...`` would produce an Op for
-``f`` whereas code like ``a = f(x)`` or ``g(f(4), 5)`` would produce
+``f`` whereas code like ``a = f(x)`` or ``g(f(4), 5)`` would produce an
-an Apply node involving the ``f`` Op.
+Apply node involving the ``f`` Op.

--- a/doc/tutorials/basic/adding.txt
+++ b/doc/tutorials/basic/adding.txt
@@ -25,10 +25,11 @@ array(28.4)
 Let's break this down into several steps. The first step is to define
-two symbols, or Results, representing the quantities that you want to
+two symbols, or Results, representing the quantities that you want
-add. Note that from now on, we will use the term :term:`Result` to
+to add. Note that from now on, we will use the term :term:`Result`
-mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result
+to mean "symbol" (in other words, ``x``, ``y``, ``z`` are all Result
-objects).
+objects). The output of the function ``f`` is a :api:`numpy.ndarray`
+with zero dimensions.
 If you are following along and typing into an interpreter, you may have
 noticed that there was a slight delay in executing the ``function``
@@ -80,7 +81,7 @@ The second step is to combine ``x`` and ``y`` into their sum ``z``:
 ``z`` is yet another :term:`Result` which represents the addition of
 ``x`` and ``y``. You can use the :api:`pp <theano.printing.pp>`
-function to print out the computation associated to ``z``.
+function to pretty-print out the computation associated to ``z``.
 >>> print pp(z)
 x + y
@@ -146,9 +147,9 @@ with numpy arrays may be found :ref:`here <predefinedtypes>`.
 .. note::
-   Watch out for the distinction between 32 and 64 bit integers (i
+   You the user---not the system architecture---choose whether your
-   prefix vs the l prefix) and between 32 and 64 bit floats (f prefix
+   program will use 32- or 64-bit integers (i prefix vs the l prefix)
-   vs the d prefix).
+   and floats (f prefix vs the d prefix).
 **Next:** `More examples`_

--- a/doc/tutorials/basic/examples.txt
+++ b/doc/tutorials/basic/examples.txt
@@ -61,7 +61,7 @@ Theano supports functions with multiple outputs. For example, we can
 compute the :term:`elementwise` difference, absolute difference, and
 squared difference between two matrices ``x`` and ``y`` at the same time:
->>> x, y = T.dmatrices('xy')
+>>> x, y = T.dmatrices('x', 'y')
 >>> diff = x - y
 >>> abs_diff = abs(diff)
 >>> diff_squared = diff**2
@@ -96,12 +96,22 @@ Here is code to compute this gradient:
 >>> x = T.dscalar('x')
 >>> y = x**2
 >>> gy = T.grad(y, x)
+>>> pp(gy)
+'fill(x ** 2, 1.0) * 2 * x ** (2 - 1)'
 >>> f = function([x], gy)
 >>> f(4)
 array(8.0)
 >>> f(94.2)
 array(188.40000000000001)
+In the example above, we can see from ``pp(gw)`` that we are computing
+the correct symbolic gradient.
+``fill(x ** 2, 1.0)`` means to make a matrix of the same shape as ``x **
+2`` and fill it with 1.0.
+.. note::
+    The optimizer will simplify the symbolic gradient expression.
 We can also compute the gradient of complex expressions such as the
 logistic function defined above. It turns out that the derivative of the
 logistic is: :math:`ds(x)/dx = s(x) \cdot (1 - s(x))`.
@@ -141,7 +151,7 @@ Let's say you want to define a function that adds two numbers, except
 that if you only provide one number, the other input is assumed to be
 one. You can do it like this:
->>> x, y = T.dscalars('xy')
+>>> x, y = T.dscalars('x', 'y')
 >>> z = x + y
 >>> f = function([x, (y, 1)], z)
 >>> f(33)
@@ -153,6 +163,26 @@ The syntax is that if one of the elements in the list of inputs is a
 pair, the input is the first element of the pair and the second
 element is its default value. Here ``y``'s default value is set to 1.
+Inputs with default values should (must?) follow inputs without default
+values.  There can be multiple inputs with default values. Defaults can
+be set positionally or by name, as in standard Python:
+>>> x, y, w = T.dscalars('x', 'y', 'w')
+>>> z = (x + y) * w
+>>> f = function([x, (y, 1), (w, 2)], z)
+>>> f(33)
+array(68.0)
+>>> f(33, 2)
+array(70.0)
+>>> f(33, 0, 1)
+array(33.0)
+>>> f(33, w=1)
+array(34.0)
+>>> f(33, w=1, y=0)
+array(33.0)
+>>> f(33, w=1, 2)
+<type 'exceptions.SyntaxError'>: non-keyword arg after keyword arg (<ipython console>, line 1)
 .. _functionstateexample:
@@ -173,13 +203,17 @@ First let's define the accumulator function:
 >>> accumulator = function([(inc, 1), ((state, new_state), 0)], new_state)
 The first argument is a pair. As we saw in the previous section, this
-means that ``inc`` is an input with a default value of 1. The
+means that ``inc`` is an input with a default value of 1. The second
-second argument has syntax that creates an internal state or
+argument has syntax that creates an internal state.  The syntax is
-closure. The syntax is ``((state_result, new_state_result),
+``((state_result, new_state_result), initial_value)``.
-initial_value)``. What this means is that every time ``accumulator``
+The internal storage associated with ``state_result`` is initialized to
-is called, the value of the internal ``state`` will be replaced
+``initial_value``.  Every time ``accumulator`` is called, the value
-by the value computed as ``new_state``. In this case, the state will
+of the internal ``state`` will be replaced by the value computed as
-be replaced by the result of incrementing it by ``inc``.
+``new_state``. In this case, the state will be replaced by the result
+of incrementing it by ``inc``.
+We recommend (insist?) that internl state arguments occur after any
+plain arguments and arguments with default values.
 There is no limit to how many states you can have. You can add an
 arbitrary number of elements to the input list which correspond to the