updated basic tutorial to use shared variables

b968046f · James Bergstra · d95468cb · b968046f
--- a/doc/basic_tutorial/examples.txt
+++ b/doc/basic_tutorial/examples.txt
@@ -59,13 +59,18 @@ Computing more than one thing at the same time

 Theano supports functions with multiple outputs. For example, we can
 compute the :term:`elementwise` difference, absolute difference, and
-squared difference between two matrices ``x`` and ``y`` at the same time:
+squared difference between two matrices ``a`` and ``b`` at the same time:

->>> x, y = T.dmatrices('x', 'y')
->>> diff = x - y
+>>> a, b = T.dmatrices('a', 'b')
+>>> diff = a - b
 >>> abs_diff = abs(diff)
 >>> diff_squared = diff**2
->>> f = function([x, y], [diff, abs_diff, diff_squared])
+>>> f = function([a, b], [diff, abs_diff, diff_squared])
+
+.. note:: 
+   `dmatrices` produces as many outputs as names that you provide.  It's a
+   shortcut for allocating symbolic variables that we will often use in the
+   tutorials.

 When we use the function, it will return the three variables (the printing
 was reformatted for readability):
@@ -78,9 +83,6 @@ was reformatted for readability):
 array([[ 1.,  0.],
        [ 1.,  4.]])]

-Also note the call to ``dmatrices``. This is a shortcut, use it wisely
-;)
-

 Computing gradients
 ===================
@@ -153,40 +155,49 @@ one. You can do it like this:

 >>> x, y = T.dscalars('x', 'y')
 >>> z = x + y
->>> f = function([x, In(y, value = 1)], z)
+>>> f = function([x, Param(y, default=1)], z)
 >>> f(33)
 array(34.0)
 >>> f(33, 2)
 array(35.0)

-This makes use of the :ref:`In <function_inputs>` class which allows
-you to specify properties of your inputs with greater detail. Here we
-give a default value of 1 for ``y`` by creating an In instance with
-its value field set to 1.
+This makes use of the :ref:`Param <function_inputs>` class which allows
+you to specify properties of your function's parameters with greater detail. Here we
+give a default value of 1 for ``y`` by creating a ``Param`` instance with
+its ``default`` field set to 1.

-Inputs with default values should follow inputs without default
-values.  There can be multiple inputs with default values. Defaults can
+Inputs with default values must follow inputs without default
+values (like python's functions).  There can be multiple inputs with default values. These parameters can
 be set positionally or by name, as in standard Python:

 >>> x, y, w = T.dscalars('x', 'y', 'w')
 >>> z = (x + y) * w
->>> f = function([x, In(y, value = 1), In(w, value = 2)], z)
+>>> f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)
 >>> f(33)
 array(68.0)
 >>> f(33, 2)
 array(70.0)
 >>> f(33, 0, 1)
 array(33.0)
->>> f(33, w=1)
+>>> f(33, w_by_name=1)
 array(34.0)
->>> f(33, w=1, y=0)
+>>> f(33, w_by_name=1, y=0)
 array(33.0)

+.. note::
+   ``Param`` does not know the name of the local variables ``y`` and ``w``
+   that are passed as arguments.  The symbolic variable objects have name
+   attributes (set by ``dscalars`` in the example above) and *these* are the
+   names of the keyword parameters in the functions that we build.  This is
+   the mechanism at work in ``Param(y, default=1)``.  In the case of ``Param(w,
+   default=2, name='w_by_name')``, we override the symbolic variable's name
+   attribute with a name to be used for this function.
+

 .. _functionstateexample:

-Making a function with state
-============================
+Including values in a symbolic graph
+====================================

 It is also possible to make a function with an internal state. For
 example, let's say we want to make an accumulator: at the beginning,
@@ -194,59 +205,87 @@ the state is initialized to zero. Then, on each function call, the state
 is incremented by the function's argument. We'll also make it so that
 the increment has a default value of 1.

-First let's define the accumulator function:
-
->>> inc = T.scalar('inc')
->>> state = T.scalar('state_name')
->>> new_state = state + inc
->>> accumulator = function([In(inc, value = 1), In(state, value = 0, update = new_state)], new_state)
-
-The first argument, as seen in the previous section, defines a default
-value of 1 for ``inc``. The second argument adds another argument to
-In, ``update``, which works as follows: every time ``accumulator`` is
-called, the value of the internal ``state`` will be replaced by the
-value computed as ``new_state``. In this case, the state will be
-replaced by the result of incrementing it by ``inc``.
-
-.. We recommend (insist?) that internal state arguments occur after any plain
-   arguments and arguments with default values.
-
-There is no limit to how many states you can have and you can name
-them however you like as long as the name does not conflict with the
-names of other inputs.
-
-Anyway, let's try it out! The state can be accessed using the square
-brackets notation ``[]``. You may access the state either by using
-the :ref:`variable` representing it or the name of that
-:ref:`variable`. In our example we can access the state either with the
-``state`` object or the string 'state_name'.
-
->>> accumulator[state]
-array(0.0)
->>> accumulator['state_name']
-array(0.0)
-
-Here we use the accumulator and check that the state is correct each
-time:
-
->>> accumulator()
-array(1.0)
->>> accumulator['state_name']
-array(1.0)
+First let's define the ``accumulator`` function. It adds its argument to the
+internal state, and returns the old state value.
+
+>>> state = shared(0)
+>>> inc = T.iscalar('inc')
+>>> accumulator = function([inc], state, updates=[(state, state+inc)])
+
+This code introduces a few new concepts.  The ``shared`` function constructs
+so-called *shared variables*.  These are hybrid symbolic and non-symbolic
+variables.  Shared variables can be used in symbolic expressions just like
+the objects returned by ``dmatrices(...)`` but they also have a ``.value``
+property that defines the value taken by this symbolic variable in *all* the
+functions that use it.  It is called a *shared* variable because its value is
+shared between many functions.  We'll come back to this soon.
+
+The other new thing in this code is the ``updates`` parameter of function.
+The updates is a list of pairs of the form (shared-variable, new expression).
+It can also be a dictionary whose keys are shared-variables and values are
+the new expressions.  Either way, it means "whenever this function runs, it
+will replace the ``.value`` of each shared variable with the result of the
+corresponding expression".  Above, our accumulator replaces the ``state``'s value with the sum
+of the state and the increment amount.
+
+Anyway, let's try it out! 
+
+>>> state.value
+array(0)
+>>> accumulator(1)
+array(0)
+>>> state.value
+array(1)
 >>> accumulator(300)
-array(301.0)
->>> accumulator['state_name']
-array(301.0)
-
-It is possible to reset the state. This is done
-by assigning to the state using the square brackets
-notation:
-
->>> accumulator['state_name'] = 5
->>> accumulator(0.9)
-array(5.9000000000000004)
->>> accumulator['state_name']
-array(5.9000000000000004)
+array(1)
+>>> state.value
+array(301)
+
+It is possible to reset the state. Just assign to the ``.value`` property:
+
+>>> state.value = -1
+>>> accumulator(3)
+array(-1)
+>>> state.value
+array(2)
+
+As we mentioned above, you can define more than one function to use the same
+shared variable.  These functions can both update the value.
+
+>>> decrementor = function([inc], state, updates=[(state, state-inc)])
+>>> decrementor(2)
+array(2)
+>>> state.value
+array(0)
+
+You might be wondering why the updates mechanism exists.  You can always
+achieve a similar thing by returning the new expressions, and working with
+them in numpy as usual.  The updates mechanism can be a syntactic convenience,
+but it is mainly there for efficiency.  Updates to shared variables can
+sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
+updates).  Also, theano has more control over where and how shared variables are
+allocated, which is one of the important elements of getting good performance
+on the GPU.
+
+It may happen that you have constructed a symbolic graph on top of a
+shared variable, but you do *not* want to use its value. In this case, you can use the
+``givens`` parameter of ``function`` which replaces a particular node in a graph
+for the purpose of one particular function.
+
+>>> fn_of_state = state * 2 + inc
+>>> non_shared_state = state.type()
+>>> skip_shared = function([inc, non_shared_state], fn_of_state,
+        givens=[(state, non_shared_state)])
+>>> skip_shared(1, 3)  # we're using 3 for the state, not state.value
+array(7)
+>>> state.value        # old state still there, but we didn't use it
+array(0)  
+
+The givens parameter can be used to replace any symbolic variable, not just a
+shared variable. You can replace constants, and expressions, in general.  Be
+careful though, not to allow the expressions introduced by a givens
+substitution to be co-dependent, the order of substitution is not defined, so
+the substitutions have to work in any order.


 Mode