DOC fixed Python 3 compatibility issues in Tutorial and Library Reference

* use Python 2/3 compatible syntax for print; * use range instead of xrange (creating a list of 100..1000 ints in Python 2 is not a big deal)

DOC fixed Python 3 compatibility issues in Tutorial and Library Reference
1739dda0 · Mikhail Korobov · 12aa9519 · 1739dda0 · 1739dda0 · 1739dda0
--- a/doc/library/compile/io.txt
+++ b/doc/library/compile/io.txt
@@ -19,7 +19,7 @@
 Inputs
 ======

-The ``inputs`` argument to ``theano.function`` is a list, containing the ``Variable`` instances for which values will be specified at the time of the function call.  But inputs can be more than just Variables.  
+The ``inputs`` argument to ``theano.function`` is a list, containing the ``Variable`` instances for which values will be specified at the time of the function call.  But inputs can be more than just Variables.
 ``In`` instances let us attach properties to ``Variables`` to tell function more about how to use them.


@@ -50,7 +50,7 @@ The ``inputs`` argument to ``theano.function`` is a list, containing the ``Varia
      compiled function to modify the Python object being used as the
      default value. The default value is ``False``.

-      ``strict``: Bool (default: ``False`` ). ``True`` means that the value 
+      ``strict``: Bool (default: ``False`` ). ``True`` means that the value
      you pass for this input must have exactly the right type. Otherwise, it
      may be cast automatically to the proper type.

@@ -90,7 +90,7 @@ Since we provided a ``value`` for ``s`` and ``x``, we can call it with just a va

 >>> inc(5)         # update s with 10+3*5
 []
->>> print inc[s]
+>>> print(inc[s])
 25.0

 The effect of this call is to increment the storage associated to ``s`` in ``inc`` by 15.
@@ -100,18 +100,18 @@ If we pass two arguments to ``inc``, then we override the value associated to

 >>> inc(3, 4)      # update s with 25 + 3*4
 []
->>> print inc[s]
+>>> print(inc[s])
 37.0
->>> print inc[x]   # the override value of 4 was only temporary
+>>> print(inc[x])   # the override value of 4 was only temporary
 3.0

-If we pass three arguments to ``inc``, then we override the value associated 
+If we pass three arguments to ``inc``, then we override the value associated
 with ``x`` and ``u`` and ``s``.
 Since ``s``'s value is updated on every call, the old value of ``s`` will be ignored and then replaced.

 >>> inc(3, 4, 7)      # update s with 7 + 3*4
 []
->>> print inc[s]
+>>> print(inc[s])
 19.0

 We can also assign to ``inc[s]`` directly:
@@ -286,13 +286,13 @@ The ``outputs`` argument to function can be one of
 - a Variable or ``Out`` instance, or
 - a list of Variables or ``Out`` instances.

-An ``Out`` instance is a structure that lets us attach options to individual output ``Variable`` instances, 
+An ``Out`` instance is a structure that lets us attach options to individual output ``Variable`` instances,
 similarly to how ``In`` lets us attach options to individual input ``Variable`` instances.

 **Out(variable, borrow=False)** returns an ``Out`` instance:

  * ``borrow``
-  
+
    If ``True``, a reference to function's internal storage
    is OK.  A value returned for this output might be clobbered by running
    the function again, but the function might be faster.

--- a/doc/library/config.txt
+++ b/doc/library/config.txt
@@ -35,7 +35,7 @@ variables, type this from the command-line:

 .. code-block:: bash

-    python -c 'import theano; print theano.config' | less
+    python -c 'import theano; print(theano.config)' | less

 Environment Variables
 =====================
@@ -98,7 +98,7 @@ import theano and print the config variable, as in:

 .. code-block:: bash

-    python -c 'import theano; print theano.config' | less
+    python -c 'import theano; print(theano.config)' | less

 .. attribute:: device

@@ -465,7 +465,7 @@ import theano and print the config variable, as in:
    Default: 'ignore'

    If there is a CPU op in the computational graph, depending on its value;
-    this flag can either raise a warning, an exception or stop the 
+    this flag can either raise a warning, an exception or stop the
    compilation with pdb.

 .. attribute:: on_shape_error
@@ -525,7 +525,7 @@ import theano and print the config variable, as in:
    This is a Python format string that specifies the subdirectory
    of ``config.base_compiledir`` in which to store platform-dependent
    compiled modules. To see a list of all available substitution keys,
-    run ``python -c "import theano; print theano.config"``, and look
+    run ``python -c "import theano; print(theano.config)"``, and look
    for compiledir_format.

    This flag's value cannot be modified during the program execution.
@@ -871,9 +871,9 @@ import theano and print the config variable, as in:

 .. attribute:: print_test_value

-    Bool value, default: False 
+    Bool value, default: False

-    If ``'True'``, Theano will override the '__str__' method of its variables 
+    If ``'True'``, Theano will override the '__str__' method of its variables
    to also print the tag.test_value when this is available.

 .. attribute:: reoptimize_unpickled_function

--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -24,8 +24,8 @@ More precisely, if *A* is a tensor you want to compute
 .. code-block:: python

  result = 1
-  for i in xrange(k):
-    result = result * A
+  for i in range(k):
+      result = result * A

 There are three things here that we need to handle: the initial value
 assigned to ``result``, the accumulation of results in ``result``, and
@@ -57,8 +57,8 @@ The equivalent Theano code would be:
  # compiled function that returns A**k
  power = theano.function(inputs=[A,k], outputs=final_result, updates=updates)

-  print power(range(10),2)
-  print power(range(10),4)
+  print(power(range(10),2))
+  print(power(range(10),4))

 .. testoutput::

@@ -121,8 +121,8 @@ from a list of its coefficients:
    # Test
    test_coefficients = numpy.asarray([1, 0, 2], dtype=numpy.float32)
    test_value = 3
-    print calculate_polynomial(test_coefficients, test_value)
-    print 1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2)
+    print(calculate_polynomial(test_coefficients, test_value))
+    print(1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2))

 .. testoutput::

@@ -188,7 +188,7 @@ downcast** of the latter.
    some_num = 15
    print(triangular_sequence(some_num))
    print([n * (n + 1) // 2 for n in xrange(some_num)])
-    
+
 .. testoutput::

    [  0   1   3   6  10  15  21  28  36  45  55  66  78  91 105]
@@ -513,8 +513,8 @@ value ``max_value``.

    f = theano.function([max_value], values)

-    print f(45)
-    
+    print(f(45))
+
 .. testoutput::

    [  2.   4.   8.  16.  32.  64.]

--- a/doc/tutorial/adding.txt
+++ b/doc/tutorial/adding.txt
@@ -7,7 +7,7 @@ Baby Steps - Algebra
 Adding two Scalars
 ==================

-To get us started with Theano and get a feel of what we're working with, 
+To get us started with Theano and get a feel of what we're working with,
 let's make a simple function: add two numbers together. Here is how you do
 it:

@@ -28,9 +28,9 @@ True

 Let's break this down into several steps. The first step is to define
 two symbols (*Variables*) representing the quantities that you want
-to add. Note that from now on, we will use the term 
-*Variable* to mean "symbol" (in other words, 
-*x*, *y*, *z* are all *Variable* objects). The output of the function 
+to add. Note that from now on, we will use the term
+*Variable* to mean "symbol" (in other words,
+*x*, *y*, *z* are all *Variable* objects). The output of the function
 *f* is a ``numpy.ndarray`` with zero dimensions.

 If you are following along and typing into an interpreter, you may have
@@ -44,10 +44,10 @@ instruction. Behind the scene, *f* was being compiled into C code.
  using Theano. The symbolic inputs that you operate on are
  *Variables* and what you get from applying various operations to
  these inputs are also *Variables*. For example, when I type
-  
+
  >>> x = theano.tensor.ivector()
  >>> y = -x
-  
+
  *x* and *y* are both Variables, i.e. instances of the
  ``theano.gof.graph.Variable`` class. The
  type of both *x* and *y* is ``theano.tensor.ivector``.
@@ -97,7 +97,7 @@ The second step is to combine *x* and *y* into their sum *z*:
 function to pretty-print out the computation associated to *z*.

 >>> from theano import pp
->>> print pp(z)
+>>> print(pp(z))
 (x + y)



--- a/doc/tutorial/aliasing.txt
+++ b/doc/tutorial/aliasing.txt
@@ -279,22 +279,24 @@ For GPU graphs, this borrowing can have a major speed impact.  See the following
                 Out(sandbox.cuda.basic_ops.gpu_from_host(tensor.exp(x)),
                     borrow=True))
   t0 = time.time()
-   for i in xrange(iters):
+   for i in range(iters):
       r = f1()
   t1 = time.time()
   no_borrow = t1 - t0
   t0 = time.time()
-   for i in xrange(iters):
+   for i in range(iters):
       r = f2()
   t1 = time.time()
-   print 'Looping', iters, 'times took', no_borrow, 'seconds without borrow',
-   print 'and', t1 - t0, 'seconds with borrow.'
+   print(
+       "Looping %s times took %s seconds without borrow "
+       "and %s seconds with borrow" % (iters, no_borrow, (t1 - t0))
+   )
   if numpy.any([isinstance(x.op, tensor.Elemwise) and
                 ('Gpu' not in type(x.op).__name__)
                 for x in f1.maker.fgraph.toposort()]):
-       print 'Used the cpu'
+       print('Used the cpu')
   else:
-       print 'Used the gpu'
+       print('Used the gpu')

 Which produces this output:


--- a/doc/tutorial/conditions.txt
+++ b/doc/tutorial/conditions.txt
@@ -26,7 +26,7 @@ IfElse vs Switch

   a,b = T.scalars('a', 'b')
   x,y = T.matrices('x', 'y')
-  
+
   z_switch = T.switch(T.lt(a, b), T.mean(x), T.mean(y))
   z_lazy = ifelse(T.lt(a, b), T.mean(x), T.mean(y))

@@ -43,14 +43,14 @@ IfElse vs Switch
   n_times = 10

   tic = time.clock()
-   for i in xrange(n_times):
+   for i in range(n_times):
       f_switch(val1, val2, big_mat1, big_mat2)
-   print 'time spent evaluating both values %f sec' % (time.clock() - tic)
+   print('time spent evaluating both values %f sec' % (time.clock() - tic))

   tic = time.clock()
-   for i in xrange(n_times):
+   for i in range(n_times):
       f_lazyifelse(val1, val2, big_mat1, big_mat2)
-   print 'time spent evaluating one value %f sec' % (time.clock() - tic)
+   print('time spent evaluating one value %f sec' % (time.clock() - tic))

 .. testoutput::
   :hide:

--- a/doc/tutorial/debug_faq.txt
+++ b/doc/tutorial/debug_faq.txt
@@ -300,10 +300,10 @@ Tips:
 "Why does my GPU function seem to be slow?"
 -------------------------------------------

-When you compile a theano function, if you do not get the speedup that you expect over the 
+When you compile a theano function, if you do not get the speedup that you expect over the
 CPU performance of the same code. It is oftentimes due to the fact that some Ops might be running
-on CPU instead GPU. If that is the case, you can use assert_no_cpu_op to check if there 
-is a CPU Op on your computational graph. assert_no_cpu_op can take the following one of the three 
+on CPU instead GPU. If that is the case, you can use assert_no_cpu_op to check if there
+is a CPU Op on your computational graph. assert_no_cpu_op can take the following one of the three
 options:

 * ``warn``: Raise a warning
@@ -314,7 +314,7 @@ options:
 It is possible to use this mode by providing the flag in THEANO_FLAGS, such as:
 ``THEANO_FLAGS="float32,device=gpu,assert_no_cpu_op='raise'" python test.py``

-But note that this optimization will not catch all the CPU Ops, it might miss some 
+But note that this optimization will not catch all the CPU Ops, it might miss some
 Ops.

 .. _faq_monitormode:
@@ -328,13 +328,15 @@ shows how to print all inputs and outputs:

 .. testcode::

+    from __future__ import print_function
    import theano

    def inspect_inputs(i, node, fn):
-        print i, node, "input(s) value(s):", [input[0] for input in fn.inputs],
+        print(i, node, "input(s) value(s):", [input[0] for input in fn.inputs],
+              end='')

    def inspect_outputs(i, node, fn):
-        print "output(s) value(s):", [output[0] for output in fn.outputs]
+        print("output(s) value(s):", [output[0] for output in fn.outputs])

    x = theano.tensor.dscalar('x')
    f = theano.function([x], [5 * x],
@@ -376,10 +378,10 @@ can be achieved as follows:
        for output in fn.outputs:
            if (not isinstance(output[0], numpy.random.RandomState) and
                numpy.isnan(output[0]).any()):
-                print '*** NaN detected ***'
+                print('*** NaN detected ***')
                theano.printing.debugprint(node)
-                print 'Inputs : %s' % [input[0] for input in fn.inputs]
-                print 'Outputs: %s' % [output[0] for output in fn.outputs]
+                print('Inputs : %s' % [input[0] for input in fn.inputs])
+                print('Outputs: %s' % [output[0] for output in fn.outputs])
                break

    x = theano.tensor.dscalar('x')

--- a/doc/tutorial/loop.txt
+++ b/doc/tutorial/loop.txt
@@ -96,7 +96,7 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
  x_res = np.zeros((5, 2), dtype=theano.config.floatX)
  x_res[0] = np.tanh(x.dot(w) + y[0].dot(u) + p[4].dot(v))
  for i in range(1, 5):
-    x_res[i] = np.tanh(x_res[i - 1].dot(w) + y[i].dot(u) + p[4-i].dot(v))
+      x_res[i] = np.tanh(x_res[i - 1].dot(w) + y[i].dot(u) + p[4-i].dot(v))
  print(x_res)

 .. testoutput::
@@ -230,8 +230,8 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
  x_res[1] = x[1].dot(u) + x_res[0].dot(v) + np.tanh(x_res[0].dot(w) + b)
  x_res[2] = x_res[0].dot(u) + x_res[1].dot(v) + np.tanh(x_res[1].dot(w) + b)
  for i in range(2, 10):
-    x_res[i] = (x_res[i - 2].dot(u) + x_res[i - 1].dot(v) +
-                np.tanh(x_res[i - 1].dot(w) + b))
+      x_res[i] = (x_res[i - 2].dot(u) + x_res[i - 1].dot(v) +
+                  np.tanh(x_res[i - 1].dot(w) + b))
  print(x_res)

 .. testoutput::
@@ -277,7 +277,7 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
  x = np.eye(5, dtype=theano.config.floatX)[0]
  w = np.eye(5, 3, dtype=theano.config.floatX)
  w[2] = np.ones((3), dtype=theano.config.floatX)
-  print compute_jac_t(w, x)[0]
+  print(compute_jac_t(w, x)[0])

  # compare with numpy
  print(((1 - np.tanh(x.dot(w)) ** 2) * w).T)
@@ -351,7 +351,7 @@ Note that we need to iterate over the indices of ``y`` and not over the elements
           [ 0.        ,  0.        ],
           [ 0.76159416,  0.76159416]])]

-Note that if you want to use a random variable ``d`` that will not be updated through scan loops, you should pass this variable as a ``non_sequences`` arguments. 
+Note that if you want to use a random variable ``d`` that will not be updated through scan loops, you should pass this variable as a ``non_sequences`` arguments.

 **Scan Example: Computing pow(A, k)**

@@ -412,7 +412,7 @@ Note that if you want to use a random variable ``d`` that will not be updated th
                                       outputs=polynomial)

  test_coeff = numpy.asarray([1, 0, 2], dtype=numpy.float32)
-  print calculate_polynomial(test_coeff, 3)
+  print(calculate_polynomial(test_coeff, 3))

 .. testoutput::


--- a/doc/tutorial/modes.txt
+++ b/doc/tutorial/modes.txt
@@ -31,7 +31,7 @@ variables, type this from the command-line:

 .. code-block:: bash

-    python -c 'import theano; print theano.config' | less
+    python -c 'import theano; print(theano.config)' | less


 For more detail, see :ref:`Configuration <libdoc_config>` in the library.
@@ -44,7 +44,7 @@ Exercise
 Consider the logistic regression:

 .. testcode::
-    
+
    import numpy
    import theano
    import theano.tensor as T
@@ -102,23 +102,23 @@ Consider the logistic regression:
 .. testoutput::
   :hide:
   :options: +ELLIPSIS
- 
+
   Used the cpu
   target values for D
   ...
   prediction on D
   ...

-Modify and execute this example to run on CPU (the default) with floatX=float32 and 
-time the execution using the command line ``time python file.py``.  Save your code 
-as it will be useful later on. 
+Modify and execute this example to run on CPU (the default) with floatX=float32 and
+time the execution using the command line ``time python file.py``.  Save your code
+as it will be useful later on.

 .. Note::

   * Apply the Theano flag ``floatX=float32`` (through ``theano.config.floatX``) in your code.
   * Cast inputs before storing them into a shared variable.
   * Circumvent the automatic cast of *int32* with *float32* to *float64*:
-    
+
     * Insert manual cast in your code or use *[u]int{8,16}*.
     * Insert manual cast around the mean operator (this involves division by length, which is an *int64*).
     * Note that a new casting mechanism is being developed.
@@ -139,7 +139,7 @@ Theano defines the following modes by name:

 - ``'FAST_COMPILE'``: Apply just a few graph optimizations and only use Python implementations. So GPU is disabled.
 - ``'FAST_RUN'``: Apply all optimizations and use C implementations where possible.
- ``'DebugMode``: Verify the correctness of all optimizations, and compare C and Python 
+- ``'DebugMode``: Verify the correctness of all optimizations, and compare C and Python
    implementations. This mode can take much longer than the other modes, but can identify
    several kinds of problems.
 - ``'ProfileMode'`` (deprecated): Same optimization as FAST_RUN, but print some profiling information.
@@ -263,7 +263,7 @@ ProfileMode

 Besides checking for errors, another important task is to profile your
 code. For this Theano uses a special mode called ProfileMode which has
-to be passed as an argument to :func:`theano.function <function.function>`. 
+to be passed as an argument to :func:`theano.function <function.function>`.
 Using the ProfileMode is a three-step process.

 .. note::
@@ -273,7 +273,7 @@ Using the ProfileMode is a three-step process.
    process exits, it will automatically print the profiling
    information on the standard output.

-    The memory profile of the output of each ``apply`` node can be enabled with the 
+    The memory profile of the output of each ``apply`` node can be enabled with the
    Theano flag :attr:`config.ProfileMode.profile_memory`.

 For more detail, see :ref:`ProfileMode <profilemode>` in the library.

--- a/doc/tutorial/sparse.txt
+++ b/doc/tutorial/sparse.txt
@@ -5,7 +5,7 @@ Sparse
 ======

 In general, *sparse* matrices provide the same functionality as regular
-matrices. The difference lies in the way the elements of *sparse* matrices are 
+matrices. The difference lies in the way the elements of *sparse* matrices are
 represented and stored in memory. Only the non-zero elements of the latter are stored.
 This has some potential advantages: first, this
 may obviously lead to reduced memory usage and, second, clever
@@ -13,7 +13,7 @@ storage methods may lead to reduced computation time through the use of
 sparse specific algorithms. We usually refer to the generically stored matrices
 as *dense* matrices.

-Theano's sparse package provides efficient algorithms, but its use is not recommended 
+Theano's sparse package provides efficient algorithms, but its use is not recommended
 in all cases or for all matrices. As an obvious example, consider the case where
 the *sparsity proportion* if very low. The *sparsity proportion* refers to the
 ratio of the number of zero elements to the number of all elements in a matrix.
@@ -30,7 +30,7 @@ ways to represent them in memory. This is usually designated by the so-called ``
 of the matrix. Since Theano's sparse matrix package is based on the SciPy
 sparse package, complete information about sparse matrices can be found
 in the SciPy documentation. Like SciPy, Theano does not implement sparse formats for
-arrays with a number of dimensions different from two. 
+arrays with a number of dimensions different from two.

 So far, Theano implements two ``formats`` of sparse matrix: ``csc`` and ``csr``.
 Those are almost identical except that ``csc`` is based on the *columns* of the
@@ -82,14 +82,14 @@ the rows and with a matrix that have a lower number of rows, ``csr`` format is a

    If shape[0] > shape[1], use ``csr`` format. Otherwise, use ``csc``.

-Sometimes, since the sparse module is young, ops does not exist for both format. So here is 
+Sometimes, since the sparse module is young, ops does not exist for both format. So here is
 what may be the most relevent rule:

 .. note::

    Use the format compatible with the ops in your computation graph.

-The documentation about the ops and their supported format may be found in 
+The documentation about the ops and their supported format may be found in
 the :ref:`Sparse Library Reference <libdoc_sparse>`.

 Handling Sparse in Theano
@@ -123,7 +123,7 @@ an example that performs a full cycle from sparse to sparse:
 Properties and Construction
 ---------------------------

-Although sparse variables do not allow direct access to their properties, 
+Although sparse variables do not allow direct access to their properties,
 this can be accomplished using the ``csm_properties`` function. This will return
 a tuple of one-dimensional ``tensor`` variables that represents the internal characteristics
 of the sparse matrix.
@@ -138,11 +138,11 @@ a ``csr`` one.
 >>> y = sparse.CSR(data, indices, indptr, shape)
 >>> f = theano.function([x], y)
 >>> a = sp.csc_matrix(np.asarray([[0, 1, 1], [0, 0, 0], [1, 0, 0]]))
->>> print a.toarray()
+>>> print(a.toarray())
 [[0 1 1]
 [0 0 0]
 [1 0 0]]
->>> print f(a).toarray()
+>>> print(f(a).toarray())
 [[0 0 1]
 [1 0 0]
 [1 0 0]]
@@ -165,11 +165,11 @@ provide a structured gradient. More explication below.
 >>> y = sparse.structured_add(x, 2)
 >>> f = theano.function([x], y)
 >>> a = sp.csc_matrix(np.asarray([[0, 0, -1], [0, -2, 1], [3, 0, 0]], dtype='float32'))
->>> print a.toarray()
+>>> print(a.toarray())
 [[ 0.  0. -1.]
 [ 0. -2.  1.]
 [ 3.  0.  0.]]
->>> print f(a).toarray()
+>>> print(f(a).toarray())
 [[ 0.  0.  1.]
 [ 0.  0.  3.]
 [ 5.  0.  0.]]

--- a/doc/tutorial/symbolic_graphs.txt
+++ b/doc/tutorial/symbolic_graphs.txt
@@ -11,24 +11,24 @@ Theano Graphs

 Debugging or profiling code written in Theano is not that simple if you
 do not know what goes on under the hood. This chapter is meant to
-introduce you to a required minimum of the inner workings of Theano.  
+introduce you to a required minimum of the inner workings of Theano.
 For more detail see :ref:`extending`.

-The first step in writing Theano code is to write down all mathematical 
-relations using symbolic placeholders (**variables**). When writing down 
+The first step in writing Theano code is to write down all mathematical
+relations using symbolic placeholders (**variables**). When writing down
 these expressions you use operations like ``+``, ``-``, ``**``,
-``sum()``, ``tanh()``. All these are represented internally as **ops**. 
+``sum()``, ``tanh()``. All these are represented internally as **ops**.
 An *op* represents a certain computation on some type of inputs
 producing some type of output. You can see it as a *function definition*
-in most programming languages. 
+in most programming languages.

-Theano builds internally a graph structure composed of interconnected 
-**variable** nodes, **op** nodes and **apply** nodes. An 
-*apply* node represents the application of an *op* to some 
+Theano builds internally a graph structure composed of interconnected
+**variable** nodes, **op** nodes and **apply** nodes. An
+*apply* node represents the application of an *op* to some
 *variables*. It is important to draw the difference between the
 definition of a computation represented by an *op* and its application
 to some actual data which is represented by the *apply* node. For more
-detail about these building blocks refer to :ref:`variable`, :ref:`op`, 
+detail about these building blocks refer to :ref:`variable`, :ref:`op`,
 :ref:`apply`. Here is an example of a graph:


@@ -43,9 +43,9 @@ detail about these building blocks refer to :ref:`variable`, :ref:`op`,

 **Diagram**

-.. _tutorial-graphfigure: 
+.. _tutorial-graphfigure:

-.. figure:: apply.png 
+.. figure:: apply.png
    :align: center

    Interaction between instances of Apply (blue), Variable (red), Op (green),
@@ -55,7 +55,7 @@ detail about these building blocks refer to :ref:`variable`, :ref:`op`,
    WARNING: hyper-links and ref's seem to break the PDF build when placed
    into this figure caption.

-Arrows in this figure represent references to the 
+Arrows in this figure represent references to the
 Python objects pointed at. The blue
 box is an :ref:`Apply` node. Red boxes are :ref:`Variable` nodes. Green
 circles are :ref:`Ops <op>`. Purple boxes are :ref:`Types <type>`.
@@ -69,9 +69,9 @@ Take for example the following code:
 >>> x = theano.tensor.dmatrix('x')
 >>> y = x * 2.

-If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``, 
+If you enter ``type(y.owner)`` you get ``<class 'theano.gof.graph.Apply'>``,
 which is the apply node that connects the op and the inputs to get this
-output. You can now print the name of the op that is applied to get 
+output. You can now print the name of the op that is applied to get
 *y*:

 >>> y.owner.op.name
@@ -87,8 +87,8 @@ x
 >>> y.owner.inputs[1]
 DimShuffle{x,x}.0

-Note that the second input is not 2 as we would have expected. This is 
-because 2 was first :term:`broadcasted <broadcasting>` to a matrix of 
+Note that the second input is not 2 as we would have expected. This is
+because 2 was first :term:`broadcasted <broadcasting>` to a matrix of
 same shape as *x*. This is done by using the op ``DimShuffle`` :

 >>> type(y.owner.inputs[1])
@@ -101,9 +101,9 @@ same shape as *x*. This is done by using the op ``DimShuffle`` :
 [TensorConstant{2.0}]


-Starting from this graph structure it is easier to understand how 
+Starting from this graph structure it is easier to understand how
 *automatic differentiation* proceeds and how the symbolic relations
-can be *optimized* for performance or stability.  
+can be *optimized* for performance or stability.


 Automatic Differentiation
@@ -113,13 +113,13 @@ Having the graph structure, computing automatic differentiation is
 simple. The only thing :func:`tensor.grad` has to do is to traverse the
 graph from the outputs back towards the inputs through all *apply*
 nodes (*apply* nodes are those that define which computations the
-graph does). For each such *apply* node, its *op* defines 
+graph does). For each such *apply* node, its *op* defines
 how to compute the *gradient* of the node's outputs with respect to its
-inputs. Note that if an *op* does not provide this information, 
+inputs. Note that if an *op* does not provide this information,
 it is assumed that the *gradient* is not defined.
-Using the 
-`chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_ 
-these gradients can be composed in order to obtain the expression of the 
+Using the
+`chain rule <http://en.wikipedia.org/wiki/Chain_rule>`_
+these gradients can be composed in order to obtain the expression of the
 *gradient* of the graph's output with respect to the graph's inputs .

 A following section of this tutorial will examine the topic of :ref:`differentiation<tutcomputinggrads>`
@@ -133,20 +133,20 @@ When compiling a Theano function, what you give to the
 :func:`theano.function <function.function>` is actually a graph
 (starting from the output variables you can traverse the graph up to
 the input variables). While this graph structure shows how to compute
-the output from the input, it also offers the possibility to improve the  
-way this computation is carried out. The way optimizations work in 
-Theano is by identifying and replacing certain patterns in the graph 
-with other specialized patterns that produce the same results but are either 
-faster or more stable. Optimizations can also detect 
+the output from the input, it also offers the possibility to improve the
+way this computation is carried out. The way optimizations work in
+Theano is by identifying and replacing certain patterns in the graph
+with other specialized patterns that produce the same results but are either
+faster or more stable. Optimizations can also detect
 identical subgraphs and ensure that the same values are not computed
 twice or reformulate parts of the graph to a GPU specific version.

-For example, one (simple) optimization that Theano uses is to replace 
+For example, one (simple) optimization that Theano uses is to replace
 the pattern :math:`\frac{xy}{y}` by *x.*

 Further information regarding the optimization
 :ref:`process<optimization>` and the specific :ref:`optimizations<optimizations>` that are applicable
-is respectively available in the library and on the entrance page of the documentation.  
+is respectively available in the library and on the entrance page of the documentation.


 **Example**
@@ -158,7 +158,7 @@ as we apply it. Consider the following example of optimization:
 >>> a = theano.tensor.vector("a")      # declare symbolic variable
 >>> b = a + a ** 10                    # build symbolic expression
 >>> f = theano.function([a], b)        # compile function
->>> print f([0, 1, 2])                 # prints `array([0,2,1026])`
+>>> print(f([0, 1, 2]))                # prints `array([0,2,1026])`
 [    0.     2.  1026.]
 >>> theano.printing.pydotprint(b, outfile="./pics/symbolic_graph_unopt.png", var_with_name_simple=True)  # doctest: +SKIP
 The output file is available at ./pics/symbolic_graph_unopt.png

--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt
@@ -48,7 +48,7 @@ file and run it.
    f = function([], T.exp(x))
    print(f.maker.fgraph.toposort())
    t0 = time.time()
-    for i in xrange(iters):
+    for i in range(iters):
        r = f()
    t1 = time.time()
    print("Looping %d times took %f seconds" % (iters, t1 - t0))
@@ -124,7 +124,7 @@ after the ``T.exp(x)`` is replaced by a GPU version of ``exp()``.
    f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))
    print(f.maker.fgraph.toposort())
    t0 = time.time()
-    for i in xrange(iters):
+    for i in range(iters):
        r = f()
    t1 = time.time()
    print("Looping %d times took %f seconds" % (iters, t1 - t0))
@@ -229,7 +229,7 @@ Tips for Improving Performance on GPU
  enable all of them with the `nvcc.flags=--use_fast_math` Theano
  flag or you can enable them individually as in this example:
  `nvcc.flags=-ftz=true --prec-div=false`.
-* To investigate whether if all the Ops in the computational graph are running on GPU. 
+* To investigate whether if all the Ops in the computational graph are running on GPU.
  It is possible to debug or check your code by providing a value to `assert_no_cpu_op`
  flag, i.e. `warn`, for warning `raise` for raising an error or `pdb` for putting a breakpoint
  in the computational graph if there is a CPU Op.
@@ -326,7 +326,7 @@ Consider again the logistic regression:
 .. testoutput::
   :hide:
   :options: + ELLIPSIS
-   
+
   Used the cpu
   target values for D
   ...
@@ -405,7 +405,7 @@ into a file and run it.
  f = function([], tensor.exp(x))
  print(f.maker.fgraph.toposort())
  t0 = time.time()
-  for i in xrange(iters):
+  for i in range(iters):
      r = f()
  t1 = time.time()
  print("Looping %d times took %f seconds" % (iters, t1 - t0))
@@ -473,7 +473,7 @@ the GPU object directly.  The following code is modifed to do just that.
  f = function([], sandbox.gpuarray.basic_ops.gpu_from_host(tensor.exp(x)))
  print(f.maker.fgraph.toposort())
  t0 = time.time()
-  for i in xrange(iters):
+  for i in range(iters):
      r = f()
  t1 = time.time()
  print("Looping %d times took %f seconds" % (iters, t1 - t0))
@@ -495,7 +495,7 @@ The output is
 .. testoutput::
   :hide:
   :options: +ELLIPSIS, +SKIP
-   
+
   Using device cuda0: ...
   [GpuElemwise{exp,no_inplace}(<GpuArray<float64>>)]
   Looping 1000 times took ... seconds