Correct Theano's tutorial: typos and layout

c86c72f4 · Eric Larsen · Frederic · 3bffa49b · c86c72f4 · c86c72f4
--- a/doc/glossary.txt
+++ b/doc/glossary.txt
 .. _glossary:
-Glossary of terminology
+Glossary
-=======================
+========
 .. glossary::

--- a/doc/tutorial/adding.txt
+++ b/doc/tutorial/adding.txt
 .. _adding:
 ====================
-Baby steps - Algebra
+Baby Steps - Algebra
 ====================
-Adding two scalars
+Adding two Scalars
 ==================
 So, to get us started with Theano and get a feel of what we're working with, 
@@ -117,7 +117,7 @@ argument is what we want to see as output when we apply the function.
 ``f`` may then be used like a normal Python function.
-Adding two matrices
+Adding two Matrices
 ===================
 You might already have guessed how to do this. Indeed, the only change

--- a/doc/tutorial/aliasing.txt
+++ b/doc/tutorial/aliasing.txt
@@ -7,39 +7,39 @@ Understanding Memory Aliasing for Speed and Correctness
 The aggressive reuse of memory is one of the ways Theano makes code fast, and
 it's important for the correctness and speed of your program that you understand
-which buffers Theano might alias to which others.
+which buffers Theano might alias to which other.
-This file describes the principles for how Theano treats memory, and explains
+This section describes the principles based on which Theano treats memory, and explains
-when you might want to change the default behaviour of some functions and
+when you might want to alter the default behaviour of some functions and
 methods for faster performance.
-The memory model: 2 spaces
+The Memory Model: Two Spaces
-==========================
+============================
 There are some simple principles that guide Theano's treatment of memory.  The
 main idea is that there is a pool of memory managed by Theano, and Theano tracks
 changes to values in that pool.
- 1. Theano manages its own memory space, which typically does not overlap with
+* Theano manages its own memory space, which typically does not overlap with
-    the memory of normal python variables that non-Theano code creates.
+  the memory of normal Python variables that non-Theano code creates.
- 1. Theano Functions only modify buffers that are in Theano's memory space.
+* Theano functions only modify buffers that are in Theano's memory space.
- 1. Theano's memory space includes the buffers allocated to store shared
+* Theano's memory space includes the buffers allocated to store shared
-    variables and the temporaries used to evaluate Functions.
+  variables and the temporaries used to evaluate functions.
- 1. Physically, Theano's memory space may be spread across the host, a GPU
+* Physically, Theano's memory space may be spread across the host, a GPU
-    device(s), and in the future may even include objects on a remote machine.
+  device(s), and in the future may even include objects on a remote machine.
- 1. The memory allocated for a shared variable buffer is unique: it is never
+* The memory allocated for a shared variable buffer is unique: it is never
-    aliased to another shared variable.
+  aliased to another shared variable.
- 1. Theano's managed memory is constant while Theano Functions are not running
+* Theano's managed memory is constant while Theano functions are not running
-    and Theano library code is not running.
+  and Theano's library code is not running.
- 1. The default behaviour of Function is to return user-space values for
+* The default behaviour of a function is to return user-space values for
-    outputs, and to expect user-space values for inputs.
+  outputs, and to expect user-space values for inputs.
 The distinction between Theano-managed memory and user-managed memory can be
 broken down by some Theano functions (e.g. shared, get_value and the
@@ -49,9 +49,9 @@ operations) at the expense of risking subtle bugs in the overall program (by
 aliasing memory).
 The rest of this section is aimed at helping you to understand when it is safe
-to use the ``borrow=True`` argument and reap the benefit of faster code.
+to use the ``borrow=True`` argument and reap the benefits of faster code.
-Borrowing when creating shared variables
+Borrowing when Creating Shared Variables
 ========================================
 A ``borrow`` argument can be provided to the shared-variable constructor.
@@ -109,7 +109,7 @@ It is not a reliable technique to use ``borrow=True`` to modify shared variables
 by side-effect, because with some devices (e.g. GPU devices) this technique will
 not work.
-Borrowing when accessing value of shared variables
+Borrowing when Accessing Value of Shared Variables
 ==================================================
 Retrieving
@@ -139,7 +139,7 @@ The reason that ``borrow=True`` might still make a copy is that the internal
 representation of a shared variable might not be what you expect.  When you
 create a shared variable by passing a numpy array for example, then ``get_value()``
 must return a numpy array too.  That's how Theano can make the GPU use
-transparent.  But when you are using a GPU (or in future perhaps a remote machine), then the numpy.ndarray
+transparent.  But when you are using a GPU (or in the future perhaps a remote machine), then the numpy.ndarray
 is not the internal representation of your data.
 If you really want Theano to return its internal representation *and never copy it*
 then you should use the ``return_internal_type=True`` argument to
@@ -213,7 +213,7 @@ be costly.  Here are a few tips to ensure fast and efficient use of GPU memory a
 here: :ref:`libdoc_cuda_var`)
-Retrieving and assigning via the .value property
+Retrieving and Assigning via the .value Property
 ------------------------------------------------
 Shared variables have a ``.value`` property that is connected to ``get_value``
@@ -234,7 +234,7 @@ potential impact on your code, use the ``.get_value`` and ``.set_value`` methods
 directly with appropriate flags.
-Borrowing when constructing Function objects
+Borrowing when Constructing Function Objects
 ============================================
 A ``borrow`` argument can also be provided to the ``In`` and ``Out`` objects
@@ -276,6 +276,7 @@ hints that give more flexibility to the compilation and optimization of the
 graph.
 *Take home message:*
 When an input ``x`` to a function is not needed after the function returns and you
 would like to make it available to Theano as additional workspace, then consider
 marking it with ``In(x, borrow=True)``.  It may make the function faster and

--- a/doc/tutorial/conditions.txt
+++ b/doc/tutorial/conditions.txt
@@ -4,15 +4,15 @@
 Conditions
 ==========
-IfElse vs switch
+IfElse vs Switch
 ================
- Build condition over symbolic variables.
+- Both Ops build a condition over symbolic variables.
- IfElse Op takes a `boolean` condition and two variables to compute as input.
+- ``IfElse`` takes a `boolean` condition and two variables as inputs.
- Switch take a `tensor` as condition and two variables to compute as input.
+- ``Switch`` takes a `tensor` as condition and two variables as inputs.
-  - Switch is an elementwise operation. It is more general than IfElse.
+  ``switch`` is an elementwise operation and it is more general than ``ifelse``.
- While Switch Op evaluates both 'output' variables, IfElse Op is lazy and only
+- Whereas ``switch`` evaluates both 'output' variables ``ifelse`` is lazy and only
  evaluates one variable respect to the condition.
 **Example**
@@ -62,11 +62,10 @@ since it computes only one variable instead of both.
  time spent evaluating one value 0.3500 sec
-It is actually important to use  ``linker='vm'`` or ``linker='cvm'``,
+Unless ``linker='vm'`` or ``linker='cvm'`` are used, ``ifelse`` will compute both variables and take the same computation
-otherwise IfElse will compute both variables and take the same computation
+time as ``switch``. The linker is not currently set by default to 'cvm' but
-time as the Switch Op. The linker is not currently set by default to 'cvm' but
 it will be in a near future.
-There is not an optimization to automatically change a switch with a
+There is not an optimization automatically replacing a ``switch`` with a
-broadcasted scalar to an ifelse, as this is not always the faster. See
+broadcasted scalar to an ``ifelse``, as this is not always faster. See
 this `ticket <http://www.assembla.com/spaces/theano/tickets/764>`_.
--- a/doc/tutorial/debug_faq.txt
+++ b/doc/tutorial/debug_faq.txt
@@ -6,15 +6,16 @@ Debugging Theano: FAQ and Troubleshooting
 =========================================
 There are many kinds of bugs that might come up in a computer program.
-This page is structured as an FAQ.  It should provide recipes to tackle common
+This page is structured as a FAQ.  It should provide recipes to tackle common
 problems, and introduce some of the tools that we use to find problems in our
 Theano code, and even (it happens) in Theano's internals, such as
 :ref:`using_debugmode`.
-Isolating the problem/Testing Theano compiler
+Isolating the Problem/Testing Theano Compiler
 ---------------------------------------------
-You can run your Theano function in a DebugMode(:ref:`using_debugmode`). This test the Theano optimizations and help to find where NaN, inf and other problem come from.
+You can run your Theano function in a DebugMode(:ref:`using_debugmode`). 
+This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.
 Using Test Values
@@ -102,7 +103,7 @@ can get Theano to give us the exact source of the error.
    # provide Theano with a default test-value
    x.tag.test_value = numpy.random.rand(5,10)
-In the above, we're tagging the symbolic matrix ``x`` with a special test
+In the above, we are tagging the symbolic matrix ``x`` with a special test
 value. This allows Theano to evaluate symbolic expressions on-the-fly (by
 calling the ``perform`` method of each Op), as they are being defined. Sources
 of error can thus be identified with much more precision and much earlier in
@@ -122,8 +123,8 @@ following error message, which properly identifies line 23 as the culprit.
 The compute_test_value mechanism works as follows:
-* Theano Constants and SharedVariable are used as is. No need to instrument them.
+* Theano ``constants`` and ``shared variables`` are used as is. No need to instrument them.
-* A Theano ``Variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be
+* A Theano ``variable`` (i.e. ``dmatrix``, ``vector``, etc.) should be
  given a special test value through the attribute ``tag.test_value``.
 * Theano automatically instruments intermediate results. As such, any quantity
  derived from ``x`` will be given a `tag.test_value` automatically.
@@ -139,11 +140,11 @@ The compute_test_value mechanism works as follows:
  variable is missing a test value.
 .. note::
-  This feature is currently not compatible with ``Scan`` and also with Ops
+  This feature is currently incompatible with ``Scan`` and also with Ops
  which do not implement a ``perform`` method.
-How do I print an intermediate value in a Function/Method?
+How do I Print an Intermediate Value in a Function/Method?
 ----------------------------------------------------------
 Theano provides a 'Print' Op to do this.
@@ -177,7 +178,7 @@ precise inspection of what's being computed where, when, and how, see the
    to remove them to know if this is the cause or not.
-How do I print a graph (before or after compilation)?
+How do I Print a Graph (before or after compilation)?
 ----------------------------------------------------------
 Theano provides two functions (:func:`theano.pp` and
@@ -190,7 +191,7 @@ You can read about them in :ref:`libdoc_printing`.
-The function I compiled is too slow, what's up?
+The Function I Compiled is Too Slow, what's up?
 -----------------------------------------------
 First, make sure you're running in FAST_RUN mode.  
 FAST_RUN is the default mode, but make sure by passing ``mode='FAST_RUN'``
@@ -207,10 +208,10 @@ Tips:
 .. _faq_wraplinker:
-How do I step through a compiled function with the WrapLinker?
+How do I Step through a Compiled Function with the WrapLinker?
 --------------------------------------------------------------
-This is not exactly an FAQ, but the doc is here for now...
+This is not exactly a FAQ, but the doc is here for now...
 It's pretty easy to roll-your-own evaluation mode.
 Check out this one:
@@ -248,7 +249,7 @@ Use your imagination :)
 This can be a really powerful debugging tool.
 Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all!
-How to use pdb ?
+How to Use pdb ?
 ----------------
 In the majority of cases, you won't be executing from the interactive shell
@@ -294,7 +295,7 @@ The call stack contains a few useful informations to trace back the source
 of the error. There's the script where the compiled function was called --
 but if you're using (improperly parameterized) prebuilt modules, the error
 might originate from ops in these modules, not this script. The last line
-tells us about the Op that caused the exception. In thise case it's a "mul"
+tells us about the Op that caused the exception. In this case it's a "mul"
 involving Variables name "a" and "b". But suppose we instead had an
 intermediate result to which we hadn't given a name.

--- a/doc/tutorial/examples.txt
+++ b/doc/tutorial/examples.txt
@@ -2,11 +2,11 @@
 .. _basictutexamples:
 =============
-More examples
+More Examples
 =============
-Logistic function
+Logistic Function
 =================
 Here's another straightforward example, though a bit more elaborate
@@ -61,7 +61,7 @@ array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])
-Computing more than one thing at the same time
+Computing More than one Thing at the Same Time
 ==============================================
 Theano supports functions with multiple outputs. For example, we can
@@ -94,7 +94,7 @@ was reformatted for readability):
        [ 1.,  4.]])]
-Setting a default value for an argument
+Setting a Default Value for an Argument
 =======================================
 Let's say you want to define a function that adds two numbers, except
@@ -152,7 +152,7 @@ array(33.0)
 .. _functionstateexample:
-Using shared variables
+Using Shared Variables
 ======================
 It is also possible to make a function with an internal state. For
@@ -227,7 +227,7 @@ array(0)
 You might be wondering why the updates mechanism exists.  You can always
 achieve a similar thing by returning the new expressions, and working with
-them in numpy as usual.  The updates mechanism can be a syntactic convenience,
+them in NumPy as usual.  The updates mechanism can be a syntactic convenience,
 but it is mainly there for efficiency.  Updates to shared variables can
 sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix
 updates).  Also, theano has more control over where and how shared variables are
@@ -252,15 +252,15 @@ array(7)
 >>> state.get_value()  # old state still there, but we didn't use it
 array(0)
-The givens parameter can be used to replace any symbolic variable, not just a
+The ``givens`` parameter can be used to replace any symbolic variable, not just a
 shared variable. You can replace constants, and expressions, in general.  Be
-careful though, not to allow the expressions introduced by a givens
+careful though, not to allow the expressions introduced by a ``givens``
 substitution to be co-dependent, the order of substitution is not defined, so
 the substitutions have to work in any order.
 In practice, a good way of thinking about the ``givens`` is as a mechanism
 that allows you to replace any part of your formula with a different
-expression that evaluates to a tensor of same shape and dtype. ``givens``
+expression that evaluates to a tensor of same shape and dtype.
 .. _using_random_numbers:
@@ -270,17 +270,17 @@ Using Random Numbers
 Because in Theano you first express everything symbolically and
 afterwards compile this expression to get functions,
 using pseudo-random numbers is not as straightforward as it is in
-numpy, though also not too complicated.
+NumPy, though also not too complicated.
 The way to think about putting randomness into Theano's computations is
-to put random variables in your graph. Theano will allocate a numpy
+to put random variables in your graph. Theano will allocate a NumPy
 RandomStream object (a random number generator) for each such
 variable, and draw from it as necessary. We will call this sort of
 sequence of random numbers a *random stream*. *Random streams* are at
 their core shared variables, so the observations on shared variables
 hold here as well.
-Brief example
+Brief Example
 -------------
 Here's a brief example.  The setup code is:
@@ -325,8 +325,8 @@ random variable appears three times in the output expression.
 >>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
-Seedings Streams
+Seeding Streams
----------------
+---------------
 Random variables can be seeded individually or collectively.
@@ -344,7 +344,7 @@ of the random variables.
 >>> srng.seed(902340)  # seeds rv_u and rv_n with different seeds each
-Sharing Streams between Functions
+Sharing Streams Between Functions
 ---------------------------------
 As usual for shared variables, the random number generators used for random
@@ -362,7 +362,7 @@ For example:
 >>> v2 = f()             # v2 != v1
-Others Random Distributions
+Other Random Distributions
 ---------------------------
 There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`. 
@@ -371,7 +371,7 @@ There are :ref:`other distributions implemented <libdoc_tensor_raw_random>`.
 .. _logistic_regression:
-A Real example: Logistic Regression
+A Real Example: Logistic Regression
 ===================================
 The preceding elements are put to work in this more realistic example. It will be used repeatedly.  

--- a/doc/tutorial/extending_theano.txt
+++ b/doc/tutorial/extending_theano.txt
@@ -5,7 +5,7 @@
 Extending Theano
 ****************
-Theano graphs
+Theano Graphs
 -------------
 - Theano works with symbolic graphs
@@ -40,7 +40,6 @@ Inputs and Outputs are lists of Theano variables
   See :ref:`dev_start_guide` for information about git, github, the
   development workflow and how to make a quality contribution.
-Op contract
 -----------
@@ -96,13 +95,13 @@ at run time. Currently there are 2 different possibilites:
 implement the :func:`perform`
 and/or :func:`c_code <Op.c_code>` (and other related :ref:`c methods
 <cop>`), or the :func:`make_thunk` method. The ``perform`` allows
-to easily wrap an existing python function into Theano. The ``c_code``
+to easily wrap an existing Python function into Theano. The ``c_code``
-and related methods allow the op to generate c code that will be 
+and related methods allow the op to generate C code that will be 
 compiled and linked by Theano. On the other hand, the ``make_thunk``
 method will be called only once during compilation and should generate
 a ``thunk``: a standalone function that when called will do the wanted computations.
 This is useful if you want to generate code and compile it yourself. For
-example, this allows you to use PyCUDA to compile gpu code.
+example, this allows you to use PyCUDA to compile GPU code.
 Also there are 2 methods that are highly recommended to be implemented. They are
 needed in order to merge duplicate computations involving your op. So if you
@@ -110,7 +109,7 @@ do not want Theano to execute your op multiple times with the same inputs,
 do implement them. Those methods are :func:`__eq__` and
 :func:`__hash__`.
-The :func:`infer_shape` method allows to infer shape of some variable, somewhere in the
+The :func:`infer_shape` method allows to infer the shape of some variable, somewhere in the
 middle of the computational graph without actually computing the outputs (when possible).
 This could be helpful if one only needs the shape of the output instead of the actual outputs.
@@ -123,7 +122,7 @@ string representation of your Op.
 The :func:`R_op` method is needed if you want `theano.tensor.Rop` to
 work with your op.
-Op example
+Op Example
 ----------
 .. code-block:: python
@@ -164,7 +163,7 @@ Op example
                return eval_points
            return self.grad(inputs, eval_points)
-Try it!
+Try it!:
 .. code-block:: python
@@ -177,15 +176,14 @@ Try it!
    print inp
    print out
-How to test it
+How To Test it
 --------------
 Theano has some functions to simplify testing. These help test the
 ``infer_shape``, ``grad`` and ``R_op`` methods. Put the following code
 in a file and execute it with the ``nosetests`` program.
-Basic tests
+**Basic Tests**
-===========
 Basic tests are done by you just by using the Op and checking that it
 returns the right answer. If you detect an error, you must raise an
@@ -210,8 +208,7 @@ exception. You can use the `assert` keyword to automatically raise an
            # Compare the result computed to the expected value.
            assert numpy.allclose(inp * 2, out)
-Testing the infer_shape
+**Testing the infer_shape**
-=======================
 When a class inherits from the ``InferShapeTester`` class, it gets the
 `self._compile_and_check` method that tests the Op ``infer_shape``
@@ -248,8 +245,7 @@ see it fail, you can implement an incorrect ``infer_shape``.
                                    # Op that should be removed from the graph.
                                    self.op_class)
-Testing the gradient
+**Testing the gradient**
-====================
 The function :ref:`verify_grad <validating_grad>`
 verifies the gradient of an Op or Theano graph. It compares the
@@ -266,8 +262,7 @@ the multiplication by 2).
            theano.tests.unittest_tools.verify_grad(self.op,
                                                    [numpy.random.rand(5, 7, 2)])
-Testing the Rop
+**Testing the Rop**
-===============
 The class :class:`RopLop_checker`, give the functions
 :func:`RopLop_checker.check_mat_rop_lop`,
@@ -310,16 +305,28 @@ You can also add this at the end of the test file:
       t.setUp()
       t.test_double_rop()
+**Testing GPU Ops**
+Ops that execute on the GPU should inherit from the
+``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
+to make the distinction between both. Currently, we use this to test
+if the NVIDIA driver works correctly with our sum reduction code on the
+GPU.
 -------------------------------------------
 **Exercise**
- Run the code in the file double_op.py.
+Run the code in the file double_op.py.
- Modify and execute to compute: x * y
- Modify and execute the example to return 2 outputs: x + y and x - y
+Modify and execute to compute: x * y.
-  - Our current element-wise fusion generates computation with only 1 output.
+Modify and execute the example to return 2 outputs: x + y and x - y
+(our current element-wise fusion generates computation with only 1 output).
 SciPy
 -----
@@ -363,14 +370,8 @@ don't forget to call the parent ``setUp`` function.
 For more details see :ref:`random_value_in_tests`.
-GPU Op
------
-Op that execute on the GPU should inherit from the
-``theano.sandbox.cuda.GpuOp`` and not ``theano.Op``. This allows Theano
-to make the distinction between both. Currently, we use this to test
-if the NVIDIA driver works correctly with our sum reduction code on the
-gpu.
 Documentation

--- a/doc/tutorial/gpu_data_convert.txt
+++ b/doc/tutorial/gpu_data_convert.txt
 .. _gpu_data_convert:
 ===================================
-PyCUDA/CUDAMat/Gnumpy compatibility
+PyCUDA/CUDAMat/gnumpy compatibility
 ===================================
 PyCUDA
 ======
-Currently PyCUDA and Theano have different objects to store GPU
+Currently, PyCUDA and Theano have different objects to store GPU
 data. The two implementations do not support the same set of features.
 Theano's implementation is called CudaNdarray and supports
-strides. It supports only the float32 dtype. PyCUDA's implementation
+*strides*. It also only supports the float32 dtype. PyCUDA's implementation
-is called GPUArray and doesn't support strides. However, it can deal with all Numpy and Cuda dtypes.
+is called GPUArray and doesn't support *strides*. However, it can deal with
+all NumPy and CUDA dtypes.
 We are currently working on having the same base object that will
 mimic Numpy. Until this is ready, here is some information on how to
@@ -23,8 +24,8 @@ Transfer
 You can use the `theano.misc.pycuda_utils` module to convert GPUArray to and
 from CudaNdarray. The functions `to_cudandarray(x, copyif=False)` and
 `to_gpuarray(x)` return a new object that occupies the same memory space
-as the original. Otherwise it raises a ValueError. Because GPUArray don't
+as the original. Otherwise it raises a ValueError. Because GPUArrays don't
-support strides, if the CudaNdarray is strided, we could copy it to
+support *strides*, if the CudaNdarray is strided, we could copy it to
 have a non-strided copy. The resulting GPUArray won't share the same
 memory region. If you want this behavior, set `copyif=True` in
 `to_gpuarray`.
@@ -33,7 +34,7 @@ Compiling with PyCUDA
 ---------------------
 You can use PyCUDA to compile CUDA functions that work directly on
-CudaNdarray. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py`
+CudaNdarrays. Here is an example from the file `theano/misc/tests/test_pycuda_theano_simple.py`:
 .. code-block:: python
@@ -75,7 +76,7 @@ CudaNdarray. Here is an example from the file `theano/misc/tests/test_pycuda_the
 Theano op using PyCUDA function
 -------------------------------
-You can use gpu function compiled with PyCUDA in a Theano op. Here is an example..
+You can use a GPU function compiled with PyCUDA in a Theano op. Here is an example:
 .. code-block:: python
@@ -119,15 +120,15 @@ You can use gpu function compiled with PyCUDA in a Theano op. Here is an example
 CUDAMat
 =======
-There are functions for conversion between CUDAMat and Theano CudaNdArray objects. 
+There are functions for conversion between CUDAMats and Theano CudaNdArray objects. 
 They obey the same principles as PyCUDA's functions and can be found in
 theano.misc.cudamat_utils.py
 WARNING: There is a strange problem associated with stride/shape with those converters. 
 To work, the test needs a transpose and reshape...
-Gnumpy
+gnumpy
 ======
-There are conversion functions between gnumpy garray object and Theano CudaNdArray. 
+There are conversion functions between gnumpy garray objects and Theano CudaNdArrays. 
 They are also similar to PyCUDA's and can be found in theano.misc.gnumpy_utils.py
--- a/doc/tutorial/gradients.txt
+++ b/doc/tutorial/gradients.txt
@@ -6,7 +6,7 @@
 Derivatives in Theano
 =====================
-Computing gradients
+Computing Gradients
 ===================
 Now let's use Theano for a slightly more sophisticated task: create a
@@ -16,7 +16,7 @@ For instance, we can compute the
 gradient of :math:`x^2` with respect to :math:`x`. Note that:
 :math:`d(x^2)/dx = 2 \cdot x`.
-Here is code to compute this gradient:
+Here is the code to compute this gradient:
 .. If you modify this code, also change :
 .. theano/tests/test_tutorial.py:T_examples.test_examples_4
@@ -74,15 +74,14 @@ array([[ 0.25      ,  0.19661193],
 In general, for any **scalar** expression ``s``, ``T.grad(s, w)`` provides
 the theano expression for computing :math:`\frac{\partial s}{\partial w}`. In 
 this way Theano can be used for doing **efficient** symbolic differentiation
-(as
+(as the expression return by ``T.grad`` will be optimized during compilation), even for
-the expression return by ``T.grad`` will be optimized during compilation) even for
 function with many inputs. ( see `automatic differentiation <http://en.wikipedia.org/wiki/Automatic_differentiation>`_ for a description
 of symbolic differentiation).
 .. note::
   The second argument of ``T.grad`` can be a list, in which case the
-   output is also a list. The order in both list is important, element
+   output is also a list. The order in both lists is important, element
   *i* of the output list is the gradient of the first argument of
   ``T.grad`` with respect to the *i*-th element of the list given as second argument.
   The first argument of ``T.grad`` has to be a scalar (a tensor
@@ -90,7 +89,6 @@ of symbolic differentiation).
   ``T.grad`` and details about the implementation, see :ref:`this <libdoc_gradient>`.
 Computing the Jacobian
 ======================
@@ -105,10 +103,10 @@ do is to loop over the entries in ``y`` and compute the gradient of
 .. note::
-    ``scan`` is a generic op in Theano that allows writting in a symbolic
+    ``scan`` is a generic op in Theano that allows writing in a symbolic
-    manner all kind of recurrent equations. While in principle, creating
+    manner all kinds of recurrent equations. While creating
    symbolic loops (and optimizing them for performance) is a hard task,
-    effort is being done for improving the performance of ``scan``. More
+    effort is being done for improving the performance of ``scan``. For more
    information about how to use this op, see :ref:`this <lib_scan>`.
@@ -120,15 +118,15 @@ do is to loop over the entries in ``y`` and compute the gradient of
 array([[ 8.,  0.],
       [ 0.,  8.]])
-What we did in this code, is to generate a sequence of ints from ``0`` to
+What we do in this code is to generate a sequence of ints from ``0`` to
 ``y.shape[0]`` using ``T.arange``. Then we loop through this sequence, and
 at each step, we compute the gradient of element ``y[[i]`` with respect to
 ``x``. ``scan`` automatically concatenates all these rows, generating a
-matrix, which corresponds to the Jacobian.
+matrix which corresponds to the Jacobian.
 .. note::
-    There are a few gotchas regarding ``T.grad``. One of them is that you
+    There are a few pitfalls to be aware of regarding ``T.grad``. One of them is that you
-    can not re-write the above expression of the jacobian as
+    cannot re-write the above expression of the jacobian as
    ``theano.scan(lambda y_i,x: T.grad(y_i,x), sequences=y,
    non_sequences=x)``, even though from the documentation of scan this
    seems possible. The reason is that ``y_i`` will not be a function of
@@ -142,7 +140,7 @@ Theano implements :func:`theano.gradient.hessian` macro that does all
 that is needed to compute the Hessian. The following text explains how
 to do it manually.
-You can compute the Hessian manually as the Jacobian. The only
+You can compute the Hessian manually similarly to the Jacobian. The only
 difference is that now, instead of computing the Jacobian of some expression
 ``y``, we compute the Jacobian of ``T.grad(cost,x)``, where ``cost`` is some
 scalar. 
@@ -159,34 +157,33 @@ array([[ 2.,  0.],
       [ 0.,  2.]])
-Jacobian times a vector
+Jacobian times a Vector
 =======================
 Sometimes we can express the algorithm in terms of Jacobians times vectors,
 or vectors times Jacobians. Compared to evaluating the Jacobian and then
-doing the product, there are methods that computes the wanted result, while
+doing the product, there are methods that compute the desired results while
-avoiding actually evaluating the Jacobian. This can bring about significant 
+avoiding actual evaluation of the Jacobian. This can bring about significant 
 performance gains. A description of one such algorithm can be found here: 
 * Barak A. Pearlmutter, "Fast Exact Multiplication by the Hessian", *Neural
  Computation, 1994*
-While in principle we would want Theano to identify such patterns for us,
+While in principle we would want Theano to identify these patterns automatically for us,
-in practice, implementing such optimizations in a generic manner can be 
+in paractice, implementing such optimizations in a generic manner is extremely 
-close to impossible. As such, we offer special functions that
+difficult. Therefore, we offer special functions dedicated to these tasks.
-can be used to compute such expression.
 R-operator
 ----------
-The *R operator* is suppose to evaluate the product between a Jacobian and a
+The *R operator* is built to evaluate the product between a Jacobian and a
 vector, namely :math:`\frac{\partial f(x)}{\partial x} v`. The formulation
 can be extended even for `x` being a matrix, or a tensor in general, case in
 which also the Jacobian becomes a tensor and the product becomes some kind
 of tensor product. Because in practice we end up needing to compute such
-expression in terms of weight matrices, theano supports this more generic
+expressions in terms of weight matrices, theano supports this more generic
-meaning of the operation. In order to evaluate the *R-operation* of
+form of the operation. In order to evaluate the *R-operation* of
 expression ``y``, with respect to ``x``, multiplying the Jacobian with ``v``
 you need to do something similar to this:
@@ -205,10 +202,10 @@ array([ 2.,  2.])
 L-operator
 ----------
-Similar to *R-operator* the *L-operator* would compute a *row* vector times
+Similar to *R-operator*, the *L-operator* would compute a *row* vector times
 the Jacobian. The mathematical forumla would be :math:`v \frac{\partial
 f(x)}{\partial x}`. As for the *R-operator*, the *L-operator* is supported
-for generic tensors (not only for vectors). Similarly, it can be used as
+for generic tensors (not only for vectors). Similarly, it can be implemented as
 follows:
 >>> W = T.dmatrix('W')
@@ -226,21 +223,21 @@ array([[ 0.,  0.],
    `v`, the evaluation point, differs between the *L-operator* and the *R-operator*.
    For the *L-operator*, the evaluation point needs to have the same shape
    as the output, while for the *R-operator* the evaluation point should
-    have the same shape as the input parameter. Also the result of these two
+    have the same shape as the input parameter. Also, the results of these two
-    opeartion differs. The result of the *L-operator* is of the same shape
+    operations differ. The result of the *L-operator* is of the same shape
    as the input parameter, while the result of the *R-operator* is the same
    as the output.
-Hessian times a vector
+Hessian times a Vector
 ======================
 If you need to compute the Hessian times a vector, you can make use of the
 above defined operators to do it more efficiently than actually computing
-the exact Hessian and then doing the product. Due to the symmetry of the 
+the exact Hessian and then performing the product. Due to the symmetry of the 
 Hessian matrix, you have two options that will
-give you the same result, though these options might exhibit different performance, so we
+give you the same result, though these options might exhibit differing performances. 
-suggest to profile the methods before using either of the two:
+Hence, we suggest profiling the methods before using either of the two:
 >>> x = T.dvector('x')
@@ -265,18 +262,18 @@ or, making use of the *R-operator*:
 array([ 4.,  4.])
-Final notes
+Final Pointers
-===========
+==============
-* T.grad works symbolically: takes and returns a Theano variable.
+* The ``grad`` function works symbolically: it takes and returns a Theano variable.
-* Can be compared to a macro: can be applied multiple times.
+* It can be compared to a macro since it can be applied repeatedly.
-* Handles scalar costs only.
+* It directly handles scalar costs only.
-* However, a simple recipe allows to compute efficiently vector x Jacobian and vector x Hessian.
+* Built-in functions allow to compute efficiently vector times Jacobian and vector times Hessian.
-* Work is in progress on the missing optimizations to be able to compute efficiently the full
+* Work is in progress on the optimizations required to compute efficiently the full
-     Jacobian and Hessian matrices and Jacobian times x vector.
+  Jacobian and Hessian matrices and the Jacobian times vector.
--- a/doc/tutorial/loading_and_saving.txt
+++ b/doc/tutorial/loading_and_saving.txt
@@ -24,7 +24,7 @@ as you would in the course of any other Python program.
 .. _pickle: http://docs.python.org/library/pickle.html
-The basics of pickling
+The Basics of Pickling
 ======================
 The two modules ``pickle`` and ``cPickle`` have the same functionalities, but
@@ -45,7 +45,7 @@ You can serialize (or *save*, or *pickle*) objects to a file with
 .. note::
    If you want your saved object to be stored efficiently, don't forget
-    to use ``cPickle.HIGHEST_PROTOCOL``, the resulting file can be
+    to use ``cPickle.HIGHEST_PROTOCOL``. The resulting file can be
    dozens of times smaller than with the default protocol.
 .. note::
@@ -81,7 +81,7 @@ For more details about pickle's usage, see
 `Python documentation <http://docs.python.org/library/pickle.html#usage>`_.
-Short-term serialization
+Short-Term Serialization
 ========================
 If you are confident that the class instance you are serializing will be
@@ -114,7 +114,7 @@ For instance, you can define functions along the lines of:
        self.training_set = cPickle.load(file(self.training_set_file, 'rb'))
-Long-term serialization
+Long-Term Serialization
 =======================
 If the implementation of the class you want to save is quite unstable, for
@@ -138,7 +138,7 @@ matrix ``W`` and a bias ``b``, you can define:
        self.W = W
        self.b = b
-If, at some point in time, ``W`` is renamed to ``weights`` and ``b`` to
+If at some point in time ``W`` is renamed to ``weights`` and ``b`` to
 ``bias``, the older pickled files will still be usable, if you update these
 functions to reflect the change in name:

--- a/doc/tutorial/loop.txt
+++ b/doc/tutorial/loop.txt
@@ -8,17 +8,17 @@ Loop
 Scan
 ====
- A general form of **recurrence**, which can be used for looping.
+- A general form of *recurrence*, which can be used for looping.
- **Reduction** and **map** (loop over the leading dimensions) are special cases of scan.
+- *Reduction* and *map* (loop over the leading dimensions) are special cases of scan.
 - You 'scan' a function along some input sequence, producing an output at each time-step.
- The function can see the **previous K time-steps** of your function.
+- The function can see the *previous K time-steps* of your function.
 - ``sum()`` could be computed by scanning the z + x(i) function over a list, given an initial state of ``z=0``.
 - Often a for-loop can be expressed as a ``scan()`` operation, and ``scan`` is the closest that Theano comes to looping.
 - Advantages of using ``scan`` over for loops:
  - Number of iterations to be part of the symbolic graph.
-  - Minimizes GPU transfers if GPU is involved.
+  - Minimizes GPU transfers (if GPU is involved).
-  - Compute gradients through sequential steps.
+  - Computes gradients through sequential steps.
  - Slightly faster than using a for loop in Python with a compiled Theano function.
  - Can lower the overall memory usage by detecting the actual amount of memory needed.
@@ -81,8 +81,9 @@ The full documentation can be found in the library: :ref:`Scan <lib_scan>`.
 **Exercise**
- Run both examples.
+Run both examples.
- Modify and execute the polynomial example to have the reduction done by scan.
+Modify and execute the polynomial example to have the reduction done by ``scan``.
 -------------------------------------------
--- a/doc/tutorial/modes.txt
+++ b/doc/tutorial/modes.txt
@@ -2,15 +2,15 @@
 .. _using_modes:
 ==========================================
-Configuration settings and Compiling modes
+Configuration Settings and Compiling Modes
 ==========================================
 Configuration
 =============
-The config module contains many ``attributes`` that modify Theano's behavior.  Many of these
+The ``config`` module contains several ``attributes`` that modify Theano's behavior.  Many of these
-attributes are consulted during the import of the ``theano`` module and many are assumed to be
+attributes are examined during the import of the ``theano`` module and several are assumed to be
 read-only.
 *As a rule, the attributes in this module should not be modified by user code.*
@@ -38,7 +38,7 @@ variables, type this from the command-line:
 **Exercise**
-Consider once again the logistic regression:
+Consider the logistic regression:
 .. code-block:: python
@@ -63,7 +63,6 @@ Consider once again the logistic regression:
    #print "Initial model:"
    #print w.get_value(), b.get_value()
    # Construct Theano expression graph
    p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # Probabily of having a one
    prediction = p_1 > 0.5 # The prediction that is done: 0 or 1
@@ -91,7 +90,6 @@ Consider once again the logistic regression:
        print train.maker.fgraph.toposort()
    for i in range(training_steps):
        pred, err = train(D[0], D[1])
    #print "Final model:"
@@ -105,19 +103,24 @@ Consider once again the logistic regression:
 Modify and execute this example to run on CPU (the default) with floatX=float32 and 
-time with ``time python file.py``.
+time the execution using the command line ``time python file.py``.
+.. TODO: To be resolved:
+.. You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
+.. Why the latter portion?
-????You will need to use: ``theano.config.floatX`` and ``ndarray.astype("str")``
 .. Note::
   * Apply the Theano flag ``floatX=float32`` through (``theano.config.floatX``) in your code.
-   * Cast inputs before putting them into a shared variable.
+   * Cast inputs before storing them into a shared variable.
   * Circumvent the automatic cast of int32 with float32 to float64:
     * Insert manual cast in your code or use [u]int{8,16}.
     * Insert manual cast around the mean operator (this involves division by length, which is an int64).
-     * A new casting mechanism is being developed.
+     * Notice that a new casting mechanism is being developed.
 -------------------------------------------
@@ -237,25 +240,25 @@ is quite strict.
 ProfileMode
 ===========
-Beside checking for errors, another important task is to profile your
+Besides checking for errors, another important task is to profile your
 code. For this Theano uses a special mode called ProfileMode which has
 to be passed as an argument to :func:`theano.function <function.function>`. 
 Using the ProfileMode is a three-step process.
 .. note::
-    To change the default to it, put the Theano flags
+    To switch the default accordingly, set the Theano flag
-    :attr:`config.mode` to ProfileMode.  In that case, when the python
+    :attr:`config.mode` to ProfileMode.  In that case, when the Python
-    process exit, it will automatically print the profiling
+    process exits, it will automatically print the profiling
-    information on the stdout.
+    information on the standard output.
-    The memory profile of the output of each apply node can be enabled with the 
+    The memory profile of the output of each ``apply`` node can be enabled with the 
    Theano flag :attr:`config.ProfileMode.profile_memory`.
 Creating a ProfileMode Instance
 -------------------------------
-First create a ProfileMode instance.
+First create a ProfileMode instance:
 >>> from theano import ProfileMode
 >>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())
@@ -270,7 +273,7 @@ implementations wherever possible should use the ``gof.OpWiseCLinker``
 using the 'fast_run' optimizer and ``gof.OpWiseCLinker`` linker.
 Compiling your Graph with ProfileMode
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-------------------------------------
 Once the ProfileMode instance is created, simply compile your graph as you
 would normally, by specifying the mode parameter.
@@ -282,17 +285,13 @@ would normally, by specifying the mode parameter.
 >>> minst = m.make(mode=profmode)
 Retrieving Timing Information
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------------
 Once your graph is compiled, simply run the program or operation you wish to
 profile, then call ``profmode.print_summary()``. This will provide you with
 the desired timing information, indicating where your graph is spending most
-of its time.
+of its time. This is best shown through an example. Let's use our logistic
+regression example.
-This is best shown through an example.
-Lets use the example of logistic
-regression.  (Code for this example is in the file
-``benchmark/regression/regression.py``.)
 Compiling the module with ProfileMode and calling ``profmode.print_summary()``
 generates the following output:
@@ -344,16 +343,16 @@ generates the following output:
    """
-The summary has two components to it. In the first section called the
+This output has two components. In the first section called
-Apply-wise summary, timing information is provided for the worst
+*Apply-wise summary*, timing information is provided for the worst
 offending Apply nodes. This corresponds to individual Op applications
-within your graph which take the longest to execute (so if you use
+within your graph which took longest to execute (so if you use
 ``dot`` twice, you will see two entries there). In the second portion,
-the Op-wise summary, the execution time of all Apply nodes executing
+the *Op-wise summary*, the execution time of all Apply nodes executing
 the same Op are grouped together and the total execution time per Op
 is shown (so if you use ``dot`` twice, you will see only one entry
 there corresponding to the sum of the time spent in each of them).
-Note that the ProfileMode also shows which Ops were running a c
+Finally, notice that the ProfileMode also shows which Ops were running a C
 implementation.
--- a/doc/tutorial/numpy.txt
+++ b/doc/tutorial/numpy.txt
@@ -24,7 +24,7 @@ where each example has dimension 5. If this would be the input of a
 neural network then the weights from the input to the first hidden
 layer would represent a matrix of size (5, #hid). 
-If I have an array:
+Consider this array:
 >>> numpy.asarray([[1., 2], [3, 4], [5, 6]])
 array([[ 1.,  2.],
@@ -61,5 +61,5 @@ array([2., 4., 6.])
 The smaller array ``b`` (actually a scalar here, which works like a 0-d array) in this case is *broadcasted* to the same size
 as ``a`` during the multiplication. This trick is often useful in
-simplifying how expression are written. More details about *broadcasting*
+simplifying how expression are written. More detail about *broadcasting*
-can be found at `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
+can be found in the `numpy user guide <http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html>`__.
--- a/doc/tutorial/python.txt
+++ b/doc/tutorial/python.txt
@@ -5,7 +5,8 @@
 Python tutorial
 ***************
-In this documentation, we suppose that reader know python. Here is a small list of python tutorials/exercices if you know know it or need a refresher:
+In this documentation, we suppose that the reader knows Python. Here is a small list of Python 
+tutorials/exercises if you need to learn it or only need a refresher:
  * `Python Challenge <http://www.pythonchallenge.com/>`__
  * `Dive into Python <http://diveintopython.net/>`__

--- a/doc/tutorial/shape_info.txt
+++ b/doc/tutorial/shape_info.txt
 .. _shape_info:
-============================================
+==========================================
-How shape informations are handled by Theano
+How Shape Information is Handled by Theano
-============================================
+==========================================
-It is not possible to enforce strict shape into a Theano variable when
+It is not possible to strictly enforce the shape of a Theano variable when
-building a graph. The given parameter of theano.function can change the
+building a graph since the particular value provided for a parameter of the theano.function can change the
-shape any TheanoVariable in a graph.
+shape any Theano variable in its graph.
-Currently shape informations are used for 2 things in Theano:
+Currently, information regarding shape is used in two ways in Theano:
- When the exact shape is known, we use it to generate faster c code for
+- When the exact output shape is known, to generate faster C code for
-  the 2d convolution on the cpu and gpu.
+  the 2d convolution on the CPU and GPU.
 - To remove computations in the graph when we only want to know the
  shape, but not the actual value of a variable. This is done with the
@@ -32,11 +32,11 @@ Currently shape informations are used for 2 things in Theano:
     # |Shape_i{1} [@43797968] ''   0
     # | |x [@43423568]
-The output of this compiled function do not contain any multiplication
+The output of this compiled function does not contain any multiplication
 or power. Theano has removed them to compute directly the shape of the
 output.
-Shape inference problem
+Shape Inference Problem
 =======================
 Theano propagates shape information in the graph. Sometimes this
@@ -83,20 +83,20 @@ can lead to errors. For example:
   # |y [@44540304]
   f(xv,yv)
-   # Raise a dimensions mismatch error.
+   # Raises a dimensions mismatch error.
-As you see, when you ask for the shape of some computation (join in the
+As you can see, when asking only for the shape of some computation (``join`` in the
-example), we sometimes compute an inferred shape directly, without executing
+example), an inferred shape is computed directly, without executing
-the computation itself (there is no join in the first output or debugprint).
+the computation itself (there is no ``join`` in the first output or debugprint).
-This makes the computation of the shape faster, but it can hide errors. In
+This makes the computation of the shape faster, but it can also hide errors. In
-the example, the computation of the shape of join is done on the first
+the example, the computation of the shape of ``join`` is done on the first
-theano variable in the join, not on the other.
+theano variable in the ``join`` computation and not on the other.
-This can probably happen with many other op as elemwise, dot, ...
+This might happen with other ops such as elemwise, dot, ...
 Indeed, to make some optimizations (for speed or stability, for instance),
-Theano can assume that the computation is correct and consistent
+Theano assumes that the computation is correct and consistent
-in the first place, this is the case here.
+in the first place, as it does here.
 You can detect those problem by running the code without this
 optimization, with the Theano flag
@@ -106,23 +106,23 @@ optimization, nor most other optimizations) or DEBUG_MODE (it will test
 before and after all optimizations (much slower)).
-Specifing exact shape
+Specifing Exact Shape
 =====================
-Currently, specifying a shape is not as easy as we want. We plan some
+Currently, specifying a shape is not as easy and flexible as we want and we plan some
-upgrade, but this is the current state of what can be done.
+upgrade.  Here is the current state of what can be done:
 - You can pass the shape info directly to the `ConvOp` created
-  when calling conv2d. You must add the parameter image_shape
+  when calling conv2d. You simply add the parameters image_shape
-  and filter_shape to that call. They but most be tuple of 4
+  and filter_shape to the call. They must be tuples of 4
  elements. Ex:
 .. code-block:: python
    theano.tensor.nnet.conv2d(..., image_shape=(7,3,5,5), filter_shape=(2,3,4,4))
- You can use the SpecifyShape op to add shape anywhere in the
+- You can use the SpecifyShape op to add shape info anywhere in the
-  graph. This allows to do some optimizations. In the following example,
+  graph. This allows to perform some optimizations. In the following example,
  this allows to precompute the Theano function to a constant.
 .. code-block:: python
@@ -134,10 +134,10 @@ upgrade, but this is the current state of what can be done.
   theano.printing.debugprint(f)
   # [2 2] [@72791376]
-Future plans
+Future Plans
 ============
 - Add the parameter "constant shape" to theano.shared(). This is probably
-  the most frequent use case when we will use it. This will make the code
+  the most frequent case with ``shared variables``. This will make the code
-  simpler and we will be able to check that the shape does not change when
+  simpler and will make it possible to check that the shape does not change when
-  we update the shared variable.
+  updating the shared variable.
--- a/doc/tutorial/symbolic_graphs.txt
+++ b/doc/tutorial/symbolic_graphs.txt
@@ -12,7 +12,7 @@ Theano Graphs
 Debugging or profiling code written in Theano is not that simple if you
 do not know what goes on under the hood. This chapter is meant to
 introduce you to a required minimum of the inner workings of Theano, 
-for more details see :ref:`extending`.
+for more detail see :ref:`extending`.
 The first step in writing Theano code is to write down all mathematical 
 relations using symbolic placeholders (**variables**). When writing down 
@@ -28,8 +28,8 @@ Theano builds internally a graph structure composed of interconnected
 **variables**. It is important to make the difference between the
 definition of a computation represented by an **op** and its application
 to some actual data which is represented by the **apply** node. For more
-details about these building blocks see :ref:`variable`, :ref:`op`, 
+detail about these building blocks see :ref:`variable`, :ref:`op`, 
-:ref:`apply`. A graph example is the following:
+:ref:`apply`. Here is a an example of a graph:
 **Code**
@@ -77,7 +77,7 @@ output. You can now print the name of the op that is applied to get
 >>> y.owner.op.name
 'Elemwise{mul,no_inplace}'
-So a elementwise multiplication is used to compute ``y``. This
+Hence, an elementwise multiplication is used to compute ``y``. This
 multiplication is done between the inputs:
 >>> len(y.owner.inputs)
@@ -101,9 +101,9 @@ same shape as x. This is done by using the op ``DimShuffle`` :
 [2.0]
-Starting from this graph structure it is easy to understand how 
+Starting from this graph structure it is easier to understand how 
-*automatic differentiation* is done, or how the symbolic relations
+*automatic differentiation* proceeds and how the symbolic relations
-can be optimized for performance or stability.
+can be *optimized* for performance or stability.
 Automatic Differentiation
@@ -159,4 +159,4 @@ Consider the following example of optimization:
 .. image:: ../hpcs2011_tutorial/pics/f_unoptimized.png   .. image:: ../hpcs2011_tutorial/pics/f_optimized.png
 ======================================================  =====================================================
-Symbolic programming involves a paradigm shift: people need to use it to understand it.
+Symbolic programming involves a paradigm shift: it is best to use it in order to understand it.
--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt