Fix doc typos and spelling errors. Make spelling of "optimize" consistent.

01351615 · Adrian Keet · 6b594491 · 01351615 · 01351615 · 01351615
--- a/doc/dev_start_guide.txt
+++ b/doc/dev_start_guide.txt
@@ -18,7 +18,7 @@ sure it is up to date and see if nobody else is working on it. Also,
 we can sometimes provides more information about it.  There is also
 the label `NeedSomeoneToFinish
 <https://github.com/Theano/Theano/labels/NeedSomeoneToFinish>`_ that is
-interresting to check. The difficulty level is variable.
+interesting to check. The difficulty level is variable.

 Resources
 =========
@@ -79,10 +79,10 @@ tests passed.

 Just because the tests run automatically does not mean you shouldn't
 run them yourself to make sure everything is all right.  You can run
-only the portion you are modifying to go faster and have travis to
+only the portion you are modifying to go faster and have Travis to
 make sure there are no global impacts.

-Also, if you are changing GPU code, travis doesn't test that, because
+Also, if you are changing GPU code, Travis doesn't test that, because
 there are no GPUs on the test nodes.

 To run the test suite with the default options, see 
@@ -128,7 +128,7 @@ To setup VIM:

    pip install "flake8<3"

-   .. warning:: Starting version 3.0.0, flake8 changed its dependancies and 
+   .. warning:: Starting version 3.0.0, flake8 changed its dependencies and
      moved its Python API to a legacy module, breaking Theano's flake8 tests.
      We recommend using a version prior to 3.  

@@ -395,7 +395,7 @@ patch Theano, you should work in another branch, like described in the
 Configure Git
 -------------

-On your local machine, you need to configure git with basic informations:
+On your local machine, you need to configure git with basic information:

 .. code-block:: bash


--- a/doc/library/compile/debugmode.txt
+++ b/doc/library/compile/debugmode.txt
@@ -133,7 +133,8 @@ Reference
        Initialize member variables.

        If any of these arguments (except optimizer) is not None, it overrides the class default.
-        The linker arguments is not used. It is set their to allow Mode.requiring() and some other fct to work with DebugMode too.
+        The linker arguments is not used. It is set there to allow
+        Mode.requiring() and some other functions to work with DebugMode too.




--- a/doc/library/compile/nanguardmode.txt
+++ b/doc/library/compile/nanguardmode.txt
@@ -14,7 +14,7 @@ Guide
 =====


-The NanGuardMode aims to prevent the model from outputing NaNs or Infs. It has
+The NanGuardMode aims to prevent the model from outputting NaNs or Infs. It has
 a number of self-checks, which can help to find out which apply node is
 generating those incorrect outputs. It provides automatic detection of 3 types
 of abnormal values: NaNs, Infs, and abnormally big values.

--- a/doc/library/compile/opfromgraph.txt
+++ b/doc/library/compile/opfromgraph.txt
@@ -12,7 +12,7 @@ encapsulate a Theano graph in an op.

 This can be used to encapsulate some functionality in one block. It is
 useful to scale Theano compilation for regular bigger graphs when we
-reuse that encapsulated fonctionality with different inputs many
+reuse that encapsulated functionality with different inputs many
 times. Due to this encapsulation, it can make Theano compilation phase
 faster for graphs with many nodes.


--- a/doc/library/config.txt
+++ b/doc/library/config.txt
@@ -109,7 +109,7 @@ import theano and print the config variable, as in:

    Default device for computations. If ``'cuda*``, change the default to try
    to move computation to the GPU using CUDA libraries. If ``'opencl*'``,
-    the openCL libraries will be used. To let the driver select the device,
+    the OpenCL libraries will be used. To let the driver select the device,
    use ``'cuda'`` or ``'opencl'``. If we are not able to use the GPU,
    either we fall back on the CPU, or an error is raised, depending
    on the :attr:`force_device` flag.
@@ -236,7 +236,7 @@ import theano and print the config variable, as in:
    less inplaces are allowed, but it makes the compilation faster.

    The interaction of which one give the lower peak memory usage is complicated and
-    not predictable, so if you are close to the peak memory usage, triyng both
+    not predictable, so if you are close to the peak memory usage, trying both
    could give you a small gain.

 .. attribute:: openmp
@@ -440,7 +440,7 @@ import theano and print the config variable, as in:

    .. note::

-        The clipping at 95% can be bypassed by specifing the exact
+        The clipping at 95% can be bypassed by specifying the exact
        number of megabytes. If more then 95% are needed, it will try
        automatically to get more memory. But this can cause
        fragmentation, see note above.
@@ -892,8 +892,8 @@ import theano and print the config variable, as in:
    one of the other ``config.numpy.seterr_*`` overrides it), but this behaviour
    can change between numpy releases.

-    This flag sets the default behaviour for all kinds of floating-pont
-    errors, and it can be overriden for specific errors by setting one
+    This flag sets the default behaviour for all kinds of floating-point
+    errors, and it can be overridden for specific errors by setting one
    (or more) of the flags below.

    This flag's value cannot be modified during the program execution.

--- a/doc/library/d3viz/index.txt
+++ b/doc/library/d3viz/index.txt
@@ -45,7 +45,7 @@ web-browsers. ``d3viz`` allows

 .. note::
  
-    This userguide is also avaible as
+    This userguide is also available as
    :download:`IPython notebook <index.ipynb>`.

 As an example, consider the following multilayer perceptron with one

--- a/doc/library/gpuarray/extra.txt
+++ b/doc/library/gpuarray/extra.txt
@@ -4,7 +4,7 @@
 Utility functions
 =================

-Optimisation
+Optimization
 ------------

 .. automodule:: theano.gpuarray.opt_util

--- a/doc/library/gpuarray/op.txt
+++ b/doc/library/gpuarray/op.txt
@@ -7,8 +7,8 @@ List of gpuarray Ops implemented
 .. moduleauthor:: LISA

 Normally you should not call directly those Ops! Theano should
-automatically transform cpu ops to their gpu equivalent. So this list
-is just useful to let people know what is implemented on the gpu.
+automatically transform CPU ops to their GPU equivalent. So this list
+is just useful to let people know what is implemented on the GPU.

 Basic Op
 ========

--- a/doc/library/printing.txt
+++ b/doc/library/printing.txt
@@ -145,7 +145,7 @@ Green ovals are inputs to the graph and blue ovals are outputs.

 If your graph uses shared variables, those shared
 variables will appear as inputs.  Future versions of the :func:`pydotprint`
-may distinguish these inplicit inputs from explicit inputs.
+may distinguish these implicit inputs from explicit inputs.

 If you give updates arguments when creating your function, these are added as
 extra inputs and outputs to the graph.  

--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -335,7 +335,7 @@ function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
 ``c`` will always be ``12``.

 The second observation is that if we use shared variables ( ``W``, ``bvis``,
-``bhid``) but we do not iterate over them (ie scan doesn't really need to know
+``bhid``) but we do not iterate over them (i.e. scan doesn't really need to know
 anything in particular about them, just that they are used inside the
 function applied at each step) you do not need to pass them as arguments.
 Scan will find them on its own and add them to the graph.
@@ -430,7 +430,7 @@ variables passed explicitly to ``OneStep`` and to scan:
                             dtype=theano.config.floatX)

    # The new scan, adding strict=True to the original call, and passing
-    # expicitly W, bvis and bhid.
+    # explicitly W, bvis and bhid.
    values, updates = theano.scan(OneStep,
                                  outputs_info=sample,
                                  non_sequences=[W, bvis, bhid],
@@ -523,7 +523,7 @@ the compiled function, the numpy array given to represent this sequence
 should be large enough to cover this values. Assume that we compile the
 above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
 By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
-will start scaning from ``uvals[4]`` towards the end.
+will start scanning from ``uvals[4]`` towards the end.


 Conditional ending of Scan
@@ -572,7 +572,7 @@ This section presents the ``scan_checkpoints`` function. In short, this
 function reduces the memory usage of scan (at the cost of more computation
 time) by not keeping in memory all the intermediate time steps of the loop,
 and recomputing them when computing the gradients. This function is therefore
-only useful if you need to compute the gradient of the ouptut of scan with
+only useful if you need to compute the gradient of the output of scan with
 respect to its inputs, and shouldn't be used otherwise.

 Before going more into the details, here are its current limitations:
@@ -582,8 +582,8 @@ Before going more into the details, here are its current limitations:
 * It only accepts sequences of the same length.
 * If ``n_steps`` is specified, it has the same value as the length of any
  sequences.
-* It is signly-recurrent, meaning that only the previous time step can be used
-  to compute the current one (ie ``h[t]`` can only depend on ``h[t-1]``). In
+* It is singly-recurrent, meaning that only the previous time step can be used
+  to compute the current one (i.e. ``h[t]`` can only depend on ``h[t-1]``). In
  other words, ``taps`` can not be used in ``sequences`` and ``outputs_info``.

 Often, in order to be able to compute the gradients through scan operations,
@@ -652,7 +652,7 @@ This one is simple but still worth pointing out. Theano is able to
 automatically recognize and optimize many computation patterns. However, there
 are patterns that Theano doesn't optimize because doing so would change the
 user interface (such as merging shared variables together into a single one,
-for instance). Additionaly, Theano doesn't catch every case that it could
+for instance). Additionally, Theano doesn't catch every case that it could
 optimize and so it remains useful for performance that the user defines an
 efficient graph in the first place. This is also the case, and sometimes even
 more so, for the graph inside of Scan. This is because it will be executed

--- a/doc/library/sparse/index.txt
+++ b/doc/library/sparse/index.txt
@@ -44,7 +44,7 @@ attributes: ``data``, ``indices``, ``indptr`` and ``shape``.
  * The ``shape`` attribute is exactly the same as the ``shape``
    attribute of a dense (i.e. generic) matrix. It can be explicitly
    specified at the creation of a sparse matrix if it cannot be
-    infered from the first three attributes.
+    inferred from the first three attributes.


 CSC Matrix
@@ -173,7 +173,7 @@ List of Implemented Operations
      The grad implemented is regular.
    - :func:`col_scale <theano.sparse.basic.col_scale>` to multiply by a vector along the columns.
      The grad implemented is structured.
-    - :func:`row_slace <theano.sparse.basic.row_scale>` to multiply by a vector along the rows.
+    - :func:`row_scale <theano.sparse.basic.row_scale>` to multiply by a vector along the rows.
      The grad implemented is structured.

 - Monoid (Element-wise operation with only one sparse input).

--- a/doc/library/tensor/basic.txt
+++ b/doc/library/tensor/basic.txt
@@ -269,7 +269,7 @@ For additional information, see the :func:`shared() <shared.shared>` documentati

 .. _libdoc_tensor_autocasting:

-Finally, when you use a numpy ndarry or a Python number together with
+Finally, when you use a numpy ndarray or a Python number together with
 :class:`TensorVariable` instances in arithmetic expressions, the result is a
 :class:`TensorVariable`. What happens to the ndarray or the number?
 Theano requires that the inputs to all expressions be Variable instances, so
@@ -893,7 +893,7 @@ Reductions
    :Parameter: *keepdims* - (boolean) If this is set to True, the axis which is reduced is
 		left in the result as a dimension with size one. With this option, the result
 		will broadcast correctly against the original tensor.
-    :Returns: the maxium value along a given axis and its index.
+    :Returns: the maximum value along a given axis and its index.

    if axis=None, Theano 0.5rc1 or later: max_and_argmax over the flattened tensor (like numpy)
                  older: then axis is assumed to be ndim(x)-1
@@ -1209,7 +1209,7 @@ Casting
    Cast any tensor `x` to a Tensor of the same shape, but with a different
    numerical type `dtype`.

-    This is not a reinterpret cast, but a coersion cast, similar to
+    This is not a reinterpret cast, but a coercion cast, similar to
    ``numpy.asarray(x, dtype=dtype)``.

    .. testcode:: cast

--- a/doc/library/tensor/nnet/index.txt
+++ b/doc/library/tensor/nnet/index.txt
@@ -9,7 +9,7 @@
   :synopsis: various ops relating to neural networks
 .. moduleauthor:: LISA

-Theano was originally developped for machine learning applications, particularly
+Theano was originally developed for machine learning applications, particularly
 for the topic of deep learning. As such, our lab has developed many functions
 and ops which are particular to neural networks and deep learning.


--- a/doc/optimizations.txt
+++ b/doc/optimizations.txt
@@ -19,7 +19,7 @@ If you would like to add an additional optimization, refer to
 When compiling, we can make a tradeoff between compile-time and run-time.
 Faster compile times will result in fewer optimizations being applied, hence generally slower run-times.
 For making this tradeoff when compiling, we provide a set of 4 optimization modes, 'o1' to 'o4', where 'o1' leads to fastest compile-time and 'o4' leads to fastest run-time in general.
-For an even faster run-time, we could disable assertions (which could be time comsuming) for valid user inputs, using the optimization mode 'unsafe', but this is, as the name suggests, unsafe.
+For an even faster run-time, we could disable assertions (which could be time consuming) for valid user inputs, using the optimization mode 'unsafe', but this is, as the name suggests, unsafe.
 (Also see note at :ref:`unsafe_optimization`.)

 ..  note::
@@ -120,7 +120,7 @@ Optimization                                              o4             o3  o2
        This optimization reorders such graphs so that all increments can be
        done inplace.  
        
-        ``inc_subensor(a,b,idx) + inc_subtensor(a,c,idx) -> inc_subtensor(inc_subtensor(a,b,idx),c,idx)``
+        ``inc_subtensor(a,b,idx) + inc_subtensor(a,c,idx) -> inc_subtensor(inc_subtensor(a,b,idx),c,idx)``

        See :func:`local_IncSubtensor_serialize`

@@ -285,7 +285,7 @@ Optimization                                              o4             o3  o2
        For the fastest possible Theano, this optimization can be enabled by
 	setting ``optimizer_including=local_remove_all_assert`` which will
 	remove all assertions in the graph for checking user inputs are valid.
-        Use this optimization if you are sure everthing is valid in your graph.
+        Use this optimization if you are sure everything is valid in your graph.
 	
 	See :ref:`unsafe_optimization`

--- a/doc/tutorial/debug_faq.txt
+++ b/doc/tutorial/debug_faq.txt
@@ -279,7 +279,7 @@ Theano provides a 'Print' op to do this.
    this is a very important value __str__ = [ 1.  2.  3.]

 Since Theano runs your program in a topological order, you won't have precise
-control over the order in which multiple ``Print()`` ops are evaluted.  For a more
+control over the order in which multiple ``Print()`` ops are evaluated.  For a more
 precise inspection of what's being computed where, when, and how, see the discussion
 :ref:`faq_monitormode`.

@@ -437,7 +437,7 @@ optimizations. The first is a speed optimization that merges elemwise
 operations together. This makes it harder to know which particular
 elemwise causes the problem. The second optimization makes some ops'
 outputs overwrite their inputs. So, if an op creates a bad output, you
-will not be able to see the input that was overwriten in the ``post_func``
+will not be able to see the input that was overwritten in the ``post_func``
 function. To disable those optimizations (with a Theano version after
 0.6rc3), define the MonitorMode like this:

@@ -606,5 +606,5 @@ Then send us filename.
 Breakpoint during Theano function execution
 -------------------------------------------

-You can set breakpoing during the execution of a Theano function with
+You can set a breakpoint during the execution of a Theano function with
 :class:`PdbBreakpoint <theano.tests.breakpoint.PdbBreakpoint>`.
--- a/doc/tutorial/examples.txt
+++ b/doc/tutorial/examples.txt
@@ -347,7 +347,7 @@ RandomStream object (a random number generator) for each such
 variable, and draw from it as necessary. We will call this sort of
 sequence of random numbers a *random stream*. *Random streams* are at
 their core shared variables, so the observations on shared variables
-hold here as well. Theanos's random objects are defined and implemented in
+hold here as well. Theano's random objects are defined and implemented in
 :ref:`RandomStreams<libdoc_tensor_shared_randomstreams>` and, at a lower level,
 in :ref:`RandomStreamsBase<libdoc_tensor_raw_random>`.


--- a/doc/tutorial/faq_tutorial.txt
+++ b/doc/tutorial/faq_tutorial.txt
@@ -7,7 +7,7 @@ Frequently Asked Questions
 How to update a subset of weights?
 ==================================
 If you want to update only a subset of a weight matrix (such as
-some rows or some columns) that are used in the forward propogation
+some rows or some columns) that are used in the forward propagation
 of each iteration, then the cost function should be defined in a way
 that it only depends on the subset of weights that are used in that
 iteration.

--- a/doc/tutorial/gradients.txt
+++ b/doc/tutorial/gradients.txt
@@ -148,7 +148,7 @@ matrix which corresponds to the Jacobian.
 Computing the Hessian
 =====================

-In Theano, the term *Hessian* has the usual mathematical acception: It is the 
+In Theano, the term *Hessian* has the usual mathematical meaning: It is the 
 matrix comprising the second order partial derivative of a function with scalar
 output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all
 that is needed to compute the Hessian. The following text explains how

--- a/doc/tutorial/multi_cores.txt
+++ b/doc/tutorial/multi_cores.txt
@@ -14,7 +14,7 @@ CPU.
 BLAS operation
 ==============

-BLAS is an interface for some mathematic operations between two
+BLAS is an interface for some mathematical operations between two
 vectors, a vector and a matrix or two matrices (e.g. the dot product
 between vector/matrix and matrix/matrix). Many different
 implementations of that interface exist and some of them are

--- a/doc/tutorial/nan_tutorial.txt
+++ b/doc/tutorial/nan_tutorial.txt
@@ -21,7 +21,7 @@ Most frequently, the cause would be that some of the hyperparameters, especially
 learning rates, are set incorrectly. A high learning rate can blow up your whole
 model into NaN outputs even within one epoch of training. So the first and
 easiest solution is try to lower it. Keep halving your learning rate until you
-start to get resonable output values.
+start to get reasonable output values.

 Other hyperparameters may also play a role. For example, are your training
 algorithms involve regularization terms? If so, are their corresponding
@@ -73,7 +73,7 @@ chance that something is wrong with your algorithm. Go back to the mathematics
 and find out if everything is derived correctly.


-Cuda Specific Option
+CUDA Specific Option
 --------------------

 The Theano flag ``nvcc.fastmath=True`` can genarate NaN. Don't set
@@ -85,6 +85,6 @@ this flag while debugging NaN.
 NaN Introduced by AllocEmpty
 -----------------------------------------------

-AllocEmpty is used by many operation such as scan to allocate some memory without properly clearing it. The reason for that is that the allocated memory will subsequently be overwritten. However, this can sometimes introduce NaN depending on the operation and what was previously stored in the memory it is working on. For instance, trying to zero out memory  using a multipication before applying an operation could cause NaN if NaN is already present in the memory, since `0 * NaN => NaN`.
+AllocEmpty is used by many operation such as scan to allocate some memory without properly clearing it. The reason for that is that the allocated memory will subsequently be overwritten. However, this can sometimes introduce NaN depending on the operation and what was previously stored in the memory it is working on. For instance, trying to zero out memory  using a multiplication before applying an operation could cause NaN if NaN is already present in the memory, since `0 * NaN => NaN`.

 Using ``optimizer_including=alloc_empty_to_zeros`` replaces `AllocEmpty` by `Alloc{0}`, which is helpful to diagnose where NaNs come from. Please note that when running in `NanGuardMode`, this optimizer is not included by default. Therefore, it might be helpful to use them both together. 
--- a/doc/tutorial/numpy.txt
+++ b/doc/tutorial/numpy.txt
@@ -57,7 +57,7 @@ Numpy does *broadcasting* of arrays of different shapes during
 arithmetic operations. What this means in general is that the smaller 
 array (or scalar) is *broadcasted* across the larger array so that they have
 compatible shapes. The example below shows an instance of
-*broadcastaing*:
+*broadcasting*:

 >>> a = numpy.asarray([1.0, 2.0, 3.0])
 >>> b = 2.0

--- a/doc/tutorial/shape_info.txt
+++ b/doc/tutorial/shape_info.txt
@@ -96,8 +96,8 @@ optimization, nor most other optimizations) or ``DebugMode`` (it will test
 before and after all optimizations (much slower)).


-Specifing Exact Shape
-=====================
+Specifying Exact Shape
+======================

 Currently, specifying a shape is not as easy and flexible as we wish and we plan some
 upgrade.  Here is the current state of what can be done:

--- a/doc/tutorial/sparse.txt
+++ b/doc/tutorial/sparse.txt
@@ -66,14 +66,14 @@ and rows. They have both the same attributes: ``data``, ``indices``, ``indptr``
    sparse matrix.

  * The ``shape`` attribute is exactly the same as the ``shape`` attribute of a dense (i.e. generic)
-    matrix. It can be explicitly specified at the creation of a sparse matrix if it cannot be infered
+    matrix. It can be explicitly specified at the creation of a sparse matrix if it cannot be inferred
    from the first three attributes.

 Which format should I use?
 --------------------------

 At the end, the format does not affect the length of the ``data`` and ``indices`` attributes. They are both
-completly fixed by the number of elements you want to store. The only thing that changes with the format
+completely fixed by the number of elements you want to store. The only thing that changes with the format
 is ``indptr``. In ``csc`` format, the matrix is compressed along columns so a lower number of columns will
 result in less memory use. On the other hand, with the ``csr`` format, the matrix is compressed along
 the rows and with a matrix that have a lower number of rows, ``csr`` format is a better choice. So here is the rule:
@@ -83,7 +83,7 @@ the rows and with a matrix that have a lower number of rows, ``csr`` format is a
    If shape[0] > shape[1], use ``csc`` format. Otherwise, use ``csr``.

 Sometimes, since the sparse module is young, ops does not exist for both format. So here is
-what may be the most relevent rule:
+what may be the most relevant rule:

 .. note::


--- a/doc/tutorial/using_gpu.txt
+++ b/doc/tutorial/using_gpu.txt
@@ -477,7 +477,7 @@ The following resources will assist you in this learning process:
  * `practical issues <http://stackoverflow.com/questions/2392250/understanding-cuda-grid-dimensions-block-dimensions-and-threads-organization-s>`_
    (on the relationship between grids, blocks and threads; see also linked and related issues on same page)

-  * `CUDA optimisation <http://www.gris.informatik.tu-darmstadt.de/cuda-workshop/slides.html>`_
+  * `CUDA optimization <http://www.gris.informatik.tu-darmstadt.de/cuda-workshop/slides.html>`_

 * **PyCUDA: Introductory**


--- a/doc/tutorial/using_multi_gpu.txt
+++ b/doc/tutorial/using_multi_gpu.txt
@@ -36,7 +36,7 @@ The mapping from context names to devices is done through the
    dev0->cuda0;dev1->cuda1

 Let's break it down.  First there is a list of mappings.  Each of
-these mappings is separeted by a semicolon ';'.  There can be any
+these mappings is separated by a semicolon ';'.  There can be any
 number of such mappings, but in the example above we have two of them:
 `dev0->cuda0` and `dev1->cuda1`.