提交 01351615 authored 作者: Adrian Keet's avatar Adrian Keet

Fix doc typos and spelling errors. Make spelling of "optimize" consistent.

上级 6b594491
...@@ -18,7 +18,7 @@ sure it is up to date and see if nobody else is working on it. Also, ...@@ -18,7 +18,7 @@ sure it is up to date and see if nobody else is working on it. Also,
we can sometimes provides more information about it. There is also we can sometimes provides more information about it. There is also
the label `NeedSomeoneToFinish the label `NeedSomeoneToFinish
<https://github.com/Theano/Theano/labels/NeedSomeoneToFinish>`_ that is <https://github.com/Theano/Theano/labels/NeedSomeoneToFinish>`_ that is
interresting to check. The difficulty level is variable. interesting to check. The difficulty level is variable.
Resources Resources
========= =========
...@@ -79,10 +79,10 @@ tests passed. ...@@ -79,10 +79,10 @@ tests passed.
Just because the tests run automatically does not mean you shouldn't Just because the tests run automatically does not mean you shouldn't
run them yourself to make sure everything is all right. You can run run them yourself to make sure everything is all right. You can run
only the portion you are modifying to go faster and have travis to only the portion you are modifying to go faster and have Travis to
make sure there are no global impacts. make sure there are no global impacts.
Also, if you are changing GPU code, travis doesn't test that, because Also, if you are changing GPU code, Travis doesn't test that, because
there are no GPUs on the test nodes. there are no GPUs on the test nodes.
To run the test suite with the default options, see To run the test suite with the default options, see
...@@ -128,7 +128,7 @@ To setup VIM: ...@@ -128,7 +128,7 @@ To setup VIM:
pip install "flake8<3" pip install "flake8<3"
.. warning:: Starting version 3.0.0, flake8 changed its dependancies and .. warning:: Starting version 3.0.0, flake8 changed its dependencies and
moved its Python API to a legacy module, breaking Theano's flake8 tests. moved its Python API to a legacy module, breaking Theano's flake8 tests.
We recommend using a version prior to 3. We recommend using a version prior to 3.
...@@ -395,7 +395,7 @@ patch Theano, you should work in another branch, like described in the ...@@ -395,7 +395,7 @@ patch Theano, you should work in another branch, like described in the
Configure Git Configure Git
------------- -------------
On your local machine, you need to configure git with basic informations: On your local machine, you need to configure git with basic information:
.. code-block:: bash .. code-block:: bash
......
...@@ -133,7 +133,8 @@ Reference ...@@ -133,7 +133,8 @@ Reference
Initialize member variables. Initialize member variables.
If any of these arguments (except optimizer) is not None, it overrides the class default. If any of these arguments (except optimizer) is not None, it overrides the class default.
The linker arguments is not used. It is set their to allow Mode.requiring() and some other fct to work with DebugMode too. The linker arguments is not used. It is set there to allow
Mode.requiring() and some other functions to work with DebugMode too.
......
...@@ -14,7 +14,7 @@ Guide ...@@ -14,7 +14,7 @@ Guide
===== =====
The NanGuardMode aims to prevent the model from outputing NaNs or Infs. It has The NanGuardMode aims to prevent the model from outputting NaNs or Infs. It has
a number of self-checks, which can help to find out which apply node is a number of self-checks, which can help to find out which apply node is
generating those incorrect outputs. It provides automatic detection of 3 types generating those incorrect outputs. It provides automatic detection of 3 types
of abnormal values: NaNs, Infs, and abnormally big values. of abnormal values: NaNs, Infs, and abnormally big values.
......
...@@ -12,7 +12,7 @@ encapsulate a Theano graph in an op. ...@@ -12,7 +12,7 @@ encapsulate a Theano graph in an op.
This can be used to encapsulate some functionality in one block. It is This can be used to encapsulate some functionality in one block. It is
useful to scale Theano compilation for regular bigger graphs when we useful to scale Theano compilation for regular bigger graphs when we
reuse that encapsulated fonctionality with different inputs many reuse that encapsulated functionality with different inputs many
times. Due to this encapsulation, it can make Theano compilation phase times. Due to this encapsulation, it can make Theano compilation phase
faster for graphs with many nodes. faster for graphs with many nodes.
......
...@@ -109,7 +109,7 @@ import theano and print the config variable, as in: ...@@ -109,7 +109,7 @@ import theano and print the config variable, as in:
Default device for computations. If ``'cuda*``, change the default to try Default device for computations. If ``'cuda*``, change the default to try
to move computation to the GPU using CUDA libraries. If ``'opencl*'``, to move computation to the GPU using CUDA libraries. If ``'opencl*'``,
the openCL libraries will be used. To let the driver select the device, the OpenCL libraries will be used. To let the driver select the device,
use ``'cuda'`` or ``'opencl'``. If we are not able to use the GPU, use ``'cuda'`` or ``'opencl'``. If we are not able to use the GPU,
either we fall back on the CPU, or an error is raised, depending either we fall back on the CPU, or an error is raised, depending
on the :attr:`force_device` flag. on the :attr:`force_device` flag.
...@@ -236,7 +236,7 @@ import theano and print the config variable, as in: ...@@ -236,7 +236,7 @@ import theano and print the config variable, as in:
less inplaces are allowed, but it makes the compilation faster. less inplaces are allowed, but it makes the compilation faster.
The interaction of which one give the lower peak memory usage is complicated and The interaction of which one give the lower peak memory usage is complicated and
not predictable, so if you are close to the peak memory usage, triyng both not predictable, so if you are close to the peak memory usage, trying both
could give you a small gain. could give you a small gain.
.. attribute:: openmp .. attribute:: openmp
...@@ -440,7 +440,7 @@ import theano and print the config variable, as in: ...@@ -440,7 +440,7 @@ import theano and print the config variable, as in:
.. note:: .. note::
The clipping at 95% can be bypassed by specifing the exact The clipping at 95% can be bypassed by specifying the exact
number of megabytes. If more then 95% are needed, it will try number of megabytes. If more then 95% are needed, it will try
automatically to get more memory. But this can cause automatically to get more memory. But this can cause
fragmentation, see note above. fragmentation, see note above.
...@@ -892,8 +892,8 @@ import theano and print the config variable, as in: ...@@ -892,8 +892,8 @@ import theano and print the config variable, as in:
one of the other ``config.numpy.seterr_*`` overrides it), but this behaviour one of the other ``config.numpy.seterr_*`` overrides it), but this behaviour
can change between numpy releases. can change between numpy releases.
This flag sets the default behaviour for all kinds of floating-pont This flag sets the default behaviour for all kinds of floating-point
errors, and it can be overriden for specific errors by setting one errors, and it can be overridden for specific errors by setting one
(or more) of the flags below. (or more) of the flags below.
This flag's value cannot be modified during the program execution. This flag's value cannot be modified during the program execution.
......
...@@ -45,7 +45,7 @@ web-browsers. ``d3viz`` allows ...@@ -45,7 +45,7 @@ web-browsers. ``d3viz`` allows
.. note:: .. note::
This userguide is also avaible as This userguide is also available as
:download:`IPython notebook <index.ipynb>`. :download:`IPython notebook <index.ipynb>`.
As an example, consider the following multilayer perceptron with one As an example, consider the following multilayer perceptron with one
......
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
Utility functions Utility functions
================= =================
Optimisation Optimization
------------ ------------
.. automodule:: theano.gpuarray.opt_util .. automodule:: theano.gpuarray.opt_util
......
...@@ -7,8 +7,8 @@ List of gpuarray Ops implemented ...@@ -7,8 +7,8 @@ List of gpuarray Ops implemented
.. moduleauthor:: LISA .. moduleauthor:: LISA
Normally you should not call directly those Ops! Theano should Normally you should not call directly those Ops! Theano should
automatically transform cpu ops to their gpu equivalent. So this list automatically transform CPU ops to their GPU equivalent. So this list
is just useful to let people know what is implemented on the gpu. is just useful to let people know what is implemented on the GPU.
Basic Op Basic Op
======== ========
......
...@@ -145,7 +145,7 @@ Green ovals are inputs to the graph and blue ovals are outputs. ...@@ -145,7 +145,7 @@ Green ovals are inputs to the graph and blue ovals are outputs.
If your graph uses shared variables, those shared If your graph uses shared variables, those shared
variables will appear as inputs. Future versions of the :func:`pydotprint` variables will appear as inputs. Future versions of the :func:`pydotprint`
may distinguish these inplicit inputs from explicit inputs. may distinguish these implicit inputs from explicit inputs.
If you give updates arguments when creating your function, these are added as If you give updates arguments when creating your function, these are added as
extra inputs and outputs to the graph. extra inputs and outputs to the graph.
......
...@@ -335,7 +335,7 @@ function, then ``a.value`` will always remain 1, ``b`` will always be 2 and ...@@ -335,7 +335,7 @@ function, then ``a.value`` will always remain 1, ``b`` will always be 2 and
``c`` will always be ``12``. ``c`` will always be ``12``.
The second observation is that if we use shared variables ( ``W``, ``bvis``, The second observation is that if we use shared variables ( ``W``, ``bvis``,
``bhid``) but we do not iterate over them (ie scan doesn't really need to know ``bhid``) but we do not iterate over them (i.e. scan doesn't really need to know
anything in particular about them, just that they are used inside the anything in particular about them, just that they are used inside the
function applied at each step) you do not need to pass them as arguments. function applied at each step) you do not need to pass them as arguments.
Scan will find them on its own and add them to the graph. Scan will find them on its own and add them to the graph.
...@@ -430,7 +430,7 @@ variables passed explicitly to ``OneStep`` and to scan: ...@@ -430,7 +430,7 @@ variables passed explicitly to ``OneStep`` and to scan:
dtype=theano.config.floatX) dtype=theano.config.floatX)
# The new scan, adding strict=True to the original call, and passing # The new scan, adding strict=True to the original call, and passing
# expicitly W, bvis and bhid. # explicitly W, bvis and bhid.
values, updates = theano.scan(OneStep, values, updates = theano.scan(OneStep,
outputs_info=sample, outputs_info=sample,
non_sequences=[W, bvis, bhid], non_sequences=[W, bvis, bhid],
...@@ -523,7 +523,7 @@ the compiled function, the numpy array given to represent this sequence ...@@ -523,7 +523,7 @@ the compiled function, the numpy array given to represent this sequence
should be large enough to cover this values. Assume that we compile the should be large enough to cover this values. Assume that we compile the
above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``. above function, and we give as ``u`` the array ``uvals = [0,1,2,3,4,5,6,7,8]``.
By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and By abusing notations, scan will consider ``uvals[0]`` as ``u[-4]``, and
will start scaning from ``uvals[4]`` towards the end. will start scanning from ``uvals[4]`` towards the end.
Conditional ending of Scan Conditional ending of Scan
...@@ -572,7 +572,7 @@ This section presents the ``scan_checkpoints`` function. In short, this ...@@ -572,7 +572,7 @@ This section presents the ``scan_checkpoints`` function. In short, this
function reduces the memory usage of scan (at the cost of more computation function reduces the memory usage of scan (at the cost of more computation
time) by not keeping in memory all the intermediate time steps of the loop, time) by not keeping in memory all the intermediate time steps of the loop,
and recomputing them when computing the gradients. This function is therefore and recomputing them when computing the gradients. This function is therefore
only useful if you need to compute the gradient of the ouptut of scan with only useful if you need to compute the gradient of the output of scan with
respect to its inputs, and shouldn't be used otherwise. respect to its inputs, and shouldn't be used otherwise.
Before going more into the details, here are its current limitations: Before going more into the details, here are its current limitations:
...@@ -582,8 +582,8 @@ Before going more into the details, here are its current limitations: ...@@ -582,8 +582,8 @@ Before going more into the details, here are its current limitations:
* It only accepts sequences of the same length. * It only accepts sequences of the same length.
* If ``n_steps`` is specified, it has the same value as the length of any * If ``n_steps`` is specified, it has the same value as the length of any
sequences. sequences.
* It is signly-recurrent, meaning that only the previous time step can be used * It is singly-recurrent, meaning that only the previous time step can be used
to compute the current one (ie ``h[t]`` can only depend on ``h[t-1]``). In to compute the current one (i.e. ``h[t]`` can only depend on ``h[t-1]``). In
other words, ``taps`` can not be used in ``sequences`` and ``outputs_info``. other words, ``taps`` can not be used in ``sequences`` and ``outputs_info``.
Often, in order to be able to compute the gradients through scan operations, Often, in order to be able to compute the gradients through scan operations,
...@@ -652,7 +652,7 @@ This one is simple but still worth pointing out. Theano is able to ...@@ -652,7 +652,7 @@ This one is simple but still worth pointing out. Theano is able to
automatically recognize and optimize many computation patterns. However, there automatically recognize and optimize many computation patterns. However, there
are patterns that Theano doesn't optimize because doing so would change the are patterns that Theano doesn't optimize because doing so would change the
user interface (such as merging shared variables together into a single one, user interface (such as merging shared variables together into a single one,
for instance). Additionaly, Theano doesn't catch every case that it could for instance). Additionally, Theano doesn't catch every case that it could
optimize and so it remains useful for performance that the user defines an optimize and so it remains useful for performance that the user defines an
efficient graph in the first place. This is also the case, and sometimes even efficient graph in the first place. This is also the case, and sometimes even
more so, for the graph inside of Scan. This is because it will be executed more so, for the graph inside of Scan. This is because it will be executed
......
...@@ -44,7 +44,7 @@ attributes: ``data``, ``indices``, ``indptr`` and ``shape``. ...@@ -44,7 +44,7 @@ attributes: ``data``, ``indices``, ``indptr`` and ``shape``.
* The ``shape`` attribute is exactly the same as the ``shape`` * The ``shape`` attribute is exactly the same as the ``shape``
attribute of a dense (i.e. generic) matrix. It can be explicitly attribute of a dense (i.e. generic) matrix. It can be explicitly
specified at the creation of a sparse matrix if it cannot be specified at the creation of a sparse matrix if it cannot be
infered from the first three attributes. inferred from the first three attributes.
CSC Matrix CSC Matrix
...@@ -173,7 +173,7 @@ List of Implemented Operations ...@@ -173,7 +173,7 @@ List of Implemented Operations
The grad implemented is regular. The grad implemented is regular.
- :func:`col_scale <theano.sparse.basic.col_scale>` to multiply by a vector along the columns. - :func:`col_scale <theano.sparse.basic.col_scale>` to multiply by a vector along the columns.
The grad implemented is structured. The grad implemented is structured.
- :func:`row_slace <theano.sparse.basic.row_scale>` to multiply by a vector along the rows. - :func:`row_scale <theano.sparse.basic.row_scale>` to multiply by a vector along the rows.
The grad implemented is structured. The grad implemented is structured.
- Monoid (Element-wise operation with only one sparse input). - Monoid (Element-wise operation with only one sparse input).
......
...@@ -269,7 +269,7 @@ For additional information, see the :func:`shared() <shared.shared>` documentati ...@@ -269,7 +269,7 @@ For additional information, see the :func:`shared() <shared.shared>` documentati
.. _libdoc_tensor_autocasting: .. _libdoc_tensor_autocasting:
Finally, when you use a numpy ndarry or a Python number together with Finally, when you use a numpy ndarray or a Python number together with
:class:`TensorVariable` instances in arithmetic expressions, the result is a :class:`TensorVariable` instances in arithmetic expressions, the result is a
:class:`TensorVariable`. What happens to the ndarray or the number? :class:`TensorVariable`. What happens to the ndarray or the number?
Theano requires that the inputs to all expressions be Variable instances, so Theano requires that the inputs to all expressions be Variable instances, so
...@@ -893,7 +893,7 @@ Reductions ...@@ -893,7 +893,7 @@ Reductions
:Parameter: *keepdims* - (boolean) If this is set to True, the axis which is reduced is :Parameter: *keepdims* - (boolean) If this is set to True, the axis which is reduced is
left in the result as a dimension with size one. With this option, the result left in the result as a dimension with size one. With this option, the result
will broadcast correctly against the original tensor. will broadcast correctly against the original tensor.
:Returns: the maxium value along a given axis and its index. :Returns: the maximum value along a given axis and its index.
if axis=None, Theano 0.5rc1 or later: max_and_argmax over the flattened tensor (like numpy) if axis=None, Theano 0.5rc1 or later: max_and_argmax over the flattened tensor (like numpy)
older: then axis is assumed to be ndim(x)-1 older: then axis is assumed to be ndim(x)-1
...@@ -1209,7 +1209,7 @@ Casting ...@@ -1209,7 +1209,7 @@ Casting
Cast any tensor `x` to a Tensor of the same shape, but with a different Cast any tensor `x` to a Tensor of the same shape, but with a different
numerical type `dtype`. numerical type `dtype`.
This is not a reinterpret cast, but a coersion cast, similar to This is not a reinterpret cast, but a coercion cast, similar to
``numpy.asarray(x, dtype=dtype)``. ``numpy.asarray(x, dtype=dtype)``.
.. testcode:: cast .. testcode:: cast
......
...@@ -9,7 +9,7 @@ ...@@ -9,7 +9,7 @@
:synopsis: various ops relating to neural networks :synopsis: various ops relating to neural networks
.. moduleauthor:: LISA .. moduleauthor:: LISA
Theano was originally developped for machine learning applications, particularly Theano was originally developed for machine learning applications, particularly
for the topic of deep learning. As such, our lab has developed many functions for the topic of deep learning. As such, our lab has developed many functions
and ops which are particular to neural networks and deep learning. and ops which are particular to neural networks and deep learning.
......
...@@ -19,7 +19,7 @@ If you would like to add an additional optimization, refer to ...@@ -19,7 +19,7 @@ If you would like to add an additional optimization, refer to
When compiling, we can make a tradeoff between compile-time and run-time. When compiling, we can make a tradeoff between compile-time and run-time.
Faster compile times will result in fewer optimizations being applied, hence generally slower run-times. Faster compile times will result in fewer optimizations being applied, hence generally slower run-times.
For making this tradeoff when compiling, we provide a set of 4 optimization modes, 'o1' to 'o4', where 'o1' leads to fastest compile-time and 'o4' leads to fastest run-time in general. For making this tradeoff when compiling, we provide a set of 4 optimization modes, 'o1' to 'o4', where 'o1' leads to fastest compile-time and 'o4' leads to fastest run-time in general.
For an even faster run-time, we could disable assertions (which could be time comsuming) for valid user inputs, using the optimization mode 'unsafe', but this is, as the name suggests, unsafe. For an even faster run-time, we could disable assertions (which could be time consuming) for valid user inputs, using the optimization mode 'unsafe', but this is, as the name suggests, unsafe.
(Also see note at :ref:`unsafe_optimization`.) (Also see note at :ref:`unsafe_optimization`.)
.. note:: .. note::
...@@ -120,7 +120,7 @@ Optimization o4 o3 o2 ...@@ -120,7 +120,7 @@ Optimization o4 o3 o2
This optimization reorders such graphs so that all increments can be This optimization reorders such graphs so that all increments can be
done inplace. done inplace.
``inc_subensor(a,b,idx) + inc_subtensor(a,c,idx) -> inc_subtensor(inc_subtensor(a,b,idx),c,idx)`` ``inc_subtensor(a,b,idx) + inc_subtensor(a,c,idx) -> inc_subtensor(inc_subtensor(a,b,idx),c,idx)``
See :func:`local_IncSubtensor_serialize` See :func:`local_IncSubtensor_serialize`
...@@ -285,7 +285,7 @@ Optimization o4 o3 o2 ...@@ -285,7 +285,7 @@ Optimization o4 o3 o2
For the fastest possible Theano, this optimization can be enabled by For the fastest possible Theano, this optimization can be enabled by
setting ``optimizer_including=local_remove_all_assert`` which will setting ``optimizer_including=local_remove_all_assert`` which will
remove all assertions in the graph for checking user inputs are valid. remove all assertions in the graph for checking user inputs are valid.
Use this optimization if you are sure everthing is valid in your graph. Use this optimization if you are sure everything is valid in your graph.
See :ref:`unsafe_optimization` See :ref:`unsafe_optimization`
...@@ -279,7 +279,7 @@ Theano provides a 'Print' op to do this. ...@@ -279,7 +279,7 @@ Theano provides a 'Print' op to do this.
this is a very important value __str__ = [ 1. 2. 3.] this is a very important value __str__ = [ 1. 2. 3.]
Since Theano runs your program in a topological order, you won't have precise Since Theano runs your program in a topological order, you won't have precise
control over the order in which multiple ``Print()`` ops are evaluted. For a more control over the order in which multiple ``Print()`` ops are evaluated. For a more
precise inspection of what's being computed where, when, and how, see the discussion precise inspection of what's being computed where, when, and how, see the discussion
:ref:`faq_monitormode`. :ref:`faq_monitormode`.
...@@ -437,7 +437,7 @@ optimizations. The first is a speed optimization that merges elemwise ...@@ -437,7 +437,7 @@ optimizations. The first is a speed optimization that merges elemwise
operations together. This makes it harder to know which particular operations together. This makes it harder to know which particular
elemwise causes the problem. The second optimization makes some ops' elemwise causes the problem. The second optimization makes some ops'
outputs overwrite their inputs. So, if an op creates a bad output, you outputs overwrite their inputs. So, if an op creates a bad output, you
will not be able to see the input that was overwriten in the ``post_func`` will not be able to see the input that was overwritten in the ``post_func``
function. To disable those optimizations (with a Theano version after function. To disable those optimizations (with a Theano version after
0.6rc3), define the MonitorMode like this: 0.6rc3), define the MonitorMode like this:
...@@ -606,5 +606,5 @@ Then send us filename. ...@@ -606,5 +606,5 @@ Then send us filename.
Breakpoint during Theano function execution Breakpoint during Theano function execution
------------------------------------------- -------------------------------------------
You can set breakpoing during the execution of a Theano function with You can set a breakpoint during the execution of a Theano function with
:class:`PdbBreakpoint <theano.tests.breakpoint.PdbBreakpoint>`. :class:`PdbBreakpoint <theano.tests.breakpoint.PdbBreakpoint>`.
...@@ -347,7 +347,7 @@ RandomStream object (a random number generator) for each such ...@@ -347,7 +347,7 @@ RandomStream object (a random number generator) for each such
variable, and draw from it as necessary. We will call this sort of variable, and draw from it as necessary. We will call this sort of
sequence of random numbers a *random stream*. *Random streams* are at sequence of random numbers a *random stream*. *Random streams* are at
their core shared variables, so the observations on shared variables their core shared variables, so the observations on shared variables
hold here as well. Theanos's random objects are defined and implemented in hold here as well. Theano's random objects are defined and implemented in
:ref:`RandomStreams<libdoc_tensor_shared_randomstreams>` and, at a lower level, :ref:`RandomStreams<libdoc_tensor_shared_randomstreams>` and, at a lower level,
in :ref:`RandomStreamsBase<libdoc_tensor_raw_random>`. in :ref:`RandomStreamsBase<libdoc_tensor_raw_random>`.
......
...@@ -7,7 +7,7 @@ Frequently Asked Questions ...@@ -7,7 +7,7 @@ Frequently Asked Questions
How to update a subset of weights? How to update a subset of weights?
================================== ==================================
If you want to update only a subset of a weight matrix (such as If you want to update only a subset of a weight matrix (such as
some rows or some columns) that are used in the forward propogation some rows or some columns) that are used in the forward propagation
of each iteration, then the cost function should be defined in a way of each iteration, then the cost function should be defined in a way
that it only depends on the subset of weights that are used in that that it only depends on the subset of weights that are used in that
iteration. iteration.
......
...@@ -148,7 +148,7 @@ matrix which corresponds to the Jacobian. ...@@ -148,7 +148,7 @@ matrix which corresponds to the Jacobian.
Computing the Hessian Computing the Hessian
===================== =====================
In Theano, the term *Hessian* has the usual mathematical acception: It is the In Theano, the term *Hessian* has the usual mathematical meaning: It is the
matrix comprising the second order partial derivative of a function with scalar matrix comprising the second order partial derivative of a function with scalar
output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all output and vector input. Theano implements :func:`theano.gradient.hessian` macro that does all
that is needed to compute the Hessian. The following text explains how that is needed to compute the Hessian. The following text explains how
......
...@@ -14,7 +14,7 @@ CPU. ...@@ -14,7 +14,7 @@ CPU.
BLAS operation BLAS operation
============== ==============
BLAS is an interface for some mathematic operations between two BLAS is an interface for some mathematical operations between two
vectors, a vector and a matrix or two matrices (e.g. the dot product vectors, a vector and a matrix or two matrices (e.g. the dot product
between vector/matrix and matrix/matrix). Many different between vector/matrix and matrix/matrix). Many different
implementations of that interface exist and some of them are implementations of that interface exist and some of them are
......
...@@ -21,7 +21,7 @@ Most frequently, the cause would be that some of the hyperparameters, especially ...@@ -21,7 +21,7 @@ Most frequently, the cause would be that some of the hyperparameters, especially
learning rates, are set incorrectly. A high learning rate can blow up your whole learning rates, are set incorrectly. A high learning rate can blow up your whole
model into NaN outputs even within one epoch of training. So the first and model into NaN outputs even within one epoch of training. So the first and
easiest solution is try to lower it. Keep halving your learning rate until you easiest solution is try to lower it. Keep halving your learning rate until you
start to get resonable output values. start to get reasonable output values.
Other hyperparameters may also play a role. For example, are your training Other hyperparameters may also play a role. For example, are your training
algorithms involve regularization terms? If so, are their corresponding algorithms involve regularization terms? If so, are their corresponding
...@@ -73,7 +73,7 @@ chance that something is wrong with your algorithm. Go back to the mathematics ...@@ -73,7 +73,7 @@ chance that something is wrong with your algorithm. Go back to the mathematics
and find out if everything is derived correctly. and find out if everything is derived correctly.
Cuda Specific Option CUDA Specific Option
-------------------- --------------------
The Theano flag ``nvcc.fastmath=True`` can genarate NaN. Don't set The Theano flag ``nvcc.fastmath=True`` can genarate NaN. Don't set
...@@ -85,6 +85,6 @@ this flag while debugging NaN. ...@@ -85,6 +85,6 @@ this flag while debugging NaN.
NaN Introduced by AllocEmpty NaN Introduced by AllocEmpty
----------------------------------------------- -----------------------------------------------
AllocEmpty is used by many operation such as scan to allocate some memory without properly clearing it. The reason for that is that the allocated memory will subsequently be overwritten. However, this can sometimes introduce NaN depending on the operation and what was previously stored in the memory it is working on. For instance, trying to zero out memory using a multipication before applying an operation could cause NaN if NaN is already present in the memory, since `0 * NaN => NaN`. AllocEmpty is used by many operation such as scan to allocate some memory without properly clearing it. The reason for that is that the allocated memory will subsequently be overwritten. However, this can sometimes introduce NaN depending on the operation and what was previously stored in the memory it is working on. For instance, trying to zero out memory using a multiplication before applying an operation could cause NaN if NaN is already present in the memory, since `0 * NaN => NaN`.
Using ``optimizer_including=alloc_empty_to_zeros`` replaces `AllocEmpty` by `Alloc{0}`, which is helpful to diagnose where NaNs come from. Please note that when running in `NanGuardMode`, this optimizer is not included by default. Therefore, it might be helpful to use them both together. Using ``optimizer_including=alloc_empty_to_zeros`` replaces `AllocEmpty` by `Alloc{0}`, which is helpful to diagnose where NaNs come from. Please note that when running in `NanGuardMode`, this optimizer is not included by default. Therefore, it might be helpful to use them both together.
...@@ -57,7 +57,7 @@ Numpy does *broadcasting* of arrays of different shapes during ...@@ -57,7 +57,7 @@ Numpy does *broadcasting* of arrays of different shapes during
arithmetic operations. What this means in general is that the smaller arithmetic operations. What this means in general is that the smaller
array (or scalar) is *broadcasted* across the larger array so that they have array (or scalar) is *broadcasted* across the larger array so that they have
compatible shapes. The example below shows an instance of compatible shapes. The example below shows an instance of
*broadcastaing*: *broadcasting*:
>>> a = numpy.asarray([1.0, 2.0, 3.0]) >>> a = numpy.asarray([1.0, 2.0, 3.0])
>>> b = 2.0 >>> b = 2.0
......
...@@ -96,8 +96,8 @@ optimization, nor most other optimizations) or ``DebugMode`` (it will test ...@@ -96,8 +96,8 @@ optimization, nor most other optimizations) or ``DebugMode`` (it will test
before and after all optimizations (much slower)). before and after all optimizations (much slower)).
Specifing Exact Shape Specifying Exact Shape
===================== ======================
Currently, specifying a shape is not as easy and flexible as we wish and we plan some Currently, specifying a shape is not as easy and flexible as we wish and we plan some
upgrade. Here is the current state of what can be done: upgrade. Here is the current state of what can be done:
......
...@@ -66,14 +66,14 @@ and rows. They have both the same attributes: ``data``, ``indices``, ``indptr`` ...@@ -66,14 +66,14 @@ and rows. They have both the same attributes: ``data``, ``indices``, ``indptr``
sparse matrix. sparse matrix.
* The ``shape`` attribute is exactly the same as the ``shape`` attribute of a dense (i.e. generic) * The ``shape`` attribute is exactly the same as the ``shape`` attribute of a dense (i.e. generic)
matrix. It can be explicitly specified at the creation of a sparse matrix if it cannot be infered matrix. It can be explicitly specified at the creation of a sparse matrix if it cannot be inferred
from the first three attributes. from the first three attributes.
Which format should I use? Which format should I use?
-------------------------- --------------------------
At the end, the format does not affect the length of the ``data`` and ``indices`` attributes. They are both At the end, the format does not affect the length of the ``data`` and ``indices`` attributes. They are both
completly fixed by the number of elements you want to store. The only thing that changes with the format completely fixed by the number of elements you want to store. The only thing that changes with the format
is ``indptr``. In ``csc`` format, the matrix is compressed along columns so a lower number of columns will is ``indptr``. In ``csc`` format, the matrix is compressed along columns so a lower number of columns will
result in less memory use. On the other hand, with the ``csr`` format, the matrix is compressed along result in less memory use. On the other hand, with the ``csr`` format, the matrix is compressed along
the rows and with a matrix that have a lower number of rows, ``csr`` format is a better choice. So here is the rule: the rows and with a matrix that have a lower number of rows, ``csr`` format is a better choice. So here is the rule:
...@@ -83,7 +83,7 @@ the rows and with a matrix that have a lower number of rows, ``csr`` format is a ...@@ -83,7 +83,7 @@ the rows and with a matrix that have a lower number of rows, ``csr`` format is a
If shape[0] > shape[1], use ``csc`` format. Otherwise, use ``csr``. If shape[0] > shape[1], use ``csc`` format. Otherwise, use ``csr``.
Sometimes, since the sparse module is young, ops does not exist for both format. So here is Sometimes, since the sparse module is young, ops does not exist for both format. So here is
what may be the most relevent rule: what may be the most relevant rule:
.. note:: .. note::
......
...@@ -477,7 +477,7 @@ The following resources will assist you in this learning process: ...@@ -477,7 +477,7 @@ The following resources will assist you in this learning process:
* `practical issues <http://stackoverflow.com/questions/2392250/understanding-cuda-grid-dimensions-block-dimensions-and-threads-organization-s>`_ * `practical issues <http://stackoverflow.com/questions/2392250/understanding-cuda-grid-dimensions-block-dimensions-and-threads-organization-s>`_
(on the relationship between grids, blocks and threads; see also linked and related issues on same page) (on the relationship between grids, blocks and threads; see also linked and related issues on same page)
* `CUDA optimisation <http://www.gris.informatik.tu-darmstadt.de/cuda-workshop/slides.html>`_ * `CUDA optimization <http://www.gris.informatik.tu-darmstadt.de/cuda-workshop/slides.html>`_
* **PyCUDA: Introductory** * **PyCUDA: Introductory**
......
...@@ -36,7 +36,7 @@ The mapping from context names to devices is done through the ...@@ -36,7 +36,7 @@ The mapping from context names to devices is done through the
dev0->cuda0;dev1->cuda1 dev0->cuda0;dev1->cuda1
Let's break it down. First there is a list of mappings. Each of Let's break it down. First there is a list of mappings. Each of
these mappings is separeted by a semicolon ';'. There can be any these mappings is separated by a semicolon ';'. There can be any
number of such mappings, but in the example above we have two of them: number of such mappings, but in the example above we have two of them:
`dev0->cuda0` and `dev1->cuda1`. `dev0->cuda0` and `dev1->cuda1`.
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论