提交 ea3d6101 authored 作者: Rami Al-Rfou's avatar Rami Al-Rfou

Merge branch 'master' of https://github.com/Theano/Theano into grad_advinc_subtensor

.. _NEWS:
Updates in the Trunk since the last release:
=============
Release Notes
=============
Theano 0.6rc2 (November 21th, 2012)
===================================
Highlight:
* Fix a few regression inserted in 0.6rc1.
* A few new features.
* Speed up.
* Scan fix.
* Crash fix.
* A few small interface change.
Commiters for this rc2 only:
Razvan Pascanu
Pascal Lamblin
Frederic Bastien
Ian Goodfellow
Jeremiah Lowin
Caglar Gulcehre
Jey Kottalam
Matthew Rocklin
abalkin
Regression in 0.6rc1 fixed:
* Fix the scan gradient dtype issue. In 0.6rc1, some upcast where inserted. (Razvan P.)
* Now grad() will do as before the 0.6rc1 for float, i.e. the grad dtype will be the same as the inputs inside the graph. If you ask for the direct grad, it will return the computed dtype. (Pascal L.)
Wrong results fix:
* Scan fix in some case didn't returned the good results. (Razvan P., reported by Jeremiah L.)
This happen if you have a state with only neg tap and the outputs of the state is a function of some sequence.
If you have multiple state, there was no problem.
* Fixed bug in Scan with multiple outputs,
where one output would sometimes overwrite another one. (Razvan P.)
* Clip.grad treated the gradient with respect to the clipping boundary as always 0. (Ian G.)
Interface change:
* Now we do not support unaligned ndarray in python code. (Frederic B.)
We did not support it in c code and supporting it in python code made
the detection harder.
* Now we only support officialy scipy 0.7.2 and numpy 1.5.0 (Frederic B.)
We weren't and aren't testing with older version.
* The theano.sparse.SparseType is available even when scipy is not (Frederic B.)
* Fixes issue where members of consider_constant grad parameter
were treated differently from Constant variables. (Ian G.)
* Remove the parameter g_cost to theano.grad(). (Ian G.)
Use the new more powerfull parameter known_grads instead.
NumPy interface support:
* theano.tensor.where is an alias for theano.tensor.switch to support NumPy semantic. (Ian G.)
* TensorVariable objects now have dot, argmin, argmax, clip, conj, repeat, trace, std, round,
ravel and argsort functions and the real and imag properties as numpy.ndarray object.
The functionality was already available in Theano. (abalkin)
Speed up:
* A C version of the SoftMax op (Razvan P.)
There was c code for the softmax with bias code.
* Faster GpuIncSubtensor (Ian G.)
* Faster copy on the GPU for 4d tensor. (Ian G.)
* The fix of flatten infer_shape re-enable an optimization (Pascal L.)
* The bug was introduced in 0.6rc1.
* Enable inc_subtensor on the GPU when updating it with a float64 dtype. (Ian G.)
It was causing an optimization warning.
* Make DeepCopy reuse preallocated memory. (Frederic B.)
* Move then convolution to the GPU when the image shape and logical image shape differ. (Frederic Bastien)
* C code for the View Op (Razvan P., Pascal L.)
New Feature:
* Added a monitoring mode "MonitorMode" as a debugging tool. (Olivier D.)
* Allow integer axes when keepdims==True (Jeremiah Lowin)
* Add erfinv and erfcinv op. (Jey Kottalam)
* Added tensor.batched_dot(). (Caglar Gulcehre)
It use scan behind the scene, but making doing this easier.
* theano.get_constant_value(x) (Frederic B.)
This try to do have x as a constant int.
This do some constant folding to try to convert x into an int.
Used by some optimization.
* Add theano.tensor.io.{MPIRecv,MPIRecvWait,MPISend,MPISendWait} (Matthew Rocklin)
Theano do not automatically use them. It is up to you to use them and split your computation.
* Added theano.sandbox.linalg.eig (abalkin)
* Started some support for Python3 (abalkin)
setup.py support python3 now.
It call 2to3 during the setup.
Python3 not fully supported as we didn't update the c code.
Crash Fix:
* Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
* Fix an optimization warning. Now it get optimized. (Frederic B.)
* Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
* Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
* Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.)
Also implement the gradient on the min/max bound.
* Fix crash in the grad of tensor.switch for int (Ian G.)
* Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
* Fix crash as sometimes sparse.dot would return a different dtype number
that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
* Better error msg (Ian G.)
* Move all sparse random function back to sandbox as it don't have a state inside Theano. (Pascal L.)
They where moved outside the sandbox in 0.6rc1
* LoadFromDisk now is allowed to take only support some memmap mode. (Pascal L.)
Otherwise, this was causing errors, segmentation faults or wrong results.
* Fix import problem on PiCloud (Jeremiah Lowin)
* You need to use the c|py linker with the default
environment. Otherwise, you need to create your own environment.
* Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
* Better handling and error message of gradients on integer. (Ian G.)
* Fixes a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)
https://github.com/Theano/Theano/wiki/Devnews
Other:
* Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G., abalkin.
=============
Release Notes
......
.. _NEWS:
Updates in the Trunk since the last release:
=============
Release Notes
=============
Theano 0.6rc2 (November 21th, 2012)
===================================
Highlight:
* Fix a few regression inserted in 0.6rc1.
* A few new features.
* Speed up.
* Scan fix.
* Crash fix.
* A few small interface change.
Commiters for this rc2 only:
Razvan Pascanu
Pascal Lamblin
Frederic Bastien
Ian Goodfellow
Jeremiah Lowin
Caglar Gulcehre
Jey Kottalam
Matthew Rocklin
abalkin
Regression in 0.6rc1 fixed:
* Fix the scan gradient dtype issue. In 0.6rc1, some upcast where inserted. (Razvan P.)
* Now grad() will do as before the 0.6rc1 for float, i.e. the grad dtype will be the same as the inputs inside the graph. If you ask for the direct grad, it will return the computed dtype. (Pascal L.)
Wrong results fix:
* Scan fix in some case didn't returned the good results. (Razvan P., reported by Jeremiah L.)
This happen if you have a state with only neg tap and the outputs of the state is a function of some sequence.
If you have multiple state, there was no problem.
* Fixed bug in Scan with multiple outputs,
where one output would sometimes overwrite another one. (Razvan P.)
* Clip.grad treated the gradient with respect to the clipping boundary as always 0. (Ian G.)
Interface change:
* Now we do not support unaligned ndarray in python code. (Frederic B.)
We did not support it in c code and supporting it in python code made
the detection harder.
* Now we only support officialy scipy 0.7.2 and numpy 1.5.0 (Frederic B.)
We weren't and aren't testing with older version.
* The theano.sparse.SparseType is available even when scipy is not (Frederic B.)
* Fixes issue where members of consider_constant grad parameter
were treated differently from Constant variables. (Ian G.)
* Remove the parameter g_cost to theano.grad(). (Ian G.)
Use the new more powerfull parameter known_grads instead.
NumPy interface support:
* theano.tensor.where is an alias for theano.tensor.switch to support NumPy semantic. (Ian G.)
* TensorVariable objects now have dot, argmin, argmax, clip, conj, repeat, trace, std, round,
ravel and argsort functions and the real and imag properties as numpy.ndarray object.
The functionality was already available in Theano. (abalkin)
Speed up:
* A C version of the SoftMax op (Razvan P.)
There was c code for the softmax with bias code.
* Faster GpuIncSubtensor (Ian G.)
* Faster copy on the GPU for 4d tensor. (Ian G.)
* The fix of flatten infer_shape re-enable an optimization (Pascal L.)
* The bug was introduced in 0.6rc1.
* Enable inc_subtensor on the GPU when updating it with a float64 dtype. (Ian G.)
It was causing an optimization warning.
* Make DeepCopy reuse preallocated memory. (Frederic B.)
* Move then convolution to the GPU when the image shape and logical image shape differ. (Frederic Bastien)
* C code for the View Op (Razvan P., Pascal L.)
New Feature:
* Added a monitoring mode "MonitorMode" as a debugging tool. (Olivier D.)
* Allow integer axes when keepdims==True (Jeremiah Lowin)
* Add erfinv and erfcinv op. (Jey Kottalam)
* Added tensor.batched_dot(). (Caglar Gulcehre)
It use scan behind the scene, but making doing this easier.
* theano.get_constant_value(x) (Frederic B.)
This try to do have x as a constant int.
This do some constant folding to try to convert x into an int.
Used by some optimization.
* Add theano.tensor.io.{MPIRecv,MPIRecvWait,MPISend,MPISendWait} (Matthew Rocklin)
Theano do not automatically use them. It is up to you to use them and split your computation.
* Added theano.sandbox.linalg.eig (abalkin)
* Started some support for Python3 (abalkin)
setup.py support python3 now.
It call 2to3 during the setup.
Python3 not fully supported as we didn't update the c code.
Crash Fix:
* Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
* Fix an optimization warning. Now it get optimized. (Frederic B.)
* Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
* Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
* Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.)
Also implement the gradient on the min/max bound.
* Fix crash in the grad of tensor.switch for int (Ian G.)
* Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
* Fix crash as sometimes sparse.dot would return a different dtype number
that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
* Better error msg (Ian G.)
* Move all sparse random function back to sandbox as it don't have a state inside Theano. (Pascal L.)
They where moved outside the sandbox in 0.6rc1
* LoadFromDisk now is allowed to take only support some memmap mode. (Pascal L.)
Otherwise, this was causing errors, segmentation faults or wrong results.
* Fix import problem on PiCloud (Jeremiah Lowin)
* You need to use the c|py linker with the default
environment. Otherwise, you need to create your own environment.
* Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
* Better handling and error message of gradients on integer. (Ian G.)
* Fixes a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)
https://github.com/Theano/Theano/wiki/Devnews
Other:
* Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G., abalkin.
=============
Release Notes
......@@ -72,7 +183,7 @@ Deprecation:
This was a predecessor of SharedVariable with a less pythonic philosophy.
Interface changes:
* Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.8.
* Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.7.2.
* In Theano 0.5, we removed the deprecated sharedvar.value property.
Now we raise an error if you access it. (Frederic B.)
* theano.function does not accept duplicate inputs, so function([x, x], ...)
......
......@@ -53,7 +53,7 @@ copyright = '2008--2012, LISA lab'
# The short X.Y version.
version = '0.6'
# The full version, including alpha/beta/rc tags.
release = '0.6rc1'
release = '0.6rc2'
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
......
......@@ -249,6 +249,8 @@ following methods:
1) They must be Variable instances.
2) When they are types that have dtypes, they must never have an integer dtype.
The output gradients passed *to* Op.grad will also obey these constraints.
Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
NullType or zero gradient. When you have an integer as an argument to your grad method,
recall the definition of a derivative to help you decide what value to return:
......
......@@ -57,8 +57,8 @@ Theano also provides :func:`theano.printing.pydotprint` that creates a png image
The parameter in T.dscalar('x') in the first line is the name of this variable
in the graph. This name is used when printing the graph to make it more readable.
If no name is provided the variable x is printed as its type as. In this example
<TensorType(float64, scalar)>.
If no name is provided the variable x is printed as its type as returned by
x.type(). In this example - <TensorType(float64, scalar)>.
The name parameter can be any string. There are no naming restrictions:
in particular, you can have many variables with the same name.
......@@ -86,7 +86,7 @@ The line ``|x [@C`` means the variable named ``x`` with debugprint identifier
your graph, their different debugprint identifier will be your clue.
The line ``|TensorConstant{2.0} [@B]`` means that there is a constant 2.0
wit this debugprint identifier.
with this debugprint identifier.
The line ``Elemwise{mul,no_inplace} [@A] ''`` is indented less than
the other ones, because it means there is a variable computed by multiplying
......@@ -121,7 +121,7 @@ Elemwise{mul} [@A] ''
|Elemwise{mul} [@B] ''
|Elemwise{pow} [@C] ''
If the depth parameter is provided, it limits the nuber of levels that are
If the depth parameter is provided, it limits the number of levels that are
shown.
......
......@@ -14,7 +14,7 @@ own Theano code, and even (it happens) in Theano's internals, in
Isolating the Problem/Testing Theano Compiler
---------------------------------------------
You can run your Theano function in a :ref:`DebugMode<using_debugmode>`.
You can run your Theano function in a :ref:`DebugMode<using_debugmode>`.
This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.
......@@ -56,12 +56,12 @@ following example.
# compile and call the actual function
f = theano.function([x], h2)
f(numpy.random.rand(5, 10))
Running the above code generates the following error message:
.. code-block:: bash
Definition in:
Definition in:
File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 1102, in apply
lopt_change = self.process_node(fgraph, node, lopt)
File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 882, in process_node
......@@ -83,8 +83,8 @@ Running the above code generates the following error message:
thunk()
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/cc.py", line 1111, in execute
raise exc_type, exc_value, exc_trace
ValueError: ('Shape mismatch: x has 10 cols but y has 20 rows',
_dot22(x, <TensorType(float64, matrix)>), [_dot22.0],
ValueError: ('Shape mismatch: x has 10 cols but y has 20 rows',
_dot22(x, <TensorType(float64, matrix)>), [_dot22.0],
_dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4')
Needless to say, the above is not very informative and does not provide much in
......@@ -114,7 +114,7 @@ following error message, which properly identifies *line 23* as the culprit.
Traceback (most recent call last):
File "test2.py", line 23, in <module>
h1 = T.dot(x,func_of_W1)
h1 = T.dot(x,func_of_W1)
File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 360, in __call__
node.op.perform(node, input_vals, output_storage)
File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/basic.py", line 4458, in perform
......@@ -167,8 +167,8 @@ Theano provides a 'Print' op to do this.
Since Theano runs your program in a topological order, you won't have precise
control over the order in which multiple ``Print()`` ops are evaluted. For a more
precise inspection of what's being computed where, when, and how, see the discussion
:ref:`faq_wraplinker`.
precise inspection of what's being computed where, when, and how, see the discussion
:ref:`faq_monitormode`.
.. warning::
......@@ -196,7 +196,7 @@ You can read about them in :ref:`libdoc_printing`.
"The Function I Compiled is Too Slow, what's up?"
-------------------------------------------------
First, make sure you're running in ``FAST_RUN`` mode. Even though
First, make sure you're running in ``FAST_RUN`` mode. Even though
``FAST_RUN`` is the default mode, insist by passing ``mode='FAST_RUN'``
to ``theano.function`` (or ``theano.make``) or by setting :attr:`config.mode`
to ``FAST_RUN``.
......@@ -206,7 +206,7 @@ Second, try the Theano :ref:`using_profilemode`. This will tell you which
Tips:
* Use the flags ``floatX=float32`` to require type *float32* instead of *float64*;
* Use the flags ``floatX=float32`` to require type *float32* instead of *float64*;
Use the Theano constructors matrix(),vector(),... instead of dmatrix(), dvector(),...
since they respectively involve the default types *float32* and *float64*.
* Check in the ``profile`` mode that there is no ``Dot`` op in the post-compilation
......@@ -216,48 +216,79 @@ Tips:
of type *float64*.
.. _faq_wraplinker:
.. _faq_monitormode:
"How do I Step through a Compiled Function with the WrapLinker?"
----------------------------------------------------------------
"How do I Step through a Compiled Function?"
--------------------------------------------
This is not exactly a FAQ, but the doc is here for now...
It's pretty easy to roll-your-own evaluation mode.
Check out this one:
You can use ``MonitorMode`` to inspect the inputs and outputs of each
node being executed when the function is called. The code snipped below
shows how to print all inputs and outputs:
.. code-block:: python
class PrintEverythingMode(Mode):
def __init__(self):
def print_eval(i, node, fn):
print i, node, [input[0] for input in fn.inputs],
fn()
print [output[0] for output in fn.outputs]
wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()], [print_eval])
super(PrintEverythingMode, self).__init__(wrap_linker, optimizer='fast_run')
When you use ``mode=PrintEverythingMode()`` as the mode for ``Function`` or ``Method``,
then you should see [potentially a lot of] output. Every ``Apply`` node will be printed out,
along with its position in the graph, the arguments to the functions ``perform`` or
``c_code`` and the output it computed.
import theano
def inspect_inputs(i, node, fn):
print i, node, [input[0] for input in fn.inputs],
def inspect_outputs(i, node, fn):
print [output[0] for output in fn.outputs]
>>> x = T.dscalar('x')
>>> f = function([x], [5 * x], mode=PrintEverythingMode())
>>> f(3)
>>> # print: 0 Elemwise{mul,no_inplace}(5, x) [array(5, dtype=int8), array(3.0)] [array(15.0)]
>>> # print: [array(15.0)]
x = theano.tensor.dscalar('x')
f = theano.function([x], [5 * x],
mode=theano.compile.MonitorMode(
pre_func=inspect_inputs,
post_func=inspect_outputs))
f(3)
# The code will print the following:
# 0 Elemwise{mul,no_inplace}(TensorConstant{5.0}, x) [array(5.0), array(3.0)] [array(15.0)]
When using these ``inspect_inputs`` and ``inspect_outputs`` functions
with ``MonitorMode``, you should see [potentially a lot of] printed output.
Every ``Apply`` node will be printed out,
along with its position in the graph, the arguments to the functions ``perform`` or
``c_code`` and the output it computed.
Admittedly, this may be a huge amount of
output to read through if you are using big tensors... but you can choose to
put logic inside of the *print_eval* function that would, for example, print
add logic that would, for instance, print
something out only if a certain kind of op were used, at a certain program
position, or only if a particular value showed up in one of the inputs or outputs.
Use your imagination :)
A typical example is to detect when NaN values are added into computations, which
can be achieved as follows:
.. code-block:: python
import numpy
import theano
def detect_nan(i, node, fn):
for output in fn.outputs:
if numpy.isnan(output[0]).any():
print '*** NaN detected ***'
theano.printing.debugprint(node)
print 'Inputs : %s' % [input[0] for input in fn.inputs]
print 'Outputs: %s' % [output[0] for output in fn.outputs]
break
x = theano.tensor.dscalar('x')
f = theano.function([x], [theano.tensor.log(x) * x],
mode=theano.compile.MonitorMode(
post_func=detect_nan))
f(0) # log(0) * 0 = -inf * 0 = NaN
# The code above will print:
# *** NaN detected ***
# Elemwise{Composite{[mul(log(i0), i0)]}} [@A] ''
# |x [@B]
# Inputs : [array(0.0)]
# Outputs: [array(nan)]
.. TODO: documentation for link.WrapLinkerMany
This can be a really powerful debugging tool. Note the call to *fn* inside the call to
*print_eval*; without it, the graph wouldn't get computed at all!
How to Use pdb
--------------
......
......@@ -153,6 +153,13 @@ short name Full constructor
``ProfileMode`` ``compile.profilemode.ProfileMode()`` C implementations where available, all available graph transformations, print profile information.
================= =============================================================== ===============================================================================
.. Note::
For debugging purpose, there also exists a ``MonitorMode`` (which has no
short name). It can be used to step through the execution of a function:
see :ref:`the debugging FAQ<faq_monitormode>` for details.
Linkers
=======
......
......@@ -14,13 +14,13 @@ try:
except ImportError:
from distutils.core import setup
try:
from distutils.command.build_py import build_py_2to3 \
from distutils.command.build_py import build_py_2to3 \
as build_py
from distutils.command.build_scripts import build_scripts_2to3 \
from distutils.command.build_scripts import build_scripts_2to3 \
as build_scripts
except ImportError:
from distutils.command.build_py import build_py
from distutils.command.build_scripts import build_scripts
from distutils.command.build_py import build_py
from distutils.command.build_scripts import build_scripts
CLASSIFIERS = """\
......@@ -55,7 +55,7 @@ PLATFORMS = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
MAJOR = 0
MINOR = 6
MICRO = 0
SUFFIX = "rc1" # Should be blank except for rc's, betas, etc.
SUFFIX = "rc2" # Should be blank except for rc's, betas, etc.
ISRELEASED = False
VERSION = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX)
......
......@@ -21,6 +21,8 @@ from module import *
import debugmode # register DEBUG_MODE
from debugmode import DebugMode
from monitormode import MonitorMode
from profilemode import ProfileMode
from theano.compile.sharedvalue import shared, shared_constructor, SharedVariable
......
......@@ -55,9 +55,12 @@ class OpFromGraph(gof.Op):
if grad_depth > 0:
output_grads = [t() for t in self.output_types]
gd = G.grad_sources_inputs(zip(self.outputs, output_grads),
self.inputs)
gs = map(gd.get, self.inputs)
# OpFromGraph doesn't implement a connection_pattern, so for now we regard
# all inputs and outputs as connected. This will compute the right numerical
# value for the gradients but could fail to raise the disconnected inputs error
# in some cases.
gs = G.grad(cost=None, known_grads=dict(zip(self.outputs, output_grads)),
wrt=self.inputs, disconnected_inputs='ignore')
self.grad_ops = []
for g in gs:
if g is None:
......
# Note: this code was initially copied from the 'pyutools' package by its
# original author, and re-licensed under Theano's license.
import theano
from theano.compile.mode import Mode
class MonitorMode(Mode):
"""
`MonitorMode` is a debug mode to easily step through function execution.
Its default behavior is to behave like the 'FAST_RUN' mode. By providing
either a `pre_func` (called before a node is executed) or a `post_func`
(called after a node is executed) monitoring function, the user can inspect
node behavior.
A typical use case is to detect the introduction of NaN values in a graph.
For an example of such a use case, see doc/tutorial/debug_faq.txt.
"""
def __init__(self, pre_func=None, post_func=None, optimizer='fast_run'):
"""
Constructor.
:param pre_func: A function to call before executing a thunk, with
arguments:
- the thunk index
- the Apply node
- the thunk to be called
:param post_func: A function to call after executing a thunk, with the
same three arguments as `pre_func`.
:param optimizer: The optimizer to use. One may use for instance
'fast_compile' to skip optimizations.
"""
self.pre_func = pre_func
self.post_func = post_func
wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()],
[self.eval])
super(MonitorMode, self).__init__(wrap_linker, optimizer=optimizer)
def eval(self, i, node, fn):
"""
The method that calls the thunk `fn`.
"""
if self.pre_func is not None:
self.pre_func(i, node, fn)
fn()
if self.post_func is not None:
self.post_func(i, node, fn)
import numpy
import theano
def test_detect_nan():
"""
Test the code snippet example that detects NaN values.
"""
nan_detected = [False]
def detect_nan(i, node, fn):
for output in fn.outputs:
if numpy.isnan(output[0]).any():
print '*** NaN detected ***'
theano.printing.debugprint(node)
print 'Inputs : %s' % [input[0] for input in fn.inputs]
print 'Outputs: %s' % [output[0] for output in fn.outputs]
nan_detected[0] = True
break
x = theano.tensor.dscalar('x')
f = theano.function([x], [theano.tensor.log(x) * x],
mode=theano.compile.MonitorMode(
post_func=detect_nan))
f(0) # log(0) * 0 = -inf * 0 = NaN
assert nan_detected[0]
......@@ -13,9 +13,11 @@ import warnings
_logger = logging.getLogger('theano.gradient')
import numpy # for numeric_grad
np = numpy
import theano
from itertools import izip
from theano import gof
from theano.gof import Variable
from theano.gof.python25 import all
......@@ -317,9 +319,6 @@ def Lop(f, wrt, eval_points, consider_constant=None,
coordinates of the tensor element in the last
If `f` is a list/tuple, then return a list/tuple with the results.
"""
if consider_constant is None:
consider_constant = []
if type(eval_points) not in (list, tuple):
eval_points = [eval_points]
......@@ -333,50 +332,15 @@ def Lop(f, wrt, eval_points, consider_constant=None,
f = list(f)
grads = list(eval_points)
for elem in consider_constant:
assert elem not in f
f.append(elem)
grads.append(elem.zeros_like())
if not isinstance(wrt, (list, tuple)):
wrt = [wrt]
arg1 = zip(f, eval_points)
arg2 = list(wrt)
gmap = grad_sources_inputs(
arg1,
arg2)
# Note : If p is not in gmap there can be several reasons, among which
# is the fact that p might not be part of the computational graph. A
# simple example is that for a+b for e.g. a[0] is not part of the graph,
# so Theano does not know how to compute TT.grad(TT.sum(a+b), a[0])
# such subtle cases can be fixed by a more careful implementation of the
# gradient, but for now Theano needs to throw an exception, and make the
# user aware that it does not know how to compute that gradient
ret = []
for p in wrt:
if p in gmap:
ret.append(gmap[p])
else:
message = (
"Lop method was asked to compute the gradient "
"with respect to a variable that is not part of "
"the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % p)
if disconnected_inputs == 'ignore':
pass
elif disconnected_inputs == 'warn':
warnings.warn(message, stacklevel=1)
elif disconnected_inputs == 'raise':
raise ValueError(message)
else:
raise ValueError(
"Invalid value for keyword "
"'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.")
ret.append(p.zeros_like())
assert len(f) == len(grads)
known = dict(izip(f, grads))
ret = grad(cost=None, known_grads=known,
consider_constant=consider_constant, wrt=wrt,
disconnected_inputs=disconnected_inputs)
return format_as(using_list, using_tuple, ret)
......@@ -385,14 +349,13 @@ def Lop(f, wrt, eval_points, consider_constant=None,
# Gradient
#########################
def grad(cost, wrt, g_cost=None, consider_constant=None,
disconnected_inputs='raise', add_names=True):
def grad(cost, wrt, consider_constant=None,
disconnected_inputs='raise', add_names=True,
known_grads=None, return_disconnected='zero'):
"""
:type cost: Scalar (0-dimensional) Variable.
May optionally be None if known_grads is provided.
:type wrt: Variable or list of Variables.
:type g_cost: Scalar Variable, or None.
:param g_cost: an expression for the gradient through cost. The default is
``ones_like(cost)``.
:param consider_constant: a list of expressions not to backpropagate
through
......@@ -402,13 +365,27 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,
(or if all links are non-differentiable). The possible values are:
- 'ignore': considers that the gradient on these parameters is zero.
- 'warn': consider the gradient zero, and print a warning.
- 'raise': raise an exception.
- 'raise': raise DisconnectedInputError.
:type add_names: bool
:param add_names: If True, variables generated by grad will be named
(d<cost.name>/d<wrt.name>) provided that both cost and wrt have
names
:type known_grads: dict
:param known_grads: If not None, a dictionary mapping variables to their
gradients. This is useful in the case where you know the
gradient on some variables but do not know the original
cost.
:type return_disconnected: string
:param return_disconnected:
'zero' : If wrt[i] is disconnected, return value i will be
wrt[i].zeros_like()
'None' : If wrt[i] is disconnected, return value i will be
None
'Disconnected' : returns variables of type DisconnectedType
:rtype: Variable or list/tuple of Variables (depending upon `wrt`)
:return: symbolic expression of gradient of `cost` with respect to `wrt`.
......@@ -422,29 +399,17 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,
if tensor is None:
from theano import tensor
if isinstance(cost.type, NullType):
if cost is None:
assert known_grads is not None
if cost is not None and isinstance(cost.type, NullType):
raise ValueError("Can't differentiate a NaN cost."
"cost is NaN because " + \
cost.type.why_null)
if cost.ndim != 0:
if cost is not None and cost.ndim != 0:
raise TypeError("cost must be a scalar.")
if consider_constant is None:
consider_constant = []
else:
# error checking on consider_constant: verify that it is a collection
# of theano variables
# this is important, if someone accidentally passes a nested data
# structure with theano variables at the leaves, only the root will
# be properly considered constant
if not hasattr(consider_constant, '__iter__'):
raise TypeError('consider_constant must be an iterable collection,'
' got ' + str(type(consider_constant)))
for elem in consider_constant:
if not isinstance(elem, gof.Variable):
raise TypeError('Elements of consider_constant must be '
'variables, but got ' + str(type(elem)))
if isinstance(wrt, set):
raise TypeError("wrt must not be a set. sets have no defined "
......@@ -461,83 +426,99 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,
raise TypeError("Expected Variable, got " + str(elem) +
" of type "+str(type(elem)))
var_to_node_to_idx = _populate_var_to_node_to_idx([cost], wrt)
outputs = []
if cost is not None:
outputs.append(cost)
if known_grads is not None:
outputs.extend(known_grads.keys())
var_to_node_to_idx = _populate_var_to_node_to_idx(
outputs, wrt, consider_constant)
# build a dict mapping var to the gradient of cost with respect to var
grad_dict = {}
# The gradient of the cost should default to 1 if the cost is of a
# continuous dtype (float, for the moment, as complex are unsupported),
# and should always be 0 if the cost is of discrete (integer) dtype.
if getattr(cost.type, 'dtype', None) not in tensor.float_dtypes:
if g_cost is not None:
try:
cval = theano.get_constant_value(g_cost)
if cval == 0:
g_cost_is_zero = True
else:
g_cost_is_zero = False
except TypeError:
g_cost_is_zero = False
if not g_cost_is_zero:
raise ValueError("The gradient of a cost of non-continuous "
"dtype (here, %s), if it is defined, should be 0. "
"However, a value of %s was provided in the 'g_cost' "
"argument of theano.grad(). To remove this error, "
"you can simply omit the 'g_cost' argument, or "
"give it the default value of None." % (
getattr(g_cost.type, 'dtype', 'no dtype defined'),
g_cost))
g_cost = tensor.zeros_like(cost)
elif g_cost is None:
# cost.type.dtype is in tensor.float_dtypes at that point
g_cost = tensor.ones_like(cost)
if known_grads is None:
known_grads = {}
else:
# Cast the provided gradient so that it has the same dtype
# as the cost.
g_cost = g_cost.astype(cost.type.dtype)
# The gradient of the cost is 1 unless specified otherwise by known_grads.
if cost is not None:
if cost in known_grads:
g_cost = known_grads[cost]
else:
g_cost = _float_ones_like(cost)
# g_cost may be Disconnected or NullType. A creative use of the function,
# sure, but nonetheless one we can and should support. So before we try
# to cast it make sure it even has a dtype
if hasattr(g_cost.type, 'dtype') and cost.type.dtype not in tensor.discrete_dtypes:
# Here we enforce the constraint that floating point variables have
# the same dtype as their gradient.
g_cost = g_cost.astype(cost.type.dtype)
# DO NOT enforce g_cost to be 0 if cost is an integer.
# This is to be enforced by the Op.grad method for the Op that outputs cost.
assert g_cost not in tensor.discrete_dtypes
grad_dict[cost] = g_cost
for var in known_grads:
g_var = known_grads[var]
if not hasattr(g_var, 'type'):
raise TypeError('output grads must be theano variables.'
'Ambiguous whether %s should be made into tensor'
' or sparse theano variable' % str(type(g_var)))
grad_dict[cost] = g_cost
if not isinstance(g_var.type, (NullType, DisconnectedType)) and 'float' \
not in str(g_var.type.dtype):
raise TypeError("Gradients must always be NullType, "
"DisconnectedType, or continuous, but grad was "
"given a known_grad of type "+str(g_var.type))
# the gradient of the constants is 0
for const in consider_constant:
grad_dict[const] = DisconnectedType()()
# DO NOT check that these gradients are equal to 0 if var is int
# The gradient is allowed to be non-zero on var in that case
# Ops outputing var should not backpropagate its gradient further
# but that is enforced elsewhere (grep for only_connected_to_int)
# variables that do not influence the cost have zero gradient.
# if wrt is such a variable, populate the grad_dict with this info
# so that wrt not being in var_to_node_to_idx won't cause an error below
# according to the flag, possibly raise an error if wrt is disconnected
for elem in wrt:
if elem not in var_to_node_to_idx and elem is not cost:
grad_dict[var] = g_var
def handle_disconnected(var):
message = ("grad method was asked to compute the gradient "
"with respect to a variable that is not part of "
"the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % elem)
"only by a non-differentiable operator: %s" % var)
if disconnected_inputs == 'ignore':
pass
elif disconnected_inputs == 'warn':
warnings.warn(message, stacklevel=2)
elif disconnected_inputs == 'raise':
raise ValueError(message)
raise DisconnectedInputError(message)
else:
raise ValueError("Invalid value for keyword "
"'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.")
# variables that do not influence the cost have zero gradient.
# if wrt is such a variable, populate the grad_dict with this info
# so that wrt not being in var_to_node_to_idx won't cause an error below
# according to the flag, possibly raise an error if wrt is disconnected
for elem in wrt:
if elem not in var_to_node_to_idx and elem is not cost \
and elem not in grad_dict:
handle_disconnected(elem)
grad_dict[elem] = DisconnectedType()()
cost_name = None
if add_names:
if add_names and cost is not None:
cost_name = cost.name
# Make sure we didn't initialize the grad_dict with any ints
# for non-int outputs
# The gradient may NEVER be an int, even if the variable is an int.
# Read the Op contract and talk to Ian Goodfellow before changing this!
for var in grad_dict:
g = grad_dict[var]
if (hasattr(g.type, 'dtype') and
getattr(var.type, 'dtype', '') in tensor.float_dtypes):
if hasattr(g.type, 'dtype'):
assert g.type.dtype in tensor.float_dtypes
rval = _populate_grad_dict(var_to_node_to_idx,
......@@ -545,7 +526,13 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,
for i in xrange(len(rval)):
if isinstance(rval[i].type, DisconnectedType):
rval[i] = _float_zeros_like(wrt[i])
handle_disconnected(rval[i])
if return_disconnected == 'zero':
rval[i] = _float_zeros_like(wrt[i])
elif return_disconnected == 'None':
rval[i] = None
else:
assert return_disconnected == 'Disconnected'
if using_tuple:
rval = tuple(rval)
......@@ -592,15 +579,18 @@ def _node_to_pattern(node):
return connection_pattern
def _populate_var_to_node_to_idx(outputs, wrt):
def _populate_var_to_node_to_idx(outputs, wrt, consider_constant):
"""
Common code shared between grad and grad_sources_inputs
Helper function for grad function.
outputs: a list of variables we want to take gradients of
wrt: a list of variables we want to take the gradient with
respect to.
consider_constant: a list of variables not to backpropagate
through.
returns:
var_to_app_to_idx:
......@@ -622,8 +612,30 @@ def _populate_var_to_node_to_idx(outputs, wrt):
This set is exactly the set of variables that connect
the variables in wrt to the cost being differentiated.
(A variable in consider_constant is not a function of
anything)
"""
# Validate and format consider_constant
if consider_constant is None:
consider_constant = []
else:
# error checking on consider_constant: verify that it is a collection
# of theano variables
# this is important, if someone accidentally passes a nested data
# structure with theano variables at the leaves, only the root will
# be properly considered constant
try:
iter(consider_constant)
except TypeError:
raise TypeError('consider_constant must be an iterable collection,'
' got ' + str(type(consider_constant)))
for elem in consider_constant:
if not isinstance(elem, gof.Variable):
raise TypeError('Elements of consider_constant must be '
'variables, but got ' + str(type(elem)))
# var_to_app_to_idx[var][node] = [i,j] means node has
# var as input at positions i and j
var_to_app_to_idx = {}
......@@ -638,9 +650,17 @@ def _populate_var_to_node_to_idx(outputs, wrt):
accounted_for = set([])
def account_for(var):
# Don't visit the same variable twice
if var in accounted_for:
return
accounted_for.add(var)
# Constants are not a function of anything
if var in consider_constant:
return
# Recursively add the variables that this variable is
# a function of.
if var.owner is not None:
app = var.owner
......@@ -699,11 +719,22 @@ def _populate_var_to_node_to_idx(outputs, wrt):
return var_to_app_to_idx
class NullTypeGradError(TypeError):
"""
Raised when grad encounters a NullType.
"""
class DisconnectedInputError(ValueError):
"""
Raised when grad is asked to compute the gradient
with respect to a disconnected input and
disconnected_inputs='raise'.
"""
def _populate_grad_dict(var_to_node_to_idx,
grad_dict, wrt, cost_name=None):
"""
Common code shared between grad_sources_inputs and grad
Helper function for grad function.
var_to_node_to_idx: a dictionary mapping a variable to
a second dictionary.
......@@ -711,14 +742,12 @@ def _populate_grad_dict(var_to_node_to_idx,
this variable to the variable's index in the apply
node's input list
grad_dict: a dictionary mapping variables to their gradients
should be populated by grad or grad_sources_inputs
grad should set gradients to DisconnectedType()() for
variables to be considered constant, set the
gradient for the cost variable to g_cost, etc.
both should set the gradient for disconnected
grad_dict: A dictionary mapping variables to their gradients.
Should be populated by grad function, which should:
-Set the gradient with respect to the cost to 1
-Load all gradients from known_grads, possibly overriding
the cost
-Set the gradient for disconnected
inputs to a variable with type DisconnectedType()
wrt: the minimal set of variables that must be included in grad_dict
......@@ -757,8 +786,42 @@ def _populate_grad_dict(var_to_node_to_idx,
input_to_outputs in connection_pattern
]
if True in inputs_connected:
# At least one input of this op is connected to the cost so we must
#List of bools indicating if each output is an integer dtype
output_is_int = [hasattr(output.type, 'dtype') and
output.type.dtype in theano.tensor.discrete_dtypes
for output in node.outputs]
#List of bools indicating if each output is NullType
ograd_is_nan = [isinstance(output.type, NullType)
for output in output_grads]
# List of bools indicating if each input only has NullType outputs
only_connected_to_nan = [(True not in
[in_to_out and out_to_cost and not out_nan
for in_to_out, out_to_cost, out_nan in
zip(in_to_outs, outputs_connected, ograd_is_nan)])
for in_to_outs in connection_pattern]
if True not in inputs_connected:
# All outputs of this op are disconnected so we can skip
# Calling the op's grad method and report that the inputs
# are disconnected
# (The op's grad method could do this too, but this saves the
# implementer the trouble of worrying about this case)
input_grads = [DisconnectedType()() for ipt in inputs]
elif False not in only_connected_to_nan:
# All inputs are only connected to nan gradients, so we don't
# need to bother calling the grad method. We know the gradient
# with respect to all connected inputs is nan.
input_grads = []
for connected in inputs_connected:
if connected:
input_grads.append(NullType()())
else:
input_grads.append(DisconnectedType()())
else:
# At least one input of this op is connected to the cost so and
# not all output gradients are undefined so we must
# call the op's grad method
# Each Op's grad function requires inputs and output_grads
......@@ -779,38 +842,46 @@ def _populate_grad_dict(var_to_node_to_idx,
inputs = [try_to_copy_if_needed(ipt) for ipt in inputs]
# Build a list of output gradients with the same dtype as
# the corresponding output variable.
# If an output is of a float dtype, we want to cast the
# output gradient into the same dtype, to avoid having a
# gradient graph with double precision (taking more memory,
# and more computation).
# If an output is of an integer dtype, then we ensure the
# output gradient is zero, and that zero can be represented
# in the same int dtype.
# If an output gradient is a NullType or DisconnectedType,
# then it will not have a dtype, and it will not be changed.
# If an output is of an integer dtype, then we just leave it
# alone.
# DO NOT force integer variables to have zero grad. This causes
# bugs where we fail to detect disconnected or undefined gradients.
# DO NOT force integer variables to have integer dtype. This is
# a violation of the op contract.
new_output_grads = []
for o, og in zip(node.outputs, output_grads):
o_dt = getattr(o.type, 'dtype', None)
og_dt = getattr(og.type, 'dtype', None)
if og_dt and o_dt in theano.tensor.discrete_dtypes:
new_output_grads.append(o.zeros_like())
elif o_dt and og_dt and o_dt != og_dt:
if o_dt not in theano.tensor.discrete_dtypes and og_dt and o_dt != og_dt:
new_output_grads.append(og.astype(o_dt))
else:
new_output_grads.append(og)
# Make sure that, if new_output_grads[i] has a dtype:
# - it is the same dtype as outputs[i]
# - if the dtype is an int, then new_output_grads[i] is 0.
# Make sure that, if new_output_grads[i] has a floating point dtype,
# it is the same dtype as outputs[i]
for o, ng in zip(node.outputs, new_output_grads):
o_dt = getattr(o.type, 'dtype', None)
ng_dt = getattr(ng.type, 'dtype', None)
if ng_dt:
if ng_dt is not None and o_dt not in theano.tensor.discrete_dtypes:
assert ng_dt == o_dt
if ng_dt in theano.tensor.discrete_dtypes:
assert theano.get_constant_value(ng) == 0
# Someone who had obviously not read the Op contract tried
# to modify this part of the function.
# If you ever think it is a good idea to make an integer
# valued gradient, please
# 1) Read the Op contract again
# 2) Talk to Ian Goodfellow
# (Both of these sources will tell you not to do it)
for ng in new_output_grads:
assert getattr(ng.type, 'dtype', None) not in theano.tensor.discrete_dtypes
input_grads = node.op.grad(inputs, new_output_grads)
......@@ -821,13 +892,6 @@ def _populate_grad_dict(var_to_node_to_idx,
if len(input_grads) != len(inputs):
raise ValueError(("%s returned the wrong number of" +\
" gradient terms.") % str(node.op))
else:
# All outputs of this op are disconnected so we can skip
# Calling the op's grad method and report that the inputs
# are disconnected
# (The op's grad method could do this too, but this saves the
# implementer the trouble of worrying about this case)
input_grads = [DisconnectedType()() for ipt in inputs]
# must convert to list in case the op returns a tuple
# we won't be able to post-process out the Nones if it does that
......@@ -835,18 +899,15 @@ def _populate_grad_dict(var_to_node_to_idx,
# Do type checking on the result
#List of bools indicating if each output is an integer dtype
output_is_int = [hasattr(output.type, 'dtype') and
output.type.dtype in theano.tensor.discrete_dtypes
for output in node.outputs]
#List of bools indicating if each input only has integer outputs
# List of bools indicating if each input only has integer outputs
only_connected_to_int = [(True not in
[in_to_out and out_to_cost and not out_int
for in_to_out, out_to_cost, out_int in
zip(in_to_outs, outputs_connected, output_is_int)])
for in_to_outs in connection_pattern]
for i, term in enumerate(input_grads):
# Disallow Nones
......@@ -863,6 +924,7 @@ def _populate_grad_dict(var_to_node_to_idx,
'the grad_undefined or grad_unimplemented helper '
'functions.') % node.op)
if not isinstance(term.type,
(NullType, DisconnectedType)):
if term.type.dtype not in theano.tensor.float_dtypes:
......@@ -870,19 +932,18 @@ def _populate_grad_dict(var_to_node_to_idx,
' returned an integer-valued variable.'
' (Input index %d, dtype %s)' % (i,
term.type.dtype))
if only_connected_to_nan[i]:
assert isinstance(term.type, NullType)
if only_connected_to_int[i]:
# This term has only integer outputs and we know
# it's not undefined or disconnected
# The only other valid thing it can be is 0
no_constant_value = True
try:
constant_value = theano.get_constant_value(term)
no_constant_value = False
except TypeError:
pass
if no_constant_value:
is_zero = _is_zero(term)
assert is_zero in ['yes', 'no', 'maybe']
if is_zero == 'maybe':
msg = "%s.grad returned %s of type %s for input"
msg += " %d. This input's only connections to "
msg += "the cost through this op are via "
......@@ -896,8 +957,7 @@ def _populate_grad_dict(var_to_node_to_idx,
msg = msg % (str(node.op), str(term),
str(type(term)), i)
raise ValueError(msg)
if constant_value != 0:
if is_zero == 'no':
msg = "%s.grad returned %s of type %s for input"
msg += " %d. Since this input is only connected "
msg += "to integer-valued outputs, it should "
......@@ -905,7 +965,7 @@ def _populate_grad_dict(var_to_node_to_idx,
msg += "%s."
msg % (str(node.op), str(term), str(type(term)),
i, str(constant_value))
i, str(theano.get_constant_value(term)))
raise ValueError(msg)
......@@ -961,7 +1021,7 @@ def _populate_grad_dict(var_to_node_to_idx,
type(term)))
if isinstance(term.type, NullType):
raise TypeError("tensor.grad "
raise NullTypeGradError("tensor.grad "
"encountered a NaN. " +\
term.type.why_null)
......@@ -997,113 +1057,6 @@ def _populate_grad_dict(var_to_node_to_idx,
return rval
def grad_sources_inputs(sources, graph_inputs):
"""
Used to compute the gradient of a cost with respect to all the
variables between graph_input and cost, but in the special
case where you don't know the cost, you only know its gradient
on a set of intermediate values.
A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
``v``. More specifically, ``g_v`` is the gradient of an external
scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
This function traverses the graph backward from the ``r`` sources,
calling ``op.grad(...)`` for all ops with some non-None gradient
on an output, to compute gradients of ``cost`` wrt intermediate
variables and ``graph_inputs``.
The ``op.grad(...)`` functions are called like this:
.. code-block:: python
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to ``op.grad`` should return a list or tuple: one symbolic
gradient per input. These gradients represent the gradients of
the same implicit ``cost`` mentionned above, wrt ``op.inputs``. Note
that this is **not** the same as the gradient of ``op.outputs`` wrt
``op.inputs``.
If ``op`` has a single input, then ``op.grad`` should return a list
or tuple of length 1.
For each input wrt to which ``op`` is not differentiable, it should
return ``None`` instead of a `Variable` instance.
If a source ``r`` receives a gradient from another source ``r2``,
then the effective gradient on ``r`` is the sum of both gradients.
:type sources: list of pairs of Variable: (v, gradient-on-v) to
initialize the total_gradient dictionary
:param sources: gradients to back-propagate using chain rule
:type graph_inputs: list of Variable
:param graph_inputs: variables considered to be constant
(do not backpropagate through them)
:rtype: dictionary whose keys and values are of type Variable
:return: mapping from each Variable encountered in the backward
traversal to the gradient with respect to that Variable.
It is assumed that there is some objective J shared between all members of
sources, so that for each v, gradient-on-v is the gradient of J with
respect to v
"""
outputs, output_grads = zip(*sources)
for output_grad in output_grads:
if not hasattr(output_grad, 'type'):
raise TypeError('output grads must be theano variables.'
'Ambiguous whether %s should be made into tensor'
' or sparse theano variable' % str(type(output_grad)))
if graph_inputs is None:
graph_inputs = gof.graph.inputs(outputs)
wrt = graph_inputs
var_to_node_to_idx = _populate_var_to_node_to_idx(outputs, wrt)
# build a dict mapping var to the gradient of cost with respect to var
grad_dict = {}
for output, output_grad in sources:
# The gradient of the cost should always be 0 if the cost is of
# discrete (integer) dtype.
if getattr(output.type, 'dtype', '') not in theano.tensor.float_dtypes:
output_grad = output.zeros_like()
else:
# Cast the provided gradient so that it has the same dtype
# as the cost.
output_grad = output_grad.astype(output.type.dtype)
grad_dict[output] = output_grad
# variables that do not influence the cost have zero gradient.
# if wrt is such a variable, populate the grad_dict with this info
# so that wrt not being in var_to_node_to_idx won't cause an error below
# according to the flag, possibly raise an error if wrt is disconnected
for elem in wrt:
if elem not in var_to_node_to_idx and elem not in outputs:
grad_dict[elem] = DisconnectedType()()
_populate_grad_dict(var_to_node_to_idx,
grad_dict, wrt)
# post-process out the DisconnectedTypes
for key in grad_dict:
if isinstance(grad_dict[key].type, DisconnectedType):
if hasattr(key, 'zeros_like'):
grad_dict[key] = _float_zeros_like(key)
return grad_dict
def _float_zeros_like(x):
""" Like zeros_like, but forces the object to have a
a floating point dtype """
......@@ -1634,3 +1587,32 @@ def hessian(cost, wrt, consider_constant=None,
"script that generated the error)")
hessians.append(hess)
return format_as(using_list, using_tuple, hessians)
def _is_zero(x):
"""
Returns 'yes', 'no', or 'maybe' indicating whether x
is always 0.
'maybe' means that x is an expression that is complicated enough
that we can't tell that it simplifies to 0.
"""
if not hasattr(x, 'type'):
return np.all(x == 0.)
if isinstance(x.type, NullType):
return 'no'
if isinstance(x.type, DisconnectedType):
return 'yes'
no_constant_value = True
try:
constant_value = theano.get_constant_value(x)
no_constant_value = False
except TypeError:
pass
if no_constant_value:
return 'maybe'
if constant_value != 0.:
return 'no'
return 'yes'
......@@ -40,7 +40,7 @@ def debugprint(obj, depth=-1, print_type=False,
:type depth: integer
:param depth: print graph to this depth (-1 for unlimited)
:type print_type: boolean
:param print_type: wether to print the type of printed objects
:param print_type: whether to print the type of printed objects
:type file: None, 'str', or file-like object
:param file: print to this file ('str' means to return a string)
:type ids: str
......@@ -531,11 +531,11 @@ def pydotprint(fct, outfile=None,
label each edge between an input and the Apply node with the
input's index.
green boxes are inputs variables to the graph
blue boxes are outputs variables of the graph
grey boxes are variables that are not outputs and are not used
Green boxes are inputs variables to the graph,
blue boxes are outputs variables of the graph,
grey boxes are variables that are not outputs and are not used,
red ellipses are transfers from/to the gpu (ops with names GpuFromHost,
HostFromGpu)
HostFromGpu).
"""
if colorCodes is None:
......
......@@ -221,7 +221,8 @@ class Scan(PureOp):
'following error has been encountered: The '
'%s %s (argument number %d) has dtype '
'%s and %d dimension(s). The corresponding slice %s '
'however has dtype %s and %d dimension(s). This '
'however has dtype %s and %d dimension(s) (it should '
'have the same dtype and one fewer dimensions). This '
'should never happen, please '
'report to theano-dev mailing list'
)
......@@ -1261,11 +1262,9 @@ class Scan(PureOp):
if x in diff_inputs]
for x in consider_inps:
try:
_gmp = gradient.grad_sources_inputs(
[(y, g_y)],
[x])
gmp[x] = _gmp[x]
except TypeError:
gmp[x] = gradient.grad(cost=None,
known_grads={y: g_y}, wrt=x)
except gradient.NullTypeGradError:
# It means the gradient is undefined (which implies
# is connected)
gmp[x] = x
......@@ -1374,11 +1373,21 @@ class Scan(PureOp):
self.inner_nitsot_outs(self_outputs))
def compute_gradient(y, g_y):
gmp = gradient.grad_sources_inputs(
[(y, g_y)],
[x for x in theano.gof.graph.inputs([y])
if x in diff_inputs])
return [gmp.get(p, None) for p in diff_inputs]
if 'int' in str(g_y.dtype):
raise TypeError("Gradients may never be integers but g_y "
"has type "+str(g_y.type))
wrt = [x for x in theano.gof.graph.inputs([y])
if x in diff_inputs]
grads = gradient.grad(
cost = None,
known_grads = {y : g_y },
wrt=wrt, consider_constant=wrt,
disconnected_inputs='ignore',
return_disconnected='None')
gmp = dict(zip(wrt, grads))
rval = [gmp.get(p, None) for p in diff_inputs]
return rval
dC_dinps_t = [None for inp in diff_inputs]
disconnected_dC_dinps_t = [True for inp in diff_inputs]
dC_dXts = []
......
......@@ -464,13 +464,27 @@ def _allclose(a, b, rtol=None, atol=None):
return numpy.allclose(a, b, atol=atol_, rtol=rtol_)
class NotConstantError(TypeError):
"""
Raised by get_constant_value if called on something that is
not constant.
For now it is a TypeError, to maintain the old interface
that get_constant_value should raise a TypeError in this
situation. However, this is unsafe because get_constant_value
could inadvertently raise a TypeError if it has a bug.
So we should eventually make NotConstantError derive
from Exception directly, and modify all code that uses
get_constant_value to catch this more specific exception.
"""
pass
def get_constant_value(v):
"""return the constant scalar(0-D) value underlying variable `v`
If v is the output of dimshuffles, fills, allocs, rebroadcasts, cast
this function digs through them.
If `v` is not some view of constant data, then raise a TypeError.
If `v` is not some view of constant data, then raise a NotConstantError.
:note: There may be another function similar to this one in the
code, but I'm not sure where it is.
......@@ -490,7 +504,7 @@ def get_constant_value(v):
numpy.complex(data) # works for all numeric scalars
return data
except Exception:
raise TypeError(
raise NotConstantError(
'v.data is non-numeric, non-scalar, or has more than one'
' unique value', v)
if v.owner:
......@@ -518,9 +532,17 @@ def get_constant_value(v):
v.owner.op.perform(v.owner, [const], ret)
return ret[0][0]
if isinstance(v.owner.op, Subtensor) and v.ndim == 0:
if isinstance(v.owner.inputs[0], TensorConstant):
return v.owner.inputs[0].data.__getitem__(
# This condition depends on Subtensor always embedding constant
# indices in the Op rather than making them inputs to the Apply node
if isinstance(v.owner.inputs[0], TensorConstant) and \
len(v.owner.inputs) == 1:
try:
return v.owner.inputs[0].data.__getitem__(
tuple(v.owner.op.idx_list))
except IndexError:
raise IndexError(str(tuple(v.owner.op.idx_list))+" is not a valid index into " + \
str(v.owner.inputs[0].data))
# The index list 'idx_list' should have length the same
# shape as the input.
......@@ -1614,6 +1636,9 @@ class _tensor_py_operators:
def flatten(self, ndim=1):
return flatten(self, ndim)
def ravel(self):
return flatten(self)
# CASTING
def astype(self, dtype):
return cast(self, dtype)
......@@ -1712,6 +1737,8 @@ class _tensor_py_operators:
def __rdot__(right, left):
return dot(left, right)
dot = __dot__
def sum(self, axis=None, dtype=None, keepdims=False):
"""See `theano.tensor.sum`"""
return sum(self, axis=axis, dtype=dtype, keepdims=keepdims)
......@@ -1736,6 +1763,10 @@ class _tensor_py_operators:
"""See `theano.tensor.var`"""
return var(self, axis, keepdims=keepdims)
def std(self, axis=None, keepdims=False):
"""See `theano.tensor.std`"""
return std(self, axis, keepdims=keepdims)
def min(self, axis=None, keepdims=False):
"""See `theano.tensor.min`"""
return min(self, axis, keepdims=keepdims)
......@@ -1744,6 +1775,40 @@ class _tensor_py_operators:
"""See `theano.tensor.max`"""
return max(self, axis, keepdims=keepdims)
def argmin(self, axis=None, keepdims=False):
"""See `theano.tensor.argmin`"""
return argmin(self, axis, keepdims=keepdims)
def argmax(self, axis=None, keepdims=False):
"""See `theano.tensor.argmax`"""
return argmax(self, axis, keepdims=keepdims)
def argsort(self, axis=-1, kind='quicksort', order=None):
"""See `theano.tensor.sort.argsort`"""
from theano.tensor.sort import argsort
return argsort(self, axis, kind, order)
def clip(self, a_min, a_max):
"Clip (limit) the values in an array."
return clip(self, a_min, a_max)
def conj(self):
"""See `theano.tensor.conj`"""
return conj(self)
def repeat(self, repeats, axis=None):
"""See `theano.tensor.repeat`"""
from theano.tensor.extra_ops import repeat
return repeat(self, repeats, axis)
def round(self, mode="half_away_from_zero"):
"""See `theano.tensor.round`"""
return round(self, mode)
def trace(self):
from theano.sandbox.linalg import trace
return trace(self)
# TO TRUMP NUMPY OPERATORS
__array_priority__ = 1000
......@@ -2949,12 +3014,12 @@ def psi(a):
@_scal_elemwise_with_nfunc('real', 1, -1)
def real(z):
"""Return real component of complex-valued tensor `z`"""
_tensor_py_operators.real = property(real)
@_scal_elemwise_with_nfunc('imag', 1, -1)
def imag(z):
"""Return imaginary component of complex-valued tensor `z`"""
_tensor_py_operators.imag = property(imag)
@_scal_elemwise_with_nfunc('angle', 1, -1)
def angle(z):
......@@ -3782,7 +3847,7 @@ class AdvancedIndexingError(TypeError):
class Subtensor(Op):
"""Return a subtensor view
The inputs array is the tensor x, followed by scalar integer variables.
The inputs array is the tensor x, followed by scalar integer types.
TODO: WRITEME: how are the scalar integer variables formatted?
This class uses a relatively complex internal representation of the inputs
......@@ -3791,7 +3856,7 @@ class Subtensor(Op):
idx_list: instance variable TODO: WRITEME: is this a list or a tuple?
(old docstring gives two conflicting
descriptions)
elements are either integers, theano scalars, or slices.
elements are either integers, theano scalar types, or slices.
one element per "explicitly named dimension"
TODO: WRITEME: what is an "explicitly named dimension" ?
......@@ -3800,7 +3865,11 @@ class Subtensor(Op):
if slice:
start/stop/step members of each slice are integer indices
into the inputs array or None
integer indices be actual integers or theano scalars
integer indices be actual integers or theano scalar types
Note that the idx_list defines the Op, so two Subtensor instances are
considered to be different Ops if they have different idx_list fields.
This means that the entries in it are theano Types, not theano Variables.
@todo: add support for advanced tensor indexing (in Subtensor_dx too).
......@@ -3818,6 +3887,17 @@ class Subtensor(Op):
@staticmethod
def collapse(idxs, cond):
"""
idxs: a list of indices or slices.
cond: a callable that returns a bool
returns: idxs, with the slices flattened out into a list.
if cond is true for an entry, does not flatten it.
"""
ret = []
def helper(entry):
......@@ -3830,10 +3910,20 @@ class Subtensor(Op):
for idx in idxs:
helper(idx)
return ret
@staticmethod
def convert(entry, slice_ok=True):
"""
The "idx_list" field is unique to each Subtensor instance.
It is not unique to each Apply node, so it should not refer to
specific Variables. This method changes references to Variables
into references to Types.
TODO: WRITEME: This method also accepts "entry" already being a Type;
when would that happen?
"""
invalid_scal_types = [scal.float64, scal.float32]
scal_types = [scal.int64, scal.int32, scal.int16, scal.int8]
tensor_types = [lscalar, iscalar, wscalar, bscalar]
......
......@@ -722,20 +722,19 @@ class Elemwise(Op):
def _bgrad(self, inputs, ograds):
# returns grad, with respect to broadcasted versions of inputs
# Gradients (especially on the final costs) don't have to be symbolic
# e.g., ograds will be [ 1. ] if your objective is c and the output
# of the current apply node is c
ograds = map(as_tensor_variable, ograds)
prev_setting = theano.config.compute_test_value
try:
theano.config.compute_test_value = 'off'
scalar_inputs = [Scalar(dtype=t.type.dtype)() for t in inputs]
scalar_ograds = [Scalar(dtype=ograd.type.dtype)()
for ograd in ograds]
def as_scalar(t):
if isinstance(t.type, (NullType, DisconnectedType)):
return t
return Scalar(t.type.dtype)()
scalar_inputs = map(as_scalar, inputs)
scalar_ograds = map(as_scalar, ograds)
scalar_igrads = self.scalar_op.grad(scalar_inputs, scalar_ograds)
for igrad in scalar_igrads:
assert igrad is not None
......
......@@ -801,10 +801,9 @@ class ConvOp(OpenMPOp):
# mimic what happens inside theano.grad: get the input gradient
# of the final cost wrt all variables involved.
tmp_gmap = theano.gradient.grad_sources_inputs(
[(node, gz)], [inputs, kerns])
return theano.gradient.grad(cost=None,
known_grads={node: gz}, wrt=[inputs, kerns])
return [tmp_gmap[inputs], tmp_gmap[kerns]]
if self.dx not in (1, 2) or self.dy not in (1, 2):
raise NotImplementedError(
......
......@@ -1046,7 +1046,7 @@ class T_CrossentropyCategorical1Hot(utt.InferShapeTester):
# Verify the gradient when providing output gradient
h = theano.function([x, y, a],
T.grad(expr, x, g_cost=a * x.sum()), mode=mode)
T.grad(expr, x, known_grads={expr:a * x.sum()}), mode=mode)
try:
assert 8 <= len(h.maker.fgraph.toposort()) <= 17
validate_grad_graph(h)
......
......@@ -14,7 +14,7 @@ builtin_min = __builtin__.min
from nose.plugins.skip import SkipTest
import numpy
from numpy.testing import dec
from numpy.testing import dec, assert_array_equal, assert_allclose
from numpy.testing.noseclasses import KnownFailureTest
import theano
......@@ -7001,6 +7001,85 @@ class TestInferShape(utt.InferShapeTester):
[tile(adtens4, aivec_val, ndim)],
[adtens4_val], Tile)
class TestTensorInstanceMethods(unittest.TestCase):
def setUp(self):
self.vars = matrices('X', 'Y')
self.vals = [rand(2,2),rand(2,2)]
def test_argmin(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.argmin().eval({X: x}), x.argmin())
def test_argmax(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.argmax().eval({X: x}), x.argmax())
def test_argsort(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.argsort().eval({X: x}), x.argsort())
assert_array_equal(X.argsort(1).eval({X: x}), x.argsort(1))
def test_dot(self):
X, Y = self.vars
x, y = self.vals
Z = X.clip(0.5 - Y, 0.5 + Y)
z = x.clip(0.5 - y, 0.5 + y)
assert_array_equal(Z.eval({X: x, Y: y}), z)
def test_dot(self):
X, Y = self.vars
x, y = self.vals
assert_array_equal(x.dot(y), X.dot(Y).eval({X: x, Y: y}))
Z = X.dot(Y)
z = x.dot(y)
assert_array_equal(x.dot(z), X.dot(Z).eval({X: x, Z: z}))
def test_real_imag(self):
X, Y = self.vars
x, y = self.vals
Z = X + Y * 1j
z = x + y * 1j
assert_array_equal(Z.real.eval({Z: z}), x)
assert_array_equal(Z.imag.eval({Z: z}), y)
def test_conj(self):
X, Y = self.vars
x, y = self.vals
Z = X + Y * 1j
z = x + y * 1j
assert_array_equal(Z.conj().eval({Z: z}), z.conj())
def test_round(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.round().eval({X: x}), x.round())
def test_std(self):
X, _ = self.vars
x, _ = self.vals
# std() is implemented as theano tree and does not pass its
# args directly to numpy. This sometimes results in small
# difference, so we use allclose test.
assert_allclose(X.std().eval({X: x}), x.std())
def test_repeat(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.repeat(2).eval({X: x}), x.repeat(2))
def test_trace(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.trace().eval({X: x}), x.trace())
def test_ravel(self):
X, _ = self.vars
x, _ = self.vals
assert_array_equal(X.ravel().eval({X: x}), x.ravel())
if __name__ == '__main__':
......
......@@ -6,7 +6,6 @@ import unittest
import theano
from theano import gof
from theano.gradient import grad_sources_inputs
from theano import gradient
from theano.tensor.nnet.Conv3D import conv3D
from theano import config
......@@ -16,6 +15,16 @@ from theano.gof.null_type import NullType
one = theano.tensor.as_tensor_variable(1.)
def grad_sources_inputs(sources, inputs):
"""
This implements the old grad_sources_inputs function in terms of
the new interface so the tests don't need to be rewritten.
"""
if inputs is None:
inputs = theano.gof.graph.inputs([source[0] for source in sources])
return dict(zip(inputs,theano.gradient.grad(cost=None, known_grads=dict(sources),
wrt=inputs, consider_constant=inputs)))
class testgrad_sources_inputs(unittest.TestCase):
def test_retNone1(self):
......@@ -369,35 +378,6 @@ class test_grad(unittest.TestCase):
# If we made it to here without an exception, then the
# connection_pattern functionality worked correctly
def test_sum_disconnected(self):
# Tests that we can add DisconnectedType to other terms correctly
x = theano.tensor.scalar()
y = x * 2.
z = x + 1.
cost = y + z
theano.tensor.grad(cost, x, consider_constant=[y, z])
# In an earlier version of theano, the above line would have failed
# while trying to add two DisconnectedTypes
def test_output_grad_on_int(self):
# If the g_cost argument is specified when x has a discrete dtype,
# g_cost should be equivalent to 0.
x = theano.tensor.iscalar('x')
y = x * 2
# Should work:
c0 = theano.tensor.constant(0)
theano.grad(y, x, g_cost=c0)
theano.grad(y, x, g_cost=y.zeros_like())
theano.grad(y, x, g_cost=y.zeros_like().astype('float64'))
# Should raise ValueError
c1 = theano.tensor.constant(1)
self.assertRaises(ValueError, theano.grad, y, x, g_cost=c1)
s0 = theano.shared(np.zeros((), dtype='int8'))
self.assertRaises(ValueError, theano.grad, y, x, g_cost=s0)
def test_downcast_dtype(self):
# Test that the gradient of a cost wrt a float32 variable does not
# get upcasted to float64.
......@@ -418,6 +398,161 @@ class test_grad(unittest.TestCase):
# be downcasted to float32, so dc_dx should also be float32
assert dc_dx.dtype == 'float32'
def test_grad_constant(self):
# Test that the gradient handles Constants and consider_constant variables
# consistently
x = theano.tensor.scalar()
y = theano.tensor.scalar()
z_x = x + y
z_one = one + y
g_x = theano.tensor.grad(z_x, x, consider_constant=[x])
g_one = theano.tensor.grad(z_one, one)
f = theano.function([x, y],[g_x, g_one])
g_x, g_one = f(1, .5)
if not np.allclose(g_x, g_one):
raise AssertionError("Gradient using consider constant is " + str(g_x)\
+ " but gradient with respect to the same Constant is " + \
str(g_one))
def test_known_grads():
# Tests that the grad method with no known_grads
# matches what happens if you put its own known_grads
# in for each variable
full_range = theano.tensor.arange(10)
x = theano.tensor.scalar('x')
t = theano.tensor.iscalar('t')
ft = full_range[t]
ft.name = 'ft'
coeffs = theano.tensor.vector('c')
ct = coeffs[t]
ct.name = 'ct'
p = x ** ft
p.name = 'p'
y = ct * p
y.name = 'y'
cost = theano.tensor.sqr(y)
cost.name = 'cost'
layers = [
[cost],
[y],
[ct,p],
[ct, x, ft],
[coeffs, t, full_range, x]
]
inputs = [coeffs, t, x]
rng = np.random.RandomState([2012, 11, 15])
values = [rng.randn(10), rng.randint(10), rng.randn() ]
values = [np.cast[ipt.dtype](value) for ipt, value in zip(inputs, values)]
true_grads = theano.tensor.grad(cost, inputs, disconnected_inputs='ignore')
true_grads = theano.function(inputs, true_grads)
true_grads = true_grads(*values)
for layer in layers:
print 'Testing by separately computing ',layer
first = theano.tensor.grad(cost, layer, disconnected_inputs='ignore')
known = dict(zip(layer, first))
full = theano.tensor.grad(cost=None,
known_grads=known,wrt=inputs, disconnected_inputs='ignore')
full = theano.function(inputs, full)
full = full(*values)
assert len(true_grads) == len(full)
for a, b, var in zip(true_grads, full, inputs):
if not np.allclose(a, b):
print 'Failure'
print a
print b
print var
print layer
for v in known:
print v,':',theano.function(inputs,known[v])(*values)
assert False
def test_dxdx():
# Tests that the gradient of a scalar with respect to itself is 1
# I use an integer in this case because people keep changing this
# gradient to be 0 on integers but according to our interpretation
# of the gradient as defined in the Op contract, it should be 1.
# If you feel the need to change this unit test you are probably
# modifying the Op contract and should definitely get the approval
# of multiple people on theano-dev.
x = theano.tensor.iscalar()
g = theano.tensor.grad(x, x)
g = g.eval({ x : 12 })
assert np.allclose(g,1.)
def test_known_grads_integers():
# Tests that known_grads works on integers
x = theano.tensor.iscalar()
g_expected = theano.tensor.scalar()
g_grad = theano.gradient.grad(cost=None,
known_grads={x : g_expected},
wrt=x)
f = theano.function([g_expected],g_grad)
x = -3
gv = np.cast[theano.config.floatX](.6)
g_actual = f(gv)
assert np.allclose(g_actual, gv)
def test_undefined_cost_grad():
# Tests that if we say the cost is not differentiable via the
# known_grads mechanism, it is treated as such by the rest of the
# system.
# This is so that Ops that are built around minigraphs like OpFromGraph
# and scan can implement Op.grad by passing ograds to known_grads
x = theano.tensor.iscalar()
y = theano.tensor.iscalar()
cost = x + y
assert cost.dtype in theano.tensor.discrete_dtypes
try:
grads = theano.tensor.grad(cost, [x, y], known_grads = {cost: NullType()() })
except theano.gradient.NullTypeGradError:
return
raise AssertionError("An undefined gradient has been ignored.")
def test_disconnected_cost_grad():
# Tests that if we say the cost is disconnected via the
# known_grads mechanism, it is treated as such by the rest of the
# system.
# This is so that Ops that are built around minigraphs like OpFromGraph
# and scan can implement Op.grad by passing ograds to known_grads
x = theano.tensor.iscalar()
y = theano.tensor.iscalar()
cost = x + y
assert cost.dtype in theano.tensor.discrete_dtypes
try:
grads = theano.tensor.grad(cost, [x, y], known_grads = {cost: gradient.DisconnectedType()() },
disconnected_inputs='raise')
except theano.gradient.DisconnectedInputError:
return
raise AssertionError("A disconnected gradient has been ignored.")
if __name__ == '__main__':
unittest.main()
......@@ -341,15 +341,9 @@ class test_RopLop(RopLop_checker):
rop_out2 = tensor.Rop((m, v, m + v), [m, v], [m_, v_])
assert isinstance(rop_out2, tuple)
assert len(rop_out2) == 3
lop_out1 = tensor.Lop([m, v, m + v], (m, v), [m_, v_])
assert isinstance(lop_out1, tuple)
assert len(lop_out1) == 2
lop_out2 = tensor.Lop((m, v, m + v), [m, v], [m_, v_])
assert isinstance(lop_out2, list)
assert len(lop_out2) == 2
all_outs = []
for o in rop_out1, rop_out2, lop_out1, lop_out2:
for o in rop_out1, rop_out2:
all_outs.extend(o)
f = theano.function([m, v, m_, v_], all_outs)
f(mval, vval, m_val, v_val)
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论