提交 d95e876d authored 作者: nouiz's avatar nouiz

Merge pull request #899 from goodfeli/rebase_fix_grad

Rebase fix grad
......@@ -98,6 +98,31 @@ following methods:
lifetime of self. Op instances should be immutable in this
sense.
.. function:: connection_pattern():
Optional (but in extremely rare cases needed to have it work with
{tensor,sparse}.grad).
Returns a list of bools the same length as the op's inputs list.
True signifies that the elements of an input have an effect on its
output.
False signifies that they do not--in other words, the op acts only
one the input's metadata such as its shape.
If no connection_pattern is implemented, tensor.grad will assume
it is a list containing only True.
Failing to implement this function for an op that needs it can
result in tensor.grad erroneously reporting that a gradient is
undefined. Returning 0 for this input in the grad method is not
the same as specifying that the elements of this input are not
connected to the output. If the gradient with respect to the
op's output is NaN but the elements of the input are not connected
to it, then the NaN never enters into the expression for the
gradient.
.. function:: grad(inputs, output_gradients)
Optional (but needed to have it work with {tensor,sparse}.grad()).
......@@ -106,31 +131,62 @@ following methods:
symbolically in this method. Both ``inputs`` and ``output_gradients``
are lists of symbolic Theano Variables and those must be operated on using
Theano's symbolic language. The grad method must return a list containing
one Variable (or ``None``) for each input. Each returned Variable represents
one Variable for each input. Each returned Variable represents
the gradient with respect to that input computed based on the symbolic gradients with
respect to each output.
If the output is not differentiable with respect to any inputs,
then this method should be defined to return ``[None for i in
inputs]``. If this method is not defined, then Theano assumes it has been
If the output is not differentiable with respect to an input
then this method should be defined to return a variable of type
NullType for that input.
If an element of output_gradient is of type theano.gradient.DisconnectedType,
it means that the cost is not a function of this output. If any of the
op's inputs participate in the computation of only disconnected outputs,
then Op.grad should return DisconnectedType variables for those inputs.
If the grad method is not defined, then Theano assumes it has been
forgotten. Symbolic differentiation will fail on a graph that
includes this Op.
It must be understood that the grad method is not meant to return the
gradient of the Op's output but rather the gradient of some other scalar
criterion C with respect to the Op's input.
It must be understood that the Op's grad method is not meant to return the
gradient of the Op's output. theano.tensor.grad computes gradients; Op.grad
is a helper function that computes terms that appear in gradients.
If an Op has a single vector-valued output y and a single vector-valued input x,
then the grad method will be passed x and a second vector z. Define J to be
the Jacobian of y with respect to x. The Op's grad method should return
dot(J.T,z). When theano.tensor.grad calls the grad method, it will set z to
be the gradient of the cost C with respect to y. If this op is the only op
that acts on x, then dot(J.T,z) is the gradient of C with respect to x.
If there are other ops that act on x, theano.tensor.grad will have to add up
the terms of x's gradient contributed by the other op's grad method.
In practice, an op's input and output are rarely implemented as single vectors.
Even if an op's output consists of a list containing a scalar, a sparse matrix,
and a 4D tensor, you can think of these objects as being formed by rearranging
a vector. Likewise for the input. In this view, the values computed by the grad
method still represent a Jacobian-vector product.
In practice, it is probably not a good idea to explicitly construct the Jacobian,
which might be very large and very sparse. However, the returned value should
be equal to the Jacobian-vector product.
So long as you implement this product correctly, you need not understand what
theano.tensor.grad is doing, but for the curious the mathematical justification
is as follows:
In essence, the grad method must simply implement through symbolic Variables
and operations the chain rule of differential calculus. The chain rule
is the mathematical procedure that allows to calculate the total derivative
is the mathematical procedure that allows one to calculate the total derivative
:math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
primitive symbolic Variable x found in the list ``inputs``,
based on the knowledge of the total derivative :math:`\frac{d C}{d f}` of
C with respect to a symbolic Variable that is returned by the Op (this is provided
primitive symbolic Variable x found in the list ``inputs``.
The grad method does this using ``output_gradients`` which provides the total
derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic Variable
that is returned by the Op (this is provided
in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the
latter with respect to the primitive Variable (this has to be computed).
In Mathematics, the total derivative of a scalar variable (C) with respect to a vector of
In mathematics, the total derivative of a scalar variable (C) with respect to a vector of
scalar variables (x), i.e. the gradient, is customarily represented as the
row vector of the partial derivatives, whereas the total derivative of a vector of
scalar variables (f) with respect to another (x), is customarily represented by the matrix of
......
......@@ -150,24 +150,6 @@ def std_fgraph(input_specs, output_specs, accept_inplace = False):
std_fgraph.features = [gof.toolbox.PreserveNames]
class UncomputableFeature(gof.Feature):
"""A feature that ensures the graph never contains any
uncomputable nodes. This check must be made at compile time
rather than runtime in order to make sure that NaN nodes are
not optimized out. It must be done as a Feature so that
the fgraph will continually check that optimizations have
not introduce any uncomputable nodes."""
def on_attach(self, fgraph):
for node in fgraph.nodes:
return self.on_import(fgraph, node)
def on_import(self, fgraph, node):
gof.op.raise_if_uncomputable(node)
std_fgraph.features.append(UncomputableFeature)
class AliasedMemoryError(Exception):
"""Memory is aliased that should not be"""
pass
......
......@@ -11,7 +11,7 @@ import toolbox
from python25 import all
from theano import config
import warnings
NullType = None
class InconsistencyError(Exception):
"""
......@@ -211,6 +211,9 @@ class FunctionGraph(utils.object2):
### import ###
def __import_r__(self, variables):
global NullType
if NullType is None:
from null_type import NullType
# Imports the owners of the variables
r_owner_done = set(self.nodes)
for node in [r.owner for r in variables if r.owner is not None]:
......@@ -219,6 +222,8 @@ class FunctionGraph(utils.object2):
self.__import__(node)
for r in variables:
if r.owner is None and not isinstance(r, graph.Constant) and r not in self.inputs:
if isinstance(r.type,NullType):
raise TypeError("Computation graph contains a NaN. "+r.type.why_null)
raise MissingInputError("Undeclared input", r)
if not getattr(r, 'fgraph', None) is self:
self.__setup_r__(r)
......
from theano.gof.type import Type
class NullType(Type):
"""
A type that allows no values. Used to represent expressions
that are undefined, either because they do not exist mathematically
or because the code to generate the expression has not been
implemented yet.
"""
def __init__(self, why_null='(no explanation given)'):
"""
why_null: A string explaining why this variable
can't take on any values
"""
self.why_null = why_null
def filter(self, data, strict=False, allow_downcast=None):
raise ValueError("No values may be assigned to a NullType")
def filter_variable(self, other):
raise ValueError("No values may be assigned to a NullType")
def may_share_memory(a, b):
return False
def values_eq(a, b, force_same_dtype=True):
raise ValueError("NullType has no values to compare")
......@@ -609,59 +609,6 @@ class Op(utils.object2, PureOp, CLinkerOp):
rval.lazy = False
return rval
class UncomputableOp(Op):
"""
An Op representing an expression that cannot be computed.
theano.function checks that the subgraph it implements
does not contain these ops, and that optimization does not
introduce any such ops.
theano.tensor.grad checks the graphs it returns to ensure
they do not contain these ops.
"""
def __init__(self, exc, msg=""):
"""
exc: the exception type to raise if a subgraph contains
this op.
msg: the message to include in the exception.
"""
self.exc = exc
self.msg = msg
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash((type(self)))
def __str__(self):
return "Uncomputable{%s,%s}"%(self.exc,self.msg)
def make_node(self,x):
if x is None:
x = graph.Constant(theano.gof.type.generic,None)
return graph.Apply(self, [x], [x.type()] )
def perform(self, node, inputs, out_storage):
""" This should never be called"""
raise AssertionError("A BadGradOp should never be compiled, "+\
"and certainly not executed.")
#Note: essentially, this op should just be NaNs_like(inputs[0])
#but 0 * BadGradOp(x) + y optimizes to just y
#so until we develop a way of symbolically representing a variable
#that is always NaN and implement the logic for 0 * NaN = NaN, etc.
#the only way we can guarantee correctness of a theano function
#is to guarantee that its initial subgraph contained no BadGradOps
def raise_exc(self):
raise self.exc(self.msg)
def raise_if_uncomputable(node):
if node is not None:
if isinstance(node.op, UncomputableOp):
node.op.raise_exc()
def get_test_value(v):
"""
Extract test value from `v`. Raises AttributeError if there is none.
......
......@@ -20,9 +20,18 @@ from theano import gof
from theano.gof import Variable
from theano.gof.python25 import all
import theano.gof.utils
from theano.gof.null_type import NullType
from theano.printing import min_informative_str
# we can't do "import theano.tensor"
# tensor depends on theano.compile
# theano.compile depends on theano.gradient (this file)
# the reason theano.compile depends on theano.gradient
# is that theano.compile.builders contains the op from graph
# functionality and it uses theano.gradient to implement
# the new op's grad method
tensor = None
_msg_retType = 'op.grad(...) returned a non-list'
_msg_badlen = 'op.grad(...) returned wrong number of gradients'
def format_as(use_list, use_tuple, outputs):
......@@ -54,171 +63,7 @@ def format_as(use_list, use_tuple, outputs):
return outputs
def grad_sources_inputs(sources, graph_inputs, warn_type=True):
"""
A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
``v``. More specifically, ``g_v`` is the gradient of an external
scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
This function traverses the graph backward from the ``r`` sources,
calling ``op.grad(...)`` for all ops with some non-None gradient
on an output, to compute gradients of ``cost`` wrt intermediate
variables and ``graph_inputs``.
The ``op.grad(...)`` functions are called like this:
.. code-block:: python
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to ``op.grad`` should return a list or tuple: one symbolic
gradient per input. These gradients represent the gradients of
the same implicit ``cost`` mentionned above, wrt ``op.inputs``. Note
that this is **not** the same as the gradient of ``op.outputs`` wrt
``op.inputs``.
If ``op`` has a single input, then ``op.grad`` should return a list
or tuple of length 1.
For each input wrt to which ``op`` is not differentiable, it should
return ``None`` instead of a `Variable` instance.
If a source ``r`` receives a gradient from another source ``r2``,
then the effective gradient on ``r`` is the sum of both gradients.
:type sources: list of pairs of Variable: (v, gradient-on-v) to
initialize the total_gradient dictionary
:param sources: gradients to back-propagate using chain rule
:type graph_inputs: list of Variable
:param graph_inputs: variables considered to be constant
(do not backpropagate through them)
:type warn_type: bool
:param warn_type: True will trigger warnings via the logging module when
the gradient on an expression has a different type than the original
expression
:rtype: dictionary whose keys and values are of type Variable
:return: mapping from each Variable encountered in the backward
traversal to the gradient with respect to that Variable.
It is assumed that there is some objective J shared between all members of
sources, so that for each v, gradient-on-v is the gradient of J with
respect to v
"""
gmap = {}
for (r, g_r) in sources:
if not hasattr(r, 'type'):
raise TypeError('sources must be Variables', r)
if g_r is not None:
if r in gmap:
gmap[r] = gmap[r] + g_r
else:
gmap[r] = g_r
graph_outputs = gof.utils.uniq([r for r, g in sources])
if graph_inputs is None:
graph_inputs = gof.graph.inputs(graph_outputs)
for node in gof.graph.io_toposort(graph_inputs,
graph_outputs).__reversed__():
g_outputs = [gmap.get(o, None) for o in node.outputs]
#if all output gradients are None, continue
if all(map(lambda x: x is None, g_outputs)): continue
#Disable all grad operation on complex. verify_grad don't
#support them and we don't know we want to handle them.
for var in node.inputs + node.outputs:
if (hasattr(var.type, 'dtype') and "complex" in var.type.dtype):
raise Exception("We do not support grad/Rop/Lop/verify_grad"
" on complex.")
output_arg = g_outputs
input_arg = node.inputs
# Each Op's grad function requires inputs and output_grads
# If the Op destroys any input, but the grad expression uses it,
# then chances are the resulting graph will have a dependency
# cycle. We avoid this cycle by passing (symbolic) copies of
# each destroyed input.
try:
dinputs = [node.inputs[x[0]] for x in node.op.destroy_map.values()]
except AttributeError:
dinputs = []
new_input_arg = []
for input in input_arg:
if input in dinputs and hasattr(input, 'copy'):
new_input_arg.append(input.copy())
else:
new_input_arg.append(input)
input_arg = new_input_arg
#note that this function is not in a try-except block
# the rationale:
# If the op implements grad, then any exception should be passed to
# the caller
# If the op doesn't implement grad, this entire function should fail.
# Other possibilities:
# * return a partial back-prop
#
op_grad = node.op.grad(input_arg, output_arg)
if not isinstance(op_grad, (list, tuple)):
raise ValueError(_msg_retType, node.op)
g_inputs = op_grad
assert isinstance(g_inputs, (list, tuple))
if len(g_inputs) != len(node.inputs):
raise ValueError(_msg_badlen,
node.op,
len(g_inputs),
len(node.inputs))
for ii, (r, g_r) in enumerate(zip(node.inputs, g_inputs)):
if warn_type:
if g_r and (getattr(r, 'type', 0) != getattr(g_r, 'type', 1)):
r_type = getattr(r, 'type', None)
g_r_type = getattr(g_r, 'type', None)
_logger.warning('%s.grad returned a different type (%s) '
'for input %i of type (%s)',
node.op, g_r_type, ii, r_type)
if g_r is not None:
assert r is not None
if r in gmap:
gmap[r] = gmap[r] + g_r
else:
gmap[r] = g_r
return gmap
class GradNotImplementedOp(gof.op.UncomputableOp):
""" An UncomputableOp representing a gradient that hasn't been implemented yet.
"""
def __init__(self, op, x_pos, comment = ""):
"""
op: A theano op whose grad is not implemented for some input
x_pos: An int, giving the index in the op's input list of
a variable for which the gradient is not implemented
(if op has unimplemented gradients for several inputs,
it must still return a separate UnimplementedGradOp for
each)
comment: An optional comment explaining why the gradient isn't
implemented.
"""
assert isinstance(op, gof.Op)
assert isinstance(x_pos, int)
assert x_pos >= 0
super(GradNotImplementedOp,self).__init__(NotImplementedError,
"%s does not implement its gradient with respect to input %d. %s" \
% (str(type(op)), x_pos, comment))
def grad_not_implemented(op, x_pos, x, comment = ""):
def grad_not_implemented(op, x_pos, x, comment=""):
"""
Return an un-computable symbolic variable of type `x.type`.
......@@ -232,40 +77,14 @@ def grad_not_implemented(op, x_pos, x, comment = ""):
gradient is not implemented.
"""
return GradNotImplementedOp(op, x_pos, comment)(x)
return (NullType(
(
"This variable is Null because the grad method for "
"input %s (%s) of the %s op is not implemented. %s"
) % (x_pos, x, op, comment)))()
class GradUndefinedError(Exception):
""" An exception raised upon attempts to use an undefined gradient.
"""
class GradUndefinedOp(gof.op.UncomputableOp):
""" An UncomputableOp representing a gradient that is mathematically
undefined.
"""
def __init__(self, op, x_pos, comment = ""):
"""
op: A theano op whose grad is mathematically undefined for
some input
x_pos: An int, giving the index in the op's input list of
a variable for which the gradient is undefined
(if op has undefined gradients for several inputs,
it must still return a separate GradUndefinedOp for
each)
comment: An optional comment explaining why the gradient isn't
defined.
"""
assert isinstance(op, gof.Op)
assert isinstance(x_pos, int)
assert x_pos >= 0
super(GradUndefinedOp,self).__init__(GradUndefinedError,
"%s does not implement its gradient with respect to input %d. %s" \
% (str(type(op)), x_pos, comment))
def grad_undefined(op, x_pos, x, comment = ""):
def grad_undefined(op, x_pos, x, comment=""):
"""
Return an un-computable symbolic variable of type `x.type`.
......@@ -279,9 +98,49 @@ def grad_undefined(op, x_pos, x, comment = ""):
gradient is not defined.
"""
return GradUndefinedOp(op, x_pos, comment)(x)
return (NullType(
(
"This variable is Null because the grad method for "
"input %s (%s) of the %s op is mathematically undefined. %s"
) % (x_pos, x, op, comment)))()
class DisconnectedType(theano.gof.type.Type):
""" A type indicating that a variable is a result
of taking the gradient of c with respect to x
when c is not a function of x.
A symbolic placeholder for 0, but to convey
the extra information that this gradient is 0
because it is disconnected.
"""
def filter(self, data, strict=False, allow_downcast=None):
raise AssertionError(
(
"If you're assigning to a DisconnectedType you're"
" doing something wrong. It should only be used as"
" a symbolic placeholder."
))
def fiter_variable(self, other):
raise AssertionError(
(
"If you're assigning to a DisconnectedType you're"
" doing something wrong. It should only be used as"
" a symbolic placeholder."
))
def may_share_memory(a, b):
return False
def value_eq(a, b, force_same_dtype=True):
raise AssertionError(
(
"If you're assigning to a DisconnectedType you're"
" doing something wrong. It should only be used as"
" a symbolic placeholder."
))
########################
......@@ -418,7 +277,7 @@ def Rop(f, wrt, eval_points):
def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
disconnected_inputs='raise'):
disconnected_inputs='raise'):
"""
Computes the L operation on `f` wrt to `wrt` evaluated at points given
in `eval_points`. Mathematically this stands for the jacobian of `f` wrt
......@@ -453,10 +312,24 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
if not isinstance(f, (list, tuple)):
f = [f]
inputs = gof.graph.inputs(f)
# make copies of f and grads so we don't modify the client's copy
f = list(f)
grads = list(eval_points)
for elem in consider_constant:
assert elem not in f
f.append(elem)
grads.append(elem.zeros_like())
if not isinstance(wrt, (list, tuple)):
wrt = [wrt]
arg1 = zip(f, eval_points)
arg2 = list(wrt)
gmap = grad_sources_inputs(
zip(f, eval_points),
list(inputs) + list(consider_constant),
arg1,
arg2,
warn_type=warn_type)
# Note : If p is not in gmap there can be several reasons, among which
......@@ -466,17 +339,16 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
# such subtle cases can be fixed by a more careful implementation of the
# gradient, but for now Theano needs to throw an exception, and make the
# user aware that it does not know how to compute that gradient
if not isinstance(wrt, (list, tuple)):
wrt = [wrt]
ret = []
for p in wrt:
if p in gmap:
ret.append(gmap[p])
else:
message = ("Lop method was asked to compute the gradient "
"with respect to a variable that is not part of "
"the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % p)
message = (
"Lop method was asked to compute the gradient "
"with respect to a variable that is not part of "
"the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % p)
if disconnected_inputs == 'ignore':
pass
elif disconnected_inputs == 'warn':
......@@ -484,9 +356,10 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
elif disconnected_inputs == 'raise':
raise ValueError(message)
else:
raise ValueError("Invalid value for keyword "
"'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.")
raise ValueError(
"Invalid value for keyword "
"'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.")
ret.append(p.zeros_like())
return format_as(using_list, using_tuple, ret)
......@@ -497,7 +370,7 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
#########################
def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
disconnected_inputs='raise'):
disconnected_inputs='raise', add_names=True):
"""
:type cost: Scalar (0-dimensional) Variable.
:type wrt: Variable or list of Variables.
......@@ -518,6 +391,11 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
- 'warn': consider the gradient zero, and print a warning.
- 'raise': raise an exception.
:type add_names: bool
:param add_names: If True, variables generated by grad will be named
(d<cost.name>/d<wrt.name>) provided that both cost and wrt have
names
:rtype: Variable or list/tuple of Variables (depending upon `wrt`)
:return: symbolic expression of gradient of `cost` with respect to `wrt`.
......@@ -526,14 +404,23 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
It returns an object of same type as `wrt`: a list/tuple
or Variable in all cases.
This function is a wrapper around the more general function
`theano.gradient.grad_sources_inputs``.
"""
global tensor
if tensor is None:
from theano import tensor
if isinstance(cost.type, NullType):
raise ValueError("Can't differentiate a NaN cost."
"cost is NaN because " + \
cost.type.why_null)
if cost.ndim != 0:
raise TypeError("cost must be a scalar.")
if consider_constant is None:
consider_constant = []
else:
#error checking on consider_constant: verify that it is a collection
# error checking on consider_constant: verify that it is a collection
# of theano variables
# this is important, if someone accidentally passes a nested data
# structure with theano variables at the leaves, only the root will
......@@ -546,47 +433,34 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
raise TypeError('Elements of consider_constant must be '
'variables, but got ' + str(type(elem)))
if not isinstance(cost, Variable):
raise TypeError(('In grad(), cost argument should be '
'a Variable.'), cost)
using_list = isinstance(wrt, list)
using_tuple = isinstance(wrt, tuple)
if not using_list and not using_tuple:
wrt = [wrt]
if cost.type.ndim:
raise TypeError(
'In theano.gradient.grad, "cost" argument should be a scalar,'
' but ndim is %i (should be 0). If you want to compute the'
' gradient of the sum of cost, you should use cost.sum().'
% cost.type.ndim)
var_to_node_to_idx = _populate_var_to_node_to_idx([cost])
# build a dict mapping var to the gradient of cost with respect to var
grad_dict = {}
# by default, the gradient of the cost is 1
if g_cost is None:
from theano import tensor
g_cost = tensor.ones_like(cost)
inputs = gof.graph.inputs([cost])
gmap = grad_sources_inputs(
[(cost, g_cost)],
list(inputs) + list(consider_constant),
warn_type=warn_type)
# Note : If p is not in gmap there can be several reasons, among which
# is the fact that p might not be part of the computational graph. A
# simple example is that for a+b for e.g. a[0] is not part of the graph,
# so Theano does not know how to compute TT.grad(TT.sum(a+b), a[0])
# such subtle cases can be fixed by a more careful implementation of the
# gradient, but for now Theano needs to throw an exception, and make the
# user aware that it does not know how to compute that gradient
using_list = isinstance(wrt, list)
using_tuple = isinstance(wrt, tuple)
if not isinstance(wrt, (list, tuple)):
wrt = [wrt]
ret = []
for p in wrt:
if p in gmap:
ret.append(gmap[p])
else:
grad_dict[cost] = g_cost
# the gradient of the constants is 0
for const in consider_constant:
grad_dict[const] = DisconnectedType()()
# variables that do not influence the cost have zero gradient.
# if wrt is such a variable, populate the grad_dict with this info
# so that wrt not being in var_to_node_to_idx won't cause an error below
# according to the flag, possibly raise an error if wrt is disconnected
for elem in wrt:
if elem not in var_to_node_to_idx and elem is not cost:
message = ("grad method was asked to compute the gradient "
"with respect to a variable that is not part of "
"the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % p)
"only by a non-differentiable operator: %s" % elem)
if disconnected_inputs == 'ignore':
pass
elif disconnected_inputs == 'warn':
......@@ -597,20 +471,331 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
raise ValueError("Invalid value for keyword "
"'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.")
ret.append(p.zeros_like())
grad_dict[elem] = DisconnectedType()()
if cost.name is not None and p.name is not None \
and ret[-1].name is None:
ret[-1].name = '(d%s/d%s)' % (cost.name, p.name)
cost_name = None
if add_names:
cost_name = cost.name
# new_vars is meant to be a list of all variables created
# by this call to grad(), which will be visible to the caller
# after we return.
new_vars = gof.graph.ancestors(ret,
blockers=gof.graph.ancestors([cost]) + list(wrt))
map(gof.op.raise_if_uncomputable, [v.owner for v in new_vars])
rval = _populate_grad_dict(var_to_node_to_idx,
grad_dict, wrt, warn_type,
cost_name)
return format_as(using_list, using_tuple, ret)
for i in xrange(len(rval)):
if isinstance(rval[i].type, DisconnectedType):
rval[i] = wrt[i].zeros_like()
if using_tuple:
rval = tuple(rval)
elif not using_list:
rval, = rval
return rval
def _populate_var_to_node_to_idx(outputs):
"""
Common code shared between grad and grad_sources_inputs
outputs: a list of nodes we want to take gradients of
returns:
var_to_node_to_idx: a dictionary mapping a variable to
a second dictionary.
the second dictionary maps apply nodes acting on
this variable to the variable's index in the apply
node's input list
"""
# var_to_node_to_idx[var][node] = [i,j] means node has
# var as input at positions i and j
var_to_node_to_idx = {}
# set of variables or nodes that have been added to their parents
accounted_for = set([])
def account_for(var):
if var in accounted_for:
return
accounted_for.add(var)
if var.owner is not None:
node = var.owner
if node not in accounted_for:
accounted_for.add(node)
for i, ipt in enumerate(node.inputs):
if ipt not in var_to_node_to_idx:
var_to_node_to_idx[ipt] = {}
node_to_idx = var_to_node_to_idx[ipt]
if node not in node_to_idx:
node_to_idx[node] = []
idx = node_to_idx[node]
assert i not in idx
idx.append(i)
account_for(ipt)
for output in outputs:
account_for(output)
return var_to_node_to_idx
def _populate_grad_dict(var_to_node_to_idx,
grad_dict, wrt, warn_type, cost_name=None):
"""
Common code shared between grad_sources_inputs and grad
var_to_node_to_idx: a dictionary mapping a variable to
a second dictionary.
the second dictionary maps apply nodes acting on
this variable to the variable's index in the apply
node's input list
grad_dict: a dictionary mapping variables to their gradients
should be populated by grad or grad_sources_inputs
grad should set gradients to DisconnectedType()() for
variables to be considered constant, set the
gradient for the cost variable to g_cost, etc.
both should set the gradient for disconnected
inputs to a variable with type DisconnectedType()
wrt: the minimal set of variables that must be included in grad_dict
warn_type: if True, log a warning when a gradient term for a variable
has a different type from that variable
cost_name: The name of the cost being differentiated, optional.
used to name the grad with respect to x as
(d<cost_name>/dx)
returns: a list of gradients corresponding to wrt
"""
# build a dict mapping node to the terms node contributes to each of
# its inputs' gradients
term_dict = {}
# populate term_dict[node] and return it
def access_term_cache(node):
if node not in term_dict:
inputs = node.inputs
# Each Op's grad function requires inputs and output_grads
# If the Op destroys any input, but the grad expression uses it,
# then chances are the resulting graph will have a dependency
# cycle. We avoid this cycle by passing (symbolic) copies of
# each destroyed input.
try:
dinputs = [node.inputs[x[0]] for x in
node.op.destroy_map.values()]
except AttributeError:
dinputs = []
def try_to_copy_if_needed(var):
if var in dinputs and hasattr(var, 'copy'):
return var.copy()
return var
inputs = [try_to_copy_if_needed(ipt) for ipt in inputs]
output_grads = [access_grad_cache(var) for var in node.outputs]
if False in [isinstance(g.type, DisconnectedType)
for g in output_grads]:
# Some outputs of this op are connected to the cost so we must
# call the ops grad method
input_grads = node.op.grad(inputs, output_grads)
if input_grads is None:
raise TypeError("%s.grad returned NoneType, "
"expected iterable." % str(node.op))
if len(input_grads) != len(inputs):
raise ValueError(("%s returned the wrong number of" +\
" gradient terms.") % str(node.op))
else:
# All outputs of this op are disconnected so we can skip
# Calling the op's grad method and report that the inputs
# are disconnected
# (The op's grad method could do this too, but this saves the
# implementer the trouble of worrying about this case)
input_grads = [DisconnectedType()() for ipt in inputs]
# must convert to list in case the op returns a tuple
# we won't be able to post-process out the Nones if it does that
term_dict[node] = list(input_grads)
for i in xrange(len(term_dict[node])):
if term_dict[node][i] is None:
# we don't know what None means. in the past it has been
# used to
# mean undefined, zero, or disconnected. So for now we
# assume it is
# zero. Assuming it is zero prevents
# us from disconnecting NaNs above.
# eventually we should disallow this
# return type and force all ops
# to return the correct thing
# raise AssertionError('%s returned None for' +\
# ' a gradient term, '
# 'this is prohibited' % node.op)
term_dict[node][i] = node.inputs[i].zeros_like()
if warn_type:
g_r_type = term_dict[node][i].type
r_type = inputs[i].type
if g_r_type != r_type:
_logger.warning(
'%s.grad returned a different type (%s) '
'for input %i of type (%s)',
node.op, g_r_type, i, r_type)
return term_dict[node]
# populate grad_dict[var] and return it
def access_grad_cache(var):
if var not in grad_dict:
if var in var_to_node_to_idx:
terms = []
node_to_idx = var_to_node_to_idx[var]
for node in node_to_idx:
for idx in node_to_idx[node]:
if hasattr(node.op, 'connection_pattern'):
pattern = node.op.connection_pattern()
if not pattern[idx]:
continue
term = access_term_cache(node)[idx]
if not isinstance(term, gof.Variable):
raise TypeError("%s.grad returned %s, expected"
" Variable instance." % (str(node.op),
type(term)))
if isinstance(term.type, NullType):
raise TypeError("tensor.grad "
"encountered a NaN. " +\
term.type.why_null)
terms.append(term)
#the next line is like sum(terms) but doesn't add an
#extraneous TensorConstant(0)
grad_dict[var] = reduce(lambda x,y: x+y, terms)
if cost_name is not None and var.name is not None:
grad_dict[var].name = '(d%s/d%s)' % (cost_name, var.name)
else:
# this variable isn't connected to the cost in the computational
# graph
grad_dict[var] = DisconnectedType()()
return grad_dict[var]
rval = [access_grad_cache(elem) for elem in wrt]
return rval
def grad_sources_inputs(sources, graph_inputs, warn_type=True):
"""
Used to compute the gradient of a cost with respect to all the
variables between graph_input and cost, but in the special
case where you don't know the cost, you only know its gradient
on a set of intermediate values.
A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
``v``. More specifically, ``g_v`` is the gradient of an external
scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
This function traverses the graph backward from the ``r`` sources,
calling ``op.grad(...)`` for all ops with some non-None gradient
on an output, to compute gradients of ``cost`` wrt intermediate
variables and ``graph_inputs``.
The ``op.grad(...)`` functions are called like this:
.. code-block:: python
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to ``op.grad`` should return a list or tuple: one symbolic
gradient per input. These gradients represent the gradients of
the same implicit ``cost`` mentionned above, wrt ``op.inputs``. Note
that this is **not** the same as the gradient of ``op.outputs`` wrt
``op.inputs``.
If ``op`` has a single input, then ``op.grad`` should return a list
or tuple of length 1.
For each input wrt to which ``op`` is not differentiable, it should
return ``None`` instead of a `Variable` instance.
If a source ``r`` receives a gradient from another source ``r2``,
then the effective gradient on ``r`` is the sum of both gradients.
:type sources: list of pairs of Variable: (v, gradient-on-v) to
initialize the total_gradient dictionary
:param sources: gradients to back-propagate using chain rule
:type graph_inputs: list of Variable
:param graph_inputs: variables considered to be constant
(do not backpropagate through them)
:type warn_type: bool
:param warn_type: True will trigger warnings via the logging module when
the gradient on an expression has a different type than the original
expression
:rtype: dictionary whose keys and values are of type Variable
:return: mapping from each Variable encountered in the backward
traversal to the gradient with respect to that Variable.
It is assumed that there is some objective J shared between all members of
sources, so that for each v, gradient-on-v is the gradient of J with
respect to v
"""
outputs, output_grads = zip(*sources)
for output_grad in output_grads:
if not hasattr(output_grad, 'type'):
raise TypeError('output grads must be theano variables.'
'Ambiguous whether %s should be made into tensor'
' or sparse theano variable' % str(type(output_grad)))
if graph_inputs is None:
graph_inputs = gof.graph.inputs(outputs)
wrt = graph_inputs
var_to_node_to_idx = _populate_var_to_node_to_idx(outputs)
# build a dict mapping var to the gradient of cost with respect to var
grad_dict = {}
# by default, the gradient of the cost is 1
for output, output_grad in sources:
grad_dict[output] = output_grad
# variables that do not influence the cost have zero gradient.
# if wrt is such a variable, populate the grad_dict with this info
# so that wrt not being in var_to_node_to_idx won't cause an error below
# according to the flag, possibly raise an error if wrt is disconnected
for elem in wrt:
if elem not in var_to_node_to_idx and elem not in outputs:
grad_dict[elem] = DisconnectedType()()
_populate_grad_dict(var_to_node_to_idx,
grad_dict, wrt, warn_type)
# post-process out the DisconnectedTypes
for key in grad_dict:
if isinstance(grad_dict[key].type, DisconnectedType):
if hasattr(key, 'zeros_like'):
grad_dict[key] = key.zeros_like()
return grad_dict
class numeric_grad(object):
......@@ -902,7 +1087,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
as_tensor_variable(p).broadcastable)(name='input %i' % i)
for i, p in enumerate(pt)]
#fun can be either a function or an actual Op instance
# fun can be either a function or an actual Op instance
o_output = fun(*tensor_pt)
if isinstance(o_output, list):
......@@ -929,6 +1114,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
return plain
t_r = shared(random_projection())
t_r.name = 'random_projection'
# random projection of o onto t_r
# This sum() is defined above, it's not the builtin sum.
......@@ -936,7 +1122,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
cost_fn = function(tensor_pt, cost)
#todo-- determine if this is actually needed
# todo-- determine if this is actually needed
g_cost = as_tensor_variable(1.0, name='g_cost')
if cast_to_output_type:
g_cost = cast(g_cost, o_output.dtype)
......@@ -958,10 +1144,11 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
num_grad.max_err(analytic_grad, abs_tol, rel_tol)
if max_abs_err > abs_tol and max_rel_err > rel_tol:
raise verify_grad.E_grad(max_arg, max_err_pos,
max_abs_err, max_rel_err, abs_tol, rel_tol)
#get new random projection for next test
# get new random projection for next test
if test_num < n_tests - 1:
t_r.set_value(random_projection(), borrow=True)
......
......@@ -456,7 +456,7 @@ def test_elemwise_composite_support_code():
P = T.exp(-(Y - U) ** 2)
epsilon = numpy.asarray(0.001, dtype="float32")
NLL = -T.mean(T.log(P + epsilon)) # SupportCodeError
G = T.grad(NLL, wrt=[W])
G = theano.gradient.grad(NLL, wrt=[W])
backup = theano.config.warn.identify_1pexp_bug
theano.config.warn.identify_1pexp_bug = False
......@@ -468,6 +468,7 @@ def test_elemwise_composite_support_code():
topo = f_grad.maker.fgraph.toposort()
assert sum([isinstance(node.op, T.Elemwise) for node in topo]) == 1
#I suspect this was failing in the original branch too
assert sum([isinstance(node.op, tcn.GpuElemwise) for node in topo]) == 1
......
......@@ -258,7 +258,7 @@ class T_Images2Neibs(unittest_tools.InferShapeTester):
def fn(images):
return images2neibs(images, (3, 3), mode='wrap_centered')
self.assertRaises(NotImplementedError, unittest_tools.verify_grad,
self.assertRaises(TypeError, unittest_tools.verify_grad,
fn, [images_val], mode=self.mode)
......@@ -276,7 +276,7 @@ class T_Images2Neibs(unittest_tools.InferShapeTester):
# are not the same.
def fn(images):
return images2neibs(images, (2, 2), (1, 1))
self.assertRaises(NotImplementedError,
self.assertRaises(TypeError,
unittest_tools.verify_grad, fn, [images_val],
mode=self.mode)
......
......@@ -488,6 +488,9 @@ class _scalar_py_operators:
def __rmod__(self,other): return mod(other,self)
def __rpow__(self,other): return pow(other,self)
def zeros_like(self):
return ScalarConstant(Scalar(str(self.type.dtype)), 0)
class ScalarVariable(_scalar_py_operators, Variable):
pass
......
......@@ -29,6 +29,8 @@ from theano import gof
from theano.tensor import TensorType
from theano import tensor
from theano.tensor.opt import Shape_i
from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
#from theano.sandbox import cuda
from theano.compile.profiling import ScanProfileStats
......@@ -431,7 +433,7 @@ class Scan(PureOp):
aux_txt += str(k) + ','
aux_txt += '},%s,%s}'
else:
aux_txt +='{%s,%s}'
aux_txt += '{%s,%s}'
aux_txt = aux_txt % (name, gpu_str, str(self.name))
return aux_txt
......@@ -1161,6 +1163,17 @@ class Scan(PureOp):
### GRAD FUNCTION
def grad(self, args, g_outs):
# This discards information about whether incoming gradients are 0
# or disconnected from the cost
# TODO: upgrade scan op to report disconnection correctly
def strip_disconnected(g):
if isinstance(g.type, DisconnectedType):
return None
return g
g_outs = [strip_disconnected(g) for g in g_outs]
# 1. forward pass - get the outputs after applying scan
scan_outputs = self(*args)
# 2. make sure they are given as a list
......@@ -1512,7 +1525,7 @@ class Scan(PureOp):
if type(outputs) not in (list, tuple):
outputs = [outputs]
# Re-order the gradients correctly
gradients = [None]
gradients = [grad_undefined(self, 0, args[0], 'Number of steps')]
offset = (self.n_mit_mot +
self.n_mit_sot +
......@@ -1522,8 +1535,16 @@ class Scan(PureOp):
end = self.n_mit_mot + self.n_mit_sot + self.n_sit_sot
gradients += [x[::-1] for x in outputs[:end]]
gradients += [None for x in xrange(self.n_shared_outs)]
gradients += [None for x in xrange(self.n_nit_sot)]
start = len(gradients)
gradients += [
grad_undefined(self, x + start, args[x + start],
'Shared Variable with update')
for x in xrange(self.n_shared_outs)]
start = len(gradients)
gradients += [
grad_undefined(self, x + start, args[x + start],
'Dimension of memory buffer for output')
for x in xrange(self.n_nit_sot)]
begin = end
end = begin + n_sitsot_outs
......@@ -1547,7 +1568,8 @@ class Scan(PureOp):
rop_self_outputs = self_outputs
if self.info['n_shared_outs'] > 0:
rop_self_outputs = rop_self_outputs[:-self.info['n_shared_outs']]
rop_outs = tensor.Rop(rop_self_outputs, rop_of_inputs, inner_eval_points)
rop_outs = tensor.Rop(rop_self_outputs, rop_of_inputs,
inner_eval_points)
if type(rop_outs) not in (list, tuple):
rop_outs = [rop_outs]
# Step 2. Figure out what corresponds to what in the scan
......@@ -1653,7 +1675,7 @@ class Scan(PureOp):
scan_sit_sot = inputs[b:e] + clean_eval_points
inner_sit_sot = self_inputs[ib:ie] + inner_eval_points[ib:ie]
#Shared outs ...
# Shared outs ...
b = e
e = e + self.n_shared_outs
ib = ie
......@@ -1738,7 +1760,7 @@ class Scan(PureOp):
b = e + self.n_nit_sot
e = e + self.n_nit_sot * 2
final_outs += outputs[b:e]
final_outs += [None]*self.n_shared_outs
final_outs += [None] * self.n_shared_outs
return final_outs
......
......@@ -1816,10 +1816,12 @@ class T_Scan(unittest.TestCase):
def test_scan_extra_inputs_hessian(self):
x = theano.tensor.vector('x')
A = theano.tensor.matrix('A')
fc1 = theano.shared(0.5)
fc2 = theano.shared(0.9)
fc1 = theano.shared(0.5, name = 'fc1')
fc2 = theano.shared(0.9, name = 'fc2')
y = fc1 * theano.dot(x * x, theano.dot(A, x))
y.name = 'y'
gy = theano.tensor.grad(y, x)
gy.name = 'gy'
hy, updates = theano.scan(
lambda i, gy, x: theano.tensor.grad(gy[i] * fc2, x),
sequences=theano.tensor.arange(gy.shape[0]),
......@@ -1829,7 +1831,9 @@ class T_Scan(unittest.TestCase):
vx = numpy.array([1., 1.], dtype=theano.config.floatX)
vA = numpy.array([[1., 1.], [1., 0.]], dtype=theano.config.floatX)
vR = numpy.array([[3.6, 1.8], [1.8, 0.9]], dtype=theano.config.floatX)
assert numpy.allclose(f(vx, vA), vR)
out = f(vx, vA)
assert numpy.allclose(out, vR)
def test_cloning_no_replace_strict_copy_inputs(self):
# This has nothing to do with scan, but it refers to the clone
......@@ -3479,14 +3483,15 @@ def test_compute_test_value():
backup = theano.config.compute_test_value
theano.config.compute_test_value = 'raise'
try:
x = tensor.vector()
x = tensor.vector('x')
xv = numpy.ones(3, dtype=theano.config.floatX)
x.tag.test_value = xv
y = theano.shared(numpy.arange(3, dtype=theano.config.floatX))
y = theano.shared(numpy.arange(3, dtype=theano.config.floatX), name='y')
z, _ = theano.scan(
fn=lambda u, v: u + v,
sequences=[x, y])
assert not _
z.name='z'
# The gradient computation used to crash before 6af465e.
g = tensor.grad(z.sum(), x)
#f = theano.function([x], g)
......
......@@ -7,7 +7,6 @@ http://www-users.cs.umn.edu/~saad/software/SPARSKIT/paper.ps
# TODO
# Automatic methods for determining best sparse format?
from itertools import izip
import sys
import numpy
......@@ -16,14 +15,14 @@ import scipy.sparse
from theano import gof, tensor, compile, scalar, config
from theano.gof.python25 import all
from theano.tensor import blas
from theano.gradient import DisconnectedType
from theano.sparse.utils import hash_from_sparse
import theano.tests.unittest_tools as utt
sparse_formats = ['csc', 'csr']
#TODO: move this decorator to the compile submodule
# TODO: move this decorator to the compile submodule
def register_specialize(lopt, *tags, **kwargs):
compile.optdb['specialize'].register((kwargs and kwargs.pop('name')) or
lopt.__name__, lopt, 'fast_run',
......@@ -256,7 +255,7 @@ def sp_zeros_like(x):
:return: The same as `x` with zero entries
for all element.
"""
#TODO: don't restrict to CSM formats
# TODO: don't restrict to CSM formats
_, _, indptr, shape = csm_properties(x)
return CSM(format=x.format)(numpy.array([], dtype=x.type.dtype),
numpy.array([]), tensor.zeros_like(indptr),
......@@ -291,7 +290,7 @@ class _sparse_py_operators:
def __rmul__(left, right):
return mul(left, right)
#extra pseudo-operator symbols
# extra pseudo-operator symbols
def __dot__(left, right):
return structured_dot(left, right)
......@@ -299,12 +298,12 @@ class _sparse_py_operators:
def __rdot__(right, left):
return structured_dot(left, right)
#N.B. THIS IS COMMENTED OUT ON PURPOSE!!!
# N.B. THIS IS COMMENTED OUT ON PURPOSE!!!
# Discussion with Fred & James (at least, and maybe others before)
# we decided that casting from a sparse to dense should be explicit
# because it's usually something you just want to be pretty careful
# about, and not to do by accident.
#def _as_TensorVariable(self):
# def _as_TensorVariable(self):
# return dense_from_sparse(self)
shape = property(lambda self: tensor.shape(dense_from_sparse(self)))
......@@ -441,7 +440,7 @@ class SparseType(gof.Type):
if strict:
raise TypeError("%s is not sparse, or not the right dtype (is %s, "
"expected %s)" % (value, value.dtype, self.dtype))
#The input format could be converted here
# The input format could be converted here
if allow_downcast:
sp = self.format_cls[self.format](value, dtype=self.dtype)
else:
......@@ -488,7 +487,7 @@ class SparseType(gof.Type):
return "Sparse[%s, %s]" % (str(self.dtype), str(self.format))
def values_eq_approx(self, a, b, eps=1e-6):
#WARNING: equality comparison of sparse matrices is not fast or easy
# WARNING: equality comparison of sparse matrices is not fast or easy
# we definitely do not want to be doing this un-necessarily during
# a FAST_RUN computation..
if not scipy.sparse.issparse(a) or not scipy.sparse.issparse(b):
......@@ -504,7 +503,7 @@ class SparseType(gof.Type):
return max(diff.data) < eps
def values_eq(self, a, b):
#WARNING: equality comparison of sparse matrices is not fast or easy
# WARNING: equality comparison of sparse matrices is not fast or easy
# we definitely do not want to be doing this un-necessarily during
# a FAST_RUN computation..
return scipy.sparse.issparse(a) \
......@@ -619,14 +618,25 @@ class CSMProperties(gof.Op):
out[0][0] = csm.data[self.kmap]
if str(csm.data.dtype) == 'int32':
out[0][0] = theano._asarray(out[0][0], dtype='int32')
#backport
#out[0][0] = csm.data if self.kmap is None else csm.data[self.kmap]
# backport
# out[0][0] = csm.data if self.kmap is None else csm.data[self.kmap]
out[1][0] = theano._asarray(csm.indices, dtype='int32')
out[2][0] = theano._asarray(csm.indptr, dtype='int32')
out[3][0] = theano._asarray(csm.shape, dtype='int32')
def grad(self, (csm,), g):
assert [gg is None for gg in g[1:]]
# g[1:] is all integers, so their Jacobian in this op
# is 0. We thus don't need to worry about what their values
# are.
# if g[0] is disconnected, then this op doesn't contribute
# any gradient anywhere. but we know that at least one of
# g[1:] is connected, or this grad method wouldn't have been
# called, so we should report zeros
if isinstance(g[0].type, DisconnectedType):
return [csm.zeros_like()]
data, indices, indptr, shape = csm_properties(csm)
return [CSM(csm.format)(g[0], indices, indptr, shape)]
# don't make this a function or it breaks some optimizations below
......@@ -662,10 +672,10 @@ class CSM(gof.Op):
:param data: One dimensionnal tensor representing
the data of the sparse to construct.
:param indices: One dimensionnal tensor of integers
:param indices: One dimensional tensor of integers
representing the indices of the sparse
matrix to construct.
:param indptr: One dimensionnal tensor of integers
:param indptr: One dimensional tensor of integers
representing the indice pointer for
the sparse matrix to construct.
:param shape: One dimensionnal tensor of integers
......@@ -673,9 +683,9 @@ class CSM(gof.Op):
matrix to construct.
:return: A sparse matrix having the properties
speficied by the inputs.
specified by the inputs.
:note: The grad method returns a dense vector, so it provide
:note: The grad method returns a dense vector, so it provides
a regular grad.
"""
......@@ -774,10 +784,10 @@ class CSM(gof.Op):
def grad(self, (x_data, x_indices, x_indptr, x_shape), (g_out,)):
g_data, g_indices, g_indptr, g_shape = csm_properties(g_out)
#unpack the data vector and wrap it as a 1d TensorType
# unpack the data vector and wrap it as a 1d TensorType
g_data = csm_grad(self.kmap)(x_data, x_indices, x_indptr, x_shape,
g_data, g_indices, g_indptr, g_shape)
return [g_data, None, None, None]
return [g_data, DisconnectedType()(), DisconnectedType()(), DisconnectedType()()]
def infer_shape(self, node, shapes):
if self.kmap is None:
......@@ -1195,7 +1205,7 @@ class GetItemScalar(gof.op.Op):
if isinstance(ind, slice):
raise Exception("GetItemScalar called with a slice as index!")
#in case of indexing using int instead of theano variable
# in case of indexing using int instead of theano variable
elif isinstance(ind, int):
ind = theano.tensor.constant(ind)
input_op += [ind]
......@@ -2026,7 +2036,7 @@ class MulSD(gof.op.Op):
def make_node(self, x, y):
x, y = as_sparse_variable(x), tensor.as_tensor_variable(y)
#upcast the tensor. Is the cast of sparse done implemented?
# upcast the tensor. Is the cast of sparse done implemented?
dtype = scalar.upcast(x.type.dtype, y.type.dtype)
if y.type.dtype != dtype:
y = tensor.cast(y, dtype)
......@@ -2049,7 +2059,7 @@ class MulSD(gof.op.Op):
elif len(y.shape) == 2:
# if we have enough memory to fit y, maybe we can fit x.asarray()
# too?
#TODO: change runtime from O(M*N) to O(nonzeros)
# TODO: change runtime from O(M*N) to O(nonzeros)
M, N = x.shape
assert x.shape == y.shape
......@@ -2810,7 +2820,7 @@ class StructuredDot(gof.Op):
raise ValueError('shape mismatch in StructuredDot.perform',
(a.shape, b.shape))
#variable = a.dot(b) # deprecated
# variable = a.dot(b) # deprecated
variable = a * b
if isinstance(node.outputs[0].type, SparseType):
assert _is_sparse(variable)
......@@ -2843,8 +2853,8 @@ class StructuredDot(gof.Op):
raise Exception("a.shape=%s, b.shape=%s, variable.shape=%s "
" ??? I have no idea why")
#The cast is needed as otherwise we hit the bug mentioned into
#theano._asarray function documentation.
# The cast is needed as otherwise we hit the bug mentioned into
# theano._asarray function documentation.
out[0] = theano._asarray(variable, str(variable.dtype))
def grad(self, (a, b), (g_out,)):
......@@ -3229,7 +3239,7 @@ class SamplingDot(gof.op.Op):
if not _is_sparse_variable(p):
raise TypeError(p)
#TODO: use it.
# TODO: use it.
dtype_out = scalar.upcast(x.type.dtype, y.type.dtype, p.type.dtype)
return gof.Apply(self, [x, y, p], [p.type()])
......
......@@ -25,6 +25,7 @@ from theano.tensor.utils import hash_from_ndarray
from theano.scalar import ComplexError, IntegerDivisionError
import theano.scalar.sharedvar
from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
### set up the external interface
from elemwise import Elemwise, DimShuffle, CAReduce, Sum
......@@ -32,7 +33,7 @@ from elemwise import Elemwise, DimShuffle, CAReduce, Sum
import logging
_logger = logging.getLogger("theano.tensor.basic")
#This is needed as we will hide it later
# This is needed as we will hide it later
python_complex = complex
python_any = any
python_all = all
......@@ -47,6 +48,7 @@ continuous_dtypes = map(str, scal.continuous_types)
discrete_dtypes = map(str, scal.discrete_types)
all_dtypes = map(str, scal.all_types)
class ShapeError(Exception):
"""Raised when the shape cannot be computed."""
pass
......@@ -108,7 +110,7 @@ if 0:
transfert the value on the gpu
"""
if hasattr(x, '_as_CudaNdarrayVariable'):
#TODO: pass name and ndim arguments
# TODO: pass name and ndim arguments
return x._as_CudaNdarrayVariable()
return as_tensor_variable(x, name, ndim)
......@@ -142,7 +144,7 @@ def as_tensor_variable(x, name=None, ndim=None):
return x._as_TensorVariable() # TODO: pass name and ndim arguments
if isinstance(x, gof.Apply):
#TODO: use Apply's default output mechanism
# TODO: use Apply's default output mechanism
if len(x.outputs) != 1:
raise ValueError(
"It is ambiguous which output of a multi-output Op has"
......@@ -161,7 +163,7 @@ def as_tensor_variable(x, name=None, ndim=None):
return x
else:
if (x.type.ndim > ndim):
#TODO: strip off leading broadcastable dimensions
# TODO: strip off leading broadcastable dimensions
raise ValueError(
'TensorType could not be cast to have %i dimensions' %
ndim, x.type)
......@@ -369,7 +371,7 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
if len(bcastable) < ndim:
bcastable = [True] * (ndim - len(bcastable)) + bcastable
elif len(bcastable) > ndim:
#TODO: strip off dimensions of size 1
# TODO: strip off dimensions of size 1
raise ValueError(
'ndarray could not be cast to constant with %i dimensions' %
ndim)
......@@ -394,6 +396,7 @@ def constant(x, name=None, ndim=None, dtype=None):
return constant_or_value(x, rtype=TensorConstant, name=name, ndim=ndim,
dtype=dtype)
def _obj_is_wrappable_as_tensor(x):
try:
constant(x)
......@@ -405,7 +408,7 @@ def _obj_is_wrappable_as_tensor(x):
def _wrap_tensor_into_member(x):
return compile.module.Member(constant(x))
compile.module.register_wrapper(_obj_is_wrappable_as_tensor,
_wrap_tensor_into_member, no_warn = True)
_wrap_tensor_into_member, no_warn=True)
if int(config.tensor.cmp_sloppy) > 1:
......@@ -427,15 +430,15 @@ elif int(config.tensor.cmp_sloppy):
float64_rtol = 1e-4
float64_atol = 1e-3
else:
#If you change those value in test don't forget to put them back
#when the test end. Don't forget the case when the test fail.
# If you change those value in test don't forget to put them back
# when the test end. Don't forget the case when the test fail.
float32_atol = 1e-5
float32_rtol = 1e-5
# defaults in numpy.allclose
float64_rtol = 1.0000000000000001e-05
float64_atol = 1e-8
#more strict. Atleast float32 precision.
# more strict. Atleast float32 precision.
float64_rtol = 1.0000000000000001e-06
......@@ -494,9 +497,9 @@ def get_constant_value(v):
shape, val = v.owner.inputs
# fill(a,b) fills the shape of 'a' filled with 'b'
return get_constant_value(val)
#Don't act as the constant_folding optimization here as this
#fct is used too early in the optimization phase. This would
#mess with the stabilization optimization.
# Don't act as the constant_folding optimization here as this
# fct is used too early in the optimization phase. This would
# mess with the stabilization optimization.
if isinstance(v.owner.op, Elemwise) and isinstance(
v.owner.op.scalar_op, scal.Cast):
const = get_constant_value(v.owner.inputs[0])
......@@ -529,7 +532,7 @@ def get_constant_value(v):
ret = v.owner.inputs[0].owner.inputs[
v.owner.op.idx_list[0] + 1]
ret = get_constant_value(ret)
#join can cast implicitly its input in some case.
# join can cast implicitly its input in some case.
return theano._asarray(ret, dtype=v.type.dtype)
if (v.owner.inputs[0].owner and
isinstance(v.owner.inputs[0].owner.op,
......@@ -542,7 +545,7 @@ def get_constant_value(v):
ret = v.owner.inputs[0].owner.inputs[v.owner.op.idx_list[0]]
ret = get_constant_value(ret)
#MakeVector can cast implicitly its input in some case.
# MakeVector can cast implicitly its input in some case.
return theano._asarray(ret, dtype=v.type.dtype)
# This is needed when we take the grad as the Shape op
......@@ -747,8 +750,8 @@ class TensorType(Type):
This function is used internally as part of C code generation.
"""
#TODO: add more type correspondances for e.g. int32, int64, float32,
#complex64, etc.
# TODO: add more type correspondances for e.g. int32, int64, float32,
# complex64, etc.
try:
return {
'float32': (float, 'npy_float32', 'NPY_FLOAT32'),
......@@ -786,7 +789,7 @@ class TensorType(Type):
@staticmethod
def values_eq(a, b, force_same_dtype=True):
#TODO: check to see if the shapes must match
# TODO: check to see if the shapes must match
# for now, we err on safe side...
if a.shape != b.shape:
return False
......@@ -863,14 +866,14 @@ class TensorType(Type):
# Find places where both a and b have inf of the same sign.
both_inf = a_inf * numpy.isinf(b)
#cmp_elemwise is weird when we have inf and -inf.
#set it to False
# cmp_elemwise is weird when we have inf and -inf.
# set it to False
cmp_elemwise = numpy.where(
both_inf & cmp_elemwise,
a == b,
cmp_elemwise)
#check the sign of the inf
# check the sign of the inf
both_inf = numpy.where(both_inf, (a == b), both_inf)
if allow_remove_inf:
......@@ -1244,21 +1247,21 @@ tensor4s, ftensor4s, dtensor4s, itensor4s, ltensor4s = _multi(
class _tensor_py_operators:
#UNARY
# UNARY
def __abs__(self):
return abs_(self)
def __neg__(self):
return neg(self)
#CASTS
# CASTS
#### REMOVED THESE BECAUSE PYTHON appears to require __int__ to return
#### an int. -JB 20081112
#def __int__(self): return convert_to_int32(self)
#def __float__(self): return convert_to_float64(self)
#def __complex__(self): return convert_to_complex128(self)
#COMPARISONS
# COMPARISONS
_is_nonzero = True
def __lt__(self, other):
......@@ -1294,7 +1297,7 @@ class _tensor_py_operators:
else:
raise TypeError("Variable does not support boolean operations.")
#BITWISE
# BITWISE
def __invert__(self):
return invert(self)
......@@ -1316,16 +1319,16 @@ class _tensor_py_operators:
def __rxor__(self, other):
return xor(other, self)
#def __iand__(self, other):
# def __iand__(self, other):
# return _and_inplace(self, other)
#
#def __ior__(self, other):
# def __ior__(self, other):
# return _or_inplace(self, other)
#
#def __ixor__(self, other):
# return _xor_inplace(self, other)
#ARITHMETIC - NORMAL
# ARITHMETIC - NORMAL
def __add__(self, other):
try:
return add(self, other)
......@@ -1439,7 +1442,7 @@ class _tensor_py_operators:
def __rpow__(self, other):
return pow(other, self)
#TRANSPOSE
# TRANSPOSE
T = property(lambda self: transpose(self))
def transpose(self, *axes):
......@@ -1502,10 +1505,9 @@ class _tensor_py_operators:
"""
if ndim is not None:
if not isinstance(ndim,int):
if not isinstance(ndim, int):
raise ValueError("Expected ndim to be an integer, is "\
+str(type(ndim)))
+ str(type(ndim)))
return reshape(self, shape, ndim=ndim)
......@@ -1542,7 +1544,7 @@ class _tensor_py_operators:
def astype(self, dtype):
return cast(self, dtype)
#SLICING
# SLICING
# Do not define __getslice__ here:
# When calling t[1:], for instance, the arguments passed to __getslice__
# are (1, sys.maxsize), which is a pain to deal with, and can even not be
......@@ -1602,7 +1604,7 @@ class _tensor_py_operators:
return Subtensor(args)(self, *Subtensor.collapse(args,
lambda entry: isinstance(entry, Variable)))
#COPYING
# COPYING
def copy(self):
return tensor_copy(self)
......@@ -1629,7 +1631,7 @@ class _tensor_py_operators:
dtype = property(lambda self: self.type.dtype)
""" The dtype of this tensor. """
#extra pseudo-operator symbols
# extra pseudo-operator symbols
def __dot__(left, right):
return dot(left, right)
......@@ -1649,7 +1651,7 @@ class _tensor_py_operators:
raise NotImplementedError()
if numpy.isinf(L):
raise NotImplementedError()
#optimizations will/should catch cases like L=1, L=2
# optimizations will/should catch cases like L=1, L=2
return pow(pow(abs_(self), L).sum(axis=axis), 1.0 / L)
def mean(self, axis=None, dtype=None, keepdims=False):
......@@ -1668,7 +1670,7 @@ class _tensor_py_operators:
"""See `theano.tensor.max`"""
return max(self, axis, keepdims=keepdims)
#TO TRUMP NUMPY OPERATORS
# TO TRUMP NUMPY OPERATORS
__array_priority__ = 1000
def get_constant_value(self):
......@@ -1697,7 +1699,7 @@ class TensorConstantSignature(tuple):
except Exception:
return False
#N.B. compare shape to ensure no broadcasting in ==
# N.B. compare shape to ensure no broadcasting in ==
if t0 != t1 or d0.shape != d1.shape:
return False
......@@ -1802,7 +1804,6 @@ class TensorConstant(_tensor_py_operators, Constant):
TensorType.Constant = TensorConstant
Tensor = TensorType
......@@ -1816,6 +1817,7 @@ elemwise.TensorConstant = TensorConstant
# Utilities
#########################
def _redefine(real_symbol_value, module='tensor'):
"""Replace the value associated with a function symbol.
......@@ -1872,7 +1874,7 @@ def _scal_elemwise_with_nfunc(nfunc, nin, nout):
if getattr(symbol, '__doc__', False):
rval.__doc__ = symbol.__doc__ + '\n' + rval.__doc__
#for the meaning of this see the ./epydoc script
# for the meaning of this see the ./epydoc script
# it makes epydoc display rval as if it were a function, not an object
rval.__epydoc_asRoutine = symbol
rval.__module__ = 'tensor'
......@@ -1965,7 +1967,7 @@ class ScalarFromTensor(Op):
scalar_from_tensor = ScalarFromTensor()
#to be removed as we get the epydoc routine-documenting thing going
# to be removed as we get the epydoc routine-documenting thing going
#-JB 20080924
def _conversion(real_value, name):
__oplist_tag(real_value, 'casting')
......@@ -2061,6 +2063,7 @@ def cast(x, dtype):
# Unary Operations
##########################
class Shape(Op):
"""
L{Op} to return the shape of a matrix.
......@@ -2077,13 +2080,13 @@ class Shape(Op):
return self.__class__.__name__
def make_node(self, x):
#Must work for all type that have a shape attribute.
#This will fail at execution time.
# Must work for all type that have a shape attribute.
# This will fail at execution time.
x = as_tensor_variable(x)
#Each type variable should implement their .shape attribute
#and have the fct infer_shape() implemented in the op that convert
#the type to TensorVariable to have the optimization working
#correctly.
# Each type variable should implement their .shape attribute
# and have the fct infer_shape() implemented in the op that convert
# the type to TensorVariable to have the optimization working
# correctly.
return Apply(self, [x], [lvector()])
def perform(self, node, inp, out_):
......@@ -2094,8 +2097,21 @@ class Shape(Op):
def infer_shape(self, node, in_shapes):
return [[len(in_shapes[0])]]
def connection_pattern(self):
# the grad returns the gradient with respect to the
# elements of a tensor variable
# the elements of the tensor variable do not participate
# in the computation of the shape, so they are not really
# part of the graph
return [False]
def grad(self, inp, grads):
return [grad_undefined(self,0,inp[0])]
# the grad returns the gradient with respect to the
# elements of a tensor variable
# the elements of the tensor variable do not participate
# in the computation of the shape, so they are not really
# part of the graph
return [None]
def R_op(self, inputs, eval_points):
return [None]
......@@ -2113,7 +2129,7 @@ def old_shape(a):
shape at graph-execution time.
"""
va = as_tensor_variable(a)
#print 'HERE', va, va.type
# print 'HERE', va, va.type
if None in va.type.shape:
# Some shape components are unknown at this time
return _shape(va)
......@@ -2314,9 +2330,21 @@ class MaxAndArgmax(Op):
x, axis = inp
g_max, g_max_idx = grads
# Check to see if the gradient on max is None
if g_max is None:
return None, None
g_max_disconnected = isinstance(g_max.type, DisconnectedType)
g_max_idx_disconnected = isinstance(g_max_idx.type, DisconnectedType)
# if the op is totally disconnected, so are its inputs
if g_max_disconnected and g_max_idx_disconnected:
return [DisconnectedType()(), DisconnectedType()()]
axis_grad = grad_undefined(self, 1, axis,
"argmax is not defined for non-integer axes so"
" argmax(x, axis+eps) is undefined")
# if the max is disconnected but the argmax is not,
# the gradient on its inputs is zero
if g_max_disconnected:
return [x.zeros_like(), axis_grad]
xmax = max(x, axis)
# Raise the g_max and xmax to the same number of dim as the input.
......@@ -2336,7 +2364,7 @@ class MaxAndArgmax(Op):
# Set the grad to the correct position.
g_x = eq(xmax_pad, x) * g_max_pad
return g_x, grad_undefined(self, 1, axis)
return g_x, axis_grad
def __str__(self):
return self.__class__.__name__
......@@ -2458,7 +2486,7 @@ def min(x, axis=None, keepdims=False):
if str_x_type.startswith('float') or str_x_type in int_dtypes:
return -max(-x, axis=axis, keepdims=keepdims)
else:
#Be careful about unsigned integers, complex
# Be careful about unsigned integers, complex
raise NotImplementedError()
......@@ -2479,7 +2507,7 @@ def argmin(x, axis=None, keepdims=False):
if str_x_type.startswith('float') or str_x_type in int_dtypes:
return argmax(-x, axis=axis, keepdims=keepdims)
else:
#Be careful about unsigned integers, complex
# Be careful about unsigned integers, complex
raise NotImplementedError()
......@@ -2707,7 +2735,7 @@ def sqr(a):
"""square of a"""
#alias to sqr, included to maintain similarity with numpy interface
# alias to sqr, included to maintain similarity with numpy interface
square = sqr
......@@ -2849,7 +2877,8 @@ def complex_from_polar(abs, angle):
# Misc
##########################
#fill, _fill_inplace = _elemwise(scal.second, 'fill',
# fill, _fill_inplace = _elemwise(scal.second, 'fill',
#"""fill WRITEME (elemwise)""")
@_scal_elemwise
def second(a, b):
......@@ -2917,7 +2946,7 @@ class Eye(gof.Op):
return [out_shape]
def grad(self, inp, grads):
return [ grad_undefined(self,i,inp[i]) for i in xrange(3) ]
return [grad_undefined(self, i, inp[i]) for i in xrange(3)]
def __eq__(self, other):
return type(self) == type(other) and self.dtype == other.dtype
......@@ -3092,7 +3121,7 @@ class Alloc(gof.Op):
out[0] = numpy.empty(sh, dtype=v.dtype)
out[0][...] = v # broadcast v to fill us up
else:
#reuse the allocated memory.
# reuse the allocated memory.
out[0][...] = v # broadcast v to fill us up
def c_code(self, node, name, inp, out, sub):
......@@ -3280,12 +3309,12 @@ class Mean(elemwise.CAReduce):
if self.axis is not None:
return super(Op, self).c_code(node, name, inames, onames, sub)
ret = elemwise.CAReduce.c_code(self, node, name, inames, onames, sub)
#TODO: c_code perform support only axis is None
# TODO: c_code perform support only axis is None
return ret + """
*((double *)PyArray_DATA(%s)) /= PyArray_SIZE(%s);
""" % (onames[0], inames[0])
#TODO: implement the grad. When done and tested, you can make this the default
# TODO: implement the grad. When done and tested, you can make this the default
# version.
# def grad(self, (x,), (gout,)):
# import pdb;pdb.set_trace()
......@@ -3379,28 +3408,33 @@ def var(input, axis=None, keepdims=False):
if isinstance(axis, int):
axis = [axis]
#compute the axis-wise mean
# compute the axis-wise mean
mean_input = mean(input, axis, keepdims=True)
#center the input
# center the input
centered_input = input - mean_input
#return the mean sqr
# return the mean sqr
return mean((centered_input ** 2), axis, keepdims=keepdims)
@constructor
def std(input, axis=None, keepdims=False):
"""
Computes the standard deviation along the given axis(es) of a tensor `input`.
Computes the standard deviation along the given axis(es)
of a tensor `input`.
:param axis: Compute the standard deviation along this axis of the tensor.
:param axis: Compute the standard deviation along this
axis of the tensor.
None means all axes (like numpy).
:type axis: None or int or (list of int) (see `Sum`)
:param keepdims: If this is set to True, the axes which are reduced are
left in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original tensor.
:param keepdims: If this is set to True, the axes
which are reduced are
left in the result as dimensions with size one.
With this option,
the result will broadcast correctly against the
original tensor.
"""
return sqrt(var(input=input, axis=axis, keepdims=keepdims))
......@@ -3423,8 +3457,8 @@ if 0:
type = TensorType(dtype=input.type.dtype,
broadcastable=broadcastable)
#backport
#type = TensorType(dtype=input.type.dtype,
# backport
# type = TensorType(dtype=input.type.dtype,
# broadcastable=[
# False if i==axis else x
# for i, x in enumerate(input.broadcastable)])
......@@ -3859,7 +3893,7 @@ class Subtensor(Op):
exception.subtensor_invalid = True
raise exception
#infer the broadcasting pattern
# infer the broadcasting pattern
padded = (idx_list
+ [slice(None, None, None)] * (x.type.ndim - len(idx_list)))
broadcastable = [bc for p, bc in zip(padded, x.type.broadcastable)
......@@ -3942,7 +3976,7 @@ class Subtensor(Op):
return type(self) == type(other) and self.idx_list == other.idx_list
def __hash__(self):
#TODO: optimize by cache this hash value
# TODO: optimize by cache this hash value
msg = []
for entry in self.idx_list:
if isinstance(entry, slice):
......@@ -3951,8 +3985,8 @@ class Subtensor(Op):
msg += [entry]
idx_list = tuple(msg)
#backport
#idx_list = tuple((entry.start, entry.stop, entry.step)
# backport
# idx_list = tuple((entry.start, entry.stop, entry.step)
# if isinstance(entry, slice)
# else entry
# for entry in self.idx_list)
......@@ -3989,7 +4023,7 @@ class Subtensor(Op):
fail = sub['fail']
init_cmds = [] # initialization for subtensor_spec
is_slice = []
#TODO: change that, it might lead to unexpected results,
# TODO: change that, it might lead to unexpected results,
# see assembla-#767
NONE_CODE = maxsize - 1
......@@ -4040,7 +4074,7 @@ class Subtensor(Op):
for entry in idx_list:
init_entry(entry)
#make sure we used all inputs
# make sure we used all inputs
assert input_pos() == len(inputs), input_pos()
assert len(is_slice) <= node.inputs[0].ndim, node.inputs[0].ndim
......@@ -4213,7 +4247,7 @@ class Subtensor(Op):
}
PyArray_UpdateFlags(xview, NPY_C_CONTIGUOUS|NPY_F_CONTIGUOUS);
""" % locals()
#print rval
# print rval
return rval
@staticmethod
......@@ -4398,7 +4432,7 @@ class IncSubtensor(Op):
msg += [entry]
idx_list = tuple(msg)
#backport
# backport
#idx_list = tuple((entry.start, entry.stop, entry.step)
# if isinstance(entry, slice)
# else entry
......@@ -4675,7 +4709,7 @@ class Split(Op):
def perform(self, node, inputs, outputs):
"""WRITEME"""
x, axis, splits = inputs
#in python 2.4, x.shape[numpy.asarray(1)] don't work.
# in python 2.4, x.shape[numpy.asarray(1)] don't work.
if sys.version_info[0:2] == (2, 4) and axis.size == 1:
axis = int(axis)
......@@ -5376,7 +5410,6 @@ class Reshape(Op):
raise ValueError('Cannot reshape input of shape %s to shape %s' %
(x.shape, shp))
def grad(self, inp, grads):
x, shp = inp
g_out, = grads
......@@ -5399,7 +5432,7 @@ class Reshape(Op):
# The following expression leads to cycles in feature_shape,
# because it tries to replace the Shape_i node by the switch
# statement, which depends on Shape_i.
#return [tuple([switch(eq(node.inputs[1][i], -1),
# return [tuple([switch(eq(node.inputs[1][i], -1),
# theano.tensor.opt.Shape_i(i)(node.outputs[0]),
# node.inputs[1][i])
# for i in xrange(self.ndim)]
......@@ -5462,7 +5495,8 @@ class Reshape(Op):
%(shp)s->data + ii * %(shp)s->strides[0]))[0];
}
Py_XDECREF(%(z)s);
%(z)s = (PyArrayObject *) PyArray_Newshape(%(x)s, &newshape, PyArray_CORDER);
%(z)s = (PyArrayObject *) PyArray_Newshape(%(x)s, &newshape,
PyArray_CORDER);
if (!%(z)s)
{
PyErr_Format(PyExc_ValueError,
......@@ -5557,7 +5591,7 @@ def flatten(x, outdim=1):
# """
# Calculates the gradient of the Tile Op.
# """
# #this is so weird, I can't think of how to make this a general thing.
# # this is so weird, I can't think of how to make this a general thing.
# def make_node(self, x, reps, g_out):
# return gof.Apply(self, [x, reps, g_out], [x.type()])
#
......@@ -5645,11 +5679,11 @@ def tile(x, reps, ndim=None):
TODO: expand this.
"""
try:
assert python_all([int(i) == i for i in iter(reps)])
except (TypeError, AssertionError):
raise ValueError("reps argument to tile must be a constant (e.g. "
"tuple, list of integers)")
try:
assert python_all([int(i) == i for i in iter(reps)])
except (TypeError, AssertionError):
raise ValueError("reps argument to tile must be a constant (e.g. "
"tuple, list of integers)")
if len(reps) != x.ndim:
raise ValueError("len(reps) != x.ndim not currently supported")
elif (ndim is not None) and ndim != x.ndim:
......@@ -5663,7 +5697,7 @@ def tile(x, reps, ndim=None):
ndim = len(reps)
# backport
# ndim = len(reps) if ndim is None else ndim #not sure if len(shp) is going
# ndim = len(reps) if ndim is None else ndim # not sure if len(shp) is going
# to work.
if ndim not in tile.op:
tile.op[ndim] = Tile(ndim)
......@@ -6146,7 +6180,7 @@ class AdvancedSubtensor(Op):
def make_node(self, x, *inputs):
x = as_tensor_variable(x)
#FIXME
# FIXME
# Note (9 Jul 2012): what does this 'FIXME' mean? Possibly that the
# current implementation must be generalized? Please specify.
if x.ndim == 2 and len(inputs) == 2:
......@@ -6209,7 +6243,7 @@ class AdvancedSubtensor(Op):
'are too big (>= 2^32 elements). It is possible that '
'out[0] (%s), with shape %s, is not correctly filled.'
% (out[0], out[0].shape))
#return
# return
#raise NotImplementedError()
def grad(self, inputs, grads):
......@@ -6232,8 +6266,8 @@ class AdvancedIncSubtensor(Op):
def __init__(self, inplace=False, set_instead_of_inc=False):
self.inplace = inplace
self.set_instead_of_inc = set_instead_of_inc
#The assert is needed as in the pass the first argument was
#something else that was not used.
# The assert is needed as in the pass the first argument was
# something else that was not used.
assert isinstance(inplace, bool)
if self.inplace:
raise NotImplementedError('In place computation is not'
......@@ -6325,6 +6359,7 @@ advanced_inc_subtensor = AdvancedIncSubtensor()
#
# TODO: Dotinv should go here, Eigs, Svd, etc.
class Dot(Op):
"""Compute matrix-matrix, matrix-vector products and vector inner-products.
......@@ -6351,7 +6386,7 @@ class Dot(Op):
numpy_semantics = 0
if numpy_semantics:
#numpy defines dot for tensor pairs with any rank
# numpy defines dot for tensor pairs with any rank
if len(inputs) != 2:
raise TypeError(
"Wrong number of inputs for %s (got %i, expected 2)" %
......@@ -6712,7 +6747,7 @@ def tensordot(x, y=None, axes=2):
return tensordot.op[axes](x, y)
#TODO: tensordot should be function as described in rst docs.
# TODO: tensordot should be function as described in rst docs.
def outer(x, y):
......
......@@ -98,26 +98,26 @@ class Conv3D(theano.Op):
if 'name' in dir(dCdH) and dCdH.name is not None:
dCdH_name = dCdH.name
else:
dCdH_name = 'anon'
dCdH_name = 'anon_dCdH'
if 'name' in dir(V) and V.name is not None:
V_name = V.name
else:
V_name = 'anon'
V_name = 'anon_V'
if 'name' in dir(W) and W.name is not None:
W_name = W.name
else:
W_name = 'anon'
W_name = 'anon_W'
if 'name' in dir(b) and b.name is not None:
b_name = b.name
else:
b_name = 'anon'
b_name = 'anon_b'
dCdV.name = 'Conv3D_dCdV.dCdH='+dCdH_name+',V='+V_name
dCdW.name = 'Conv3D_dCdW.dCdH='+dCdH_name+',V='+V_name+',W='+W_name
dCdb.name = 'Conv3D_dCdb.dCdH='+dCdH_name+',V='+V_name+',W='+W_name+',b='+b_name
dCdV.name = 'Conv3D_dCdV(dCdH='+dCdH_name+',V='+V_name+')'
dCdW.name = 'Conv3D_dCdW(dCdH='+dCdH_name+',V='+V_name+',W='+W_name+')'
dCdb.name = 'Conv3D_dCdb(dCdH='+dCdH_name+',V='+V_name+',W='+W_name+',b='+b_name+')'
......
......@@ -56,22 +56,22 @@ class ConvTransp3D(theano.Op):
if 'name' in dir(dCdR) and dCdR.name is not None:
dCdR_name = dCdR.name
else:
dCdR_name = 'anon'
dCdR_name = 'anon_dCdR'
if 'name' in dir(H) and H.name is not None:
H_name = H.name
else:
H_name = 'anon'
H_name = 'anon_H'
if 'name' in dir(W) and W.name is not None:
W_name = W.name
else:
W_name = 'anon'
W_name = 'anon_W'
if 'name' in dir(b) and b.name is not None:
b_name = b.name
else:
b_name = 'anon'
b_name = 'anon_b'
dCdW.name = 'ConvTransp3D_dCdW.H='+H_name+',dCdR='+dCdR_name+',W='+W_name
......
......@@ -780,9 +780,19 @@ class ConvOp(OpenMPOp):
# build a "node", that should be equivalent to the one given by
# self.make_node, but using conv3D instead of self.
shuffled_inputs = inputs.dimshuffle(0, 2, 3, 'x', 1)
if inputs.name is not None:
shuffled_inputs.name = 'shuffle_for_conv3D(%s)' % inputs.name
flipped_kerns = kerns[:, :, ::-1, ::-1]
if kerns.name is not None:
flipped_kerns.name = 'flipped(%s)' % kerns.name
shuffled_kerns = flipped_kerns.dimshuffle(0, 2, 3, 'x', 1)
if flipped_kerns.name is not None:
shuffled_kerns.name = 'shuffled_for_conv3D(%s)' % flipped_kerns.name
tmp_node = theano.tensor.nnet.conv3D(
V=inputs.dimshuffle(0, 2, 3, 'x', 1),
W=kerns[:, :, ::-1, ::-1].dimshuffle(0, 2, 3, 'x', 1),
V = shuffled_inputs,
W= shuffled_kerns,
b=theano.tensor.alloc(numpy.asarray(0, dtype=kerns.dtype),
kerns.shape[0]),
d=(self.dx, self.dy, 1))
......
......@@ -14,6 +14,7 @@ from theano.compile import optdb
from theano.gof import Apply
from theano.tensor.nnet.sigm import sigmoid, softplus
from theano.gradient import DisconnectedType
############
......@@ -76,6 +77,10 @@ class SoftmaxWithBias(gof.Op):
def grad(self, inp, grads):
x, b = inp
g_sm, = grads
if isinstance(g_sm.type, DisconnectedType):
return [ DisconnectedType()(), DisconnectedType()() ]
sm = softmax_with_bias(x, b)
dx = softmax_grad(g_sm, sm)
db = tensor.sum(dx, axis=0)
......@@ -710,21 +715,40 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
def grad(self, inp, grads):
x, b, y_idx = inp
g_nll, g_sm, g_am = grads
if g_am is not None:
raise NotImplementedError()
elif g_sm is not None:
# There is a gradient w.r.t. the softmax's output itself.
if g_nll is not None or g_am is not None:
raise NotImplementedError()
return softmax_with_bias.grad((x, b, ), (g_sm, )) + (None, )
else:
# There is a gradient w.r.t. the NLL.
assert g_nll is not None
dx_terms = []
db_terms = []
d_idx_terms = []
if not isinstance(g_nll.type, DisconnectedType):
nll, sm = crossentropy_softmax_1hot_with_bias(x, b, y_idx)
#dx = CrossentropySoftmax1HotWithBiasDx()(g_nll, sm, y_idx)
dx = crossentropy_softmax_1hot_with_bias_dx(g_nll, sm, y_idx)
db = tensor.sum(dx, axis=[0])
return dx, db, None
dx_terms.append(dx)
db_terms.append(db)
if not isinstance(g_sm.type, DisconnectedType):
dx, db = softmax_with_bias.grad((x, b), (g_sm, ))
dx_terms.append(dx)
db_terms.append(db)
if not isinstance(g_am.type, DisconnectedType):
dx_terms.append(x.zeros_like())
db_terms.append(b.zeros_like())
d_idx_terms.append(y_idx.zeros_like())
def fancy_sum( terms ):
if len(terms) == 0:
return DisconnectedType()()
rval = terms[0]
for term in terms[1:]:
rval = rval + term
return rval
return [ fancy_sum(terms) for terms in
[dx_terms, db_terms, d_idx_terms ] ]
def c_headers(self):
return ['<iostream>', '<cmath>']
......
......@@ -18,7 +18,9 @@ class TestConv2D(utt.InferShapeTester):
def setUp(self):
super (TestConv2D, self).setUp()
self.input = T.dtensor4('input')
self.input.name = 'default_V'
self.filters = T.dtensor4('filters')
self.filters.name = 'default_filters'
def validate(self, image_shape, filter_shape,
border_mode='valid', subsample=(1, 1),
......@@ -34,7 +36,7 @@ class TestConv2D(utt.InferShapeTester):
N_filter_shape = [T.get_constant_value(T.
as_tensor_variable(x)) for x in filter_shape]
if not input:
if input is None:
input = self.input
if not filters:
filters = self.filters
......@@ -44,11 +46,16 @@ class TestConv2D(utt.InferShapeTester):
# we create a symbolic function so that verify_grad can work
def sym_conv2d(input, filters):
# define theano graph and function
return conv.conv2d(input, filters, image_shape, filter_shape,
input.name = 'input'
filters.name = 'filters'
rval = conv.conv2d(input, filters, image_shape, filter_shape,
border_mode, subsample, unroll_batch=unroll_batch,
unroll_kern=unroll_kern, unroll_patch=unroll_patch)
rval.name = 'conv_output'
return rval
output = sym_conv2d(input, filters)
output.name = 'conv2d(%s,%s)' % (input.name, filters.name)
theano_conv = theano.function([input, filters], output)
# initialize input and compute result
......
......@@ -121,33 +121,49 @@ class TestConv3D(utt.InferShapeTester):
mode.check_py_code = False
self.W = shared(N.ndarray(shape=(1, 1, 1, 1, 1), dtype=floatX))
self.W.name = 'W'
self.b = shared(N.zeros(1, dtype=floatX))
self.b.name = 'b'
self.rb = shared(N.zeros(1, dtype=floatX))
self.rb.name = 'rb'
self.V = shared(N.ndarray(shape=(1, 1, 1, 1, 1), dtype=floatX))
self.V.name = 'V'
self.d = shared(N.ndarray(shape=(3, ), dtype=int))
self.d.name = 'd'
self.H = conv3D(self.V, self.W, self.b, self.d)
self.H.name = 'H'
self.H_func = function([], self.H, mode=mode)
self.H_shape_func = function([], self.H.shape, mode=mode)
self.RShape = T.vector(dtype='int64')
self.RShape.name = 'RShape'
self.otherH = T.TensorType(floatX,
(False, False, False, False, False))(name='otherH')
self.transp = convTransp3D(self.W, self.rb, self.d,
self.otherH, self.RShape)
self.transp.name = 'transp'
self.transp_func = function([self.otherH, self.RShape],
self.transp, mode=mode)
self.R = convTransp3D(self.W, self.rb, self.d, self.H, self.RShape)
self.R.name = 'R'
self.R_func = function([self.RShape], self.R, mode=mode)
self.R_shape_func = function([self.RShape], self.R.shape)
self.reconsObj = T.sum(T.sqr(self.V - self.R))
diff = self.V - self.R
diff.name = 'diff'
sqr = T.sqr(diff)
sqr.name = 'sqr'
self.reconsObj = T.sum(sqr)
self.reconsObj.name = 'reconsObj'
self.reconsObjFunc = function([self.RShape], self.reconsObj, mode=mode)
W_grad = T.grad(self.reconsObj, self.W)
self.gradientsFunc = function([self.RShape],
[T.grad(self.reconsObj, self.W), T.grad(self.reconsObj,
[W_grad, T.grad(self.reconsObj,
self.H), T.grad(self.reconsObj, self.V),
T.grad(self.reconsObj, self.b)], mode=mode)
......
......@@ -2832,16 +2832,16 @@ class Canonizer(gof.LocalOptimizer):
# this canonized graph... if so, we do nothing and wait for
# them to be transformed.
def _bypass_dimshuffle(n):
if isinstance(n.op, DimShuffle) and len(n.outputs[0].clients) <= 1:
return _bypass_dimshuffle(n.outputs[0].clients.__iter__(
).next()[0])
if (isinstance(getattr(n, 'op', None), DimShuffle) and
len(n.outputs[0].clients) <= 1):
return _bypass_dimshuffle(n.outputs[0].clients[0][0])
else:
return n
for c, c_idx in out.clients:
if c == 'output':
continue
if _bypass_dimshuffle(c).op in [self.main, self.inverse,
self.reciprocal]:
if getattr(_bypass_dimshuffle(c), 'op', '') in [
self.main, self.inverse, self.reciprocal]:
return False
# Here we make the canonical version of the graph around this node
......
......@@ -2023,6 +2023,10 @@ class T_max_and_argmax(unittest.TestCase):
because there is no differentiable path from cost to the input and
not because of an error of the grad method of the op
"""
raise KnownFailureTest("The desired behavior of the grad method in this case is currently under debate. In any case, the result should be to return NaN or 0, not to report a disconnected input.")
x = matrix()
cost = argmax(x, axis=0).sum()
value_error_raised = False
......@@ -2220,6 +2224,7 @@ class T_argmin_argmax(unittest.TestCase):
def test_grad_argmin(self):
data = rand(2, 3)
n = as_tensor_variable(data)
n.name = 'n'
#test grad of argmin
utt.verify_grad(lambda v: argmin(v, axis=-1), [data])
......@@ -2231,7 +2236,9 @@ class T_argmin_argmax(unittest.TestCase):
utt.verify_grad(lambda v: argmin(v.flatten()), [data])
try:
grad(argmin(n, axis=-1), n)
cost = argmin(n, axis=-1)
cost.name = None
g = grad(cost, n)
raise Exception('Expected an error')
except TypeError:
pass
......@@ -4375,6 +4382,7 @@ class test_grad(unittest.TestCase):
o = test_grad.O()
a1 = o.make_node()
g0,g1 = grad(a1.outputs[0], a1.inputs)
g0.name = None
self.assertTrue(o.gval0 is g0)
self.assertTrue(o.gval1 is g1)
......@@ -4435,10 +4443,8 @@ class test_grad(unittest.TestCase):
v = vector()
m = matrix()
# grad(v,...) and grad(m,...) should fail
self.assertRaises(TypeError, grad, v, s)
self.assertRaises(TypeError, grad, v, m)
self.assertRaises(TypeError, grad, m, s)
self.assertRaises(TypeError, grad, m, v)
self.assertRaises(TypeError, grad, v, v)
self.assertRaises(TypeError, grad, m, m)
class T_op_cache(unittest.TestCase):
def setUp(self):
......
......@@ -10,19 +10,22 @@ from theano.gradient import grad_sources_inputs
from theano import gradient
from theano.tensor.nnet.Conv3D import conv3D
from theano import config
import numpy as np
one = theano.tensor.as_tensor_variable(1.)
def _grad_sources_inputs(*args):
# warn_type was introduced after this code, it complains throughout for nothing.
return grad_sources_inputs(warn_type=False, *args)
class test_grad_sources_inputs(unittest.TestCase):
def test_retNone1(self):
"""Test that it is not ok to return None from op.grad()"""
class retNone(gof.op.Op):
def make_node(self):
inputs = [gof.generic()]
outputs = [gof.generic()]
inputs = [theano.tensor.vector()]
outputs = [theano.tensor.vector()]
return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads):
x, = inp
......@@ -30,240 +33,118 @@ class test_grad_sources_inputs(unittest.TestCase):
pass
a = retNone().make_node()
try:
_grad_sources_inputs([(a.out, 1)], None)
except ValueError, e:
self.assertTrue(e[0] is gradient._msg_retType)
_grad_sources_inputs([(a.out, one)], None)
except TypeError, e:
return
self.fail()
def test_retNone1_b(self):
"""Test that it is ok to return [None] from op.grad()"""
class retNone(gof.op.Op):
def make_node(self, *inputs):
outputs = [gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads):
return [None]
i = gof.generic()
a = retNone().make_node(i)
g = _grad_sources_inputs([(a.out, 1)], None)
self.assertTrue(not i in g)
def test_wrong_rval_len1(self):
"""Test that it is not ok to return the wrong number of gradients"""
"""Test that it is not ok to return the wrong number of gradient terms"""
class retNone(gof.op.Op):
def make_node(self, *inputs):
outputs = [gof.generic()]
outputs = [theano.tensor.vector()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads):
return [None]
i = gof.generic()
j = gof.generic()
i = theano.tensor.vector()
j = theano.tensor.vector()
a1 = retNone().make_node(i)
g = _grad_sources_inputs([(a1.out, 1)], None)
g = _grad_sources_inputs([(a1.out, one)], None)
a2 = retNone().make_node(i,j)
try:
g = _grad_sources_inputs([(a2.out, 1)], None)
g = _grad_sources_inputs([(a2.out, one)], None)
except ValueError, e:
self.assertTrue(e[0] is gradient._msg_badlen)
return
self.fail()
def test_stop_on_all_none(self):
"""Test that op.grad() is not called when output grads are all None"""
class retNone(gof.op.Op):
def __init__(self, tst):
self.tst = tst
def make_node(self, *inputs):
outputs = [gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads):
self.tst.fail()
i = gof.generic()
a1 = retNone(self).make_node(i)
g = _grad_sources_inputs([(a1.out, None)], None)
def test_1in_1out(self):
"""Test grad is called correctly for a 1-to-1 op"""
gval = gof.generic()
gval = theano.tensor.matrix()
class O(gof.op.Op):
def make_node(self):
inputs = [gof.generic()]
outputs = [gof.generic()]
inputs = [theano.tensor.matrix()]
outputs = [theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads):
return gval,
a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None)
g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval)
def test_1in_Nout(self):
"""Test grad is called correctly for a 1-to-many op"""
gval = gof.generic()
gval = theano.tensor.matrix()
class O(gof.op.Op):
def make_node(self):
inputs = [gof.generic()]
outputs = [gof.generic(),gof.generic()]
inputs = [theano.tensor.matrix()]
outputs = [theano.tensor.scalar(),theano.tensor.scalar()]
return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads):
x, = inp
gz1, gz2 = grads
return gval,
a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None)
g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval)
def test_Nin_1out(self):
"""Test grad is called correctly for a many-to-1 op"""
gval0 = gof.generic()
gval1 = gof.generic()
gval0 = theano.tensor.scalar()
gval1 = theano.tensor.scalar()
class O(gof.op.Op):
def make_node(self):
inputs = [gof.generic(),gof.generic()]
outputs = [gof.generic()]
inputs = [theano.tensor.scalar(), theano.tensor.scalar()]
outputs = [theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads):
x0, x1 = inp
gz, = grads
return (gval0, gval1)
a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None)
g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval0)
self.assertTrue(g[a1.inputs[1]] is gval1)
def test_Nin_Nout(self):
"""Test grad is called correctly for a many-to-many op"""
gval0 = gof.generic()
gval1 = gof.generic()
gval0 = theano.tensor.matrix()
gval1 = theano.tensor.matrix()
class O(gof.op.Op):
def make_node(self):
inputs = [gof.generic(),gof.generic()]
outputs = [gof.generic(),gof.generic()]
inputs = [theano.tensor.matrix(),theano.tensor.matrix()]
outputs = [theano.tensor.matrix(),theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads):
return gval0, gval1
a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None)
g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval0)
self.assertTrue(g[a1.inputs[1]] is gval1)
def test_some_None_ograds(self):
"""Test grad is called when some output gradients are None"""
class O(gof.op.Op):
def __init__(self, tst):
self.tst = tst
def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()]
outputs = [theano.tensor.matrix(),theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, g_out):
return [1]
i = gof.generic()
return [one]
i = theano.tensor.matrix()
a1 = O(self).make_node(i)
g = grad_sources_inputs([(a1.outputs[0], 1)], None, warn_type=False)
self.assertTrue(g[i] is 1)
def test_some_None_igrads(self):
"""Test that traversal works properly when an op return some None"""
class O(gof.op.Op):
def __init__(self, tst, grad_ok):
self.tst = tst
self.grad_ok = grad_ok
def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, g_out):
if not self.grad_ok:
self.tst.fail()
else:
return [1, None]
i = gof.generic()
j = gof.generic()
k = gof.generic()
a1 = O(self, True).make_node(i,j)
a2 = O(self, True).make_node(a1.outputs[1], k)
g = grad_sources_inputs([(a2.outputs[0], 1)], None, warn_type=False)
self.assertTrue(g[i] is 1 and j not in g and k not in g)
a1 = O(self, True).make_node(i,j)
a2 = O(self, True).make_node(k, a1.outputs[1])
g = _grad_sources_inputs([(a2.outputs[0], 1)], None)
self.assertTrue(g[k] is 1 and i not in g and j not in g)
def test_inputs(self):
"""Test that passing inputs shortens the traversal"""
class O(gof.op.Op):
def __init__(self, tst, grad_ok):
self.tst = tst
self.grad_ok = grad_ok
def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads):
g0, g1 = grads
if not self.grad_ok:
self.tst.fail()
else:
if g1:
return [g0, g0+g1]
else:
return [g0, g0]
i = gof.generic()
j = gof.generic()
k = gof.generic()
a1 = O(self, True).make_node(i,j)
a2 = O(self, True).make_node(k,a1.outputs[1])
g = _grad_sources_inputs([(a2.outputs[0], 1), (a1.outputs[1],4),
(a1.outputs[0], 3), (a1.outputs[0], 3)], a1.outputs)
self.assertTrue(g[a2.inputs[0]] == 1)
self.assertTrue(g[a2.inputs[1]] == 5)
self.assertTrue(g[a1.outputs[0]] == 6)
self.assertTrue(g[a1.outputs[1]] == 5)
self.assertTrue(a1.inputs[0] not in g)
self.assertTrue(a1.inputs[1] not in g)
def test_multiple_sources(self):
"""Test that passing multiple sources works"""
class O(gof.op.Op):
def __init__(self, tst, grad_ok):
self.tst = tst
self.grad_ok = grad_ok
def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads):
g0, g1 = grads
if not self.grad_ok:
self.tst.fail()
else:
if g1:
return [g0, g0+g1]
else:
return [g0, g0]
i = gof.generic()
j = gof.generic()
k = gof.generic()
a1 = O(self,True).make_node(i,j)
a2 = O(self,True).make_node(k,a1.outputs[1])
g = _grad_sources_inputs([(a2.outputs[0], 1), (a1.outputs[1],4),
(a1.outputs[0], 3), (a1.outputs[0], 3)], None)
self.assertTrue(g[a2.inputs[0]] == 1)
self.assertTrue(g[a2.inputs[1]] == 5)
self.assertTrue(g[a1.outputs[0]] == 6)
self.assertTrue(g[a1.outputs[1]] == 5)
self.assertTrue(g[a1.inputs[0]] == 6)
self.assertTrue(g[a1.inputs[1]] == 11)
g = grad_sources_inputs([(a1.outputs[0], one)], None, warn_type=False)
self.assertTrue(g[i] is one)
def test_unimplemented_grad_func():
#tests that function compilation catches unimplemented grads in the graph
# tests that function compilation catches unimplemented grads in the graph
a = theano.tensor.vector()
b = theano.gradient.grad_not_implemented(theano.tensor.add, 0, a)
try:
f = theano.function([a], b)
f = theano.function([a], b, on_unused_input = 'ignore')
assert 0
#Note: it's important that the NotImplementedGradOp is caught
#at COMPILATION time, not execution time.
#If the uncomputable variable is, for example, multiplied by 0,
#it could be optimized out of the final graph.
except NotImplementedError:
except TypeError:
pass
def test_undefined_grad_func():
......@@ -271,13 +152,9 @@ def test_undefined_grad_func():
a = theano.tensor.vector()
b = theano.gradient.grad_undefined(theano.tensor.add, 0, a)
try:
f = theano.function([a],b)
f = theano.function([a],b, on_unused_input = 'ignore')
assert 0
#Note: it's important that the GradUndefinedOp is cauhgt at
#COMPILATION time, not execution time.
#If the uncomputable variable is, for example, multiplied by0,
#it could be optimized out of the final graph
except theano.gradient.GradUndefinedError:
except TypeError:
pass
def test_unimplemented_grad_grad():
......@@ -296,7 +173,7 @@ def test_unimplemented_grad_grad():
try:
g = theano.gradient.grad(b,a)
assert False
except NotImplementedError:
except TypeError:
pass
def test_undefined_grad_grad():
......@@ -314,7 +191,7 @@ def test_undefined_grad_grad():
try:
g = theano.gradient.grad(Z.sum(),d)
assert False
except theano.gradient.GradUndefinedError:
except TypeError:
pass
def test_grad_name():
......@@ -325,5 +202,97 @@ def test_grad_name():
g = theano.tensor.grad(f,x)
assert g.name == '(df/dx)'
def test_grad_duplicate_input():
#test that the grad works when a variable
#appears in more than one place in a node's input list
def output(x):
return (x*x)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
theano.tests.unittest_tools.verify_grad(output,[vx])
def test_grad_quadratic():
#test the gradient on a tiny graph
def cost(x,A):
return theano.tensor.dot(x,theano.tensor.dot(A,x))
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(cost,[vx,vA])
def test_grad_quadratic_vector():
#test the gradient on a small graph
def output(x,A):
return theano.tensor.dot(x*x,A)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(output,[vx,vA])
def test_grad_cubic():
#test the gradient on a bigger graph
def cost(x,A):
return theano.tensor.dot(x*x,theano.tensor.dot(A,x))
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(cost,[vx,vA])
def test_grad_grad_quadratic():
#test the gradient on a graph constructed using the gradient
def output(x,A):
orig_cost = theano.tensor.dot(x,theano.tensor.dot(A,x))
return theano.gradient.grad(orig_cost, x)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(output,[vx,vA])
def test_grad_grad_cubic():
#test the gradient on a bigger graph constructed using the gradient
def output(x,A):
orig_cost = theano.tensor.dot(x*x,theano.tensor.dot(A,x))
return theano.gradient.grad(orig_cost, x)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(output,[vx,vA])
if __name__ == '__main__':
unittest.main()
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论