提交 d95e876d authored 作者: nouiz's avatar nouiz

Merge pull request #899 from goodfeli/rebase_fix_grad

Rebase fix grad
...@@ -98,6 +98,31 @@ following methods: ...@@ -98,6 +98,31 @@ following methods:
lifetime of self. Op instances should be immutable in this lifetime of self. Op instances should be immutable in this
sense. sense.
.. function:: connection_pattern():
Optional (but in extremely rare cases needed to have it work with
{tensor,sparse}.grad).
Returns a list of bools the same length as the op's inputs list.
True signifies that the elements of an input have an effect on its
output.
False signifies that they do not--in other words, the op acts only
one the input's metadata such as its shape.
If no connection_pattern is implemented, tensor.grad will assume
it is a list containing only True.
Failing to implement this function for an op that needs it can
result in tensor.grad erroneously reporting that a gradient is
undefined. Returning 0 for this input in the grad method is not
the same as specifying that the elements of this input are not
connected to the output. If the gradient with respect to the
op's output is NaN but the elements of the input are not connected
to it, then the NaN never enters into the expression for the
gradient.
.. function:: grad(inputs, output_gradients) .. function:: grad(inputs, output_gradients)
Optional (but needed to have it work with {tensor,sparse}.grad()). Optional (but needed to have it work with {tensor,sparse}.grad()).
...@@ -106,31 +131,62 @@ following methods: ...@@ -106,31 +131,62 @@ following methods:
symbolically in this method. Both ``inputs`` and ``output_gradients`` symbolically in this method. Both ``inputs`` and ``output_gradients``
are lists of symbolic Theano Variables and those must be operated on using are lists of symbolic Theano Variables and those must be operated on using
Theano's symbolic language. The grad method must return a list containing Theano's symbolic language. The grad method must return a list containing
one Variable (or ``None``) for each input. Each returned Variable represents one Variable for each input. Each returned Variable represents
the gradient with respect to that input computed based on the symbolic gradients with the gradient with respect to that input computed based on the symbolic gradients with
respect to each output. respect to each output.
If the output is not differentiable with respect to any inputs, If the output is not differentiable with respect to an input
then this method should be defined to return ``[None for i in then this method should be defined to return a variable of type
inputs]``. If this method is not defined, then Theano assumes it has been NullType for that input.
If an element of output_gradient is of type theano.gradient.DisconnectedType,
it means that the cost is not a function of this output. If any of the
op's inputs participate in the computation of only disconnected outputs,
then Op.grad should return DisconnectedType variables for those inputs.
If the grad method is not defined, then Theano assumes it has been
forgotten. Symbolic differentiation will fail on a graph that forgotten. Symbolic differentiation will fail on a graph that
includes this Op. includes this Op.
It must be understood that the grad method is not meant to return the It must be understood that the Op's grad method is not meant to return the
gradient of the Op's output but rather the gradient of some other scalar gradient of the Op's output. theano.tensor.grad computes gradients; Op.grad
criterion C with respect to the Op's input. is a helper function that computes terms that appear in gradients.
If an Op has a single vector-valued output y and a single vector-valued input x,
then the grad method will be passed x and a second vector z. Define J to be
the Jacobian of y with respect to x. The Op's grad method should return
dot(J.T,z). When theano.tensor.grad calls the grad method, it will set z to
be the gradient of the cost C with respect to y. If this op is the only op
that acts on x, then dot(J.T,z) is the gradient of C with respect to x.
If there are other ops that act on x, theano.tensor.grad will have to add up
the terms of x's gradient contributed by the other op's grad method.
In practice, an op's input and output are rarely implemented as single vectors.
Even if an op's output consists of a list containing a scalar, a sparse matrix,
and a 4D tensor, you can think of these objects as being formed by rearranging
a vector. Likewise for the input. In this view, the values computed by the grad
method still represent a Jacobian-vector product.
In practice, it is probably not a good idea to explicitly construct the Jacobian,
which might be very large and very sparse. However, the returned value should
be equal to the Jacobian-vector product.
So long as you implement this product correctly, you need not understand what
theano.tensor.grad is doing, but for the curious the mathematical justification
is as follows:
In essence, the grad method must simply implement through symbolic Variables In essence, the grad method must simply implement through symbolic Variables
and operations the chain rule of differential calculus. The chain rule and operations the chain rule of differential calculus. The chain rule
is the mathematical procedure that allows to calculate the total derivative is the mathematical procedure that allows one to calculate the total derivative
:math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a :math:`\frac{d C}{d x}` of the final scalar symbolic Variable C with respect to a
primitive symbolic Variable x found in the list ``inputs``, primitive symbolic Variable x found in the list ``inputs``.
based on the knowledge of the total derivative :math:`\frac{d C}{d f}` of The grad method does this using ``output_gradients`` which provides the total
C with respect to a symbolic Variable that is returned by the Op (this is provided derivative :math:`\frac{d C}{d f}` of C with respect to a symbolic Variable
that is returned by the Op (this is provided
in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the in ``output_gradients``), as well as the knowledge of the total derivative :math:`\frac{d f}{d x}` of the
latter with respect to the primitive Variable (this has to be computed). latter with respect to the primitive Variable (this has to be computed).
In Mathematics, the total derivative of a scalar variable (C) with respect to a vector of In mathematics, the total derivative of a scalar variable (C) with respect to a vector of
scalar variables (x), i.e. the gradient, is customarily represented as the scalar variables (x), i.e. the gradient, is customarily represented as the
row vector of the partial derivatives, whereas the total derivative of a vector of row vector of the partial derivatives, whereas the total derivative of a vector of
scalar variables (f) with respect to another (x), is customarily represented by the matrix of scalar variables (f) with respect to another (x), is customarily represented by the matrix of
......
...@@ -150,24 +150,6 @@ def std_fgraph(input_specs, output_specs, accept_inplace = False): ...@@ -150,24 +150,6 @@ def std_fgraph(input_specs, output_specs, accept_inplace = False):
std_fgraph.features = [gof.toolbox.PreserveNames] std_fgraph.features = [gof.toolbox.PreserveNames]
class UncomputableFeature(gof.Feature):
"""A feature that ensures the graph never contains any
uncomputable nodes. This check must be made at compile time
rather than runtime in order to make sure that NaN nodes are
not optimized out. It must be done as a Feature so that
the fgraph will continually check that optimizations have
not introduce any uncomputable nodes."""
def on_attach(self, fgraph):
for node in fgraph.nodes:
return self.on_import(fgraph, node)
def on_import(self, fgraph, node):
gof.op.raise_if_uncomputable(node)
std_fgraph.features.append(UncomputableFeature)
class AliasedMemoryError(Exception): class AliasedMemoryError(Exception):
"""Memory is aliased that should not be""" """Memory is aliased that should not be"""
pass pass
......
...@@ -11,7 +11,7 @@ import toolbox ...@@ -11,7 +11,7 @@ import toolbox
from python25 import all from python25 import all
from theano import config from theano import config
import warnings import warnings
NullType = None
class InconsistencyError(Exception): class InconsistencyError(Exception):
""" """
...@@ -211,6 +211,9 @@ class FunctionGraph(utils.object2): ...@@ -211,6 +211,9 @@ class FunctionGraph(utils.object2):
### import ### ### import ###
def __import_r__(self, variables): def __import_r__(self, variables):
global NullType
if NullType is None:
from null_type import NullType
# Imports the owners of the variables # Imports the owners of the variables
r_owner_done = set(self.nodes) r_owner_done = set(self.nodes)
for node in [r.owner for r in variables if r.owner is not None]: for node in [r.owner for r in variables if r.owner is not None]:
...@@ -219,6 +222,8 @@ class FunctionGraph(utils.object2): ...@@ -219,6 +222,8 @@ class FunctionGraph(utils.object2):
self.__import__(node) self.__import__(node)
for r in variables: for r in variables:
if r.owner is None and not isinstance(r, graph.Constant) and r not in self.inputs: if r.owner is None and not isinstance(r, graph.Constant) and r not in self.inputs:
if isinstance(r.type,NullType):
raise TypeError("Computation graph contains a NaN. "+r.type.why_null)
raise MissingInputError("Undeclared input", r) raise MissingInputError("Undeclared input", r)
if not getattr(r, 'fgraph', None) is self: if not getattr(r, 'fgraph', None) is self:
self.__setup_r__(r) self.__setup_r__(r)
......
from theano.gof.type import Type
class NullType(Type):
"""
A type that allows no values. Used to represent expressions
that are undefined, either because they do not exist mathematically
or because the code to generate the expression has not been
implemented yet.
"""
def __init__(self, why_null='(no explanation given)'):
"""
why_null: A string explaining why this variable
can't take on any values
"""
self.why_null = why_null
def filter(self, data, strict=False, allow_downcast=None):
raise ValueError("No values may be assigned to a NullType")
def filter_variable(self, other):
raise ValueError("No values may be assigned to a NullType")
def may_share_memory(a, b):
return False
def values_eq(a, b, force_same_dtype=True):
raise ValueError("NullType has no values to compare")
...@@ -609,59 +609,6 @@ class Op(utils.object2, PureOp, CLinkerOp): ...@@ -609,59 +609,6 @@ class Op(utils.object2, PureOp, CLinkerOp):
rval.lazy = False rval.lazy = False
return rval return rval
class UncomputableOp(Op):
"""
An Op representing an expression that cannot be computed.
theano.function checks that the subgraph it implements
does not contain these ops, and that optimization does not
introduce any such ops.
theano.tensor.grad checks the graphs it returns to ensure
they do not contain these ops.
"""
def __init__(self, exc, msg=""):
"""
exc: the exception type to raise if a subgraph contains
this op.
msg: the message to include in the exception.
"""
self.exc = exc
self.msg = msg
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
return hash((type(self)))
def __str__(self):
return "Uncomputable{%s,%s}"%(self.exc,self.msg)
def make_node(self,x):
if x is None:
x = graph.Constant(theano.gof.type.generic,None)
return graph.Apply(self, [x], [x.type()] )
def perform(self, node, inputs, out_storage):
""" This should never be called"""
raise AssertionError("A BadGradOp should never be compiled, "+\
"and certainly not executed.")
#Note: essentially, this op should just be NaNs_like(inputs[0])
#but 0 * BadGradOp(x) + y optimizes to just y
#so until we develop a way of symbolically representing a variable
#that is always NaN and implement the logic for 0 * NaN = NaN, etc.
#the only way we can guarantee correctness of a theano function
#is to guarantee that its initial subgraph contained no BadGradOps
def raise_exc(self):
raise self.exc(self.msg)
def raise_if_uncomputable(node):
if node is not None:
if isinstance(node.op, UncomputableOp):
node.op.raise_exc()
def get_test_value(v): def get_test_value(v):
""" """
Extract test value from `v`. Raises AttributeError if there is none. Extract test value from `v`. Raises AttributeError if there is none.
......
...@@ -20,9 +20,18 @@ from theano import gof ...@@ -20,9 +20,18 @@ from theano import gof
from theano.gof import Variable from theano.gof import Variable
from theano.gof.python25 import all from theano.gof.python25 import all
import theano.gof.utils import theano.gof.utils
from theano.gof.null_type import NullType
from theano.printing import min_informative_str
# we can't do "import theano.tensor"
# tensor depends on theano.compile
# theano.compile depends on theano.gradient (this file)
# the reason theano.compile depends on theano.gradient
# is that theano.compile.builders contains the op from graph
# functionality and it uses theano.gradient to implement
# the new op's grad method
tensor = None
_msg_retType = 'op.grad(...) returned a non-list' _msg_retType = 'op.grad(...) returned a non-list'
_msg_badlen = 'op.grad(...) returned wrong number of gradients'
def format_as(use_list, use_tuple, outputs): def format_as(use_list, use_tuple, outputs):
...@@ -54,171 +63,7 @@ def format_as(use_list, use_tuple, outputs): ...@@ -54,171 +63,7 @@ def format_as(use_list, use_tuple, outputs):
return outputs return outputs
def grad_sources_inputs(sources, graph_inputs, warn_type=True): def grad_not_implemented(op, x_pos, x, comment=""):
"""
A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
``v``. More specifically, ``g_v`` is the gradient of an external
scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
This function traverses the graph backward from the ``r`` sources,
calling ``op.grad(...)`` for all ops with some non-None gradient
on an output, to compute gradients of ``cost`` wrt intermediate
variables and ``graph_inputs``.
The ``op.grad(...)`` functions are called like this:
.. code-block:: python
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to ``op.grad`` should return a list or tuple: one symbolic
gradient per input. These gradients represent the gradients of
the same implicit ``cost`` mentionned above, wrt ``op.inputs``. Note
that this is **not** the same as the gradient of ``op.outputs`` wrt
``op.inputs``.
If ``op`` has a single input, then ``op.grad`` should return a list
or tuple of length 1.
For each input wrt to which ``op`` is not differentiable, it should
return ``None`` instead of a `Variable` instance.
If a source ``r`` receives a gradient from another source ``r2``,
then the effective gradient on ``r`` is the sum of both gradients.
:type sources: list of pairs of Variable: (v, gradient-on-v) to
initialize the total_gradient dictionary
:param sources: gradients to back-propagate using chain rule
:type graph_inputs: list of Variable
:param graph_inputs: variables considered to be constant
(do not backpropagate through them)
:type warn_type: bool
:param warn_type: True will trigger warnings via the logging module when
the gradient on an expression has a different type than the original
expression
:rtype: dictionary whose keys and values are of type Variable
:return: mapping from each Variable encountered in the backward
traversal to the gradient with respect to that Variable.
It is assumed that there is some objective J shared between all members of
sources, so that for each v, gradient-on-v is the gradient of J with
respect to v
"""
gmap = {}
for (r, g_r) in sources:
if not hasattr(r, 'type'):
raise TypeError('sources must be Variables', r)
if g_r is not None:
if r in gmap:
gmap[r] = gmap[r] + g_r
else:
gmap[r] = g_r
graph_outputs = gof.utils.uniq([r for r, g in sources])
if graph_inputs is None:
graph_inputs = gof.graph.inputs(graph_outputs)
for node in gof.graph.io_toposort(graph_inputs,
graph_outputs).__reversed__():
g_outputs = [gmap.get(o, None) for o in node.outputs]
#if all output gradients are None, continue
if all(map(lambda x: x is None, g_outputs)): continue
#Disable all grad operation on complex. verify_grad don't
#support them and we don't know we want to handle them.
for var in node.inputs + node.outputs:
if (hasattr(var.type, 'dtype') and "complex" in var.type.dtype):
raise Exception("We do not support grad/Rop/Lop/verify_grad"
" on complex.")
output_arg = g_outputs
input_arg = node.inputs
# Each Op's grad function requires inputs and output_grads
# If the Op destroys any input, but the grad expression uses it,
# then chances are the resulting graph will have a dependency
# cycle. We avoid this cycle by passing (symbolic) copies of
# each destroyed input.
try:
dinputs = [node.inputs[x[0]] for x in node.op.destroy_map.values()]
except AttributeError:
dinputs = []
new_input_arg = []
for input in input_arg:
if input in dinputs and hasattr(input, 'copy'):
new_input_arg.append(input.copy())
else:
new_input_arg.append(input)
input_arg = new_input_arg
#note that this function is not in a try-except block
# the rationale:
# If the op implements grad, then any exception should be passed to
# the caller
# If the op doesn't implement grad, this entire function should fail.
# Other possibilities:
# * return a partial back-prop
#
op_grad = node.op.grad(input_arg, output_arg)
if not isinstance(op_grad, (list, tuple)):
raise ValueError(_msg_retType, node.op)
g_inputs = op_grad
assert isinstance(g_inputs, (list, tuple))
if len(g_inputs) != len(node.inputs):
raise ValueError(_msg_badlen,
node.op,
len(g_inputs),
len(node.inputs))
for ii, (r, g_r) in enumerate(zip(node.inputs, g_inputs)):
if warn_type:
if g_r and (getattr(r, 'type', 0) != getattr(g_r, 'type', 1)):
r_type = getattr(r, 'type', None)
g_r_type = getattr(g_r, 'type', None)
_logger.warning('%s.grad returned a different type (%s) '
'for input %i of type (%s)',
node.op, g_r_type, ii, r_type)
if g_r is not None:
assert r is not None
if r in gmap:
gmap[r] = gmap[r] + g_r
else:
gmap[r] = g_r
return gmap
class GradNotImplementedOp(gof.op.UncomputableOp):
""" An UncomputableOp representing a gradient that hasn't been implemented yet.
"""
def __init__(self, op, x_pos, comment = ""):
"""
op: A theano op whose grad is not implemented for some input
x_pos: An int, giving the index in the op's input list of
a variable for which the gradient is not implemented
(if op has unimplemented gradients for several inputs,
it must still return a separate UnimplementedGradOp for
each)
comment: An optional comment explaining why the gradient isn't
implemented.
"""
assert isinstance(op, gof.Op)
assert isinstance(x_pos, int)
assert x_pos >= 0
super(GradNotImplementedOp,self).__init__(NotImplementedError,
"%s does not implement its gradient with respect to input %d. %s" \
% (str(type(op)), x_pos, comment))
def grad_not_implemented(op, x_pos, x, comment = ""):
""" """
Return an un-computable symbolic variable of type `x.type`. Return an un-computable symbolic variable of type `x.type`.
...@@ -232,40 +77,14 @@ def grad_not_implemented(op, x_pos, x, comment = ""): ...@@ -232,40 +77,14 @@ def grad_not_implemented(op, x_pos, x, comment = ""):
gradient is not implemented. gradient is not implemented.
""" """
return GradNotImplementedOp(op, x_pos, comment)(x) return (NullType(
(
"This variable is Null because the grad method for "
"input %s (%s) of the %s op is not implemented. %s"
) % (x_pos, x, op, comment)))()
class GradUndefinedError(Exception):
""" An exception raised upon attempts to use an undefined gradient.
"""
class GradUndefinedOp(gof.op.UncomputableOp): def grad_undefined(op, x_pos, x, comment=""):
""" An UncomputableOp representing a gradient that is mathematically
undefined.
"""
def __init__(self, op, x_pos, comment = ""):
"""
op: A theano op whose grad is mathematically undefined for
some input
x_pos: An int, giving the index in the op's input list of
a variable for which the gradient is undefined
(if op has undefined gradients for several inputs,
it must still return a separate GradUndefinedOp for
each)
comment: An optional comment explaining why the gradient isn't
defined.
"""
assert isinstance(op, gof.Op)
assert isinstance(x_pos, int)
assert x_pos >= 0
super(GradUndefinedOp,self).__init__(GradUndefinedError,
"%s does not implement its gradient with respect to input %d. %s" \
% (str(type(op)), x_pos, comment))
def grad_undefined(op, x_pos, x, comment = ""):
""" """
Return an un-computable symbolic variable of type `x.type`. Return an un-computable symbolic variable of type `x.type`.
...@@ -279,9 +98,49 @@ def grad_undefined(op, x_pos, x, comment = ""): ...@@ -279,9 +98,49 @@ def grad_undefined(op, x_pos, x, comment = ""):
gradient is not defined. gradient is not defined.
""" """
return GradUndefinedOp(op, x_pos, comment)(x) return (NullType(
(
"This variable is Null because the grad method for "
"input %s (%s) of the %s op is mathematically undefined. %s"
) % (x_pos, x, op, comment)))()
class DisconnectedType(theano.gof.type.Type):
""" A type indicating that a variable is a result
of taking the gradient of c with respect to x
when c is not a function of x.
A symbolic placeholder for 0, but to convey
the extra information that this gradient is 0
because it is disconnected.
"""
def filter(self, data, strict=False, allow_downcast=None):
raise AssertionError(
(
"If you're assigning to a DisconnectedType you're"
" doing something wrong. It should only be used as"
" a symbolic placeholder."
))
def fiter_variable(self, other):
raise AssertionError(
(
"If you're assigning to a DisconnectedType you're"
" doing something wrong. It should only be used as"
" a symbolic placeholder."
))
def may_share_memory(a, b):
return False
def value_eq(a, b, force_same_dtype=True):
raise AssertionError(
(
"If you're assigning to a DisconnectedType you're"
" doing something wrong. It should only be used as"
" a symbolic placeholder."
))
######################## ########################
...@@ -418,7 +277,7 @@ def Rop(f, wrt, eval_points): ...@@ -418,7 +277,7 @@ def Rop(f, wrt, eval_points):
def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False, def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
disconnected_inputs='raise'): disconnected_inputs='raise'):
""" """
Computes the L operation on `f` wrt to `wrt` evaluated at points given Computes the L operation on `f` wrt to `wrt` evaluated at points given
in `eval_points`. Mathematically this stands for the jacobian of `f` wrt in `eval_points`. Mathematically this stands for the jacobian of `f` wrt
...@@ -453,10 +312,24 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False, ...@@ -453,10 +312,24 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
if not isinstance(f, (list, tuple)): if not isinstance(f, (list, tuple)):
f = [f] f = [f]
inputs = gof.graph.inputs(f) # make copies of f and grads so we don't modify the client's copy
f = list(f)
grads = list(eval_points)
for elem in consider_constant:
assert elem not in f
f.append(elem)
grads.append(elem.zeros_like())
if not isinstance(wrt, (list, tuple)):
wrt = [wrt]
arg1 = zip(f, eval_points)
arg2 = list(wrt)
gmap = grad_sources_inputs( gmap = grad_sources_inputs(
zip(f, eval_points), arg1,
list(inputs) + list(consider_constant), arg2,
warn_type=warn_type) warn_type=warn_type)
# Note : If p is not in gmap there can be several reasons, among which # Note : If p is not in gmap there can be several reasons, among which
...@@ -466,17 +339,16 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False, ...@@ -466,17 +339,16 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
# such subtle cases can be fixed by a more careful implementation of the # such subtle cases can be fixed by a more careful implementation of the
# gradient, but for now Theano needs to throw an exception, and make the # gradient, but for now Theano needs to throw an exception, and make the
# user aware that it does not know how to compute that gradient # user aware that it does not know how to compute that gradient
if not isinstance(wrt, (list, tuple)):
wrt = [wrt]
ret = [] ret = []
for p in wrt: for p in wrt:
if p in gmap: if p in gmap:
ret.append(gmap[p]) ret.append(gmap[p])
else: else:
message = ("Lop method was asked to compute the gradient " message = (
"with respect to a variable that is not part of " "Lop method was asked to compute the gradient "
"the computational graph of the cost, or is used " "with respect to a variable that is not part of "
"only by a non-differentiable operator: %s" % p) "the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % p)
if disconnected_inputs == 'ignore': if disconnected_inputs == 'ignore':
pass pass
elif disconnected_inputs == 'warn': elif disconnected_inputs == 'warn':
...@@ -484,9 +356,10 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False, ...@@ -484,9 +356,10 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
elif disconnected_inputs == 'raise': elif disconnected_inputs == 'raise':
raise ValueError(message) raise ValueError(message)
else: else:
raise ValueError("Invalid value for keyword " raise ValueError(
"'disconnected_inputs', valid values are " "Invalid value for keyword "
"'ignore', 'warn' and 'raise'.") "'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.")
ret.append(p.zeros_like()) ret.append(p.zeros_like())
return format_as(using_list, using_tuple, ret) return format_as(using_list, using_tuple, ret)
...@@ -497,7 +370,7 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False, ...@@ -497,7 +370,7 @@ def Lop(f, wrt, eval_points, consider_constant=None, warn_type=False,
######################### #########################
def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False, def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
disconnected_inputs='raise'): disconnected_inputs='raise', add_names=True):
""" """
:type cost: Scalar (0-dimensional) Variable. :type cost: Scalar (0-dimensional) Variable.
:type wrt: Variable or list of Variables. :type wrt: Variable or list of Variables.
...@@ -518,6 +391,11 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False, ...@@ -518,6 +391,11 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
- 'warn': consider the gradient zero, and print a warning. - 'warn': consider the gradient zero, and print a warning.
- 'raise': raise an exception. - 'raise': raise an exception.
:type add_names: bool
:param add_names: If True, variables generated by grad will be named
(d<cost.name>/d<wrt.name>) provided that both cost and wrt have
names
:rtype: Variable or list/tuple of Variables (depending upon `wrt`) :rtype: Variable or list/tuple of Variables (depending upon `wrt`)
:return: symbolic expression of gradient of `cost` with respect to `wrt`. :return: symbolic expression of gradient of `cost` with respect to `wrt`.
...@@ -526,14 +404,23 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False, ...@@ -526,14 +404,23 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
It returns an object of same type as `wrt`: a list/tuple It returns an object of same type as `wrt`: a list/tuple
or Variable in all cases. or Variable in all cases.
This function is a wrapper around the more general function
`theano.gradient.grad_sources_inputs``.
""" """
global tensor
if tensor is None:
from theano import tensor
if isinstance(cost.type, NullType):
raise ValueError("Can't differentiate a NaN cost."
"cost is NaN because " + \
cost.type.why_null)
if cost.ndim != 0:
raise TypeError("cost must be a scalar.")
if consider_constant is None: if consider_constant is None:
consider_constant = [] consider_constant = []
else: else:
#error checking on consider_constant: verify that it is a collection # error checking on consider_constant: verify that it is a collection
# of theano variables # of theano variables
# this is important, if someone accidentally passes a nested data # this is important, if someone accidentally passes a nested data
# structure with theano variables at the leaves, only the root will # structure with theano variables at the leaves, only the root will
...@@ -546,47 +433,34 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False, ...@@ -546,47 +433,34 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
raise TypeError('Elements of consider_constant must be ' raise TypeError('Elements of consider_constant must be '
'variables, but got ' + str(type(elem))) 'variables, but got ' + str(type(elem)))
if not isinstance(cost, Variable): using_list = isinstance(wrt, list)
raise TypeError(('In grad(), cost argument should be ' using_tuple = isinstance(wrt, tuple)
'a Variable.'), cost) if not using_list and not using_tuple:
wrt = [wrt]
if cost.type.ndim: var_to_node_to_idx = _populate_var_to_node_to_idx([cost])
raise TypeError(
'In theano.gradient.grad, "cost" argument should be a scalar,'
' but ndim is %i (should be 0). If you want to compute the'
' gradient of the sum of cost, you should use cost.sum().'
% cost.type.ndim)
# build a dict mapping var to the gradient of cost with respect to var
grad_dict = {}
# by default, the gradient of the cost is 1
if g_cost is None: if g_cost is None:
from theano import tensor
g_cost = tensor.ones_like(cost) g_cost = tensor.ones_like(cost)
inputs = gof.graph.inputs([cost]) grad_dict[cost] = g_cost
gmap = grad_sources_inputs(
[(cost, g_cost)], # the gradient of the constants is 0
list(inputs) + list(consider_constant), for const in consider_constant:
warn_type=warn_type) grad_dict[const] = DisconnectedType()()
# Note : If p is not in gmap there can be several reasons, among which # variables that do not influence the cost have zero gradient.
# is the fact that p might not be part of the computational graph. A # if wrt is such a variable, populate the grad_dict with this info
# simple example is that for a+b for e.g. a[0] is not part of the graph, # so that wrt not being in var_to_node_to_idx won't cause an error below
# so Theano does not know how to compute TT.grad(TT.sum(a+b), a[0]) # according to the flag, possibly raise an error if wrt is disconnected
# such subtle cases can be fixed by a more careful implementation of the for elem in wrt:
# gradient, but for now Theano needs to throw an exception, and make the if elem not in var_to_node_to_idx and elem is not cost:
# user aware that it does not know how to compute that gradient
using_list = isinstance(wrt, list)
using_tuple = isinstance(wrt, tuple)
if not isinstance(wrt, (list, tuple)):
wrt = [wrt]
ret = []
for p in wrt:
if p in gmap:
ret.append(gmap[p])
else:
message = ("grad method was asked to compute the gradient " message = ("grad method was asked to compute the gradient "
"with respect to a variable that is not part of " "with respect to a variable that is not part of "
"the computational graph of the cost, or is used " "the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % p) "only by a non-differentiable operator: %s" % elem)
if disconnected_inputs == 'ignore': if disconnected_inputs == 'ignore':
pass pass
elif disconnected_inputs == 'warn': elif disconnected_inputs == 'warn':
...@@ -597,20 +471,331 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False, ...@@ -597,20 +471,331 @@ def grad(cost, wrt, g_cost=None, consider_constant=None, warn_type=False,
raise ValueError("Invalid value for keyword " raise ValueError("Invalid value for keyword "
"'disconnected_inputs', valid values are " "'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.") "'ignore', 'warn' and 'raise'.")
ret.append(p.zeros_like()) grad_dict[elem] = DisconnectedType()()
if cost.name is not None and p.name is not None \ cost_name = None
and ret[-1].name is None: if add_names:
ret[-1].name = '(d%s/d%s)' % (cost.name, p.name) cost_name = cost.name
# new_vars is meant to be a list of all variables created rval = _populate_grad_dict(var_to_node_to_idx,
# by this call to grad(), which will be visible to the caller grad_dict, wrt, warn_type,
# after we return. cost_name)
new_vars = gof.graph.ancestors(ret,
blockers=gof.graph.ancestors([cost]) + list(wrt))
map(gof.op.raise_if_uncomputable, [v.owner for v in new_vars])
return format_as(using_list, using_tuple, ret) for i in xrange(len(rval)):
if isinstance(rval[i].type, DisconnectedType):
rval[i] = wrt[i].zeros_like()
if using_tuple:
rval = tuple(rval)
elif not using_list:
rval, = rval
return rval
def _populate_var_to_node_to_idx(outputs):
"""
Common code shared between grad and grad_sources_inputs
outputs: a list of nodes we want to take gradients of
returns:
var_to_node_to_idx: a dictionary mapping a variable to
a second dictionary.
the second dictionary maps apply nodes acting on
this variable to the variable's index in the apply
node's input list
"""
# var_to_node_to_idx[var][node] = [i,j] means node has
# var as input at positions i and j
var_to_node_to_idx = {}
# set of variables or nodes that have been added to their parents
accounted_for = set([])
def account_for(var):
if var in accounted_for:
return
accounted_for.add(var)
if var.owner is not None:
node = var.owner
if node not in accounted_for:
accounted_for.add(node)
for i, ipt in enumerate(node.inputs):
if ipt not in var_to_node_to_idx:
var_to_node_to_idx[ipt] = {}
node_to_idx = var_to_node_to_idx[ipt]
if node not in node_to_idx:
node_to_idx[node] = []
idx = node_to_idx[node]
assert i not in idx
idx.append(i)
account_for(ipt)
for output in outputs:
account_for(output)
return var_to_node_to_idx
def _populate_grad_dict(var_to_node_to_idx,
grad_dict, wrt, warn_type, cost_name=None):
"""
Common code shared between grad_sources_inputs and grad
var_to_node_to_idx: a dictionary mapping a variable to
a second dictionary.
the second dictionary maps apply nodes acting on
this variable to the variable's index in the apply
node's input list
grad_dict: a dictionary mapping variables to their gradients
should be populated by grad or grad_sources_inputs
grad should set gradients to DisconnectedType()() for
variables to be considered constant, set the
gradient for the cost variable to g_cost, etc.
both should set the gradient for disconnected
inputs to a variable with type DisconnectedType()
wrt: the minimal set of variables that must be included in grad_dict
warn_type: if True, log a warning when a gradient term for a variable
has a different type from that variable
cost_name: The name of the cost being differentiated, optional.
used to name the grad with respect to x as
(d<cost_name>/dx)
returns: a list of gradients corresponding to wrt
"""
# build a dict mapping node to the terms node contributes to each of
# its inputs' gradients
term_dict = {}
# populate term_dict[node] and return it
def access_term_cache(node):
if node not in term_dict:
inputs = node.inputs
# Each Op's grad function requires inputs and output_grads
# If the Op destroys any input, but the grad expression uses it,
# then chances are the resulting graph will have a dependency
# cycle. We avoid this cycle by passing (symbolic) copies of
# each destroyed input.
try:
dinputs = [node.inputs[x[0]] for x in
node.op.destroy_map.values()]
except AttributeError:
dinputs = []
def try_to_copy_if_needed(var):
if var in dinputs and hasattr(var, 'copy'):
return var.copy()
return var
inputs = [try_to_copy_if_needed(ipt) for ipt in inputs]
output_grads = [access_grad_cache(var) for var in node.outputs]
if False in [isinstance(g.type, DisconnectedType)
for g in output_grads]:
# Some outputs of this op are connected to the cost so we must
# call the ops grad method
input_grads = node.op.grad(inputs, output_grads)
if input_grads is None:
raise TypeError("%s.grad returned NoneType, "
"expected iterable." % str(node.op))
if len(input_grads) != len(inputs):
raise ValueError(("%s returned the wrong number of" +\
" gradient terms.") % str(node.op))
else:
# All outputs of this op are disconnected so we can skip
# Calling the op's grad method and report that the inputs
# are disconnected
# (The op's grad method could do this too, but this saves the
# implementer the trouble of worrying about this case)
input_grads = [DisconnectedType()() for ipt in inputs]
# must convert to list in case the op returns a tuple
# we won't be able to post-process out the Nones if it does that
term_dict[node] = list(input_grads)
for i in xrange(len(term_dict[node])):
if term_dict[node][i] is None:
# we don't know what None means. in the past it has been
# used to
# mean undefined, zero, or disconnected. So for now we
# assume it is
# zero. Assuming it is zero prevents
# us from disconnecting NaNs above.
# eventually we should disallow this
# return type and force all ops
# to return the correct thing
# raise AssertionError('%s returned None for' +\
# ' a gradient term, '
# 'this is prohibited' % node.op)
term_dict[node][i] = node.inputs[i].zeros_like()
if warn_type:
g_r_type = term_dict[node][i].type
r_type = inputs[i].type
if g_r_type != r_type:
_logger.warning(
'%s.grad returned a different type (%s) '
'for input %i of type (%s)',
node.op, g_r_type, i, r_type)
return term_dict[node]
# populate grad_dict[var] and return it
def access_grad_cache(var):
if var not in grad_dict:
if var in var_to_node_to_idx:
terms = []
node_to_idx = var_to_node_to_idx[var]
for node in node_to_idx:
for idx in node_to_idx[node]:
if hasattr(node.op, 'connection_pattern'):
pattern = node.op.connection_pattern()
if not pattern[idx]:
continue
term = access_term_cache(node)[idx]
if not isinstance(term, gof.Variable):
raise TypeError("%s.grad returned %s, expected"
" Variable instance." % (str(node.op),
type(term)))
if isinstance(term.type, NullType):
raise TypeError("tensor.grad "
"encountered a NaN. " +\
term.type.why_null)
terms.append(term)
#the next line is like sum(terms) but doesn't add an
#extraneous TensorConstant(0)
grad_dict[var] = reduce(lambda x,y: x+y, terms)
if cost_name is not None and var.name is not None:
grad_dict[var].name = '(d%s/d%s)' % (cost_name, var.name)
else:
# this variable isn't connected to the cost in the computational
# graph
grad_dict[var] = DisconnectedType()()
return grad_dict[var]
rval = [access_grad_cache(elem) for elem in wrt]
return rval
def grad_sources_inputs(sources, graph_inputs, warn_type=True):
"""
Used to compute the gradient of a cost with respect to all the
variables between graph_input and cost, but in the special
case where you don't know the cost, you only know its gradient
on a set of intermediate values.
A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
``v``. More specifically, ``g_v`` is the gradient of an external
scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
This function traverses the graph backward from the ``r`` sources,
calling ``op.grad(...)`` for all ops with some non-None gradient
on an output, to compute gradients of ``cost`` wrt intermediate
variables and ``graph_inputs``.
The ``op.grad(...)`` functions are called like this:
.. code-block:: python
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to ``op.grad`` should return a list or tuple: one symbolic
gradient per input. These gradients represent the gradients of
the same implicit ``cost`` mentionned above, wrt ``op.inputs``. Note
that this is **not** the same as the gradient of ``op.outputs`` wrt
``op.inputs``.
If ``op`` has a single input, then ``op.grad`` should return a list
or tuple of length 1.
For each input wrt to which ``op`` is not differentiable, it should
return ``None`` instead of a `Variable` instance.
If a source ``r`` receives a gradient from another source ``r2``,
then the effective gradient on ``r`` is the sum of both gradients.
:type sources: list of pairs of Variable: (v, gradient-on-v) to
initialize the total_gradient dictionary
:param sources: gradients to back-propagate using chain rule
:type graph_inputs: list of Variable
:param graph_inputs: variables considered to be constant
(do not backpropagate through them)
:type warn_type: bool
:param warn_type: True will trigger warnings via the logging module when
the gradient on an expression has a different type than the original
expression
:rtype: dictionary whose keys and values are of type Variable
:return: mapping from each Variable encountered in the backward
traversal to the gradient with respect to that Variable.
It is assumed that there is some objective J shared between all members of
sources, so that for each v, gradient-on-v is the gradient of J with
respect to v
"""
outputs, output_grads = zip(*sources)
for output_grad in output_grads:
if not hasattr(output_grad, 'type'):
raise TypeError('output grads must be theano variables.'
'Ambiguous whether %s should be made into tensor'
' or sparse theano variable' % str(type(output_grad)))
if graph_inputs is None:
graph_inputs = gof.graph.inputs(outputs)
wrt = graph_inputs
var_to_node_to_idx = _populate_var_to_node_to_idx(outputs)
# build a dict mapping var to the gradient of cost with respect to var
grad_dict = {}
# by default, the gradient of the cost is 1
for output, output_grad in sources:
grad_dict[output] = output_grad
# variables that do not influence the cost have zero gradient.
# if wrt is such a variable, populate the grad_dict with this info
# so that wrt not being in var_to_node_to_idx won't cause an error below
# according to the flag, possibly raise an error if wrt is disconnected
for elem in wrt:
if elem not in var_to_node_to_idx and elem not in outputs:
grad_dict[elem] = DisconnectedType()()
_populate_grad_dict(var_to_node_to_idx,
grad_dict, wrt, warn_type)
# post-process out the DisconnectedTypes
for key in grad_dict:
if isinstance(grad_dict[key].type, DisconnectedType):
if hasattr(key, 'zeros_like'):
grad_dict[key] = key.zeros_like()
return grad_dict
class numeric_grad(object): class numeric_grad(object):
...@@ -902,7 +1087,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, ...@@ -902,7 +1087,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
as_tensor_variable(p).broadcastable)(name='input %i' % i) as_tensor_variable(p).broadcastable)(name='input %i' % i)
for i, p in enumerate(pt)] for i, p in enumerate(pt)]
#fun can be either a function or an actual Op instance # fun can be either a function or an actual Op instance
o_output = fun(*tensor_pt) o_output = fun(*tensor_pt)
if isinstance(o_output, list): if isinstance(o_output, list):
...@@ -929,6 +1114,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, ...@@ -929,6 +1114,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
return plain return plain
t_r = shared(random_projection()) t_r = shared(random_projection())
t_r.name = 'random_projection'
# random projection of o onto t_r # random projection of o onto t_r
# This sum() is defined above, it's not the builtin sum. # This sum() is defined above, it's not the builtin sum.
...@@ -936,7 +1122,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, ...@@ -936,7 +1122,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
cost_fn = function(tensor_pt, cost) cost_fn = function(tensor_pt, cost)
#todo-- determine if this is actually needed # todo-- determine if this is actually needed
g_cost = as_tensor_variable(1.0, name='g_cost') g_cost = as_tensor_variable(1.0, name='g_cost')
if cast_to_output_type: if cast_to_output_type:
g_cost = cast(g_cost, o_output.dtype) g_cost = cast(g_cost, o_output.dtype)
...@@ -958,10 +1144,11 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, ...@@ -958,10 +1144,11 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None,
num_grad.max_err(analytic_grad, abs_tol, rel_tol) num_grad.max_err(analytic_grad, abs_tol, rel_tol)
if max_abs_err > abs_tol and max_rel_err > rel_tol: if max_abs_err > abs_tol and max_rel_err > rel_tol:
raise verify_grad.E_grad(max_arg, max_err_pos, raise verify_grad.E_grad(max_arg, max_err_pos,
max_abs_err, max_rel_err, abs_tol, rel_tol) max_abs_err, max_rel_err, abs_tol, rel_tol)
#get new random projection for next test # get new random projection for next test
if test_num < n_tests - 1: if test_num < n_tests - 1:
t_r.set_value(random_projection(), borrow=True) t_r.set_value(random_projection(), borrow=True)
......
...@@ -456,7 +456,7 @@ def test_elemwise_composite_support_code(): ...@@ -456,7 +456,7 @@ def test_elemwise_composite_support_code():
P = T.exp(-(Y - U) ** 2) P = T.exp(-(Y - U) ** 2)
epsilon = numpy.asarray(0.001, dtype="float32") epsilon = numpy.asarray(0.001, dtype="float32")
NLL = -T.mean(T.log(P + epsilon)) # SupportCodeError NLL = -T.mean(T.log(P + epsilon)) # SupportCodeError
G = T.grad(NLL, wrt=[W]) G = theano.gradient.grad(NLL, wrt=[W])
backup = theano.config.warn.identify_1pexp_bug backup = theano.config.warn.identify_1pexp_bug
theano.config.warn.identify_1pexp_bug = False theano.config.warn.identify_1pexp_bug = False
...@@ -468,6 +468,7 @@ def test_elemwise_composite_support_code(): ...@@ -468,6 +468,7 @@ def test_elemwise_composite_support_code():
topo = f_grad.maker.fgraph.toposort() topo = f_grad.maker.fgraph.toposort()
assert sum([isinstance(node.op, T.Elemwise) for node in topo]) == 1 assert sum([isinstance(node.op, T.Elemwise) for node in topo]) == 1
#I suspect this was failing in the original branch too
assert sum([isinstance(node.op, tcn.GpuElemwise) for node in topo]) == 1 assert sum([isinstance(node.op, tcn.GpuElemwise) for node in topo]) == 1
......
...@@ -258,7 +258,7 @@ class T_Images2Neibs(unittest_tools.InferShapeTester): ...@@ -258,7 +258,7 @@ class T_Images2Neibs(unittest_tools.InferShapeTester):
def fn(images): def fn(images):
return images2neibs(images, (3, 3), mode='wrap_centered') return images2neibs(images, (3, 3), mode='wrap_centered')
self.assertRaises(NotImplementedError, unittest_tools.verify_grad, self.assertRaises(TypeError, unittest_tools.verify_grad,
fn, [images_val], mode=self.mode) fn, [images_val], mode=self.mode)
...@@ -276,7 +276,7 @@ class T_Images2Neibs(unittest_tools.InferShapeTester): ...@@ -276,7 +276,7 @@ class T_Images2Neibs(unittest_tools.InferShapeTester):
# are not the same. # are not the same.
def fn(images): def fn(images):
return images2neibs(images, (2, 2), (1, 1)) return images2neibs(images, (2, 2), (1, 1))
self.assertRaises(NotImplementedError, self.assertRaises(TypeError,
unittest_tools.verify_grad, fn, [images_val], unittest_tools.verify_grad, fn, [images_val],
mode=self.mode) mode=self.mode)
......
...@@ -488,6 +488,9 @@ class _scalar_py_operators: ...@@ -488,6 +488,9 @@ class _scalar_py_operators:
def __rmod__(self,other): return mod(other,self) def __rmod__(self,other): return mod(other,self)
def __rpow__(self,other): return pow(other,self) def __rpow__(self,other): return pow(other,self)
def zeros_like(self):
return ScalarConstant(Scalar(str(self.type.dtype)), 0)
class ScalarVariable(_scalar_py_operators, Variable): class ScalarVariable(_scalar_py_operators, Variable):
pass pass
......
...@@ -29,6 +29,8 @@ from theano import gof ...@@ -29,6 +29,8 @@ from theano import gof
from theano.tensor import TensorType from theano.tensor import TensorType
from theano import tensor from theano import tensor
from theano.tensor.opt import Shape_i from theano.tensor.opt import Shape_i
from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
#from theano.sandbox import cuda #from theano.sandbox import cuda
from theano.compile.profiling import ScanProfileStats from theano.compile.profiling import ScanProfileStats
...@@ -431,7 +433,7 @@ class Scan(PureOp): ...@@ -431,7 +433,7 @@ class Scan(PureOp):
aux_txt += str(k) + ',' aux_txt += str(k) + ','
aux_txt += '},%s,%s}' aux_txt += '},%s,%s}'
else: else:
aux_txt +='{%s,%s}' aux_txt += '{%s,%s}'
aux_txt = aux_txt % (name, gpu_str, str(self.name)) aux_txt = aux_txt % (name, gpu_str, str(self.name))
return aux_txt return aux_txt
...@@ -1161,6 +1163,17 @@ class Scan(PureOp): ...@@ -1161,6 +1163,17 @@ class Scan(PureOp):
### GRAD FUNCTION ### GRAD FUNCTION
def grad(self, args, g_outs): def grad(self, args, g_outs):
# This discards information about whether incoming gradients are 0
# or disconnected from the cost
# TODO: upgrade scan op to report disconnection correctly
def strip_disconnected(g):
if isinstance(g.type, DisconnectedType):
return None
return g
g_outs = [strip_disconnected(g) for g in g_outs]
# 1. forward pass - get the outputs after applying scan # 1. forward pass - get the outputs after applying scan
scan_outputs = self(*args) scan_outputs = self(*args)
# 2. make sure they are given as a list # 2. make sure they are given as a list
...@@ -1512,7 +1525,7 @@ class Scan(PureOp): ...@@ -1512,7 +1525,7 @@ class Scan(PureOp):
if type(outputs) not in (list, tuple): if type(outputs) not in (list, tuple):
outputs = [outputs] outputs = [outputs]
# Re-order the gradients correctly # Re-order the gradients correctly
gradients = [None] gradients = [grad_undefined(self, 0, args[0], 'Number of steps')]
offset = (self.n_mit_mot + offset = (self.n_mit_mot +
self.n_mit_sot + self.n_mit_sot +
...@@ -1522,8 +1535,16 @@ class Scan(PureOp): ...@@ -1522,8 +1535,16 @@ class Scan(PureOp):
end = self.n_mit_mot + self.n_mit_sot + self.n_sit_sot end = self.n_mit_mot + self.n_mit_sot + self.n_sit_sot
gradients += [x[::-1] for x in outputs[:end]] gradients += [x[::-1] for x in outputs[:end]]
gradients += [None for x in xrange(self.n_shared_outs)] start = len(gradients)
gradients += [None for x in xrange(self.n_nit_sot)] gradients += [
grad_undefined(self, x + start, args[x + start],
'Shared Variable with update')
for x in xrange(self.n_shared_outs)]
start = len(gradients)
gradients += [
grad_undefined(self, x + start, args[x + start],
'Dimension of memory buffer for output')
for x in xrange(self.n_nit_sot)]
begin = end begin = end
end = begin + n_sitsot_outs end = begin + n_sitsot_outs
...@@ -1547,7 +1568,8 @@ class Scan(PureOp): ...@@ -1547,7 +1568,8 @@ class Scan(PureOp):
rop_self_outputs = self_outputs rop_self_outputs = self_outputs
if self.info['n_shared_outs'] > 0: if self.info['n_shared_outs'] > 0:
rop_self_outputs = rop_self_outputs[:-self.info['n_shared_outs']] rop_self_outputs = rop_self_outputs[:-self.info['n_shared_outs']]
rop_outs = tensor.Rop(rop_self_outputs, rop_of_inputs, inner_eval_points) rop_outs = tensor.Rop(rop_self_outputs, rop_of_inputs,
inner_eval_points)
if type(rop_outs) not in (list, tuple): if type(rop_outs) not in (list, tuple):
rop_outs = [rop_outs] rop_outs = [rop_outs]
# Step 2. Figure out what corresponds to what in the scan # Step 2. Figure out what corresponds to what in the scan
...@@ -1653,7 +1675,7 @@ class Scan(PureOp): ...@@ -1653,7 +1675,7 @@ class Scan(PureOp):
scan_sit_sot = inputs[b:e] + clean_eval_points scan_sit_sot = inputs[b:e] + clean_eval_points
inner_sit_sot = self_inputs[ib:ie] + inner_eval_points[ib:ie] inner_sit_sot = self_inputs[ib:ie] + inner_eval_points[ib:ie]
#Shared outs ... # Shared outs ...
b = e b = e
e = e + self.n_shared_outs e = e + self.n_shared_outs
ib = ie ib = ie
...@@ -1738,7 +1760,7 @@ class Scan(PureOp): ...@@ -1738,7 +1760,7 @@ class Scan(PureOp):
b = e + self.n_nit_sot b = e + self.n_nit_sot
e = e + self.n_nit_sot * 2 e = e + self.n_nit_sot * 2
final_outs += outputs[b:e] final_outs += outputs[b:e]
final_outs += [None]*self.n_shared_outs final_outs += [None] * self.n_shared_outs
return final_outs return final_outs
......
...@@ -1816,10 +1816,12 @@ class T_Scan(unittest.TestCase): ...@@ -1816,10 +1816,12 @@ class T_Scan(unittest.TestCase):
def test_scan_extra_inputs_hessian(self): def test_scan_extra_inputs_hessian(self):
x = theano.tensor.vector('x') x = theano.tensor.vector('x')
A = theano.tensor.matrix('A') A = theano.tensor.matrix('A')
fc1 = theano.shared(0.5) fc1 = theano.shared(0.5, name = 'fc1')
fc2 = theano.shared(0.9) fc2 = theano.shared(0.9, name = 'fc2')
y = fc1 * theano.dot(x * x, theano.dot(A, x)) y = fc1 * theano.dot(x * x, theano.dot(A, x))
y.name = 'y'
gy = theano.tensor.grad(y, x) gy = theano.tensor.grad(y, x)
gy.name = 'gy'
hy, updates = theano.scan( hy, updates = theano.scan(
lambda i, gy, x: theano.tensor.grad(gy[i] * fc2, x), lambda i, gy, x: theano.tensor.grad(gy[i] * fc2, x),
sequences=theano.tensor.arange(gy.shape[0]), sequences=theano.tensor.arange(gy.shape[0]),
...@@ -1829,7 +1831,9 @@ class T_Scan(unittest.TestCase): ...@@ -1829,7 +1831,9 @@ class T_Scan(unittest.TestCase):
vx = numpy.array([1., 1.], dtype=theano.config.floatX) vx = numpy.array([1., 1.], dtype=theano.config.floatX)
vA = numpy.array([[1., 1.], [1., 0.]], dtype=theano.config.floatX) vA = numpy.array([[1., 1.], [1., 0.]], dtype=theano.config.floatX)
vR = numpy.array([[3.6, 1.8], [1.8, 0.9]], dtype=theano.config.floatX) vR = numpy.array([[3.6, 1.8], [1.8, 0.9]], dtype=theano.config.floatX)
assert numpy.allclose(f(vx, vA), vR) out = f(vx, vA)
assert numpy.allclose(out, vR)
def test_cloning_no_replace_strict_copy_inputs(self): def test_cloning_no_replace_strict_copy_inputs(self):
# This has nothing to do with scan, but it refers to the clone # This has nothing to do with scan, but it refers to the clone
...@@ -3479,14 +3483,15 @@ def test_compute_test_value(): ...@@ -3479,14 +3483,15 @@ def test_compute_test_value():
backup = theano.config.compute_test_value backup = theano.config.compute_test_value
theano.config.compute_test_value = 'raise' theano.config.compute_test_value = 'raise'
try: try:
x = tensor.vector() x = tensor.vector('x')
xv = numpy.ones(3, dtype=theano.config.floatX) xv = numpy.ones(3, dtype=theano.config.floatX)
x.tag.test_value = xv x.tag.test_value = xv
y = theano.shared(numpy.arange(3, dtype=theano.config.floatX)) y = theano.shared(numpy.arange(3, dtype=theano.config.floatX), name='y')
z, _ = theano.scan( z, _ = theano.scan(
fn=lambda u, v: u + v, fn=lambda u, v: u + v,
sequences=[x, y]) sequences=[x, y])
assert not _ assert not _
z.name='z'
# The gradient computation used to crash before 6af465e. # The gradient computation used to crash before 6af465e.
g = tensor.grad(z.sum(), x) g = tensor.grad(z.sum(), x)
#f = theano.function([x], g) #f = theano.function([x], g)
......
...@@ -7,7 +7,6 @@ http://www-users.cs.umn.edu/~saad/software/SPARSKIT/paper.ps ...@@ -7,7 +7,6 @@ http://www-users.cs.umn.edu/~saad/software/SPARSKIT/paper.ps
# TODO # TODO
# Automatic methods for determining best sparse format? # Automatic methods for determining best sparse format?
from itertools import izip
import sys import sys
import numpy import numpy
...@@ -16,14 +15,14 @@ import scipy.sparse ...@@ -16,14 +15,14 @@ import scipy.sparse
from theano import gof, tensor, compile, scalar, config from theano import gof, tensor, compile, scalar, config
from theano.gof.python25 import all from theano.gof.python25 import all
from theano.tensor import blas from theano.gradient import DisconnectedType
from theano.sparse.utils import hash_from_sparse from theano.sparse.utils import hash_from_sparse
import theano.tests.unittest_tools as utt import theano.tests.unittest_tools as utt
sparse_formats = ['csc', 'csr'] sparse_formats = ['csc', 'csr']
#TODO: move this decorator to the compile submodule # TODO: move this decorator to the compile submodule
def register_specialize(lopt, *tags, **kwargs): def register_specialize(lopt, *tags, **kwargs):
compile.optdb['specialize'].register((kwargs and kwargs.pop('name')) or compile.optdb['specialize'].register((kwargs and kwargs.pop('name')) or
lopt.__name__, lopt, 'fast_run', lopt.__name__, lopt, 'fast_run',
...@@ -256,7 +255,7 @@ def sp_zeros_like(x): ...@@ -256,7 +255,7 @@ def sp_zeros_like(x):
:return: The same as `x` with zero entries :return: The same as `x` with zero entries
for all element. for all element.
""" """
#TODO: don't restrict to CSM formats # TODO: don't restrict to CSM formats
_, _, indptr, shape = csm_properties(x) _, _, indptr, shape = csm_properties(x)
return CSM(format=x.format)(numpy.array([], dtype=x.type.dtype), return CSM(format=x.format)(numpy.array([], dtype=x.type.dtype),
numpy.array([]), tensor.zeros_like(indptr), numpy.array([]), tensor.zeros_like(indptr),
...@@ -291,7 +290,7 @@ class _sparse_py_operators: ...@@ -291,7 +290,7 @@ class _sparse_py_operators:
def __rmul__(left, right): def __rmul__(left, right):
return mul(left, right) return mul(left, right)
#extra pseudo-operator symbols # extra pseudo-operator symbols
def __dot__(left, right): def __dot__(left, right):
return structured_dot(left, right) return structured_dot(left, right)
...@@ -299,12 +298,12 @@ class _sparse_py_operators: ...@@ -299,12 +298,12 @@ class _sparse_py_operators:
def __rdot__(right, left): def __rdot__(right, left):
return structured_dot(left, right) return structured_dot(left, right)
#N.B. THIS IS COMMENTED OUT ON PURPOSE!!! # N.B. THIS IS COMMENTED OUT ON PURPOSE!!!
# Discussion with Fred & James (at least, and maybe others before) # Discussion with Fred & James (at least, and maybe others before)
# we decided that casting from a sparse to dense should be explicit # we decided that casting from a sparse to dense should be explicit
# because it's usually something you just want to be pretty careful # because it's usually something you just want to be pretty careful
# about, and not to do by accident. # about, and not to do by accident.
#def _as_TensorVariable(self): # def _as_TensorVariable(self):
# return dense_from_sparse(self) # return dense_from_sparse(self)
shape = property(lambda self: tensor.shape(dense_from_sparse(self))) shape = property(lambda self: tensor.shape(dense_from_sparse(self)))
...@@ -441,7 +440,7 @@ class SparseType(gof.Type): ...@@ -441,7 +440,7 @@ class SparseType(gof.Type):
if strict: if strict:
raise TypeError("%s is not sparse, or not the right dtype (is %s, " raise TypeError("%s is not sparse, or not the right dtype (is %s, "
"expected %s)" % (value, value.dtype, self.dtype)) "expected %s)" % (value, value.dtype, self.dtype))
#The input format could be converted here # The input format could be converted here
if allow_downcast: if allow_downcast:
sp = self.format_cls[self.format](value, dtype=self.dtype) sp = self.format_cls[self.format](value, dtype=self.dtype)
else: else:
...@@ -488,7 +487,7 @@ class SparseType(gof.Type): ...@@ -488,7 +487,7 @@ class SparseType(gof.Type):
return "Sparse[%s, %s]" % (str(self.dtype), str(self.format)) return "Sparse[%s, %s]" % (str(self.dtype), str(self.format))
def values_eq_approx(self, a, b, eps=1e-6): def values_eq_approx(self, a, b, eps=1e-6):
#WARNING: equality comparison of sparse matrices is not fast or easy # WARNING: equality comparison of sparse matrices is not fast or easy
# we definitely do not want to be doing this un-necessarily during # we definitely do not want to be doing this un-necessarily during
# a FAST_RUN computation.. # a FAST_RUN computation..
if not scipy.sparse.issparse(a) or not scipy.sparse.issparse(b): if not scipy.sparse.issparse(a) or not scipy.sparse.issparse(b):
...@@ -504,7 +503,7 @@ class SparseType(gof.Type): ...@@ -504,7 +503,7 @@ class SparseType(gof.Type):
return max(diff.data) < eps return max(diff.data) < eps
def values_eq(self, a, b): def values_eq(self, a, b):
#WARNING: equality comparison of sparse matrices is not fast or easy # WARNING: equality comparison of sparse matrices is not fast or easy
# we definitely do not want to be doing this un-necessarily during # we definitely do not want to be doing this un-necessarily during
# a FAST_RUN computation.. # a FAST_RUN computation..
return scipy.sparse.issparse(a) \ return scipy.sparse.issparse(a) \
...@@ -619,14 +618,25 @@ class CSMProperties(gof.Op): ...@@ -619,14 +618,25 @@ class CSMProperties(gof.Op):
out[0][0] = csm.data[self.kmap] out[0][0] = csm.data[self.kmap]
if str(csm.data.dtype) == 'int32': if str(csm.data.dtype) == 'int32':
out[0][0] = theano._asarray(out[0][0], dtype='int32') out[0][0] = theano._asarray(out[0][0], dtype='int32')
#backport # backport
#out[0][0] = csm.data if self.kmap is None else csm.data[self.kmap] # out[0][0] = csm.data if self.kmap is None else csm.data[self.kmap]
out[1][0] = theano._asarray(csm.indices, dtype='int32') out[1][0] = theano._asarray(csm.indices, dtype='int32')
out[2][0] = theano._asarray(csm.indptr, dtype='int32') out[2][0] = theano._asarray(csm.indptr, dtype='int32')
out[3][0] = theano._asarray(csm.shape, dtype='int32') out[3][0] = theano._asarray(csm.shape, dtype='int32')
def grad(self, (csm,), g): def grad(self, (csm,), g):
assert [gg is None for gg in g[1:]]
# g[1:] is all integers, so their Jacobian in this op
# is 0. We thus don't need to worry about what their values
# are.
# if g[0] is disconnected, then this op doesn't contribute
# any gradient anywhere. but we know that at least one of
# g[1:] is connected, or this grad method wouldn't have been
# called, so we should report zeros
if isinstance(g[0].type, DisconnectedType):
return [csm.zeros_like()]
data, indices, indptr, shape = csm_properties(csm) data, indices, indptr, shape = csm_properties(csm)
return [CSM(csm.format)(g[0], indices, indptr, shape)] return [CSM(csm.format)(g[0], indices, indptr, shape)]
# don't make this a function or it breaks some optimizations below # don't make this a function or it breaks some optimizations below
...@@ -662,10 +672,10 @@ class CSM(gof.Op): ...@@ -662,10 +672,10 @@ class CSM(gof.Op):
:param data: One dimensionnal tensor representing :param data: One dimensionnal tensor representing
the data of the sparse to construct. the data of the sparse to construct.
:param indices: One dimensionnal tensor of integers :param indices: One dimensional tensor of integers
representing the indices of the sparse representing the indices of the sparse
matrix to construct. matrix to construct.
:param indptr: One dimensionnal tensor of integers :param indptr: One dimensional tensor of integers
representing the indice pointer for representing the indice pointer for
the sparse matrix to construct. the sparse matrix to construct.
:param shape: One dimensionnal tensor of integers :param shape: One dimensionnal tensor of integers
...@@ -673,9 +683,9 @@ class CSM(gof.Op): ...@@ -673,9 +683,9 @@ class CSM(gof.Op):
matrix to construct. matrix to construct.
:return: A sparse matrix having the properties :return: A sparse matrix having the properties
speficied by the inputs. specified by the inputs.
:note: The grad method returns a dense vector, so it provide :note: The grad method returns a dense vector, so it provides
a regular grad. a regular grad.
""" """
...@@ -774,10 +784,10 @@ class CSM(gof.Op): ...@@ -774,10 +784,10 @@ class CSM(gof.Op):
def grad(self, (x_data, x_indices, x_indptr, x_shape), (g_out,)): def grad(self, (x_data, x_indices, x_indptr, x_shape), (g_out,)):
g_data, g_indices, g_indptr, g_shape = csm_properties(g_out) g_data, g_indices, g_indptr, g_shape = csm_properties(g_out)
#unpack the data vector and wrap it as a 1d TensorType # unpack the data vector and wrap it as a 1d TensorType
g_data = csm_grad(self.kmap)(x_data, x_indices, x_indptr, x_shape, g_data = csm_grad(self.kmap)(x_data, x_indices, x_indptr, x_shape,
g_data, g_indices, g_indptr, g_shape) g_data, g_indices, g_indptr, g_shape)
return [g_data, None, None, None] return [g_data, DisconnectedType()(), DisconnectedType()(), DisconnectedType()()]
def infer_shape(self, node, shapes): def infer_shape(self, node, shapes):
if self.kmap is None: if self.kmap is None:
...@@ -1195,7 +1205,7 @@ class GetItemScalar(gof.op.Op): ...@@ -1195,7 +1205,7 @@ class GetItemScalar(gof.op.Op):
if isinstance(ind, slice): if isinstance(ind, slice):
raise Exception("GetItemScalar called with a slice as index!") raise Exception("GetItemScalar called with a slice as index!")
#in case of indexing using int instead of theano variable # in case of indexing using int instead of theano variable
elif isinstance(ind, int): elif isinstance(ind, int):
ind = theano.tensor.constant(ind) ind = theano.tensor.constant(ind)
input_op += [ind] input_op += [ind]
...@@ -2026,7 +2036,7 @@ class MulSD(gof.op.Op): ...@@ -2026,7 +2036,7 @@ class MulSD(gof.op.Op):
def make_node(self, x, y): def make_node(self, x, y):
x, y = as_sparse_variable(x), tensor.as_tensor_variable(y) x, y = as_sparse_variable(x), tensor.as_tensor_variable(y)
#upcast the tensor. Is the cast of sparse done implemented? # upcast the tensor. Is the cast of sparse done implemented?
dtype = scalar.upcast(x.type.dtype, y.type.dtype) dtype = scalar.upcast(x.type.dtype, y.type.dtype)
if y.type.dtype != dtype: if y.type.dtype != dtype:
y = tensor.cast(y, dtype) y = tensor.cast(y, dtype)
...@@ -2049,7 +2059,7 @@ class MulSD(gof.op.Op): ...@@ -2049,7 +2059,7 @@ class MulSD(gof.op.Op):
elif len(y.shape) == 2: elif len(y.shape) == 2:
# if we have enough memory to fit y, maybe we can fit x.asarray() # if we have enough memory to fit y, maybe we can fit x.asarray()
# too? # too?
#TODO: change runtime from O(M*N) to O(nonzeros) # TODO: change runtime from O(M*N) to O(nonzeros)
M, N = x.shape M, N = x.shape
assert x.shape == y.shape assert x.shape == y.shape
...@@ -2810,7 +2820,7 @@ class StructuredDot(gof.Op): ...@@ -2810,7 +2820,7 @@ class StructuredDot(gof.Op):
raise ValueError('shape mismatch in StructuredDot.perform', raise ValueError('shape mismatch in StructuredDot.perform',
(a.shape, b.shape)) (a.shape, b.shape))
#variable = a.dot(b) # deprecated # variable = a.dot(b) # deprecated
variable = a * b variable = a * b
if isinstance(node.outputs[0].type, SparseType): if isinstance(node.outputs[0].type, SparseType):
assert _is_sparse(variable) assert _is_sparse(variable)
...@@ -2843,8 +2853,8 @@ class StructuredDot(gof.Op): ...@@ -2843,8 +2853,8 @@ class StructuredDot(gof.Op):
raise Exception("a.shape=%s, b.shape=%s, variable.shape=%s " raise Exception("a.shape=%s, b.shape=%s, variable.shape=%s "
" ??? I have no idea why") " ??? I have no idea why")
#The cast is needed as otherwise we hit the bug mentioned into # The cast is needed as otherwise we hit the bug mentioned into
#theano._asarray function documentation. # theano._asarray function documentation.
out[0] = theano._asarray(variable, str(variable.dtype)) out[0] = theano._asarray(variable, str(variable.dtype))
def grad(self, (a, b), (g_out,)): def grad(self, (a, b), (g_out,)):
...@@ -3229,7 +3239,7 @@ class SamplingDot(gof.op.Op): ...@@ -3229,7 +3239,7 @@ class SamplingDot(gof.op.Op):
if not _is_sparse_variable(p): if not _is_sparse_variable(p):
raise TypeError(p) raise TypeError(p)
#TODO: use it. # TODO: use it.
dtype_out = scalar.upcast(x.type.dtype, y.type.dtype, p.type.dtype) dtype_out = scalar.upcast(x.type.dtype, y.type.dtype, p.type.dtype)
return gof.Apply(self, [x, y, p], [p.type()]) return gof.Apply(self, [x, y, p], [p.type()])
......
...@@ -25,6 +25,7 @@ from theano.tensor.utils import hash_from_ndarray ...@@ -25,6 +25,7 @@ from theano.tensor.utils import hash_from_ndarray
from theano.scalar import ComplexError, IntegerDivisionError from theano.scalar import ComplexError, IntegerDivisionError
import theano.scalar.sharedvar import theano.scalar.sharedvar
from theano.gradient import grad_undefined from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
### set up the external interface ### set up the external interface
from elemwise import Elemwise, DimShuffle, CAReduce, Sum from elemwise import Elemwise, DimShuffle, CAReduce, Sum
...@@ -32,7 +33,7 @@ from elemwise import Elemwise, DimShuffle, CAReduce, Sum ...@@ -32,7 +33,7 @@ from elemwise import Elemwise, DimShuffle, CAReduce, Sum
import logging import logging
_logger = logging.getLogger("theano.tensor.basic") _logger = logging.getLogger("theano.tensor.basic")
#This is needed as we will hide it later # This is needed as we will hide it later
python_complex = complex python_complex = complex
python_any = any python_any = any
python_all = all python_all = all
...@@ -47,6 +48,7 @@ continuous_dtypes = map(str, scal.continuous_types) ...@@ -47,6 +48,7 @@ continuous_dtypes = map(str, scal.continuous_types)
discrete_dtypes = map(str, scal.discrete_types) discrete_dtypes = map(str, scal.discrete_types)
all_dtypes = map(str, scal.all_types) all_dtypes = map(str, scal.all_types)
class ShapeError(Exception): class ShapeError(Exception):
"""Raised when the shape cannot be computed.""" """Raised when the shape cannot be computed."""
pass pass
...@@ -108,7 +110,7 @@ if 0: ...@@ -108,7 +110,7 @@ if 0:
transfert the value on the gpu transfert the value on the gpu
""" """
if hasattr(x, '_as_CudaNdarrayVariable'): if hasattr(x, '_as_CudaNdarrayVariable'):
#TODO: pass name and ndim arguments # TODO: pass name and ndim arguments
return x._as_CudaNdarrayVariable() return x._as_CudaNdarrayVariable()
return as_tensor_variable(x, name, ndim) return as_tensor_variable(x, name, ndim)
...@@ -142,7 +144,7 @@ def as_tensor_variable(x, name=None, ndim=None): ...@@ -142,7 +144,7 @@ def as_tensor_variable(x, name=None, ndim=None):
return x._as_TensorVariable() # TODO: pass name and ndim arguments return x._as_TensorVariable() # TODO: pass name and ndim arguments
if isinstance(x, gof.Apply): if isinstance(x, gof.Apply):
#TODO: use Apply's default output mechanism # TODO: use Apply's default output mechanism
if len(x.outputs) != 1: if len(x.outputs) != 1:
raise ValueError( raise ValueError(
"It is ambiguous which output of a multi-output Op has" "It is ambiguous which output of a multi-output Op has"
...@@ -161,7 +163,7 @@ def as_tensor_variable(x, name=None, ndim=None): ...@@ -161,7 +163,7 @@ def as_tensor_variable(x, name=None, ndim=None):
return x return x
else: else:
if (x.type.ndim > ndim): if (x.type.ndim > ndim):
#TODO: strip off leading broadcastable dimensions # TODO: strip off leading broadcastable dimensions
raise ValueError( raise ValueError(
'TensorType could not be cast to have %i dimensions' % 'TensorType could not be cast to have %i dimensions' %
ndim, x.type) ndim, x.type)
...@@ -369,7 +371,7 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None): ...@@ -369,7 +371,7 @@ def constant_or_value(x, rtype, name=None, ndim=None, dtype=None):
if len(bcastable) < ndim: if len(bcastable) < ndim:
bcastable = [True] * (ndim - len(bcastable)) + bcastable bcastable = [True] * (ndim - len(bcastable)) + bcastable
elif len(bcastable) > ndim: elif len(bcastable) > ndim:
#TODO: strip off dimensions of size 1 # TODO: strip off dimensions of size 1
raise ValueError( raise ValueError(
'ndarray could not be cast to constant with %i dimensions' % 'ndarray could not be cast to constant with %i dimensions' %
ndim) ndim)
...@@ -394,6 +396,7 @@ def constant(x, name=None, ndim=None, dtype=None): ...@@ -394,6 +396,7 @@ def constant(x, name=None, ndim=None, dtype=None):
return constant_or_value(x, rtype=TensorConstant, name=name, ndim=ndim, return constant_or_value(x, rtype=TensorConstant, name=name, ndim=ndim,
dtype=dtype) dtype=dtype)
def _obj_is_wrappable_as_tensor(x): def _obj_is_wrappable_as_tensor(x):
try: try:
constant(x) constant(x)
...@@ -405,7 +408,7 @@ def _obj_is_wrappable_as_tensor(x): ...@@ -405,7 +408,7 @@ def _obj_is_wrappable_as_tensor(x):
def _wrap_tensor_into_member(x): def _wrap_tensor_into_member(x):
return compile.module.Member(constant(x)) return compile.module.Member(constant(x))
compile.module.register_wrapper(_obj_is_wrappable_as_tensor, compile.module.register_wrapper(_obj_is_wrappable_as_tensor,
_wrap_tensor_into_member, no_warn = True) _wrap_tensor_into_member, no_warn=True)
if int(config.tensor.cmp_sloppy) > 1: if int(config.tensor.cmp_sloppy) > 1:
...@@ -427,15 +430,15 @@ elif int(config.tensor.cmp_sloppy): ...@@ -427,15 +430,15 @@ elif int(config.tensor.cmp_sloppy):
float64_rtol = 1e-4 float64_rtol = 1e-4
float64_atol = 1e-3 float64_atol = 1e-3
else: else:
#If you change those value in test don't forget to put them back # If you change those value in test don't forget to put them back
#when the test end. Don't forget the case when the test fail. # when the test end. Don't forget the case when the test fail.
float32_atol = 1e-5 float32_atol = 1e-5
float32_rtol = 1e-5 float32_rtol = 1e-5
# defaults in numpy.allclose # defaults in numpy.allclose
float64_rtol = 1.0000000000000001e-05 float64_rtol = 1.0000000000000001e-05
float64_atol = 1e-8 float64_atol = 1e-8
#more strict. Atleast float32 precision. # more strict. Atleast float32 precision.
float64_rtol = 1.0000000000000001e-06 float64_rtol = 1.0000000000000001e-06
...@@ -494,9 +497,9 @@ def get_constant_value(v): ...@@ -494,9 +497,9 @@ def get_constant_value(v):
shape, val = v.owner.inputs shape, val = v.owner.inputs
# fill(a,b) fills the shape of 'a' filled with 'b' # fill(a,b) fills the shape of 'a' filled with 'b'
return get_constant_value(val) return get_constant_value(val)
#Don't act as the constant_folding optimization here as this # Don't act as the constant_folding optimization here as this
#fct is used too early in the optimization phase. This would # fct is used too early in the optimization phase. This would
#mess with the stabilization optimization. # mess with the stabilization optimization.
if isinstance(v.owner.op, Elemwise) and isinstance( if isinstance(v.owner.op, Elemwise) and isinstance(
v.owner.op.scalar_op, scal.Cast): v.owner.op.scalar_op, scal.Cast):
const = get_constant_value(v.owner.inputs[0]) const = get_constant_value(v.owner.inputs[0])
...@@ -529,7 +532,7 @@ def get_constant_value(v): ...@@ -529,7 +532,7 @@ def get_constant_value(v):
ret = v.owner.inputs[0].owner.inputs[ ret = v.owner.inputs[0].owner.inputs[
v.owner.op.idx_list[0] + 1] v.owner.op.idx_list[0] + 1]
ret = get_constant_value(ret) ret = get_constant_value(ret)
#join can cast implicitly its input in some case. # join can cast implicitly its input in some case.
return theano._asarray(ret, dtype=v.type.dtype) return theano._asarray(ret, dtype=v.type.dtype)
if (v.owner.inputs[0].owner and if (v.owner.inputs[0].owner and
isinstance(v.owner.inputs[0].owner.op, isinstance(v.owner.inputs[0].owner.op,
...@@ -542,7 +545,7 @@ def get_constant_value(v): ...@@ -542,7 +545,7 @@ def get_constant_value(v):
ret = v.owner.inputs[0].owner.inputs[v.owner.op.idx_list[0]] ret = v.owner.inputs[0].owner.inputs[v.owner.op.idx_list[0]]
ret = get_constant_value(ret) ret = get_constant_value(ret)
#MakeVector can cast implicitly its input in some case. # MakeVector can cast implicitly its input in some case.
return theano._asarray(ret, dtype=v.type.dtype) return theano._asarray(ret, dtype=v.type.dtype)
# This is needed when we take the grad as the Shape op # This is needed when we take the grad as the Shape op
...@@ -747,8 +750,8 @@ class TensorType(Type): ...@@ -747,8 +750,8 @@ class TensorType(Type):
This function is used internally as part of C code generation. This function is used internally as part of C code generation.
""" """
#TODO: add more type correspondances for e.g. int32, int64, float32, # TODO: add more type correspondances for e.g. int32, int64, float32,
#complex64, etc. # complex64, etc.
try: try:
return { return {
'float32': (float, 'npy_float32', 'NPY_FLOAT32'), 'float32': (float, 'npy_float32', 'NPY_FLOAT32'),
...@@ -786,7 +789,7 @@ class TensorType(Type): ...@@ -786,7 +789,7 @@ class TensorType(Type):
@staticmethod @staticmethod
def values_eq(a, b, force_same_dtype=True): def values_eq(a, b, force_same_dtype=True):
#TODO: check to see if the shapes must match # TODO: check to see if the shapes must match
# for now, we err on safe side... # for now, we err on safe side...
if a.shape != b.shape: if a.shape != b.shape:
return False return False
...@@ -863,14 +866,14 @@ class TensorType(Type): ...@@ -863,14 +866,14 @@ class TensorType(Type):
# Find places where both a and b have inf of the same sign. # Find places where both a and b have inf of the same sign.
both_inf = a_inf * numpy.isinf(b) both_inf = a_inf * numpy.isinf(b)
#cmp_elemwise is weird when we have inf and -inf. # cmp_elemwise is weird when we have inf and -inf.
#set it to False # set it to False
cmp_elemwise = numpy.where( cmp_elemwise = numpy.where(
both_inf & cmp_elemwise, both_inf & cmp_elemwise,
a == b, a == b,
cmp_elemwise) cmp_elemwise)
#check the sign of the inf # check the sign of the inf
both_inf = numpy.where(both_inf, (a == b), both_inf) both_inf = numpy.where(both_inf, (a == b), both_inf)
if allow_remove_inf: if allow_remove_inf:
...@@ -1244,21 +1247,21 @@ tensor4s, ftensor4s, dtensor4s, itensor4s, ltensor4s = _multi( ...@@ -1244,21 +1247,21 @@ tensor4s, ftensor4s, dtensor4s, itensor4s, ltensor4s = _multi(
class _tensor_py_operators: class _tensor_py_operators:
#UNARY # UNARY
def __abs__(self): def __abs__(self):
return abs_(self) return abs_(self)
def __neg__(self): def __neg__(self):
return neg(self) return neg(self)
#CASTS # CASTS
#### REMOVED THESE BECAUSE PYTHON appears to require __int__ to return #### REMOVED THESE BECAUSE PYTHON appears to require __int__ to return
#### an int. -JB 20081112 #### an int. -JB 20081112
#def __int__(self): return convert_to_int32(self) #def __int__(self): return convert_to_int32(self)
#def __float__(self): return convert_to_float64(self) #def __float__(self): return convert_to_float64(self)
#def __complex__(self): return convert_to_complex128(self) #def __complex__(self): return convert_to_complex128(self)
#COMPARISONS # COMPARISONS
_is_nonzero = True _is_nonzero = True
def __lt__(self, other): def __lt__(self, other):
...@@ -1294,7 +1297,7 @@ class _tensor_py_operators: ...@@ -1294,7 +1297,7 @@ class _tensor_py_operators:
else: else:
raise TypeError("Variable does not support boolean operations.") raise TypeError("Variable does not support boolean operations.")
#BITWISE # BITWISE
def __invert__(self): def __invert__(self):
return invert(self) return invert(self)
...@@ -1316,16 +1319,16 @@ class _tensor_py_operators: ...@@ -1316,16 +1319,16 @@ class _tensor_py_operators:
def __rxor__(self, other): def __rxor__(self, other):
return xor(other, self) return xor(other, self)
#def __iand__(self, other): # def __iand__(self, other):
# return _and_inplace(self, other) # return _and_inplace(self, other)
# #
#def __ior__(self, other): # def __ior__(self, other):
# return _or_inplace(self, other) # return _or_inplace(self, other)
# #
#def __ixor__(self, other): #def __ixor__(self, other):
# return _xor_inplace(self, other) # return _xor_inplace(self, other)
#ARITHMETIC - NORMAL # ARITHMETIC - NORMAL
def __add__(self, other): def __add__(self, other):
try: try:
return add(self, other) return add(self, other)
...@@ -1439,7 +1442,7 @@ class _tensor_py_operators: ...@@ -1439,7 +1442,7 @@ class _tensor_py_operators:
def __rpow__(self, other): def __rpow__(self, other):
return pow(other, self) return pow(other, self)
#TRANSPOSE # TRANSPOSE
T = property(lambda self: transpose(self)) T = property(lambda self: transpose(self))
def transpose(self, *axes): def transpose(self, *axes):
...@@ -1502,10 +1505,9 @@ class _tensor_py_operators: ...@@ -1502,10 +1505,9 @@ class _tensor_py_operators:
""" """
if ndim is not None: if ndim is not None:
if not isinstance(ndim,int): if not isinstance(ndim, int):
raise ValueError("Expected ndim to be an integer, is "\ raise ValueError("Expected ndim to be an integer, is "\
+str(type(ndim))) + str(type(ndim)))
return reshape(self, shape, ndim=ndim) return reshape(self, shape, ndim=ndim)
...@@ -1542,7 +1544,7 @@ class _tensor_py_operators: ...@@ -1542,7 +1544,7 @@ class _tensor_py_operators:
def astype(self, dtype): def astype(self, dtype):
return cast(self, dtype) return cast(self, dtype)
#SLICING # SLICING
# Do not define __getslice__ here: # Do not define __getslice__ here:
# When calling t[1:], for instance, the arguments passed to __getslice__ # When calling t[1:], for instance, the arguments passed to __getslice__
# are (1, sys.maxsize), which is a pain to deal with, and can even not be # are (1, sys.maxsize), which is a pain to deal with, and can even not be
...@@ -1602,7 +1604,7 @@ class _tensor_py_operators: ...@@ -1602,7 +1604,7 @@ class _tensor_py_operators:
return Subtensor(args)(self, *Subtensor.collapse(args, return Subtensor(args)(self, *Subtensor.collapse(args,
lambda entry: isinstance(entry, Variable))) lambda entry: isinstance(entry, Variable)))
#COPYING # COPYING
def copy(self): def copy(self):
return tensor_copy(self) return tensor_copy(self)
...@@ -1629,7 +1631,7 @@ class _tensor_py_operators: ...@@ -1629,7 +1631,7 @@ class _tensor_py_operators:
dtype = property(lambda self: self.type.dtype) dtype = property(lambda self: self.type.dtype)
""" The dtype of this tensor. """ """ The dtype of this tensor. """
#extra pseudo-operator symbols # extra pseudo-operator symbols
def __dot__(left, right): def __dot__(left, right):
return dot(left, right) return dot(left, right)
...@@ -1649,7 +1651,7 @@ class _tensor_py_operators: ...@@ -1649,7 +1651,7 @@ class _tensor_py_operators:
raise NotImplementedError() raise NotImplementedError()
if numpy.isinf(L): if numpy.isinf(L):
raise NotImplementedError() raise NotImplementedError()
#optimizations will/should catch cases like L=1, L=2 # optimizations will/should catch cases like L=1, L=2
return pow(pow(abs_(self), L).sum(axis=axis), 1.0 / L) return pow(pow(abs_(self), L).sum(axis=axis), 1.0 / L)
def mean(self, axis=None, dtype=None, keepdims=False): def mean(self, axis=None, dtype=None, keepdims=False):
...@@ -1668,7 +1670,7 @@ class _tensor_py_operators: ...@@ -1668,7 +1670,7 @@ class _tensor_py_operators:
"""See `theano.tensor.max`""" """See `theano.tensor.max`"""
return max(self, axis, keepdims=keepdims) return max(self, axis, keepdims=keepdims)
#TO TRUMP NUMPY OPERATORS # TO TRUMP NUMPY OPERATORS
__array_priority__ = 1000 __array_priority__ = 1000
def get_constant_value(self): def get_constant_value(self):
...@@ -1697,7 +1699,7 @@ class TensorConstantSignature(tuple): ...@@ -1697,7 +1699,7 @@ class TensorConstantSignature(tuple):
except Exception: except Exception:
return False return False
#N.B. compare shape to ensure no broadcasting in == # N.B. compare shape to ensure no broadcasting in ==
if t0 != t1 or d0.shape != d1.shape: if t0 != t1 or d0.shape != d1.shape:
return False return False
...@@ -1802,7 +1804,6 @@ class TensorConstant(_tensor_py_operators, Constant): ...@@ -1802,7 +1804,6 @@ class TensorConstant(_tensor_py_operators, Constant):
TensorType.Constant = TensorConstant TensorType.Constant = TensorConstant
Tensor = TensorType Tensor = TensorType
...@@ -1816,6 +1817,7 @@ elemwise.TensorConstant = TensorConstant ...@@ -1816,6 +1817,7 @@ elemwise.TensorConstant = TensorConstant
# Utilities # Utilities
######################### #########################
def _redefine(real_symbol_value, module='tensor'): def _redefine(real_symbol_value, module='tensor'):
"""Replace the value associated with a function symbol. """Replace the value associated with a function symbol.
...@@ -1872,7 +1874,7 @@ def _scal_elemwise_with_nfunc(nfunc, nin, nout): ...@@ -1872,7 +1874,7 @@ def _scal_elemwise_with_nfunc(nfunc, nin, nout):
if getattr(symbol, '__doc__', False): if getattr(symbol, '__doc__', False):
rval.__doc__ = symbol.__doc__ + '\n' + rval.__doc__ rval.__doc__ = symbol.__doc__ + '\n' + rval.__doc__
#for the meaning of this see the ./epydoc script # for the meaning of this see the ./epydoc script
# it makes epydoc display rval as if it were a function, not an object # it makes epydoc display rval as if it were a function, not an object
rval.__epydoc_asRoutine = symbol rval.__epydoc_asRoutine = symbol
rval.__module__ = 'tensor' rval.__module__ = 'tensor'
...@@ -1965,7 +1967,7 @@ class ScalarFromTensor(Op): ...@@ -1965,7 +1967,7 @@ class ScalarFromTensor(Op):
scalar_from_tensor = ScalarFromTensor() scalar_from_tensor = ScalarFromTensor()
#to be removed as we get the epydoc routine-documenting thing going # to be removed as we get the epydoc routine-documenting thing going
#-JB 20080924 #-JB 20080924
def _conversion(real_value, name): def _conversion(real_value, name):
__oplist_tag(real_value, 'casting') __oplist_tag(real_value, 'casting')
...@@ -2061,6 +2063,7 @@ def cast(x, dtype): ...@@ -2061,6 +2063,7 @@ def cast(x, dtype):
# Unary Operations # Unary Operations
########################## ##########################
class Shape(Op): class Shape(Op):
""" """
L{Op} to return the shape of a matrix. L{Op} to return the shape of a matrix.
...@@ -2077,13 +2080,13 @@ class Shape(Op): ...@@ -2077,13 +2080,13 @@ class Shape(Op):
return self.__class__.__name__ return self.__class__.__name__
def make_node(self, x): def make_node(self, x):
#Must work for all type that have a shape attribute. # Must work for all type that have a shape attribute.
#This will fail at execution time. # This will fail at execution time.
x = as_tensor_variable(x) x = as_tensor_variable(x)
#Each type variable should implement their .shape attribute # Each type variable should implement their .shape attribute
#and have the fct infer_shape() implemented in the op that convert # and have the fct infer_shape() implemented in the op that convert
#the type to TensorVariable to have the optimization working # the type to TensorVariable to have the optimization working
#correctly. # correctly.
return Apply(self, [x], [lvector()]) return Apply(self, [x], [lvector()])
def perform(self, node, inp, out_): def perform(self, node, inp, out_):
...@@ -2094,8 +2097,21 @@ class Shape(Op): ...@@ -2094,8 +2097,21 @@ class Shape(Op):
def infer_shape(self, node, in_shapes): def infer_shape(self, node, in_shapes):
return [[len(in_shapes[0])]] return [[len(in_shapes[0])]]
def connection_pattern(self):
# the grad returns the gradient with respect to the
# elements of a tensor variable
# the elements of the tensor variable do not participate
# in the computation of the shape, so they are not really
# part of the graph
return [False]
def grad(self, inp, grads): def grad(self, inp, grads):
return [grad_undefined(self,0,inp[0])] # the grad returns the gradient with respect to the
# elements of a tensor variable
# the elements of the tensor variable do not participate
# in the computation of the shape, so they are not really
# part of the graph
return [None]
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
return [None] return [None]
...@@ -2113,7 +2129,7 @@ def old_shape(a): ...@@ -2113,7 +2129,7 @@ def old_shape(a):
shape at graph-execution time. shape at graph-execution time.
""" """
va = as_tensor_variable(a) va = as_tensor_variable(a)
#print 'HERE', va, va.type # print 'HERE', va, va.type
if None in va.type.shape: if None in va.type.shape:
# Some shape components are unknown at this time # Some shape components are unknown at this time
return _shape(va) return _shape(va)
...@@ -2314,9 +2330,21 @@ class MaxAndArgmax(Op): ...@@ -2314,9 +2330,21 @@ class MaxAndArgmax(Op):
x, axis = inp x, axis = inp
g_max, g_max_idx = grads g_max, g_max_idx = grads
# Check to see if the gradient on max is None g_max_disconnected = isinstance(g_max.type, DisconnectedType)
if g_max is None: g_max_idx_disconnected = isinstance(g_max_idx.type, DisconnectedType)
return None, None
# if the op is totally disconnected, so are its inputs
if g_max_disconnected and g_max_idx_disconnected:
return [DisconnectedType()(), DisconnectedType()()]
axis_grad = grad_undefined(self, 1, axis,
"argmax is not defined for non-integer axes so"
" argmax(x, axis+eps) is undefined")
# if the max is disconnected but the argmax is not,
# the gradient on its inputs is zero
if g_max_disconnected:
return [x.zeros_like(), axis_grad]
xmax = max(x, axis) xmax = max(x, axis)
# Raise the g_max and xmax to the same number of dim as the input. # Raise the g_max and xmax to the same number of dim as the input.
...@@ -2336,7 +2364,7 @@ class MaxAndArgmax(Op): ...@@ -2336,7 +2364,7 @@ class MaxAndArgmax(Op):
# Set the grad to the correct position. # Set the grad to the correct position.
g_x = eq(xmax_pad, x) * g_max_pad g_x = eq(xmax_pad, x) * g_max_pad
return g_x, grad_undefined(self, 1, axis) return g_x, axis_grad
def __str__(self): def __str__(self):
return self.__class__.__name__ return self.__class__.__name__
...@@ -2458,7 +2486,7 @@ def min(x, axis=None, keepdims=False): ...@@ -2458,7 +2486,7 @@ def min(x, axis=None, keepdims=False):
if str_x_type.startswith('float') or str_x_type in int_dtypes: if str_x_type.startswith('float') or str_x_type in int_dtypes:
return -max(-x, axis=axis, keepdims=keepdims) return -max(-x, axis=axis, keepdims=keepdims)
else: else:
#Be careful about unsigned integers, complex # Be careful about unsigned integers, complex
raise NotImplementedError() raise NotImplementedError()
...@@ -2479,7 +2507,7 @@ def argmin(x, axis=None, keepdims=False): ...@@ -2479,7 +2507,7 @@ def argmin(x, axis=None, keepdims=False):
if str_x_type.startswith('float') or str_x_type in int_dtypes: if str_x_type.startswith('float') or str_x_type in int_dtypes:
return argmax(-x, axis=axis, keepdims=keepdims) return argmax(-x, axis=axis, keepdims=keepdims)
else: else:
#Be careful about unsigned integers, complex # Be careful about unsigned integers, complex
raise NotImplementedError() raise NotImplementedError()
...@@ -2707,7 +2735,7 @@ def sqr(a): ...@@ -2707,7 +2735,7 @@ def sqr(a):
"""square of a""" """square of a"""
#alias to sqr, included to maintain similarity with numpy interface # alias to sqr, included to maintain similarity with numpy interface
square = sqr square = sqr
...@@ -2849,7 +2877,8 @@ def complex_from_polar(abs, angle): ...@@ -2849,7 +2877,8 @@ def complex_from_polar(abs, angle):
# Misc # Misc
########################## ##########################
#fill, _fill_inplace = _elemwise(scal.second, 'fill',
# fill, _fill_inplace = _elemwise(scal.second, 'fill',
#"""fill WRITEME (elemwise)""") #"""fill WRITEME (elemwise)""")
@_scal_elemwise @_scal_elemwise
def second(a, b): def second(a, b):
...@@ -2917,7 +2946,7 @@ class Eye(gof.Op): ...@@ -2917,7 +2946,7 @@ class Eye(gof.Op):
return [out_shape] return [out_shape]
def grad(self, inp, grads): def grad(self, inp, grads):
return [ grad_undefined(self,i,inp[i]) for i in xrange(3) ] return [grad_undefined(self, i, inp[i]) for i in xrange(3)]
def __eq__(self, other): def __eq__(self, other):
return type(self) == type(other) and self.dtype == other.dtype return type(self) == type(other) and self.dtype == other.dtype
...@@ -3092,7 +3121,7 @@ class Alloc(gof.Op): ...@@ -3092,7 +3121,7 @@ class Alloc(gof.Op):
out[0] = numpy.empty(sh, dtype=v.dtype) out[0] = numpy.empty(sh, dtype=v.dtype)
out[0][...] = v # broadcast v to fill us up out[0][...] = v # broadcast v to fill us up
else: else:
#reuse the allocated memory. # reuse the allocated memory.
out[0][...] = v # broadcast v to fill us up out[0][...] = v # broadcast v to fill us up
def c_code(self, node, name, inp, out, sub): def c_code(self, node, name, inp, out, sub):
...@@ -3280,12 +3309,12 @@ class Mean(elemwise.CAReduce): ...@@ -3280,12 +3309,12 @@ class Mean(elemwise.CAReduce):
if self.axis is not None: if self.axis is not None:
return super(Op, self).c_code(node, name, inames, onames, sub) return super(Op, self).c_code(node, name, inames, onames, sub)
ret = elemwise.CAReduce.c_code(self, node, name, inames, onames, sub) ret = elemwise.CAReduce.c_code(self, node, name, inames, onames, sub)
#TODO: c_code perform support only axis is None # TODO: c_code perform support only axis is None
return ret + """ return ret + """
*((double *)PyArray_DATA(%s)) /= PyArray_SIZE(%s); *((double *)PyArray_DATA(%s)) /= PyArray_SIZE(%s);
""" % (onames[0], inames[0]) """ % (onames[0], inames[0])
#TODO: implement the grad. When done and tested, you can make this the default # TODO: implement the grad. When done and tested, you can make this the default
# version. # version.
# def grad(self, (x,), (gout,)): # def grad(self, (x,), (gout,)):
# import pdb;pdb.set_trace() # import pdb;pdb.set_trace()
...@@ -3379,28 +3408,33 @@ def var(input, axis=None, keepdims=False): ...@@ -3379,28 +3408,33 @@ def var(input, axis=None, keepdims=False):
if isinstance(axis, int): if isinstance(axis, int):
axis = [axis] axis = [axis]
#compute the axis-wise mean # compute the axis-wise mean
mean_input = mean(input, axis, keepdims=True) mean_input = mean(input, axis, keepdims=True)
#center the input # center the input
centered_input = input - mean_input centered_input = input - mean_input
#return the mean sqr # return the mean sqr
return mean((centered_input ** 2), axis, keepdims=keepdims) return mean((centered_input ** 2), axis, keepdims=keepdims)
@constructor @constructor
def std(input, axis=None, keepdims=False): def std(input, axis=None, keepdims=False):
""" """
Computes the standard deviation along the given axis(es) of a tensor `input`. Computes the standard deviation along the given axis(es)
of a tensor `input`.
:param axis: Compute the standard deviation along this axis of the tensor. :param axis: Compute the standard deviation along this
axis of the tensor.
None means all axes (like numpy). None means all axes (like numpy).
:type axis: None or int or (list of int) (see `Sum`) :type axis: None or int or (list of int) (see `Sum`)
:param keepdims: If this is set to True, the axes which are reduced are :param keepdims: If this is set to True, the axes
left in the result as dimensions with size one. With this option, which are reduced are
the result will broadcast correctly against the original tensor. left in the result as dimensions with size one.
With this option,
the result will broadcast correctly against the
original tensor.
""" """
return sqrt(var(input=input, axis=axis, keepdims=keepdims)) return sqrt(var(input=input, axis=axis, keepdims=keepdims))
...@@ -3423,8 +3457,8 @@ if 0: ...@@ -3423,8 +3457,8 @@ if 0:
type = TensorType(dtype=input.type.dtype, type = TensorType(dtype=input.type.dtype,
broadcastable=broadcastable) broadcastable=broadcastable)
#backport # backport
#type = TensorType(dtype=input.type.dtype, # type = TensorType(dtype=input.type.dtype,
# broadcastable=[ # broadcastable=[
# False if i==axis else x # False if i==axis else x
# for i, x in enumerate(input.broadcastable)]) # for i, x in enumerate(input.broadcastable)])
...@@ -3859,7 +3893,7 @@ class Subtensor(Op): ...@@ -3859,7 +3893,7 @@ class Subtensor(Op):
exception.subtensor_invalid = True exception.subtensor_invalid = True
raise exception raise exception
#infer the broadcasting pattern # infer the broadcasting pattern
padded = (idx_list padded = (idx_list
+ [slice(None, None, None)] * (x.type.ndim - len(idx_list))) + [slice(None, None, None)] * (x.type.ndim - len(idx_list)))
broadcastable = [bc for p, bc in zip(padded, x.type.broadcastable) broadcastable = [bc for p, bc in zip(padded, x.type.broadcastable)
...@@ -3942,7 +3976,7 @@ class Subtensor(Op): ...@@ -3942,7 +3976,7 @@ class Subtensor(Op):
return type(self) == type(other) and self.idx_list == other.idx_list return type(self) == type(other) and self.idx_list == other.idx_list
def __hash__(self): def __hash__(self):
#TODO: optimize by cache this hash value # TODO: optimize by cache this hash value
msg = [] msg = []
for entry in self.idx_list: for entry in self.idx_list:
if isinstance(entry, slice): if isinstance(entry, slice):
...@@ -3951,8 +3985,8 @@ class Subtensor(Op): ...@@ -3951,8 +3985,8 @@ class Subtensor(Op):
msg += [entry] msg += [entry]
idx_list = tuple(msg) idx_list = tuple(msg)
#backport # backport
#idx_list = tuple((entry.start, entry.stop, entry.step) # idx_list = tuple((entry.start, entry.stop, entry.step)
# if isinstance(entry, slice) # if isinstance(entry, slice)
# else entry # else entry
# for entry in self.idx_list) # for entry in self.idx_list)
...@@ -3989,7 +4023,7 @@ class Subtensor(Op): ...@@ -3989,7 +4023,7 @@ class Subtensor(Op):
fail = sub['fail'] fail = sub['fail']
init_cmds = [] # initialization for subtensor_spec init_cmds = [] # initialization for subtensor_spec
is_slice = [] is_slice = []
#TODO: change that, it might lead to unexpected results, # TODO: change that, it might lead to unexpected results,
# see assembla-#767 # see assembla-#767
NONE_CODE = maxsize - 1 NONE_CODE = maxsize - 1
...@@ -4040,7 +4074,7 @@ class Subtensor(Op): ...@@ -4040,7 +4074,7 @@ class Subtensor(Op):
for entry in idx_list: for entry in idx_list:
init_entry(entry) init_entry(entry)
#make sure we used all inputs # make sure we used all inputs
assert input_pos() == len(inputs), input_pos() assert input_pos() == len(inputs), input_pos()
assert len(is_slice) <= node.inputs[0].ndim, node.inputs[0].ndim assert len(is_slice) <= node.inputs[0].ndim, node.inputs[0].ndim
...@@ -4213,7 +4247,7 @@ class Subtensor(Op): ...@@ -4213,7 +4247,7 @@ class Subtensor(Op):
} }
PyArray_UpdateFlags(xview, NPY_C_CONTIGUOUS|NPY_F_CONTIGUOUS); PyArray_UpdateFlags(xview, NPY_C_CONTIGUOUS|NPY_F_CONTIGUOUS);
""" % locals() """ % locals()
#print rval # print rval
return rval return rval
@staticmethod @staticmethod
...@@ -4398,7 +4432,7 @@ class IncSubtensor(Op): ...@@ -4398,7 +4432,7 @@ class IncSubtensor(Op):
msg += [entry] msg += [entry]
idx_list = tuple(msg) idx_list = tuple(msg)
#backport # backport
#idx_list = tuple((entry.start, entry.stop, entry.step) #idx_list = tuple((entry.start, entry.stop, entry.step)
# if isinstance(entry, slice) # if isinstance(entry, slice)
# else entry # else entry
...@@ -4675,7 +4709,7 @@ class Split(Op): ...@@ -4675,7 +4709,7 @@ class Split(Op):
def perform(self, node, inputs, outputs): def perform(self, node, inputs, outputs):
"""WRITEME""" """WRITEME"""
x, axis, splits = inputs x, axis, splits = inputs
#in python 2.4, x.shape[numpy.asarray(1)] don't work. # in python 2.4, x.shape[numpy.asarray(1)] don't work.
if sys.version_info[0:2] == (2, 4) and axis.size == 1: if sys.version_info[0:2] == (2, 4) and axis.size == 1:
axis = int(axis) axis = int(axis)
...@@ -5376,7 +5410,6 @@ class Reshape(Op): ...@@ -5376,7 +5410,6 @@ class Reshape(Op):
raise ValueError('Cannot reshape input of shape %s to shape %s' % raise ValueError('Cannot reshape input of shape %s to shape %s' %
(x.shape, shp)) (x.shape, shp))
def grad(self, inp, grads): def grad(self, inp, grads):
x, shp = inp x, shp = inp
g_out, = grads g_out, = grads
...@@ -5399,7 +5432,7 @@ class Reshape(Op): ...@@ -5399,7 +5432,7 @@ class Reshape(Op):
# The following expression leads to cycles in feature_shape, # The following expression leads to cycles in feature_shape,
# because it tries to replace the Shape_i node by the switch # because it tries to replace the Shape_i node by the switch
# statement, which depends on Shape_i. # statement, which depends on Shape_i.
#return [tuple([switch(eq(node.inputs[1][i], -1), # return [tuple([switch(eq(node.inputs[1][i], -1),
# theano.tensor.opt.Shape_i(i)(node.outputs[0]), # theano.tensor.opt.Shape_i(i)(node.outputs[0]),
# node.inputs[1][i]) # node.inputs[1][i])
# for i in xrange(self.ndim)] # for i in xrange(self.ndim)]
...@@ -5462,7 +5495,8 @@ class Reshape(Op): ...@@ -5462,7 +5495,8 @@ class Reshape(Op):
%(shp)s->data + ii * %(shp)s->strides[0]))[0]; %(shp)s->data + ii * %(shp)s->strides[0]))[0];
} }
Py_XDECREF(%(z)s); Py_XDECREF(%(z)s);
%(z)s = (PyArrayObject *) PyArray_Newshape(%(x)s, &newshape, PyArray_CORDER); %(z)s = (PyArrayObject *) PyArray_Newshape(%(x)s, &newshape,
PyArray_CORDER);
if (!%(z)s) if (!%(z)s)
{ {
PyErr_Format(PyExc_ValueError, PyErr_Format(PyExc_ValueError,
...@@ -5557,7 +5591,7 @@ def flatten(x, outdim=1): ...@@ -5557,7 +5591,7 @@ def flatten(x, outdim=1):
# """ # """
# Calculates the gradient of the Tile Op. # Calculates the gradient of the Tile Op.
# """ # """
# #this is so weird, I can't think of how to make this a general thing. # # this is so weird, I can't think of how to make this a general thing.
# def make_node(self, x, reps, g_out): # def make_node(self, x, reps, g_out):
# return gof.Apply(self, [x, reps, g_out], [x.type()]) # return gof.Apply(self, [x, reps, g_out], [x.type()])
# #
...@@ -5645,11 +5679,11 @@ def tile(x, reps, ndim=None): ...@@ -5645,11 +5679,11 @@ def tile(x, reps, ndim=None):
TODO: expand this. TODO: expand this.
""" """
try: try:
assert python_all([int(i) == i for i in iter(reps)]) assert python_all([int(i) == i for i in iter(reps)])
except (TypeError, AssertionError): except (TypeError, AssertionError):
raise ValueError("reps argument to tile must be a constant (e.g. " raise ValueError("reps argument to tile must be a constant (e.g. "
"tuple, list of integers)") "tuple, list of integers)")
if len(reps) != x.ndim: if len(reps) != x.ndim:
raise ValueError("len(reps) != x.ndim not currently supported") raise ValueError("len(reps) != x.ndim not currently supported")
elif (ndim is not None) and ndim != x.ndim: elif (ndim is not None) and ndim != x.ndim:
...@@ -5663,7 +5697,7 @@ def tile(x, reps, ndim=None): ...@@ -5663,7 +5697,7 @@ def tile(x, reps, ndim=None):
ndim = len(reps) ndim = len(reps)
# backport # backport
# ndim = len(reps) if ndim is None else ndim #not sure if len(shp) is going # ndim = len(reps) if ndim is None else ndim # not sure if len(shp) is going
# to work. # to work.
if ndim not in tile.op: if ndim not in tile.op:
tile.op[ndim] = Tile(ndim) tile.op[ndim] = Tile(ndim)
...@@ -6146,7 +6180,7 @@ class AdvancedSubtensor(Op): ...@@ -6146,7 +6180,7 @@ class AdvancedSubtensor(Op):
def make_node(self, x, *inputs): def make_node(self, x, *inputs):
x = as_tensor_variable(x) x = as_tensor_variable(x)
#FIXME # FIXME
# Note (9 Jul 2012): what does this 'FIXME' mean? Possibly that the # Note (9 Jul 2012): what does this 'FIXME' mean? Possibly that the
# current implementation must be generalized? Please specify. # current implementation must be generalized? Please specify.
if x.ndim == 2 and len(inputs) == 2: if x.ndim == 2 and len(inputs) == 2:
...@@ -6209,7 +6243,7 @@ class AdvancedSubtensor(Op): ...@@ -6209,7 +6243,7 @@ class AdvancedSubtensor(Op):
'are too big (>= 2^32 elements). It is possible that ' 'are too big (>= 2^32 elements). It is possible that '
'out[0] (%s), with shape %s, is not correctly filled.' 'out[0] (%s), with shape %s, is not correctly filled.'
% (out[0], out[0].shape)) % (out[0], out[0].shape))
#return # return
#raise NotImplementedError() #raise NotImplementedError()
def grad(self, inputs, grads): def grad(self, inputs, grads):
...@@ -6232,8 +6266,8 @@ class AdvancedIncSubtensor(Op): ...@@ -6232,8 +6266,8 @@ class AdvancedIncSubtensor(Op):
def __init__(self, inplace=False, set_instead_of_inc=False): def __init__(self, inplace=False, set_instead_of_inc=False):
self.inplace = inplace self.inplace = inplace
self.set_instead_of_inc = set_instead_of_inc self.set_instead_of_inc = set_instead_of_inc
#The assert is needed as in the pass the first argument was # The assert is needed as in the pass the first argument was
#something else that was not used. # something else that was not used.
assert isinstance(inplace, bool) assert isinstance(inplace, bool)
if self.inplace: if self.inplace:
raise NotImplementedError('In place computation is not' raise NotImplementedError('In place computation is not'
...@@ -6325,6 +6359,7 @@ advanced_inc_subtensor = AdvancedIncSubtensor() ...@@ -6325,6 +6359,7 @@ advanced_inc_subtensor = AdvancedIncSubtensor()
# #
# TODO: Dotinv should go here, Eigs, Svd, etc. # TODO: Dotinv should go here, Eigs, Svd, etc.
class Dot(Op): class Dot(Op):
"""Compute matrix-matrix, matrix-vector products and vector inner-products. """Compute matrix-matrix, matrix-vector products and vector inner-products.
...@@ -6351,7 +6386,7 @@ class Dot(Op): ...@@ -6351,7 +6386,7 @@ class Dot(Op):
numpy_semantics = 0 numpy_semantics = 0
if numpy_semantics: if numpy_semantics:
#numpy defines dot for tensor pairs with any rank # numpy defines dot for tensor pairs with any rank
if len(inputs) != 2: if len(inputs) != 2:
raise TypeError( raise TypeError(
"Wrong number of inputs for %s (got %i, expected 2)" % "Wrong number of inputs for %s (got %i, expected 2)" %
...@@ -6712,7 +6747,7 @@ def tensordot(x, y=None, axes=2): ...@@ -6712,7 +6747,7 @@ def tensordot(x, y=None, axes=2):
return tensordot.op[axes](x, y) return tensordot.op[axes](x, y)
#TODO: tensordot should be function as described in rst docs. # TODO: tensordot should be function as described in rst docs.
def outer(x, y): def outer(x, y):
......
...@@ -98,26 +98,26 @@ class Conv3D(theano.Op): ...@@ -98,26 +98,26 @@ class Conv3D(theano.Op):
if 'name' in dir(dCdH) and dCdH.name is not None: if 'name' in dir(dCdH) and dCdH.name is not None:
dCdH_name = dCdH.name dCdH_name = dCdH.name
else: else:
dCdH_name = 'anon' dCdH_name = 'anon_dCdH'
if 'name' in dir(V) and V.name is not None: if 'name' in dir(V) and V.name is not None:
V_name = V.name V_name = V.name
else: else:
V_name = 'anon' V_name = 'anon_V'
if 'name' in dir(W) and W.name is not None: if 'name' in dir(W) and W.name is not None:
W_name = W.name W_name = W.name
else: else:
W_name = 'anon' W_name = 'anon_W'
if 'name' in dir(b) and b.name is not None: if 'name' in dir(b) and b.name is not None:
b_name = b.name b_name = b.name
else: else:
b_name = 'anon' b_name = 'anon_b'
dCdV.name = 'Conv3D_dCdV.dCdH='+dCdH_name+',V='+V_name dCdV.name = 'Conv3D_dCdV(dCdH='+dCdH_name+',V='+V_name+')'
dCdW.name = 'Conv3D_dCdW.dCdH='+dCdH_name+',V='+V_name+',W='+W_name dCdW.name = 'Conv3D_dCdW(dCdH='+dCdH_name+',V='+V_name+',W='+W_name+')'
dCdb.name = 'Conv3D_dCdb.dCdH='+dCdH_name+',V='+V_name+',W='+W_name+',b='+b_name dCdb.name = 'Conv3D_dCdb(dCdH='+dCdH_name+',V='+V_name+',W='+W_name+',b='+b_name+')'
......
...@@ -56,22 +56,22 @@ class ConvTransp3D(theano.Op): ...@@ -56,22 +56,22 @@ class ConvTransp3D(theano.Op):
if 'name' in dir(dCdR) and dCdR.name is not None: if 'name' in dir(dCdR) and dCdR.name is not None:
dCdR_name = dCdR.name dCdR_name = dCdR.name
else: else:
dCdR_name = 'anon' dCdR_name = 'anon_dCdR'
if 'name' in dir(H) and H.name is not None: if 'name' in dir(H) and H.name is not None:
H_name = H.name H_name = H.name
else: else:
H_name = 'anon' H_name = 'anon_H'
if 'name' in dir(W) and W.name is not None: if 'name' in dir(W) and W.name is not None:
W_name = W.name W_name = W.name
else: else:
W_name = 'anon' W_name = 'anon_W'
if 'name' in dir(b) and b.name is not None: if 'name' in dir(b) and b.name is not None:
b_name = b.name b_name = b.name
else: else:
b_name = 'anon' b_name = 'anon_b'
dCdW.name = 'ConvTransp3D_dCdW.H='+H_name+',dCdR='+dCdR_name+',W='+W_name dCdW.name = 'ConvTransp3D_dCdW.H='+H_name+',dCdR='+dCdR_name+',W='+W_name
......
...@@ -780,9 +780,19 @@ class ConvOp(OpenMPOp): ...@@ -780,9 +780,19 @@ class ConvOp(OpenMPOp):
# build a "node", that should be equivalent to the one given by # build a "node", that should be equivalent to the one given by
# self.make_node, but using conv3D instead of self. # self.make_node, but using conv3D instead of self.
shuffled_inputs = inputs.dimshuffle(0, 2, 3, 'x', 1)
if inputs.name is not None:
shuffled_inputs.name = 'shuffle_for_conv3D(%s)' % inputs.name
flipped_kerns = kerns[:, :, ::-1, ::-1]
if kerns.name is not None:
flipped_kerns.name = 'flipped(%s)' % kerns.name
shuffled_kerns = flipped_kerns.dimshuffle(0, 2, 3, 'x', 1)
if flipped_kerns.name is not None:
shuffled_kerns.name = 'shuffled_for_conv3D(%s)' % flipped_kerns.name
tmp_node = theano.tensor.nnet.conv3D( tmp_node = theano.tensor.nnet.conv3D(
V=inputs.dimshuffle(0, 2, 3, 'x', 1), V = shuffled_inputs,
W=kerns[:, :, ::-1, ::-1].dimshuffle(0, 2, 3, 'x', 1), W= shuffled_kerns,
b=theano.tensor.alloc(numpy.asarray(0, dtype=kerns.dtype), b=theano.tensor.alloc(numpy.asarray(0, dtype=kerns.dtype),
kerns.shape[0]), kerns.shape[0]),
d=(self.dx, self.dy, 1)) d=(self.dx, self.dy, 1))
......
...@@ -14,6 +14,7 @@ from theano.compile import optdb ...@@ -14,6 +14,7 @@ from theano.compile import optdb
from theano.gof import Apply from theano.gof import Apply
from theano.tensor.nnet.sigm import sigmoid, softplus from theano.tensor.nnet.sigm import sigmoid, softplus
from theano.gradient import DisconnectedType
############ ############
...@@ -76,6 +77,10 @@ class SoftmaxWithBias(gof.Op): ...@@ -76,6 +77,10 @@ class SoftmaxWithBias(gof.Op):
def grad(self, inp, grads): def grad(self, inp, grads):
x, b = inp x, b = inp
g_sm, = grads g_sm, = grads
if isinstance(g_sm.type, DisconnectedType):
return [ DisconnectedType()(), DisconnectedType()() ]
sm = softmax_with_bias(x, b) sm = softmax_with_bias(x, b)
dx = softmax_grad(g_sm, sm) dx = softmax_grad(g_sm, sm)
db = tensor.sum(dx, axis=0) db = tensor.sum(dx, axis=0)
...@@ -710,21 +715,40 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op): ...@@ -710,21 +715,40 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
def grad(self, inp, grads): def grad(self, inp, grads):
x, b, y_idx = inp x, b, y_idx = inp
g_nll, g_sm, g_am = grads g_nll, g_sm, g_am = grads
if g_am is not None:
raise NotImplementedError()
elif g_sm is not None: dx_terms = []
# There is a gradient w.r.t. the softmax's output itself. db_terms = []
if g_nll is not None or g_am is not None: d_idx_terms = []
raise NotImplementedError()
return softmax_with_bias.grad((x, b, ), (g_sm, )) + (None, )
else: if not isinstance(g_nll.type, DisconnectedType):
# There is a gradient w.r.t. the NLL.
assert g_nll is not None
nll, sm = crossentropy_softmax_1hot_with_bias(x, b, y_idx) nll, sm = crossentropy_softmax_1hot_with_bias(x, b, y_idx)
#dx = CrossentropySoftmax1HotWithBiasDx()(g_nll, sm, y_idx)
dx = crossentropy_softmax_1hot_with_bias_dx(g_nll, sm, y_idx) dx = crossentropy_softmax_1hot_with_bias_dx(g_nll, sm, y_idx)
db = tensor.sum(dx, axis=[0]) db = tensor.sum(dx, axis=[0])
return dx, db, None dx_terms.append(dx)
db_terms.append(db)
if not isinstance(g_sm.type, DisconnectedType):
dx, db = softmax_with_bias.grad((x, b), (g_sm, ))
dx_terms.append(dx)
db_terms.append(db)
if not isinstance(g_am.type, DisconnectedType):
dx_terms.append(x.zeros_like())
db_terms.append(b.zeros_like())
d_idx_terms.append(y_idx.zeros_like())
def fancy_sum( terms ):
if len(terms) == 0:
return DisconnectedType()()
rval = terms[0]
for term in terms[1:]:
rval = rval + term
return rval
return [ fancy_sum(terms) for terms in
[dx_terms, db_terms, d_idx_terms ] ]
def c_headers(self): def c_headers(self):
return ['<iostream>', '<cmath>'] return ['<iostream>', '<cmath>']
......
...@@ -18,7 +18,9 @@ class TestConv2D(utt.InferShapeTester): ...@@ -18,7 +18,9 @@ class TestConv2D(utt.InferShapeTester):
def setUp(self): def setUp(self):
super (TestConv2D, self).setUp() super (TestConv2D, self).setUp()
self.input = T.dtensor4('input') self.input = T.dtensor4('input')
self.input.name = 'default_V'
self.filters = T.dtensor4('filters') self.filters = T.dtensor4('filters')
self.filters.name = 'default_filters'
def validate(self, image_shape, filter_shape, def validate(self, image_shape, filter_shape,
border_mode='valid', subsample=(1, 1), border_mode='valid', subsample=(1, 1),
...@@ -34,7 +36,7 @@ class TestConv2D(utt.InferShapeTester): ...@@ -34,7 +36,7 @@ class TestConv2D(utt.InferShapeTester):
N_filter_shape = [T.get_constant_value(T. N_filter_shape = [T.get_constant_value(T.
as_tensor_variable(x)) for x in filter_shape] as_tensor_variable(x)) for x in filter_shape]
if not input: if input is None:
input = self.input input = self.input
if not filters: if not filters:
filters = self.filters filters = self.filters
...@@ -44,11 +46,16 @@ class TestConv2D(utt.InferShapeTester): ...@@ -44,11 +46,16 @@ class TestConv2D(utt.InferShapeTester):
# we create a symbolic function so that verify_grad can work # we create a symbolic function so that verify_grad can work
def sym_conv2d(input, filters): def sym_conv2d(input, filters):
# define theano graph and function # define theano graph and function
return conv.conv2d(input, filters, image_shape, filter_shape, input.name = 'input'
filters.name = 'filters'
rval = conv.conv2d(input, filters, image_shape, filter_shape,
border_mode, subsample, unroll_batch=unroll_batch, border_mode, subsample, unroll_batch=unroll_batch,
unroll_kern=unroll_kern, unroll_patch=unroll_patch) unroll_kern=unroll_kern, unroll_patch=unroll_patch)
rval.name = 'conv_output'
return rval
output = sym_conv2d(input, filters) output = sym_conv2d(input, filters)
output.name = 'conv2d(%s,%s)' % (input.name, filters.name)
theano_conv = theano.function([input, filters], output) theano_conv = theano.function([input, filters], output)
# initialize input and compute result # initialize input and compute result
......
...@@ -121,33 +121,49 @@ class TestConv3D(utt.InferShapeTester): ...@@ -121,33 +121,49 @@ class TestConv3D(utt.InferShapeTester):
mode.check_py_code = False mode.check_py_code = False
self.W = shared(N.ndarray(shape=(1, 1, 1, 1, 1), dtype=floatX)) self.W = shared(N.ndarray(shape=(1, 1, 1, 1, 1), dtype=floatX))
self.W.name = 'W'
self.b = shared(N.zeros(1, dtype=floatX)) self.b = shared(N.zeros(1, dtype=floatX))
self.b.name = 'b'
self.rb = shared(N.zeros(1, dtype=floatX)) self.rb = shared(N.zeros(1, dtype=floatX))
self.rb.name = 'rb'
self.V = shared(N.ndarray(shape=(1, 1, 1, 1, 1), dtype=floatX)) self.V = shared(N.ndarray(shape=(1, 1, 1, 1, 1), dtype=floatX))
self.V.name = 'V'
self.d = shared(N.ndarray(shape=(3, ), dtype=int)) self.d = shared(N.ndarray(shape=(3, ), dtype=int))
self.d.name = 'd'
self.H = conv3D(self.V, self.W, self.b, self.d) self.H = conv3D(self.V, self.W, self.b, self.d)
self.H.name = 'H'
self.H_func = function([], self.H, mode=mode) self.H_func = function([], self.H, mode=mode)
self.H_shape_func = function([], self.H.shape, mode=mode) self.H_shape_func = function([], self.H.shape, mode=mode)
self.RShape = T.vector(dtype='int64') self.RShape = T.vector(dtype='int64')
self.RShape.name = 'RShape'
self.otherH = T.TensorType(floatX, self.otherH = T.TensorType(floatX,
(False, False, False, False, False))(name='otherH') (False, False, False, False, False))(name='otherH')
self.transp = convTransp3D(self.W, self.rb, self.d, self.transp = convTransp3D(self.W, self.rb, self.d,
self.otherH, self.RShape) self.otherH, self.RShape)
self.transp.name = 'transp'
self.transp_func = function([self.otherH, self.RShape], self.transp_func = function([self.otherH, self.RShape],
self.transp, mode=mode) self.transp, mode=mode)
self.R = convTransp3D(self.W, self.rb, self.d, self.H, self.RShape) self.R = convTransp3D(self.W, self.rb, self.d, self.H, self.RShape)
self.R.name = 'R'
self.R_func = function([self.RShape], self.R, mode=mode) self.R_func = function([self.RShape], self.R, mode=mode)
self.R_shape_func = function([self.RShape], self.R.shape) self.R_shape_func = function([self.RShape], self.R.shape)
self.reconsObj = T.sum(T.sqr(self.V - self.R)) diff = self.V - self.R
diff.name = 'diff'
sqr = T.sqr(diff)
sqr.name = 'sqr'
self.reconsObj = T.sum(sqr)
self.reconsObj.name = 'reconsObj'
self.reconsObjFunc = function([self.RShape], self.reconsObj, mode=mode) self.reconsObjFunc = function([self.RShape], self.reconsObj, mode=mode)
W_grad = T.grad(self.reconsObj, self.W)
self.gradientsFunc = function([self.RShape], self.gradientsFunc = function([self.RShape],
[T.grad(self.reconsObj, self.W), T.grad(self.reconsObj, [W_grad, T.grad(self.reconsObj,
self.H), T.grad(self.reconsObj, self.V), self.H), T.grad(self.reconsObj, self.V),
T.grad(self.reconsObj, self.b)], mode=mode) T.grad(self.reconsObj, self.b)], mode=mode)
......
...@@ -2832,16 +2832,16 @@ class Canonizer(gof.LocalOptimizer): ...@@ -2832,16 +2832,16 @@ class Canonizer(gof.LocalOptimizer):
# this canonized graph... if so, we do nothing and wait for # this canonized graph... if so, we do nothing and wait for
# them to be transformed. # them to be transformed.
def _bypass_dimshuffle(n): def _bypass_dimshuffle(n):
if isinstance(n.op, DimShuffle) and len(n.outputs[0].clients) <= 1: if (isinstance(getattr(n, 'op', None), DimShuffle) and
return _bypass_dimshuffle(n.outputs[0].clients.__iter__( len(n.outputs[0].clients) <= 1):
).next()[0]) return _bypass_dimshuffle(n.outputs[0].clients[0][0])
else: else:
return n return n
for c, c_idx in out.clients: for c, c_idx in out.clients:
if c == 'output': if c == 'output':
continue continue
if _bypass_dimshuffle(c).op in [self.main, self.inverse, if getattr(_bypass_dimshuffle(c), 'op', '') in [
self.reciprocal]: self.main, self.inverse, self.reciprocal]:
return False return False
# Here we make the canonical version of the graph around this node # Here we make the canonical version of the graph around this node
......
...@@ -2023,6 +2023,10 @@ class T_max_and_argmax(unittest.TestCase): ...@@ -2023,6 +2023,10 @@ class T_max_and_argmax(unittest.TestCase):
because there is no differentiable path from cost to the input and because there is no differentiable path from cost to the input and
not because of an error of the grad method of the op not because of an error of the grad method of the op
""" """
raise KnownFailureTest("The desired behavior of the grad method in this case is currently under debate. In any case, the result should be to return NaN or 0, not to report a disconnected input.")
x = matrix() x = matrix()
cost = argmax(x, axis=0).sum() cost = argmax(x, axis=0).sum()
value_error_raised = False value_error_raised = False
...@@ -2220,6 +2224,7 @@ class T_argmin_argmax(unittest.TestCase): ...@@ -2220,6 +2224,7 @@ class T_argmin_argmax(unittest.TestCase):
def test_grad_argmin(self): def test_grad_argmin(self):
data = rand(2, 3) data = rand(2, 3)
n = as_tensor_variable(data) n = as_tensor_variable(data)
n.name = 'n'
#test grad of argmin #test grad of argmin
utt.verify_grad(lambda v: argmin(v, axis=-1), [data]) utt.verify_grad(lambda v: argmin(v, axis=-1), [data])
...@@ -2231,7 +2236,9 @@ class T_argmin_argmax(unittest.TestCase): ...@@ -2231,7 +2236,9 @@ class T_argmin_argmax(unittest.TestCase):
utt.verify_grad(lambda v: argmin(v.flatten()), [data]) utt.verify_grad(lambda v: argmin(v.flatten()), [data])
try: try:
grad(argmin(n, axis=-1), n) cost = argmin(n, axis=-1)
cost.name = None
g = grad(cost, n)
raise Exception('Expected an error') raise Exception('Expected an error')
except TypeError: except TypeError:
pass pass
...@@ -4375,6 +4382,7 @@ class test_grad(unittest.TestCase): ...@@ -4375,6 +4382,7 @@ class test_grad(unittest.TestCase):
o = test_grad.O() o = test_grad.O()
a1 = o.make_node() a1 = o.make_node()
g0,g1 = grad(a1.outputs[0], a1.inputs) g0,g1 = grad(a1.outputs[0], a1.inputs)
g0.name = None
self.assertTrue(o.gval0 is g0) self.assertTrue(o.gval0 is g0)
self.assertTrue(o.gval1 is g1) self.assertTrue(o.gval1 is g1)
...@@ -4435,10 +4443,8 @@ class test_grad(unittest.TestCase): ...@@ -4435,10 +4443,8 @@ class test_grad(unittest.TestCase):
v = vector() v = vector()
m = matrix() m = matrix()
# grad(v,...) and grad(m,...) should fail # grad(v,...) and grad(m,...) should fail
self.assertRaises(TypeError, grad, v, s) self.assertRaises(TypeError, grad, v, v)
self.assertRaises(TypeError, grad, v, m) self.assertRaises(TypeError, grad, m, m)
self.assertRaises(TypeError, grad, m, s)
self.assertRaises(TypeError, grad, m, v)
class T_op_cache(unittest.TestCase): class T_op_cache(unittest.TestCase):
def setUp(self): def setUp(self):
......
...@@ -10,19 +10,22 @@ from theano.gradient import grad_sources_inputs ...@@ -10,19 +10,22 @@ from theano.gradient import grad_sources_inputs
from theano import gradient from theano import gradient
from theano.tensor.nnet.Conv3D import conv3D from theano.tensor.nnet.Conv3D import conv3D
from theano import config from theano import config
import numpy as np
one = theano.tensor.as_tensor_variable(1.)
def _grad_sources_inputs(*args): def _grad_sources_inputs(*args):
# warn_type was introduced after this code, it complains throughout for nothing. # warn_type was introduced after this code, it complains throughout for nothing.
return grad_sources_inputs(warn_type=False, *args) return grad_sources_inputs(warn_type=False, *args)
class test_grad_sources_inputs(unittest.TestCase): class test_grad_sources_inputs(unittest.TestCase):
def test_retNone1(self): def test_retNone1(self):
"""Test that it is not ok to return None from op.grad()""" """Test that it is not ok to return None from op.grad()"""
class retNone(gof.op.Op): class retNone(gof.op.Op):
def make_node(self): def make_node(self):
inputs = [gof.generic()] inputs = [theano.tensor.vector()]
outputs = [gof.generic()] outputs = [theano.tensor.vector()]
return gof.Apply(self, inputs, outputs) return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads): def grad(self, inp, grads):
x, = inp x, = inp
...@@ -30,240 +33,118 @@ class test_grad_sources_inputs(unittest.TestCase): ...@@ -30,240 +33,118 @@ class test_grad_sources_inputs(unittest.TestCase):
pass pass
a = retNone().make_node() a = retNone().make_node()
try: try:
_grad_sources_inputs([(a.out, 1)], None) _grad_sources_inputs([(a.out, one)], None)
except ValueError, e: except TypeError, e:
self.assertTrue(e[0] is gradient._msg_retType)
return return
self.fail() self.fail()
def test_retNone1_b(self):
"""Test that it is ok to return [None] from op.grad()"""
class retNone(gof.op.Op):
def make_node(self, *inputs):
outputs = [gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads):
return [None]
i = gof.generic()
a = retNone().make_node(i)
g = _grad_sources_inputs([(a.out, 1)], None)
self.assertTrue(not i in g)
def test_wrong_rval_len1(self): def test_wrong_rval_len1(self):
"""Test that it is not ok to return the wrong number of gradients""" """Test that it is not ok to return the wrong number of gradient terms"""
class retNone(gof.op.Op): class retNone(gof.op.Op):
def make_node(self, *inputs): def make_node(self, *inputs):
outputs = [gof.generic()] outputs = [theano.tensor.vector()]
return gof.Apply(self, inputs, outputs) return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads): def grad(self, inputs, grads):
return [None] return [None]
i = gof.generic() i = theano.tensor.vector()
j = gof.generic() j = theano.tensor.vector()
a1 = retNone().make_node(i) a1 = retNone().make_node(i)
g = _grad_sources_inputs([(a1.out, 1)], None) g = _grad_sources_inputs([(a1.out, one)], None)
a2 = retNone().make_node(i,j) a2 = retNone().make_node(i,j)
try: try:
g = _grad_sources_inputs([(a2.out, 1)], None) g = _grad_sources_inputs([(a2.out, one)], None)
except ValueError, e: except ValueError, e:
self.assertTrue(e[0] is gradient._msg_badlen)
return return
self.fail() self.fail()
def test_stop_on_all_none(self):
"""Test that op.grad() is not called when output grads are all None"""
class retNone(gof.op.Op):
def __init__(self, tst):
self.tst = tst
def make_node(self, *inputs):
outputs = [gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads):
self.tst.fail()
i = gof.generic()
a1 = retNone(self).make_node(i)
g = _grad_sources_inputs([(a1.out, None)], None)
def test_1in_1out(self): def test_1in_1out(self):
"""Test grad is called correctly for a 1-to-1 op""" """Test grad is called correctly for a 1-to-1 op"""
gval = gof.generic() gval = theano.tensor.matrix()
class O(gof.op.Op): class O(gof.op.Op):
def make_node(self): def make_node(self):
inputs = [gof.generic()] inputs = [theano.tensor.matrix()]
outputs = [gof.generic()] outputs = [theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs) return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads): def grad(self, inp, grads):
return gval, return gval,
a1 = O().make_node() a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None) g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval) self.assertTrue(g[a1.inputs[0]] is gval)
def test_1in_Nout(self): def test_1in_Nout(self):
"""Test grad is called correctly for a 1-to-many op""" """Test grad is called correctly for a 1-to-many op"""
gval = gof.generic() gval = theano.tensor.matrix()
class O(gof.op.Op): class O(gof.op.Op):
def make_node(self): def make_node(self):
inputs = [gof.generic()] inputs = [theano.tensor.matrix()]
outputs = [gof.generic(),gof.generic()] outputs = [theano.tensor.scalar(),theano.tensor.scalar()]
return gof.Apply(self, inputs, outputs) return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads): def grad(self, inp, grads):
x, = inp x, = inp
gz1, gz2 = grads gz1, gz2 = grads
return gval, return gval,
a1 = O().make_node() a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None) g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval) self.assertTrue(g[a1.inputs[0]] is gval)
def test_Nin_1out(self): def test_Nin_1out(self):
"""Test grad is called correctly for a many-to-1 op""" """Test grad is called correctly for a many-to-1 op"""
gval0 = gof.generic() gval0 = theano.tensor.scalar()
gval1 = gof.generic() gval1 = theano.tensor.scalar()
class O(gof.op.Op): class O(gof.op.Op):
def make_node(self): def make_node(self):
inputs = [gof.generic(),gof.generic()] inputs = [theano.tensor.scalar(), theano.tensor.scalar()]
outputs = [gof.generic()] outputs = [theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs) return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads): def grad(self, inp, grads):
x0, x1 = inp x0, x1 = inp
gz, = grads gz, = grads
return (gval0, gval1) return (gval0, gval1)
a1 = O().make_node() a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None) g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval0) self.assertTrue(g[a1.inputs[0]] is gval0)
self.assertTrue(g[a1.inputs[1]] is gval1) self.assertTrue(g[a1.inputs[1]] is gval1)
def test_Nin_Nout(self): def test_Nin_Nout(self):
"""Test grad is called correctly for a many-to-many op""" """Test grad is called correctly for a many-to-many op"""
gval0 = gof.generic() gval0 = theano.tensor.matrix()
gval1 = gof.generic() gval1 = theano.tensor.matrix()
class O(gof.op.Op): class O(gof.op.Op):
def make_node(self): def make_node(self):
inputs = [gof.generic(),gof.generic()] inputs = [theano.tensor.matrix(),theano.tensor.matrix()]
outputs = [gof.generic(),gof.generic()] outputs = [theano.tensor.matrix(),theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs) return gof.Apply(self, inputs, outputs)
def grad(self, inp, grads): def grad(self, inp, grads):
return gval0, gval1 return gval0, gval1
a1 = O().make_node() a1 = O().make_node()
g = _grad_sources_inputs([(a1.outputs[0], 1)], None) g = _grad_sources_inputs([(a1.outputs[0], one)], None)
self.assertTrue(g[a1.inputs[0]] is gval0) self.assertTrue(g[a1.inputs[0]] is gval0)
self.assertTrue(g[a1.inputs[1]] is gval1) self.assertTrue(g[a1.inputs[1]] is gval1)
def test_some_None_ograds(self): def test_some_None_ograds(self):
"""Test grad is called when some output gradients are None""" """Test grad is called when some output gradients are None"""
class O(gof.op.Op): class O(gof.op.Op):
def __init__(self, tst): def __init__(self, tst):
self.tst = tst self.tst = tst
def make_node(self, *inputs): def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()] outputs = [theano.tensor.matrix(),theano.tensor.matrix()]
return gof.Apply(self, inputs, outputs) return gof.Apply(self, inputs, outputs)
def grad(self, inputs, g_out): def grad(self, inputs, g_out):
return [1] return [one]
i = gof.generic() i = theano.tensor.matrix()
a1 = O(self).make_node(i) a1 = O(self).make_node(i)
g = grad_sources_inputs([(a1.outputs[0], 1)], None, warn_type=False) g = grad_sources_inputs([(a1.outputs[0], one)], None, warn_type=False)
self.assertTrue(g[i] is 1) self.assertTrue(g[i] is one)
def test_some_None_igrads(self):
"""Test that traversal works properly when an op return some None"""
class O(gof.op.Op):
def __init__(self, tst, grad_ok):
self.tst = tst
self.grad_ok = grad_ok
def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, g_out):
if not self.grad_ok:
self.tst.fail()
else:
return [1, None]
i = gof.generic()
j = gof.generic()
k = gof.generic()
a1 = O(self, True).make_node(i,j)
a2 = O(self, True).make_node(a1.outputs[1], k)
g = grad_sources_inputs([(a2.outputs[0], 1)], None, warn_type=False)
self.assertTrue(g[i] is 1 and j not in g and k not in g)
a1 = O(self, True).make_node(i,j)
a2 = O(self, True).make_node(k, a1.outputs[1])
g = _grad_sources_inputs([(a2.outputs[0], 1)], None)
self.assertTrue(g[k] is 1 and i not in g and j not in g)
def test_inputs(self):
"""Test that passing inputs shortens the traversal"""
class O(gof.op.Op):
def __init__(self, tst, grad_ok):
self.tst = tst
self.grad_ok = grad_ok
def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads):
g0, g1 = grads
if not self.grad_ok:
self.tst.fail()
else:
if g1:
return [g0, g0+g1]
else:
return [g0, g0]
i = gof.generic()
j = gof.generic()
k = gof.generic()
a1 = O(self, True).make_node(i,j)
a2 = O(self, True).make_node(k,a1.outputs[1])
g = _grad_sources_inputs([(a2.outputs[0], 1), (a1.outputs[1],4),
(a1.outputs[0], 3), (a1.outputs[0], 3)], a1.outputs)
self.assertTrue(g[a2.inputs[0]] == 1)
self.assertTrue(g[a2.inputs[1]] == 5)
self.assertTrue(g[a1.outputs[0]] == 6)
self.assertTrue(g[a1.outputs[1]] == 5)
self.assertTrue(a1.inputs[0] not in g)
self.assertTrue(a1.inputs[1] not in g)
def test_multiple_sources(self):
"""Test that passing multiple sources works"""
class O(gof.op.Op):
def __init__(self, tst, grad_ok):
self.tst = tst
self.grad_ok = grad_ok
def make_node(self, *inputs):
outputs = [gof.generic(),gof.generic()]
return gof.Apply(self, inputs, outputs)
def grad(self, inputs, grads):
g0, g1 = grads
if not self.grad_ok:
self.tst.fail()
else:
if g1:
return [g0, g0+g1]
else:
return [g0, g0]
i = gof.generic()
j = gof.generic()
k = gof.generic()
a1 = O(self,True).make_node(i,j)
a2 = O(self,True).make_node(k,a1.outputs[1])
g = _grad_sources_inputs([(a2.outputs[0], 1), (a1.outputs[1],4),
(a1.outputs[0], 3), (a1.outputs[0], 3)], None)
self.assertTrue(g[a2.inputs[0]] == 1)
self.assertTrue(g[a2.inputs[1]] == 5)
self.assertTrue(g[a1.outputs[0]] == 6)
self.assertTrue(g[a1.outputs[1]] == 5)
self.assertTrue(g[a1.inputs[0]] == 6)
self.assertTrue(g[a1.inputs[1]] == 11)
def test_unimplemented_grad_func(): def test_unimplemented_grad_func():
#tests that function compilation catches unimplemented grads in the graph # tests that function compilation catches unimplemented grads in the graph
a = theano.tensor.vector() a = theano.tensor.vector()
b = theano.gradient.grad_not_implemented(theano.tensor.add, 0, a) b = theano.gradient.grad_not_implemented(theano.tensor.add, 0, a)
try: try:
f = theano.function([a], b) f = theano.function([a], b, on_unused_input = 'ignore')
assert 0 assert 0
#Note: it's important that the NotImplementedGradOp is caught except TypeError:
#at COMPILATION time, not execution time.
#If the uncomputable variable is, for example, multiplied by 0,
#it could be optimized out of the final graph.
except NotImplementedError:
pass pass
def test_undefined_grad_func(): def test_undefined_grad_func():
...@@ -271,13 +152,9 @@ def test_undefined_grad_func(): ...@@ -271,13 +152,9 @@ def test_undefined_grad_func():
a = theano.tensor.vector() a = theano.tensor.vector()
b = theano.gradient.grad_undefined(theano.tensor.add, 0, a) b = theano.gradient.grad_undefined(theano.tensor.add, 0, a)
try: try:
f = theano.function([a],b) f = theano.function([a],b, on_unused_input = 'ignore')
assert 0 assert 0
#Note: it's important that the GradUndefinedOp is cauhgt at except TypeError:
#COMPILATION time, not execution time.
#If the uncomputable variable is, for example, multiplied by0,
#it could be optimized out of the final graph
except theano.gradient.GradUndefinedError:
pass pass
def test_unimplemented_grad_grad(): def test_unimplemented_grad_grad():
...@@ -296,7 +173,7 @@ def test_unimplemented_grad_grad(): ...@@ -296,7 +173,7 @@ def test_unimplemented_grad_grad():
try: try:
g = theano.gradient.grad(b,a) g = theano.gradient.grad(b,a)
assert False assert False
except NotImplementedError: except TypeError:
pass pass
def test_undefined_grad_grad(): def test_undefined_grad_grad():
...@@ -314,7 +191,7 @@ def test_undefined_grad_grad(): ...@@ -314,7 +191,7 @@ def test_undefined_grad_grad():
try: try:
g = theano.gradient.grad(Z.sum(),d) g = theano.gradient.grad(Z.sum(),d)
assert False assert False
except theano.gradient.GradUndefinedError: except TypeError:
pass pass
def test_grad_name(): def test_grad_name():
...@@ -325,5 +202,97 @@ def test_grad_name(): ...@@ -325,5 +202,97 @@ def test_grad_name():
g = theano.tensor.grad(f,x) g = theano.tensor.grad(f,x)
assert g.name == '(df/dx)' assert g.name == '(df/dx)'
def test_grad_duplicate_input():
#test that the grad works when a variable
#appears in more than one place in a node's input list
def output(x):
return (x*x)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
theano.tests.unittest_tools.verify_grad(output,[vx])
def test_grad_quadratic():
#test the gradient on a tiny graph
def cost(x,A):
return theano.tensor.dot(x,theano.tensor.dot(A,x))
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(cost,[vx,vA])
def test_grad_quadratic_vector():
#test the gradient on a small graph
def output(x,A):
return theano.tensor.dot(x*x,A)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(output,[vx,vA])
def test_grad_cubic():
#test the gradient on a bigger graph
def cost(x,A):
return theano.tensor.dot(x*x,theano.tensor.dot(A,x))
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(cost,[vx,vA])
def test_grad_grad_quadratic():
#test the gradient on a graph constructed using the gradient
def output(x,A):
orig_cost = theano.tensor.dot(x,theano.tensor.dot(A,x))
return theano.gradient.grad(orig_cost, x)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(output,[vx,vA])
def test_grad_grad_cubic():
#test the gradient on a bigger graph constructed using the gradient
def output(x,A):
orig_cost = theano.tensor.dot(x*x,theano.tensor.dot(A,x))
return theano.gradient.grad(orig_cost, x)
rng = np.random.RandomState([2012,8,28])
vx = rng.randn(2)
vA = rng.randn(2,2)
theano.tests.unittest_tools.verify_grad(output,[vx,vA])
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论