提交 c0c25559 authored 作者: lamblin's avatar lamblin

Merge pull request #910 from goodfeli/int_grad

Consistent & correct handling of integers and gradients -Documentation and implementation of a consistent way of handling gradients and integers -Type checks that ensure the gradient is always floating point and not an integer -Type checks that ensure the gradient of an integer is always undefined or 0 -An upgraded version of connection_pattern that provides theano with enough information to answer questions like "is variable x a function of variable y?" accurately
...@@ -98,34 +98,56 @@ following methods: ...@@ -98,34 +98,56 @@ following methods:
lifetime of self. Op instances should be immutable in this lifetime of self. Op instances should be immutable in this
sense. sense.
.. function:: connection_pattern(): .. function:: connection_pattern( node ):
Optional (but in extremely rare cases needed to have it work with Optional method; sometimes needed for gradient.grad to
{tensor,sparse}.grad). work correctly.
Returns a list of bools the same length as the op's inputs list. Returns a list of list of bools.
True signifies that the elements of an input have an effect on its Op.connection_pattern[input_idx][output_idx] is true if the
output. elements of inputs[input_idx] have an effect on the elements of
outputs[output_idx].
False signifies that they do not--in other words, the op acts only The ``node'' parameter is needed to determine the number of
one the input's metadata such as its shape. inputs. Some ops such as Subtensor take a variable number of
inputs.
If no connection_pattern is implemented, tensor.grad will assume If no connection_pattern is specified, gradient.grad will
it is a list containing only True. assume that all inputs have some elements connected to some
elements of all outputs.
This method conveys two pieces of information that are otherwise
not part of the theano graph:
1) Which of the op's inputs are truly ancestors of each of the
op's outputs. Suppose an op has two inputs, x and y, and
outputs f(x) and g(y). y is not really an ancestor of f, but
it appears to be so in the theano graph.
2) Whether the actual elements of each input/output are relevant
to a computation.
For example, the shape op does not read its input's elements,
only its shape metadata. d shape(x) / dx should thus raise
a disconnected input exception (if these exceptions are
enabled).
As another example, the elements of the Alloc op's outputs
are not affected by the shape arguments to the Alloc op.
Failing to implement this function for an op that needs it can Failing to implement this function for an op that needs it can
result in tensor.grad erroneously reporting that a gradient is result in two types of incorrect behavior:
undefined. Returning 0 for this input in the grad method is not
the same as specifying that the elements of this input are not 1) gradient.grad erroneously raising a TypeError reporting that
connected to the output. If the gradient with respect to the a gradient is undefined.
op's output is NaN but the elements of the input are not connected 2) gradient.grad failing to raise a ValueError reporting that
to it, then the NaN never enters into the expression for the an input is disconnected.
gradient.
Even if connection_pattern is not implemented correctly,
if gradient.grad returns an expression, that expression will
be numerically correct.
.. function:: grad(inputs, output_gradients) .. function:: grad(inputs, output_gradients)
Optional (but needed to have it work with {tensor,sparse}.grad()). Optional (but needed to have it work with gradient.grad()).
If the Op being defined is differentiable, its gradient may be specified If the Op being defined is differentiable, its gradient may be specified
symbolically in this method. Both ``inputs`` and ``output_gradients`` symbolically in this method. Both ``inputs`` and ``output_gradients``
...@@ -217,6 +239,70 @@ following methods: ...@@ -217,6 +239,70 @@ following methods:
Both the partial differentiation and the multiplication have to be performed by Both the partial differentiation and the multiplication have to be performed by
:func:`grad`. :func:`grad`.
Theano currently imposes the following constraints on the values returned by the grad method:
1) They must be Variable instances.
2) When they are types that have dtypes, they must never have an integer dtype.
Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
NullType or zero gradient. When you have an integer as an argument to your grad method,
recall the definition of a derivative to help you decide what value to return:
:math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
Suppose your function f has an integer-valued output. For most functions you're likely
to implement in theano, this means your gradient should be zero, because f(x+epsilon)
= f(x) for almost all x. (The only other option is that the gradient could be undefined,
if your function is discontinuous everywhere, like the rational indicator function)
Suppose your function f has an integer-valued input. This is a little trickier, because
you need to think about what you mean mathematically when you make a variable integer-valued
in theano. Most of the time in machine learning we mean "f is a function of a real-valued
x, but we are only going to pass in integer-values of x". In this case, f(x+epsilon) exists,
so the gradient through f should be the same whether x is an integer or a floating point
variable. Sometimes what we mean is "f is a function of an integer-valued x, and f is only
defined where x is an integer." Since f(x+epsilon) doesn't exist, the gradient is undefined.
Finally, many times in theano, integer valued inputs don't actually affect the elements of
the output, only its shape.
If your function f has both an integer-valued input and an
integer-valued output, then both rules have to be combined:
- If f is defined at (x+epsilon), then the input gradient is
defined. Since f(x+epsilon) would be equal to f(x) almost
everywhere, the gradient should be 0 (first rule).
- If f is only defined where x is an integer, then the gradient
is undefined, regardless of what the gradient with respect to the
output is.
Examples:
1) f(x,y) = dot product between x and y. x and y are integers.
Since the output is also an integer, f is a step function.
Its gradient is zero almost everywhere, so Op.grad should return
zeros in the shape of x and y.
2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
In this case the output is floating point. It doesn't matter that y is an integer.
We consider f to still be defined at f(x,y+epsilon). The gradient is exactly the
same as if y were floating point.
3) f(x,y) = argmax of x along axis y.
The gradient with respect to y is undefined, because f(x,y) is not defined for
floating point y. How could you take an argmax along a fraActional axis?
The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
everywhere.
4) f(x,y) = a vector with y elements, each of which taking on the value x
The grad method should return DisconnectedType()() for y, because the elements of
f don't depend on y. Only the shape of f depends on y. You probably also want to
implement a connection_pattern method to encode this.
5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
gradient with respect to y will be 0.5, even if y is an
integer. However, the gradient with respect to x will be 0,
because the output of f is integer-valued.
.. function:: infer_shape(node, shapes) .. function:: infer_shape(node, shapes)
Optional. Optional.
......
...@@ -29,3 +29,9 @@ class NullType(Type): ...@@ -29,3 +29,9 @@ class NullType(Type):
def values_eq(a, b, force_same_dtype=True): def values_eq(a, b, force_same_dtype=True):
raise ValueError("NullType has no values to compare") raise ValueError("NullType has no values to compare")
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self, other):
return hash(type(self))
差异被折叠。
...@@ -4,8 +4,8 @@ linkers). It resembles the if clause of any programming language, that ...@@ -4,8 +4,8 @@ linkers). It resembles the if clause of any programming language, that
has a `then` and `else` branch, and executes either one or the other has a `then` and `else` branch, and executes either one or the other
according to the condition provided. according to the condition provided.
This op contrast the already existent `switch` op, that will evaluate both This op differs from the already existent `switch` op, that evaluates both
branches of the clause and afterwards pick (according to the condition) branches of the clause and afterwards picks (according to the condition)
which value to report. Note also that `switch` is an elemwise operation (so which value to report. Note also that `switch` is an elemwise operation (so
it picks each entry of a matrix according to the condition) while `ifelse` it picks each entry of a matrix according to the condition) while `ifelse`
is a global operation with a scalar condition. is a global operation with a scalar condition.
...@@ -60,7 +60,7 @@ class IfElse(PureOp): ...@@ -60,7 +60,7 @@ class IfElse(PureOp):
:note: :note:
Other Linkers then CVM and VM are INCOMPATIBLE with this Op, and Other Linkers then CVM and VM are INCOMPATIBLE with this Op, and
will ingnore its lazy characteristic, computing both the True and will ignore its lazy characteristic, computing both the True and
False branch before picking one. False branch before picking one.
""" """
...@@ -212,7 +212,14 @@ class IfElse(PureOp): ...@@ -212,7 +212,14 @@ class IfElse(PureOp):
for t in ts]) for t in ts])
if_false = ([ins[0]] + [theano.tensor.zeros_like(f) if_false = ([ins[0]] + [theano.tensor.zeros_like(f)
for f in fs] + grads) for f in fs] + grads)
return ([None] +
condition = ins[0]
# condition does affect the elements of the output so it is connected.
# For the sake of making the gradient convenient we assume that
# condition + epsilon always triggers the same branch as condition
condition_grad = condition.zeros_like().astype(theano.config.floatX)
return ([condition_grad] +
if_true_op.make_node(*if_true).outputs + if_true_op.make_node(*if_true).outputs +
if_false_op.make_node(*if_false).outputs) if_false_op.make_node(*if_false).outputs)
......
# Skip test if cuda_ndarray is not available. # Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest from nose.plugins.skip import SkipTest
import numpy
import theano
import theano.sandbox.cuda as cuda_ndarray import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.cuda_available == False: if cuda_ndarray.cuda_available == False:
......
...@@ -2,10 +2,10 @@ ...@@ -2,10 +2,10 @@
TODO: implement Images2Neibs.{perform,infer_shape}() methods TODO: implement Images2Neibs.{perform,infer_shape}() methods
""" """
import theano
from theano import Op, Apply from theano import Op, Apply
import theano.tensor as T import theano.tensor as T
from theano.gradient import grad_not_implemented from theano.gradient import grad_not_implemented
from theano.gradient import grad_undefined
class Images2Neibs(Op): class Images2Neibs(Op):
...@@ -59,7 +59,8 @@ class Images2Neibs(Op): ...@@ -59,7 +59,8 @@ class Images2Neibs(Op):
for j in xrange(list 2 dim) for j in xrange(list 2 dim)
for k in <image column coordinates> for k in <image column coordinates>
for l in <image row coordinates> for l in <image row coordinates>
output[idx,:] = flattened version of ten4[i,j,l:l+r,k:k+c] output[idx,:]
= flattened version of ten4[i,j,l:l+r,k:k+c]
idx += 1 idx += 1
(note: the op isn't necessarily implemented internally with these (note: the op isn't necessarily implemented internally with these
for loops, they're just the easiest way to describe the output pattern) for loops, they're just the easiest way to describe the output pattern)
...@@ -90,8 +91,11 @@ class Images2Neibs(Op): ...@@ -90,8 +91,11 @@ class Images2Neibs(Op):
(hasattr(neib_shape, "equals") and (hasattr(neib_shape, "equals") and
neib_shape.equals(neib_step))): neib_shape.equals(neib_step))):
return [neibs2images(gz, neib_shape, x.shape, mode=self.mode), return [neibs2images(gz, neib_shape, x.shape, mode=self.mode),
None, None] grad_undefined(self, 1, neib_shape),
return [grad_not_implemented(self, 0, x), None, None] grad_undefined(self, 2, neib_step)]
return [grad_not_implemented(self, 0, x),
grad_undefined(self, 1, neib_shape),
grad_undefined(self, 2, neib_step)]
def c_code_cache_version(self): def c_code_cache_version(self):
return (5,) return (5,)
...@@ -307,5 +311,3 @@ def neibs2images(neibs, neib_shape, original_shape, mode='valid'): ...@@ -307,5 +311,3 @@ def neibs2images(neibs, neib_shape, original_shape, mode='valid'):
raise NotImplementedError("neibs2images do not support mode=%s" % mode) raise NotImplementedError("neibs2images do not support mode=%s" % mode)
return output_4d return output_4d
差异被折叠。
...@@ -260,12 +260,16 @@ class Scan(PureOp): ...@@ -260,12 +260,16 @@ class Scan(PureOp):
zip(self.inner_seqs(self.inputs), zip(self.inner_seqs(self.inputs),
self.outer_seqs(inputs))): self.outer_seqs(inputs))):
if inner_seq.type.dtype != outer_seq[idx].type.dtype: if inner_seq.type.dtype != outer_seq[idx].type.dtype:
assert isinstance(idx, int)
raise ValueError(err_msg1 % ('sequence', raise ValueError(err_msg1 % ('sequence',
str(outer_seq), str(outer_seq),
idx, idx,
outer_seq.type.dtype, outer_seq.type.dtype,
outer_seq.ndim,
str(inner_seq), str(inner_seq),
inner_seq.type.dtype)) inner_seq.type.dtype,
inner_seq.ndim))
argoffset += len(self.outer_seqs(inputs)) argoffset += len(self.outer_seqs(inputs))
# Check that this 3 things have the same dtype for mit_mot: # Check that this 3 things have the same dtype for mit_mot:
# - initial state of the output # - initial state of the output
...@@ -1260,7 +1264,7 @@ class Scan(PureOp): ...@@ -1260,7 +1264,7 @@ class Scan(PureOp):
# the gradients with respect to all outputs) # the gradients with respect to all outputs)
def compute_gradient(y, g_y): def compute_gradient(y, g_y):
gmp = gradient.grad_sources_inputs( gmp = gradient.grad_sources_inputs(
[(y, g_y)], diff_inputs, False) [(y, g_y)], diff_inputs)
return [gmp.get(p, None) for p in diff_inputs] return [gmp.get(p, None) for p in diff_inputs]
# 6. clean the outputs (i.e. remove update rules) # 6. clean the outputs (i.e. remove update rules)
...@@ -1301,7 +1305,13 @@ class Scan(PureOp): ...@@ -1301,7 +1305,13 @@ class Scan(PureOp):
# 7.3. compute gradients of the inputs given one output # 7.3. compute gradients of the inputs given one output
for dx, out in enumerate(clean_outputs): for dx, out in enumerate(clean_outputs):
inner_g_out = safe_new(out) if g_outs[dx] != None:
inner_g_out = safe_new(g_outs[dx][0])
else:
# We do not have a gradient on this output so we need a
# placeholder, which for now has the same dtype as the
# output
inner_g_out = safe_new(out)
### ###
#### I need to clip the gradient HERE !! #### I need to clip the gradient HERE !!
......
...@@ -18,6 +18,7 @@ from theano.gof.python25 import all ...@@ -18,6 +18,7 @@ from theano.gof.python25 import all
from theano.gradient import DisconnectedType from theano.gradient import DisconnectedType
from theano.sparse.utils import hash_from_sparse from theano.sparse.utils import hash_from_sparse
import theano.tests.unittest_tools as utt import theano.tests.unittest_tools as utt
from theano.gradient import grad_not_implemented
sparse_formats = ['csc', 'csr'] sparse_formats = ['csc', 'csr']
...@@ -255,11 +256,13 @@ def sp_zeros_like(x): ...@@ -255,11 +256,13 @@ def sp_zeros_like(x):
:return: The same as `x` with zero entries :return: The same as `x` with zero entries
for all element. for all element.
""" """
# TODO: don't restrict to CSM formats # TODO: don't restrict to CSM formats
_, _, indptr, shape = csm_properties(x) _, _, indptr, shape = csm_properties(x)
return CSM(format=x.format)(numpy.array([], dtype=x.type.dtype), return CSM(format=x.format)(data=numpy.array([], dtype=x.type.dtype),
numpy.array([]), tensor.zeros_like(indptr), indices=numpy.array([]),
shape) indptr=tensor.zeros_like(indptr),
shape=shape)
class _sparse_py_operators: class _sparse_py_operators:
...@@ -670,7 +673,7 @@ class CSM(gof.Op): ...@@ -670,7 +673,7 @@ class CSM(gof.Op):
the sparse matrix. Fancy indexing with numpy.ndarray the sparse matrix. Fancy indexing with numpy.ndarray
should be used for this purpose. should be used for this purpose.
:param data: One dimensionnal tensor representing :param data: One dimensional tensor representing
the data of the sparse to construct. the data of the sparse to construct.
:param indices: One dimensional tensor of integers :param indices: One dimensional tensor of integers
representing the indices of the sparse representing the indices of the sparse
...@@ -678,7 +681,7 @@ class CSM(gof.Op): ...@@ -678,7 +681,7 @@ class CSM(gof.Op):
:param indptr: One dimensional tensor of integers :param indptr: One dimensional tensor of integers
representing the indice pointer for representing the indice pointer for
the sparse matrix to construct. the sparse matrix to construct.
:param shape: One dimensionnal tensor of integers :param shape: One dimensional tensor of integers
representing the shape of the sparse representing the shape of the sparse
matrix to construct. matrix to construct.
...@@ -782,6 +785,9 @@ class CSM(gof.Op): ...@@ -782,6 +785,9 @@ class CSM(gof.Op):
indptr.copy()), shape.copy(), indptr.copy()), shape.copy(),
copy=False) copy=False)
def connection_pattern(self, node):
return [[True], [False], [False], [False]]
def grad(self, (x_data, x_indices, x_indptr, x_shape), (g_out,)): def grad(self, (x_data, x_indices, x_indptr, x_shape), (g_out,)):
g_data, g_indices, g_indptr, g_shape = csm_properties(g_out) g_data, g_indices, g_indptr, g_shape = csm_properties(g_out)
# unpack the data vector and wrap it as a 1d TensorType # unpack the data vector and wrap it as a 1d TensorType
...@@ -984,7 +990,19 @@ class DenseFromSparse(gof.op.Op): ...@@ -984,7 +990,19 @@ class DenseFromSparse(gof.op.Op):
def grad(self, (x, ), (gz, )): def grad(self, (x, ), (gz, )):
if self.sparse_grad: if self.sparse_grad:
return [sp_ones_like(x) * gz] left = sp_ones_like(x)
right = gz
# Do upcasting if necessary to avoid an unimplemented case
# of mul
if right.dtype == 'float64' and left.dtype == 'float32':
left = left.astype('float64')
if right.dtype == 'float32' and left.dtype == 'float64':
right = right.astype('float64')
return [left * right]
else: else:
return [SparseFromDense(x.type.format)(gz)] return [SparseFromDense(x.type.format)(gz)]
...@@ -1993,7 +2011,9 @@ class MulSS(gof.op.Op): ...@@ -1993,7 +2011,9 @@ class MulSS(gof.op.Op):
def make_node(self, x, y): def make_node(self, x, y):
x, y = as_sparse_variable(x), as_sparse_variable(y) x, y = as_sparse_variable(x), as_sparse_variable(y)
if x.type != y.type: if x.type != y.type:
raise NotImplementedError() raise NotImplementedError(
"MulSS not supported for differing types. "
"Got %s and %s." % (str(x.type), str(y.type)))
return gof.Apply(self, [x, y], [x.type()]) return gof.Apply(self, [x, y], [x.type()])
def perform(self, node, (x, y), (out, )): def perform(self, node, (x, y), (out, )):
...@@ -2042,7 +2062,9 @@ class MulSD(gof.op.Op): ...@@ -2042,7 +2062,9 @@ class MulSD(gof.op.Op):
y = tensor.cast(y, dtype) y = tensor.cast(y, dtype)
if x.type.dtype != y.type.dtype: if x.type.dtype != y.type.dtype:
raise NotImplementedError() raise NotImplementedError(
"MulSD not implemented for different input dtypes. "
"Got %s and %s." % (x.type.dtype, y.type.dtype))
# The magic number two here arises because L{scipy.sparse} # The magic number two here arises because L{scipy.sparse}
# objects must be matrices (have dimension 2) # objects must be matrices (have dimension 2)
# Broadcasting of the sparse matrix is not supported. # Broadcasting of the sparse matrix is not supported.
...@@ -2128,7 +2150,9 @@ class MulSV(gof.op.Op): ...@@ -2128,7 +2150,9 @@ class MulSV(gof.op.Op):
assert y.type.ndim == 1 assert y.type.ndim == 1
if x.type.dtype != y.type.dtype: if x.type.dtype != y.type.dtype:
raise NotImplementedError() raise NotImplementedError(
"MulSV not implemented for differing dtypes."
"Got %s and %s." % (str(x.type.dtype), str(y.type.dtype)))
return gof.Apply(self, return gof.Apply(self,
[x, y], [x, y],
[SparseType(dtype=x.type.dtype, [SparseType(dtype=x.type.dtype,
...@@ -2142,6 +2166,15 @@ class MulSV(gof.op.Op): ...@@ -2142,6 +2166,15 @@ class MulSV(gof.op.Op):
def grad(self, (x, y), (gz,)): def grad(self, (x, y), (gz,)):
assert _is_sparse_variable(x) and _is_dense_variable(y) assert _is_sparse_variable(x) and _is_dense_variable(y)
assert _is_sparse_variable(gz) assert _is_sparse_variable(gz)
# mul_s_v is not implemented if the types vary
if gz.dtype == 'float64' and y.dtype == 'float32':
y = y.astype('float64')
if gz.dtype == 'float32' and y.dtype == 'float64':
gz = gz.astype('float64')
return mul_s_v(gz, y), sp_sum(x * gz, axis=0, sparse_grad=True) return mul_s_v(gz, y), sp_sum(x * gz, axis=0, sparse_grad=True)
def infer_shape(self, node, ins_shapes): def infer_shape(self, node, ins_shapes):
...@@ -2176,8 +2209,18 @@ def mul(x, y): ...@@ -2176,8 +2209,18 @@ def mul(x, y):
assert x_is_sparse_variable or y_is_sparse_variable assert x_is_sparse_variable or y_is_sparse_variable
if x_is_sparse_variable and y_is_sparse_variable: if x_is_sparse_variable and y_is_sparse_variable:
# mul_s_s is not implemented if the types differ
if y.dtype == 'float64' and x.dtype == 'float32':
x = x.astype('float64')
return mul_s_s(x, y) return mul_s_s(x, y)
elif x_is_sparse_variable and not y_is_sparse_variable: elif x_is_sparse_variable and not y_is_sparse_variable:
# mul is unimplemented if the dtypes differ
if y.dtype == 'float64' and x.dtype == 'float32':
x = x.astype('float64')
return mul_s_d(x, y) return mul_s_d(x, y)
elif y_is_sparse_variable and not x_is_sparse_variable: elif y_is_sparse_variable and not x_is_sparse_variable:
return mul_s_d(y, x) return mul_s_d(y, x)
...@@ -3260,7 +3303,7 @@ class SamplingDot(gof.op.Op): ...@@ -3260,7 +3303,7 @@ class SamplingDot(gof.op.Op):
rval = [ rval = [
dot(p * gz, y), dot(p * gz, y),
dot((p * gz).T, x), dot((p * gz).T, x),
None grad_not_implemented(self, 2, p)
] ]
return rval return rval
......
差异被折叠。
...@@ -14,6 +14,7 @@ from theano.scalar import Scalar ...@@ -14,6 +14,7 @@ from theano.scalar import Scalar
from theano.printing import min_informative_str, pprint from theano.printing import min_informative_str, pprint
from theano.gof.python25 import all, any from theano.gof.python25 import all, any
from theano.tensor.utils import hash_from_dict from theano.tensor.utils import hash_from_dict
from theano.gradient import DisconnectedType
config = theano.config config = theano.config
...@@ -277,7 +278,8 @@ class DimShuffle(Op): ...@@ -277,7 +278,8 @@ class DimShuffle(Op):
#get the copy / view of the input depending on whether we're doingi #get the copy / view of the input depending on whether we're doingi
# things inplace or not. # things inplace or not.
if self.inplace: if self.inplace:
get_base = ['{ PyArrayObject * %(basename)s = %(input)s', 'Py_INCREF((PyObject*)%(basename)s)'] get_base = [
'{ PyArrayObject * %(basename)s = %(input)s', 'Py_INCREF((PyObject*)%(basename)s)']
else: else:
get_base = [('{ PyArrayObject * %(basename)s = (PyArrayObject*)PyArray_FromAny((PyObject*)%(input)s, NULL,' get_base = [('{ PyArrayObject * %(basename)s = (PyArrayObject*)PyArray_FromAny((PyObject*)%(input)s, NULL,'
'0, 0, NPY_ALIGNED|NPY_ENSURECOPY, NULL)')] '0, 0, NPY_ALIGNED|NPY_ENSURECOPY, NULL)')]
...@@ -285,7 +287,8 @@ class DimShuffle(Op): ...@@ -285,7 +287,8 @@ class DimShuffle(Op):
shape_statements = ['npy_intp dimensions[%i]' % nd_out] shape_statements = ['npy_intp dimensions[%i]' % nd_out]
for i, o in enumerate(self.new_order): for i, o in enumerate(self.new_order):
if o != 'x': if o != 'x':
shape_statements += [('dimensions[' + str(i) + '] = %(basename)s->dimensions[' + str(o) + ']')] shape_statements += [('dimensions[' + str(
i) + '] = %(basename)s->dimensions[' + str(o) + ']')]
else: else:
shape_statements += [('dimensions[' + str(i) + '] = 1')] shape_statements += [('dimensions[' + str(i) + '] = 1')]
...@@ -294,7 +297,8 @@ class DimShuffle(Op): ...@@ -294,7 +297,8 @@ class DimShuffle(Op):
#set the strides of the non-broadcasted dimensions #set the strides of the non-broadcasted dimensions
for i, o in enumerate(self.new_order): for i, o in enumerate(self.new_order):
if o != 'x': if o != 'x':
strides_statements += [('strides[' + str(i) + '] = %(basename)s->strides[' + str(o) + ']')] strides_statements += [('strides[' + str(i)
+ '] = %(basename)s->strides[' + str(o) + ']')]
else: else:
strides_statements += [('strides[' + str(i) + '] = 0')] strides_statements += [('strides[' + str(i) + '] = 0')]
...@@ -310,7 +314,8 @@ class DimShuffle(Op): ...@@ -310,7 +314,8 @@ class DimShuffle(Op):
'-1] = %(basename)s->descr->elsize' '-1] = %(basename)s->descr->elsize'
) )
for i in xrange(nd_out - 2, -1, -1): for i in xrange(nd_out - 2, -1, -1):
strides_statements.append("if (strides[%(i)s] == 0) strides[%(i)s] = strides[%(i)s+1] * dimensions[%(i)s+1]" % dict(i=str(i))) strides_statements.append(
"if (strides[%(i)s] == 0) strides[%(i)s] = strides[%(i)s+1] * dimensions[%(i)s+1]" % dict(i=str(i)))
# #
# PyObject* PyArray_New(PyTypeObject* subtype, int nd, npy_intp* dims, int type_num, # PyObject* PyArray_New(PyTypeObject* subtype, int nd, npy_intp* dims, int type_num,
...@@ -605,7 +610,8 @@ class Elemwise(Op): ...@@ -605,7 +610,8 @@ class Elemwise(Op):
# the right thing to do .. have to talk to Ian and James # the right thing to do .. have to talk to Ian and James
# about it # about it
if bgrads[jdx] is None: if bgrads[jdx] is None or \
isinstance(bgrads[jdx].type, DisconnectedType):
pass pass
elif eval_point is not None: elif eval_point is not None:
if rop_out is None: if rop_out is None:
...@@ -617,6 +623,13 @@ class Elemwise(Op): ...@@ -617,6 +623,13 @@ class Elemwise(Op):
return rval return rval
def connection_pattern(self, node):
if hasattr(self.scalar_op, 'connection_pattern'):
return self.scalar_op.connection_pattern(node)
return [[True for output in node.outputs] for ipt in node.inputs]
def grad(self, inputs, ograds): def grad(self, inputs, ograds):
#compute grad with respect to broadcasted input #compute grad with respect to broadcasted input
...@@ -676,10 +689,16 @@ class Elemwise(Op): ...@@ -676,10 +689,16 @@ class Elemwise(Op):
theano.config.compute_test_value = prev_setting theano.config.compute_test_value = prev_setting
if not isinstance(scalar_igrads, (list, tuple)):
raise TypeError('%s.grad returned %s instead of list or tuple' %
(str(self.scalar_op), str(type(scalar_igrads))))
nd = len(inputs[0].type.broadcastable) # this is the same for everyone nd = len(inputs[0].type.broadcastable) # this is the same for everyone
def transform(r): def transform(r):
# From a graph of ScalarOps, make a graph of Broadcast ops. # From a graph of ScalarOps, make a graph of Broadcast ops.
if isinstance(r.type, DisconnectedType):
return r
if r in scalar_inputs: if r in scalar_inputs:
return inputs[scalar_inputs.index(r)] return inputs[scalar_inputs.index(r)]
if r in scalar_ograds: if r in scalar_ograds:
...@@ -803,7 +822,7 @@ class Elemwise(Op): ...@@ -803,7 +822,7 @@ class Elemwise(Op):
errormsg = ('While computing ' + str(node.outputs) + errormsg = ('While computing ' + str(node.outputs) +
': Failed calling ufunc for op ' + ': Failed calling ufunc for op ' +
str(self.scalar_op) + str(self.scalar_op) +
'for params of shape ' + ' for params of shape ' +
str([arg.shape for arg in ufunc_args])) str([arg.shape for arg in ufunc_args]))
if config.exception_verbosity == 'high': if config.exception_verbosity == 'high':
...@@ -1324,7 +1343,8 @@ class CAReduce(Op): ...@@ -1324,7 +1343,8 @@ class CAReduce(Op):
alloc += """ alloc += """
for(int i=0;i<%(iname)s->nd;i++){ for(int i=0;i<%(iname)s->nd;i++){
if(PyArray_DIMS(%(iname)s)[i]==0 && tosum[i]){ if(PyArray_DIMS(%(iname)s)[i]==0 && tosum[i]){
PyErr_Format(PyExc_ValueError, "Input of CAReduce{%(scal_name)s} has zero-size on axis %%d",i); PyErr_Format(PyExc_ValueError,
"Input of CAReduce{%(scal_name)s} has zero-size on axis %%d",i);
%(fail)s; %(fail)s;
} }
} }
...@@ -1585,6 +1605,12 @@ class Sum(CAReduceDtype): ...@@ -1585,6 +1605,12 @@ class Sum(CAReduceDtype):
def grad(self, inp, grads): def grad(self, inp, grads):
x, = inp x, = inp
out = self(*inp)
if out.dtype.find('int') != -1:
return [x.zeros_like().astype(theano.config.floatX)]
gz, = grads gz, = grads
gz = as_tensor_variable(gz) gz = as_tensor_variable(gz)
axis = self.axis axis = self.axis
...@@ -1601,7 +1627,7 @@ class Sum(CAReduceDtype): ...@@ -1601,7 +1627,7 @@ class Sum(CAReduceDtype):
new_dims.append(i) new_dims.append(i)
i += 1 i += 1
ds_op = DimShuffle(gz.type.broadcastable, new_dims) ds_op = DimShuffle(gz.type.broadcastable, new_dims)
gx = Elemwise(scalar.second)(x, ds_op(gz).astype(x.dtype)) gx = Elemwise(scalar.second)(x, ds_op(gz))
return [gx] return [gx]
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
...@@ -1646,7 +1672,7 @@ class Prod(CAReduceDtype): ...@@ -1646,7 +1672,7 @@ class Prod(CAReduceDtype):
def grad(self, inp, grads): def grad(self, inp, grads):
''' '''
The grad of this Op could be very easy, it is was not for the case The grad of this Op could be very easy, if it is was not for the case
where zeros are present in a given "group" (ie. elements reduced where zeros are present in a given "group" (ie. elements reduced
together to form the product). together to form the product).
...@@ -1692,8 +1718,11 @@ class Prod(CAReduceDtype): ...@@ -1692,8 +1718,11 @@ class Prod(CAReduceDtype):
''' '''
prod_in, = inp prod_in, = inp
gz, = grads gz, = grads
if prod_in.dtype[0:3] in ('int', 'uin'):
return [None] out = self(*inp)
if out.dtype[0:3] in ('int', 'uin'):
return [prod_in.zeros_like().astype(theano.config.floatX)]
# Prepare the broadcasting that is used everywhere to broadcast # Prepare the broadcasting that is used everywhere to broadcast
# over the original groups (ie. broadcast over the elements of a given # over the original groups (ie. broadcast over the elements of a given
......
...@@ -5,6 +5,7 @@ import theano ...@@ -5,6 +5,7 @@ import theano
import basic import basic
from theano import gof, scalar from theano import gof, scalar
import basic as tensor import basic as tensor
from theano.gradient import DisconnectedType
class DiffOp(theano.Op): class DiffOp(theano.Op):
...@@ -148,7 +149,13 @@ class BinCountOp(theano.Op): ...@@ -148,7 +149,13 @@ class BinCountOp(theano.Op):
z[0] = np.bincount(x, weights=weights, minlength=self.minlength) z[0] = np.bincount(x, weights=weights, minlength=self.minlength)
def grad(self, inputs, outputs_gradients): def grad(self, inputs, outputs_gradients):
return [None for i in inputs] output = self(*inputs)
if output.dtype.find('int') != -1:
return [inp.zeros_like().astype(theano.config.floatX)
for inp in inputs]
raise NotImplementedError()
def infer_shape(self, node, ins_shapes): def infer_shape(self, node, ins_shapes):
x = node.inputs[0] x = node.inputs[0]
...@@ -252,6 +259,10 @@ class RepeatOp(theano.Op): ...@@ -252,6 +259,10 @@ class RepeatOp(theano.Op):
z = output_storage[0] z = output_storage[0]
z[0] = np.repeat(x, repeats=repeats, axis=self.axis) z[0] = np.repeat(x, repeats=repeats, axis=self.axis)
def connection_pattern(self, node):
return [[True], [False]]
def grad(self, (x, repeats), (gz, )): def grad(self, (x, repeats), (gz, )):
if repeats.ndim == 0: if repeats.ndim == 0:
if self.axis is None: if self.axis is None:
...@@ -265,7 +276,8 @@ class RepeatOp(theano.Op): ...@@ -265,7 +276,8 @@ class RepeatOp(theano.Op):
shape = [x.shape[k] for k in range(x.ndim)] shape = [x.shape[k] for k in range(x.ndim)]
shape.insert(axis, repeats) shape.insert(axis, repeats)
return [gz.reshape(shape, x.ndim + 1).sum(axis=axis), None] return [gz.reshape(shape, x.ndim + 1).sum(axis=axis),
DisconnectedType()()]
elif repeats.ndim == 1: elif repeats.ndim == 1:
# For this implementation, we would need to specify the length # For this implementation, we would need to specify the length
# of repeats in order to split gz in the right way to sum # of repeats in order to split gz in the right way to sum
...@@ -387,7 +399,6 @@ def bartlett(M): ...@@ -387,7 +399,6 @@ def bartlett(M):
return bartlett_(M) return bartlett_(M)
class FillDiagonal(gof.Op): class FillDiagonal(gof.Op):
# See function fill_diagonal for docstring # See function fill_diagonal for docstring
def __eq__(self, other): def __eq__(self, other):
......
...@@ -2,6 +2,8 @@ import theano ...@@ -2,6 +2,8 @@ import theano
from theano.tensor import basic as T from theano.tensor import basic as T
from theano.misc import strutil from theano.misc import strutil
import numpy as N import numpy as N
from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
#TODO: speed up by reordering loops. Should pass through the videos once, incrementing all weight gradients, rather #TODO: speed up by reordering loops. Should pass through the videos once, incrementing all weight gradients, rather
...@@ -9,7 +11,7 @@ import numpy as N ...@@ -9,7 +11,7 @@ import numpy as N
class ConvGrad3D(theano.Op): class ConvGrad3D(theano.Op):
""" Gradient of Conv3D with respect to W """ """ Gradient of Conv3D with respect to W """
def __eq__(self,other): def __eq__(self, other):
return type(self) == type(other) return type(self) == type(other)
def __hash__(self): def __hash__(self):
...@@ -27,20 +29,26 @@ class ConvGrad3D(theano.Op): ...@@ -27,20 +29,26 @@ class ConvGrad3D(theano.Op):
return theano.Apply(self, inputs=[V_, d_, WShape_, dCdH_], outputs = [ T.TensorType(V_.dtype, (False,False,False,False,False))() ] ) return theano.Apply(self, inputs=[V_, d_, WShape_, dCdH_], outputs = [ T.TensorType(V_.dtype, (False,False,False,False,False))() ] )
def infer_shape(self, node, input_shapes): def infer_shape(self, node, input_shapes):
V,d,W_shape, dCdH = node.inputs V, d, W_shape, dCdH = node.inputs
return [ ( W_shape[0], W_shape[1], W_shape[2], W_shape[3], W_shape[4] ) ] return [ ( W_shape[0], W_shape[1], W_shape[2], W_shape[3], W_shape[4] ) ]
def grad(self,inputs, output_gradients): def connection_pattern(self, node):
C,d, WShape, B = inputs
dLdA ,= output_gradients
z = T.zeros_like(C[0,0,0,0,:]) return [[True], [True], [False], [True]]
dLdC = convTransp3D( dLdA, z, d, B, C.shape[1:4])
dLdd = None #not differentiable, since d is not continuous
dLdWShape = None #not differentiable, since d is not continuous
dLdB = conv3D( C, dLdA, T.zeros_like(B[0,0,0,0,:]), d)
return [ dLdC, dLdd, dLdWShape, dLdB ] def grad(self, inputs, output_gradients):
C, d, WShape, B = inputs
dLdA, = output_gradients
z = T.zeros_like(C[0, 0, 0, 0, :])
dLdC = convTransp3D(dLdA, z, d, B, C.shape[1:4])
# d actually does affect the outputs, so it's not disconnected
dLdd = grad_undefined(self, 1, d)
# The shape of the weights doesn't affect the output elements
dLdWShape = DisconnectedType()()
dLdB = conv3D(C, dLdA, T.zeros_like(B[0, 0, 0, 0, :]), d)
return [dLdC, dLdd, dLdWShape, dLdB]
def perform(self, node, inputs, output_storage): def perform(self, node, inputs, output_storage):
V, d, WShape, dCdH = inputs V, d, WShape, dCdH = inputs
...@@ -64,17 +72,15 @@ class ConvGrad3D(theano.Op): ...@@ -64,17 +72,15 @@ class ConvGrad3D(theano.Op):
#print 'computing output of shape '+str(WShape) #print 'computing output of shape '+str(WShape)
for k in xrange(0, WShape[1]):
for l in xrange(0, WShape[2]):
for k in xrange(0,WShape[1]): for m in xrange(0, WShape[3]):
for l in xrange(0,WShape[2]): for i in xrange(0, batchSize):
for m in xrange(0,WShape[3]): for p in xrange(0, outputHeight):
for i in xrange(0,batchSize): for q in xrange(0, outputWidth):
for p in xrange(0,outputHeight): for r in xrange(0, outputDur):
for q in xrange(0,outputWidth): for j in xrange(0, WShape[0]):
for r in xrange(0,outputDur): for z in xrange(0, WShape[4]):
for j in xrange(0,WShape[0]):
for z in xrange(0,WShape[4]):
dCdW[j,k,l,m,z] += dCdH[i,p,q,r,j] * V[i,dr*p+k,dc*q+l,dt*r+m,z] dCdW[j,k,l,m,z] += dCdH[i,p,q,r,j] * V[i,dr*p+k,dc*q+l,dt*r+m,z]
output_storage[0][0] = dCdW output_storage[0][0] = dCdW
...@@ -89,7 +95,7 @@ class ConvGrad3D(theano.Op): ...@@ -89,7 +95,7 @@ class ConvGrad3D(theano.Op):
dCdW = outputs[0] dCdW = outputs[0]
codeSource = """ codeSource = """
///////////// < code generated by ConvGradW3D > ///////////// < code generated by ConvGradW3D >
//printf("\t\t\t\tConvGradW3D c code\\n"); //printf("\t\t\t\tConvGradW3D c code\\n");
...@@ -269,7 +275,7 @@ class ConvGrad3D(theano.Op): ...@@ -269,7 +275,7 @@ class ConvGrad3D(theano.Op):
///////////// < /code generated by ConvGradW3D > ///////////// < /code generated by ConvGradW3D >
""" """
return strutil.renderString(codeSource,locals()) return strutil.renderString(codeSource, locals())
convGrad3D = ConvGrad3D() convGrad3D = ConvGrad3D()
......
...@@ -2,10 +2,13 @@ import numpy as N ...@@ -2,10 +2,13 @@ import numpy as N
from theano.tensor import basic as T from theano.tensor import basic as T
from theano.misc import strutil from theano.misc import strutil
import theano import theano
from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
class ConvTransp3D(theano.Op): class ConvTransp3D(theano.Op):
""" "Transpose" of Conv3D (Conv3D implements multiplication by an implicitly defined matrix W. This implements multiplication by its transpose) """ """ "Transpose" of Conv3D (Conv3D implements multiplication by an implicitly defined matrix W. This implements multiplication by its transpose) """
def __eq__(self,other): def __eq__(self, other):
return type(self) == type(other) return type(self) == type(other)
def __hash__(self): def __hash__(self):
...@@ -14,7 +17,7 @@ class ConvTransp3D(theano.Op): ...@@ -14,7 +17,7 @@ class ConvTransp3D(theano.Op):
def c_code_cache_version(self): def c_code_cache_version(self):
return (3,) return (3,)
def make_node(self, W, b, d, H, RShape = None): def make_node(self, W, b, d, H, RShape=None):
""" """
:param W: Weights, filter :param W: Weights, filter
:param b: bias, shape == (W.shape[0],) :param b: bias, shape == (W.shape[0],)
...@@ -28,7 +31,7 @@ class ConvTransp3D(theano.Op): ...@@ -28,7 +31,7 @@ class ConvTransp3D(theano.Op):
if RShape: if RShape:
RShape_ = T.as_tensor_variable(RShape) RShape_ = T.as_tensor_variable(RShape)
else: else:
RShape_ = T.as_tensor_variable([-1,-1,-1]) RShape_ = T.as_tensor_variable([-1, -1, -1])
return theano.Apply(self, inputs=[W_,b_,d_,H_, RShape_], outputs = [ T.TensorType(H_.dtype, (False,False,False,False,False))() ] ) return theano.Apply(self, inputs=[W_,b_,d_,H_, RShape_], outputs = [ T.TensorType(H_.dtype, (False,False,False,False,False))() ] )
...@@ -36,22 +39,25 @@ class ConvTransp3D(theano.Op): ...@@ -36,22 +39,25 @@ class ConvTransp3D(theano.Op):
flags = ['-Werror'] flags = ['-Werror']
return flags return flags
def infer_shape(self, node, input_shapes): def infer_shape(self, node, input_shapes):
W,b,d,H,RShape = node.inputs W, b, d, H, RShape = node.inputs
W_shape, b_shape, d_shape, H_shape, RShape_shape = input_shapes W_shape, b_shape, d_shape, H_shape, RShape_shape = input_shapes
return [(H_shape[0], RShape[0], RShape[1], RShape[2], W_shape[4])] return [(H_shape[0], RShape[0], RShape[1], RShape[2], W_shape[4])]
def grad(self,inputs, output_gradients): def connection_pattern(self, node):
W,b,d,H, RShape = inputs return [[True], [True], [True], [True], [False]]
dCdR ,= output_gradients
dCdH = conv3D( dCdR, W, T.zeros_like(H[0,0,0,0,:]), d)
WShape = W.shape
dCdW = convGrad3D(dCdR,d,WShape,H)
dCdb = T.sum(dCdR,axis=(0,1,2,3))
dCdd = None #not differentiable, since d is not continuous
dCdRShape = None #not differentiable, since RShape is not continuous
def grad(self, inputs, output_gradients):
W, b, d, H, RShape = inputs
dCdR, = output_gradients
dCdH = conv3D(dCdR, W, T.zeros_like(H[0, 0, 0, 0, :]), d)
WShape = W.shape
dCdW = convGrad3D(dCdR, d, WShape, H)
dCdb = T.sum(dCdR, axis=(0, 1, 2, 3))
# not differentiable, since d affects the output elements
dCdd = grad_undefined(self, 2, d)
# disconnected, since RShape just determines the output shape
dCdRShape = DisconnectedType()()
if 'name' in dir(dCdR) and dCdR.name is not None: if 'name' in dir(dCdR) and dCdR.name is not None:
dCdR_name = dCdR.name dCdR_name = dCdR.name
...@@ -76,15 +82,14 @@ class ConvTransp3D(theano.Op): ...@@ -76,15 +82,14 @@ class ConvTransp3D(theano.Op):
dCdW.name = 'ConvTransp3D_dCdW.H='+H_name+',dCdR='+dCdR_name+',W='+W_name dCdW.name = 'ConvTransp3D_dCdW.H='+H_name+',dCdR='+dCdR_name+',W='+W_name
dCdb.name = 'ConvTransp3D_dCdb.H='+H_name+',dCdR='+dCdR_name+',W='+W_name+',b='+b_name dCdb.name = 'ConvTransp3D_dCdb.H='+H_name+',dCdR='+dCdR_name+',W='+W_name+',b='+b_name
dCdH.name = 'ConvTransp3D_dCdH.H='+H_name+',dCdR='+dCdR_name dCdH.name = 'ConvTransp3D_dCdH.H=' + H_name + ',dCdR=' + dCdR_name
return [ dCdW, dCdb, dCdd, dCdH, dCdRShape ]
return [dCdW, dCdb, dCdd, dCdH, dCdRShape]
def perform(self, node, inputs, output_storage): def perform(self, node, inputs, output_storage):
W, b, d, H, RShape = inputs W, b, d, H, RShape = inputs
# print "\t\t\t\tConvTransp3D python code" # print "\t\t\t\tConvTransp3D python code"
output_storage[0][0] = computeR(W,b,d,H,RShape) output_storage[0][0] = computeR(W, b, d, H, RShape)
def c_code(self, node, nodename, inputs, outputs, sub): def c_code(self, node, nodename, inputs, outputs, sub):
W, b, d, H, RShape = inputs W, b, d, H, RShape = inputs
...@@ -321,33 +326,35 @@ class ConvTransp3D(theano.Op): ...@@ -321,33 +326,35 @@ class ConvTransp3D(theano.Op):
///////////// < /code generated by ConvTransp3D > ///////////// < /code generated by ConvTransp3D >
""" """
return strutil.renderString(codeSource,locals()) return strutil.renderString(codeSource, locals())
convTransp3D = ConvTransp3D() convTransp3D = ConvTransp3D()
#If the input size wasn't a multiple of D we may need to cause some automatic padding to get the right size of reconstruction #If the input size wasn't a multiple of D we may need to cause some automatic padding to get the right size of reconstruction
def computeR(W,b,d,H,Rshape = None):
def computeR(W, b, d, H, Rshape=None):
assert len(W.shape) == 5 assert len(W.shape) == 5
assert len(H.shape) == 5 assert len(H.shape) == 5
assert len(b.shape) == 1 assert len(b.shape) == 1
assert len(d) == 3 assert len(d) == 3
outputChannels, filterHeight, filterWidth, filterDur, \
outputChannels, filterHeight, filterWidth, filterDur, inputChannels = W.shape inputChannels = W.shape
batchSize, outputHeight, outputWidth, outputDur, outputChannelsAgain = H.shape batchSize, outputHeight, outputWidth, outputDur, \
outputChannelsAgain = H.shape
assert outputChannelsAgain == outputChannels assert outputChannelsAgain == outputChannels
assert b.shape[0] == inputChannels assert b.shape[0] == inputChannels
dr, dc, dt = d
dr,dc,dt = d
assert dr > 0 assert dr > 0
assert dc > 0 assert dc > 0
assert dt > 0 assert dt > 0
videoHeight = (outputHeight-1) * dr + filterHeight videoHeight = (outputHeight - 1) * dr + filterHeight
videoWidth = (outputWidth-1) * dc + filterWidth videoWidth = (outputWidth - 1) * dc + filterWidth
videoDur = (outputDur-1) * dt + filterDur videoDur = (outputDur - 1) * dt + filterDur
if Rshape is not None and Rshape[0] != -1: if Rshape is not None and Rshape[0] != -1:
if Rshape[0] < videoHeight: if Rshape[0] < videoHeight:
...@@ -364,24 +371,27 @@ def computeR(W,b,d,H,Rshape = None): ...@@ -364,24 +371,27 @@ def computeR(W,b,d,H,Rshape = None):
#print "video size: "+str((videoHeight, videoWidth, videoDur)) #print "video size: "+str((videoHeight, videoWidth, videoDur))
R = N.zeros( (batchSize, videoHeight, R = N.zeros((batchSize, videoHeight,
videoWidth, videoDur, inputChannels ) , dtype=H.dtype) videoWidth, videoDur, inputChannels), dtype=H.dtype)
#R[i,j,r,c,t] = b_j + sum_{rc,rk | d \circ rc + rk = r} sum_{cc,ck | ...} sum_{tc,tk | ...} sum_k W[k, j, rk, ck, tk] * H[i,k,rc,cc,tc] #R[i,j,r,c,t] = b_j + sum_{rc,rk | d \circ rc + rk = r} sum_{cc,ck | ...} sum_{tc,tk | ...} sum_k W[k, j, rk, ck, tk] * H[i,k,rc,cc,tc]
for i in xrange(0,batchSize): for i in xrange(0, batchSize):
#print '\texample '+str(i+1)+'/'+str(batchSize) #print '\texample '+str(i+1)+'/'+str(batchSize)
for j in xrange(0,inputChannels): for j in xrange(0, inputChannels):
#print '\t\tfeature map '+str(j+1)+'/'+str(inputChannels) #print '\t\tfeature map '+str(j+1)+'/'+str(inputChannels)
for r in xrange(0,videoHeight): for r in xrange(0, videoHeight):
#print '\t\t\trow '+str(r+1)+'/'+str(videoHeight) #print '\t\t\trow '+str(r+1)+'/'+str(videoHeight)
for c in xrange(0,videoWidth): for c in xrange(0, videoWidth):
for t in xrange(0,videoDur): for t in xrange(0, videoDur):
R[i,r,c,t,j] = b[j] R[i, r, c, t, j] = b[j]
ftc = max([0, int(N.ceil(float(t-filterDur +1 )/float(dt))) ]) ftc = max([0, int(N.ceil(
fcc = max([0, int(N.ceil(float(c-filterWidth +1)/float(dc))) ]) float(t - filterDur + 1) / float(dt)))])
fcc = max([0, int(N.ceil(
float(c - filterWidth + 1) / float(dc)))])
rc = max([0, int(N.ceil(float(r-filterHeight+1)/float(dr))) ]) rc = max([0, int(N.ceil(
float(r - filterHeight + 1) / float(dr)))])
while rc < outputHeight: while rc < outputHeight:
rk = r - rc * dr rk = r - rc * dr
if rk < 0: if rk < 0:
...@@ -399,20 +409,21 @@ def computeR(W,b,d,H,Rshape = None): ...@@ -399,20 +409,21 @@ def computeR(W,b,d,H,Rshape = None):
if tk < 0: if tk < 0:
break break
R[i,r,c,t,j] += N.dot(W[:,rk,ck,tk,j], H[i,rc,cc,tc,:] ) R[
i,r,c,t,j] += N.dot(W[:,rk,ck,tk,j], H[i,rc,cc,tc,:] )
tc += 1 tc += 1
"" #close loop over tc "" # close loop over tc
cc += 1 cc += 1
"" #close loop over cc "" # close loop over cc
rc += 1 rc += 1
"" #close loop over rc "" # close loop over rc
"" #close loop over t "" # close loop over t
"" #close loop over c "" # close loop over c
"" #close loop over r "" # close loop over r
"" #close loop over j "" # close loop over j
"" #close loop over i "" # close loop over i
return R return R
......
...@@ -15,6 +15,7 @@ from theano.gof import Apply ...@@ -15,6 +15,7 @@ from theano.gof import Apply
from theano.tensor.nnet.sigm import sigmoid, softplus from theano.tensor.nnet.sigm import sigmoid, softplus
from theano.gradient import DisconnectedType from theano.gradient import DisconnectedType
from theano.gradient import grad_not_implemented
############ ############
...@@ -79,7 +80,7 @@ class SoftmaxWithBias(gof.Op): ...@@ -79,7 +80,7 @@ class SoftmaxWithBias(gof.Op):
g_sm, = grads g_sm, = grads
if isinstance(g_sm.type, DisconnectedType): if isinstance(g_sm.type, DisconnectedType):
return [ DisconnectedType()(), DisconnectedType()() ] return [DisconnectedType()(), DisconnectedType()()]
sm = softmax_with_bias(x, b) sm = softmax_with_bias(x, b)
dx = softmax_grad(g_sm, sm) dx = softmax_grad(g_sm, sm)
...@@ -560,8 +561,8 @@ if 0: ...@@ -560,8 +561,8 @@ if 0:
axis = ds_input.owner.op.axis axis = ds_input.owner.op.axis
sum_input = ds_input.owner.inputs[0] sum_input = ds_input.owner.inputs[0]
if ((ds_order!=(0,'x')) or if ((ds_order != (0, 'x')) or
(axis!=(1,)) or (axis != (1,)) or
(sum_input is not prod_term)): (sum_input is not prod_term)):
rest.append(add_in) rest.append(add_in)
#print 'ds_order =', ds_order #print 'ds_order =', ds_order
...@@ -712,16 +713,20 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op): ...@@ -712,16 +713,20 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
am_shp = idx_shp am_shp = idx_shp
return [nll_shp, sm_shp, am_shp] return [nll_shp, sm_shp, am_shp]
def connection_pattern(self, node):
return [[True, True, True], # x
[True, True, True], # b
[False, False, True]] # y_idx
def grad(self, inp, grads): def grad(self, inp, grads):
x, b, y_idx = inp x, b, y_idx = inp
g_nll, g_sm, g_am = grads g_nll, g_sm, g_am = grads
dx_terms = [] dx_terms = []
db_terms = [] db_terms = []
d_idx_terms = [] d_idx_terms = []
if not isinstance(g_nll.type, DisconnectedType): if not isinstance(g_nll.type, DisconnectedType):
nll, sm = crossentropy_softmax_1hot_with_bias(x, b, y_idx) nll, sm = crossentropy_softmax_1hot_with_bias(x, b, y_idx)
dx = crossentropy_softmax_1hot_with_bias_dx(g_nll, sm, y_idx) dx = crossentropy_softmax_1hot_with_bias_dx(g_nll, sm, y_idx)
...@@ -739,7 +744,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op): ...@@ -739,7 +744,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
db_terms.append(b.zeros_like()) db_terms.append(b.zeros_like())
d_idx_terms.append(y_idx.zeros_like()) d_idx_terms.append(y_idx.zeros_like())
def fancy_sum( terms ): def fancy_sum(terms):
if len(terms) == 0: if len(terms) == 0:
return DisconnectedType()() return DisconnectedType()()
rval = terms[0] rval = terms[0]
...@@ -747,8 +752,8 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op): ...@@ -747,8 +752,8 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
rval = rval + term rval = rval + term
return rval return rval
return [ fancy_sum(terms) for terms in return [fancy_sum(terms) for terms in
[dx_terms, db_terms, d_idx_terms ] ] [dx_terms, db_terms, d_idx_terms]]
def c_headers(self): def c_headers(self):
return ['<iostream>', '<cmath>'] return ['<iostream>', '<cmath>']
...@@ -897,7 +902,7 @@ class CrossentropySoftmax1HotWithBiasDx (gof.Op): ...@@ -897,7 +902,7 @@ class CrossentropySoftmax1HotWithBiasDx (gof.Op):
sm, tensor.fill(dy, -1), y_idx_range, y_idx), sm, tensor.fill(dy, -1), y_idx_range, y_idx),
axis=1) axis=1)
g_sm = dy.dimshuffle(0, 'x') * g_dx g_sm = dy.dimshuffle(0, 'x') * g_dx
g_y_idx = None g_y_idx = grad_not_implemented(self, 2, y_idx)
return [g_dy, g_sm, g_y_idx] return [g_dy, g_sm, g_y_idx]
def c_code_cache_version(self): def c_code_cache_version(self):
...@@ -1136,7 +1141,7 @@ class CrossentropyCategorical1Hot(gof.Op): ...@@ -1136,7 +1141,7 @@ class CrossentropyCategorical1Hot(gof.Op):
coding, one_of_n = inp coding, one_of_n = inp
g_y, = grads g_y, = grads
return [crossentropy_categorical_1hot_grad(g_y, coding, one_of_n), return [crossentropy_categorical_1hot_grad(g_y, coding, one_of_n),
None] grad_not_implemented(self, 1, one_of_n)]
crossentropy_categorical_1hot = CrossentropyCategorical1Hot() crossentropy_categorical_1hot = CrossentropyCategorical1Hot()
...@@ -1325,7 +1330,6 @@ def local_advanced_indexing_crossentropy_onehot(node): ...@@ -1325,7 +1330,6 @@ def local_advanced_indexing_crossentropy_onehot(node):
except Exception: except Exception:
pass pass
if sm is not None and sm.owner and sm.owner.op in (softmax, if sm is not None and sm.owner and sm.owner.op in (softmax,
softmax_with_bias): softmax_with_bias):
sm_w_bias = local_softmax_with_bias.transform(sm.owner) sm_w_bias = local_softmax_with_bias.transform(sm.owner)
...@@ -1481,7 +1485,8 @@ def local_advanced_indexing_crossentropy_onehot_grad(node): ...@@ -1481,7 +1485,8 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
if adv_subtensor is not None: if adv_subtensor is not None:
try: try:
maybe_sm, maybe_rows, maybe_labels = adv_subtensor.owner.inputs maybe_sm, maybe_rows, \
maybe_labels = adv_subtensor.owner.inputs
except Exception: except Exception:
return return
...@@ -1691,7 +1696,6 @@ class Prepend_scalar_constant_to_each_row(gof.Op): ...@@ -1691,7 +1696,6 @@ class Prepend_scalar_constant_to_each_row(gof.Op):
shp = (in_shapes[0][0], in_shapes[0][1] + 1) shp = (in_shapes[0][0], in_shapes[0][1] + 1)
return [shp] return [shp]
def grad(self, inp, grads): def grad(self, inp, grads):
mat, = inp mat, = inp
goutput, = grads goutput, = grads
...@@ -1758,18 +1762,19 @@ prepend_1_to_each_row = Prepend_scalar_constant_to_each_row(1.) ...@@ -1758,18 +1762,19 @@ prepend_1_to_each_row = Prepend_scalar_constant_to_each_row(1.)
#numerically stabilize log softmax (X) #numerically stabilize log softmax (X)
# as X-X.max(axis=1).dimshuffle(0,'x') - log(exp(X-X.max(axis=1).dimshuffle(0,'x')).sum(axis=1)).dimshuffle(0,'x) # as X-X.max(axis=1).dimshuffle(0,'x') - log(exp(X-X.max(axis=1).dimshuffle(0,'x')).sum(axis=1)).dimshuffle(0,'x)
def make_out_pattern(X): def make_out_pattern(X):
stabilized_X = X - X.max(axis=1).dimshuffle(0,'x') stabilized_X = X - X.max(axis=1).dimshuffle(0, 'x')
out_var = stabilized_X - tensor.log(tensor.exp(stabilized_X).sum(axis=1)).dimshuffle(0,'x') out_var = stabilized_X - tensor.log(tensor.exp(stabilized_X).sum(
axis=1)).dimshuffle(0, 'x')
#tell DEBUG_MODE that it's OK if the original graph produced NaN and the optimized graph does not #tell DEBUG_MODE that it's OK if the original graph produced NaN and the optimized graph does not
out_var.values_eq_approx = out_var.type.values_eq_approx_remove_nan out_var.values_eq_approx = out_var.type.values_eq_approx_remove_nan
return out_var return out_var
local_log_softmax = gof.PatternSub( in_pattern = (tensor.log, (softmax, 'x')), local_log_softmax = gof.PatternSub(in_pattern=(tensor.log, (softmax, 'x')),
out_pattern = (make_out_pattern, 'x'), out_pattern=(make_out_pattern, 'x'),
allow_multiple_clients=True) allow_multiple_clients=True)
#don't do register_stabilize, this is to make local_log_softmax run #don't do register_stabilize, this is to make local_log_softmax run
#only after another more specific optimization that stabilizes cross entropy #only after another more specific optimization that stabilizes cross entropy
#opt.register_stabilize(local_log_softmax, name = 'local_log_softmax') #opt.register_stabilize(local_log_softmax, name = 'local_log_softmax')
opt.register_specialize(local_log_softmax, name = 'local_log_softmax') opt.register_specialize(local_log_softmax, name='local_log_softmax')
...@@ -30,13 +30,20 @@ class ScalarSigmoid(scalar.UnaryScalarOp): ...@@ -30,13 +30,20 @@ class ScalarSigmoid(scalar.UnaryScalarOp):
if x > 30.0: if x > 30.0:
return 1.0 return 1.0
return 1.0 / (1.0 + numpy.exp(-x)) return 1.0 / (1.0 + numpy.exp(-x))
def impl(self, x): def impl(self, x):
return ScalarSigmoid.st_impl(x) return ScalarSigmoid.st_impl(x)
def grad(self, inp, grads): def grad(self, inp, grads):
x, = inp x, = inp
gz, = grads gz, = grads
y = scalar_sigmoid(x) y = scalar_sigmoid(x)
return [gz * y * (1.0 - y)] rval = gz * y * (1.0 - y)
assert rval.type.dtype.find('float') != -1
return [rval]
def c_code(self, node, name, inp, out, sub): def c_code(self, node, name, inp, out, sub):
x, = inp x, = inp
z, = out z, = out
...@@ -50,6 +57,7 @@ class ScalarSigmoid(scalar.UnaryScalarOp): ...@@ -50,6 +57,7 @@ class ScalarSigmoid(scalar.UnaryScalarOp):
return """%(z)s = %(x)s < -709.0 ? 0.0 : %(x)s > 19.0 ? 1.0 : 1.0 /(1.0+exp(-%(x)s));""" % locals() return """%(z)s = %(x)s < -709.0 ? 0.0 : %(x)s > 19.0 ? 1.0 : 1.0 /(1.0+exp(-%(x)s));""" % locals()
else: else:
raise NotImplementedError('only floatingpoint is implemented') raise NotImplementedError('only floatingpoint is implemented')
def c_code_cache_version(self): def c_code_cache_version(self):
v = super(ScalarSigmoid, self).c_code_cache_version() v = super(ScalarSigmoid, self).c_code_cache_version()
if v: if v:
...@@ -61,7 +69,7 @@ sigmoid = elemwise.Elemwise(scalar_sigmoid, name='sigmoid') ...@@ -61,7 +69,7 @@ sigmoid = elemwise.Elemwise(scalar_sigmoid, name='sigmoid')
sigmoid_inplace = elemwise.Elemwise( sigmoid_inplace = elemwise.Elemwise(
ScalarSigmoid(scalar.transfer_type(0)), ScalarSigmoid(scalar.transfer_type(0)),
inplace_pattern={0:0}, inplace_pattern={0: 0},
name='sigmoid_inplace', name='sigmoid_inplace',
) )
...@@ -76,12 +84,15 @@ class ScalarSoftplus(scalar.UnaryScalarOp): ...@@ -76,12 +84,15 @@ class ScalarSoftplus(scalar.UnaryScalarOp):
if x > 30.0: if x > 30.0:
return x return x
return numpy.log1p(numpy.exp(x)) return numpy.log1p(numpy.exp(x))
def impl(self, x): def impl(self, x):
return ScalarSoftplus.static_impl(x) return ScalarSoftplus.static_impl(x)
def grad(self, inp, grads): def grad(self, inp, grads):
x, = inp x, = inp
gz, = grads gz, = grads
return [gz * scalar_sigmoid(x)] return [gz * scalar_sigmoid(x)]
def c_code(self, node, name, inp, out, sub): def c_code(self, node, name, inp, out, sub):
x, = inp x, = inp
z, = out z, = out
...@@ -95,27 +106,29 @@ class ScalarSoftplus(scalar.UnaryScalarOp): ...@@ -95,27 +106,29 @@ class ScalarSoftplus(scalar.UnaryScalarOp):
return """%(z)s = %(x)s < -745.0 ? 0.0 : %(x)s > 16.0 ? %(x)s : log1p(exp(%(x)s));""" % locals() return """%(z)s = %(x)s < -745.0 ? 0.0 : %(x)s > 16.0 ? %(x)s : log1p(exp(%(x)s));""" % locals()
else: else:
raise NotImplementedError('only floatingpoint is implemented') raise NotImplementedError('only floatingpoint is implemented')
def c_code_cache_version(self): def c_code_cache_version(self):
v = super(ScalarSoftplus, self).c_code_cache_version() v = super(ScalarSoftplus, self).c_code_cache_version()
if v: if v:
return (2,) + v return (2,) + v
else: else:
return v return v
scalar_softplus = ScalarSoftplus(scalar.upgrade_to_float, name='scalar_softplus') scalar_softplus = ScalarSoftplus(scalar.upgrade_to_float, name= 'scalar_softplus')
softplus = elemwise.Elemwise(scalar_softplus, name='softplus') softplus = elemwise.Elemwise(scalar_softplus, name='softplus')
pprint.assign(softplus, printing.FunctionPrinter('softplus')) pprint.assign(softplus, printing.FunctionPrinter('softplus'))
def _skip_mul_1(r): def _skip_mul_1(r):
if r.owner and r.owner.op == tensor.mul: if r.owner and r.owner.op == tensor.mul:
not_is_1 = [i for i in r.owner.inputs if not _is_1(i) ] not_is_1 = [i for i in r.owner.inputs if not _is_1(i)]
if len(not_is_1)==1: if len(not_is_1) == 1:
return not_is_1[0] return not_is_1[0]
logsigm_to_softplus = gof.PatternSub( logsigm_to_softplus = gof.PatternSub(
(tensor.log, (sigmoid, 'x')), (tensor.log, (sigmoid, 'x')),
(tensor.neg, (softplus, (tensor.neg, 'x'))), (tensor.neg, (softplus, (tensor.neg, 'x'))),
allow_multiple_clients = True, allow_multiple_clients=True,
skip_identities_fn=_skip_mul_1) skip_identities_fn=_skip_mul_1)
...@@ -131,21 +144,22 @@ def _is_1(expr): ...@@ -131,21 +144,22 @@ def _is_1(expr):
log1msigm_to_softplus = gof.PatternSub( log1msigm_to_softplus = gof.PatternSub(
(tensor.log, (tensor.log,
(tensor.sub, (tensor.sub,
dict(pattern='y', constraint = _is_1), dict(pattern='y', constraint=_is_1),
(sigmoid, 'x'))), (sigmoid, 'x'))),
(tensor.neg, (softplus, 'x')), (tensor.neg, (softplus, 'x')),
allow_multiple_clients = True, allow_multiple_clients=True,
skip_identities_fn=_skip_mul_1) skip_identities_fn=_skip_mul_1)
log1pexp_to_softplus = gof.PatternSub( log1pexp_to_softplus = gof.PatternSub(
(tensor.log1p, (tensor.log1p,
(tensor.exp, 'x')), (tensor.exp, 'x')),
(softplus, 'x'), (softplus, 'x'),
allow_multiple_clients = True) allow_multiple_clients=True)
opt.register_stabilize(logsigm_to_softplus, name='logsigm_to_softplus')
opt.register_stabilize(log1msigm_to_softplus, name='log1msigm_to_softplus')
opt.register_stabilize(log1pexp_to_softplus, name='log1pexp_to_softplus')
opt.register_stabilize(logsigm_to_softplus, name = 'logsigm_to_softplus')
opt.register_stabilize(log1msigm_to_softplus, name = 'log1msigm_to_softplus')
opt.register_stabilize(log1pexp_to_softplus, name = 'log1pexp_to_softplus')
def is_1pexp(t): def is_1pexp(t):
""" """
...@@ -239,7 +253,7 @@ def partition_num_or_denom(r, f): ...@@ -239,7 +253,7 @@ def partition_num_or_denom(r, f):
else: else:
neg_t, f_t = f_t neg_t, f_t = f_t
f_terms.append(f_t) f_terms.append(f_t)
neg ^= neg_t #bit flip if neg_t is true neg ^= neg_t # bit flip if neg_t is true
return f_terms, rest, neg return f_terms, rest, neg
...@@ -291,7 +305,8 @@ def local_exp_over_1_plus_exp(node): ...@@ -291,7 +305,8 @@ def local_exp_over_1_plus_exp(node):
#find all the exp() terms in the numerator #find all the exp() terms in the numerator
num, denom = node.inputs num, denom = node.inputs
num_exp_x, num_rest, num_neg = partition_num_or_denom(num, is_exp) num_exp_x, num_rest, num_neg = partition_num_or_denom(num, is_exp)
denom_1pexp, denom_rest, denom_neg = partition_num_or_denom(denom, is_1pexp) denom_1pexp, denom_rest, \
denom_neg = partition_num_or_denom(denom, is_1pexp)
sigmoids = [] sigmoids = []
for t in denom_1pexp: for t in denom_1pexp:
...@@ -303,7 +318,7 @@ def local_exp_over_1_plus_exp(node): ...@@ -303,7 +318,7 @@ def local_exp_over_1_plus_exp(node):
# case: 1/(1+exp(x)) # case: 1/(1+exp(x))
sigmoids.append(sigmoid(-t)) sigmoids.append(sigmoid(-t))
if not sigmoids: # we didn't find any. abort if not sigmoids: # we didn't find any. abort
return return
# put the new numerator together # put the new numerator together
new_num = sigmoids + [tensor.exp(t) for t in num_exp_x] + num_rest new_num = sigmoids + [tensor.exp(t) for t in num_exp_x] + num_rest
...@@ -322,6 +337,7 @@ def local_exp_over_1_plus_exp(node): ...@@ -322,6 +337,7 @@ def local_exp_over_1_plus_exp(node):
else: else:
return [new_num / tensor.mul(*denom_rest)] return [new_num / tensor.mul(*denom_rest)]
def parse_mul_tree(root): def parse_mul_tree(root):
""" """
Parse a tree of multiplications starting at the given root. Parse a tree of multiplications starting at the given root.
...@@ -504,7 +520,7 @@ def perform_sigm_times_exp(tree, exp_x=None, exp_minus_x=None, sigm_x=None, ...@@ -504,7 +520,7 @@ def perform_sigm_times_exp(tree, exp_x=None, exp_minus_x=None, sigm_x=None,
sigm_minus_x = [] sigm_minus_x = []
if full_tree is None: if full_tree is None:
full_tree = tree full_tree = tree
if False: # Debug code. if False: # Debug code.
print '<perform_sigm_times_exp>' print '<perform_sigm_times_exp>'
print ' full_tree = %s' % full_tree print ' full_tree = %s' % full_tree
print ' tree = %s' % tree print ' tree = %s' % tree
...@@ -613,10 +629,13 @@ def local_inv_1_plus_exp(node): ...@@ -613,10 +629,13 @@ def local_inv_1_plus_exp(node):
if nonconsts[0].owner and nonconsts[0].owner.op == tensor.exp: if nonconsts[0].owner and nonconsts[0].owner.op == tensor.exp:
if scalars and numpy.allclose(numpy.sum(scalars), 1): if scalars and numpy.allclose(numpy.sum(scalars), 1):
return opt._fill_chain( return opt._fill_chain(
sigmoid(tensor.neg(nonconsts[0].owner.inputs[0])), sigmoid(
tensor.neg(nonconsts[0].owner.inputs[0])),
scalar_inputs) scalar_inputs)
# Registration is below, and conditional. # Registration is below, and conditional.
@gof.local_optimizer([tensor.sub]) @gof.local_optimizer([tensor.sub])
def local_1msigmoid(node): def local_1msigmoid(node):
""" """
...@@ -625,7 +644,7 @@ def local_1msigmoid(node): ...@@ -625,7 +644,7 @@ def local_1msigmoid(node):
if node.op == tensor.sub: if node.op == tensor.sub:
sub_l, sub_r = node.inputs sub_l, sub_r = node.inputs
if len(sub_r.clients) > 1: if len(sub_r.clients) > 1:
return # graph is using both sigm and 1-sigm return # graph is using both sigm and 1-sigm
if sub_r.owner and sub_r.owner.op == sigmoid: if sub_r.owner and sub_r.owner.op == sigmoid:
try: try:
val_l = opt.get_constant_value(sub_l) val_l = opt.get_constant_value(sub_l)
...@@ -678,13 +697,14 @@ if 0: ...@@ -678,13 +697,14 @@ if 0:
assert t0.owner.op == div assert t0.owner.op == div
t0top, t0bot = t0.owner.inputs t0top, t0bot = t0.owner.inputs
t1top, t1bot = t1.owner.inputs t1top, t1bot = t1.owner.inputs
rval.append(div(mul(*(t0top+t1top)), mul(*(t0bot+t1bot)))) rval.append(div(mul(*(
t0top + t1top)), mul(*(t0bot + t1bot))))
if len(rval) > 100: if len(rval) > 100:
# This loop can be exponentially long. # This loop can be exponentially long.
# aborting # aborting
return [] return []
elif len(node.outputs)>1: elif len(node.outputs) > 1:
return [] return []
else: else:
return [node.outputs[0]] return [node.outputs[0]]
...@@ -542,15 +542,12 @@ class MakeVector(T.Op): ...@@ -542,15 +542,12 @@ class MakeVector(T.Op):
def grad(self, inputs, output_gradients): def grad(self, inputs, output_gradients):
# If the output is of an integer dtype, no gradient shall pass # If the output is of an integer dtype, no gradient shall pass
if 'int' in self.dtype: if 'int' in self.dtype:
return [None] * len(inputs) return [ipt.zeros_like().astype(theano.config.floatX)
for ipt in inputs]
grads = [] grads = []
for i, inp in enumerate(inputs): for i, inp in enumerate(inputs):
if 'int' in inp.dtype: grads.append(output_gradients[0][i])
# No gradient wrt integer inputs
grads.append(None)
else:
grads.append(output_gradients[0][i])
return grads return grads
def R_op(self, inputs, eval_points): def R_op(self, inputs, eval_points):
...@@ -1914,6 +1911,8 @@ def local_subtensor_of_alloc(node): ...@@ -1914,6 +1911,8 @@ def local_subtensor_of_alloc(node):
nw_val = val[tuple(val_slices)] nw_val = val[tuple(val_slices)]
nw_dims += dims[len(slices):] nw_dims += dims[len(slices):]
if nw_val.ndim > len(nw_dims):
return False
rval = T.alloc(nw_val, *nw_dims) rval = T.alloc(nw_val, *nw_dims)
if type(rval) not in (list, tuple): if type(rval) not in (list, tuple):
rval = [rval] rval = [rval]
......
...@@ -136,7 +136,7 @@ class RandomStreams(Component, raw_random.RandomStreamsBase): ...@@ -136,7 +136,7 @@ class RandomStreams(Component, raw_random.RandomStreamsBase):
""" """
def __init__(self, seed=None, no_warn = False): def __init__(self, seed=None, no_warn=False):
""":type seed: None or int """:type seed: None or int
:param seed: a default seed to initialize the RandomState :param seed: a default seed to initialize the RandomState
...@@ -146,7 +146,7 @@ class RandomStreams(Component, raw_random.RandomStreamsBase): ...@@ -146,7 +146,7 @@ class RandomStreams(Component, raw_random.RandomStreamsBase):
""" """
if not no_warn: if not no_warn:
deprecation_warning() deprecation_warning()
super(RandomStreams, self).__init__(no_warn = True) super(RandomStreams, self).__init__(no_warn=True)
self.random_state_variables = [] self.random_state_variables = []
self.default_instance_seed = seed self.default_instance_seed = seed
...@@ -164,7 +164,6 @@ class RandomStreams(Component, raw_random.RandomStreamsBase): ...@@ -164,7 +164,6 @@ class RandomStreams(Component, raw_random.RandomStreamsBase):
def build(self, mode, memo): def build(self, mode, memo):
"""override `Component.build` """ """override `Component.build` """
if self not in memo: if self not in memo:
print 'creating RandomStreamsInstance'
memo[self] = RandomStreamsInstance(self, memo, memo[self] = RandomStreamsInstance(self, memo,
self.default_instance_seed) self.default_instance_seed)
return memo[self] return memo[self]
......
This source diff could not be displayed because it is too large. You can view the blob instead.
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论