提交 c0c25559 authored 作者: lamblin's avatar lamblin

Merge pull request #910 from goodfeli/int_grad

Consistent & correct handling of integers and gradients -Documentation and implementation of a consistent way of handling gradients and integers -Type checks that ensure the gradient is always floating point and not an integer -Type checks that ensure the gradient of an integer is always undefined or 0 -An upgraded version of connection_pattern that provides theano with enough information to answer questions like "is variable x a function of variable y?" accurately
......@@ -98,34 +98,56 @@ following methods:
lifetime of self. Op instances should be immutable in this
sense.
.. function:: connection_pattern():
.. function:: connection_pattern( node ):
Optional (but in extremely rare cases needed to have it work with
{tensor,sparse}.grad).
Optional method; sometimes needed for gradient.grad to
work correctly.
Returns a list of bools the same length as the op's inputs list.
Returns a list of list of bools.
True signifies that the elements of an input have an effect on its
output.
Op.connection_pattern[input_idx][output_idx] is true if the
elements of inputs[input_idx] have an effect on the elements of
outputs[output_idx].
False signifies that they do not--in other words, the op acts only
one the input's metadata such as its shape.
The ``node'' parameter is needed to determine the number of
inputs. Some ops such as Subtensor take a variable number of
inputs.
If no connection_pattern is implemented, tensor.grad will assume
it is a list containing only True.
If no connection_pattern is specified, gradient.grad will
assume that all inputs have some elements connected to some
elements of all outputs.
This method conveys two pieces of information that are otherwise
not part of the theano graph:
1) Which of the op's inputs are truly ancestors of each of the
op's outputs. Suppose an op has two inputs, x and y, and
outputs f(x) and g(y). y is not really an ancestor of f, but
it appears to be so in the theano graph.
2) Whether the actual elements of each input/output are relevant
to a computation.
For example, the shape op does not read its input's elements,
only its shape metadata. d shape(x) / dx should thus raise
a disconnected input exception (if these exceptions are
enabled).
As another example, the elements of the Alloc op's outputs
are not affected by the shape arguments to the Alloc op.
Failing to implement this function for an op that needs it can
result in tensor.grad erroneously reporting that a gradient is
undefined. Returning 0 for this input in the grad method is not
the same as specifying that the elements of this input are not
connected to the output. If the gradient with respect to the
op's output is NaN but the elements of the input are not connected
to it, then the NaN never enters into the expression for the
gradient.
result in two types of incorrect behavior:
1) gradient.grad erroneously raising a TypeError reporting that
a gradient is undefined.
2) gradient.grad failing to raise a ValueError reporting that
an input is disconnected.
Even if connection_pattern is not implemented correctly,
if gradient.grad returns an expression, that expression will
be numerically correct.
.. function:: grad(inputs, output_gradients)
Optional (but needed to have it work with {tensor,sparse}.grad()).
Optional (but needed to have it work with gradient.grad()).
If the Op being defined is differentiable, its gradient may be specified
symbolically in this method. Both ``inputs`` and ``output_gradients``
......@@ -217,6 +239,70 @@ following methods:
Both the partial differentiation and the multiplication have to be performed by
:func:`grad`.
Theano currently imposes the following constraints on the values returned by the grad method:
1) They must be Variable instances.
2) When they are types that have dtypes, they must never have an integer dtype.
Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
NullType or zero gradient. When you have an integer as an argument to your grad method,
recall the definition of a derivative to help you decide what value to return:
:math:`\frac{d f}{d x} = \lim_{\epsilon \rightarrow 0} (f(x+\epsilon)-f(x))/\epsilon`.
Suppose your function f has an integer-valued output. For most functions you're likely
to implement in theano, this means your gradient should be zero, because f(x+epsilon)
= f(x) for almost all x. (The only other option is that the gradient could be undefined,
if your function is discontinuous everywhere, like the rational indicator function)
Suppose your function f has an integer-valued input. This is a little trickier, because
you need to think about what you mean mathematically when you make a variable integer-valued
in theano. Most of the time in machine learning we mean "f is a function of a real-valued
x, but we are only going to pass in integer-values of x". In this case, f(x+epsilon) exists,
so the gradient through f should be the same whether x is an integer or a floating point
variable. Sometimes what we mean is "f is a function of an integer-valued x, and f is only
defined where x is an integer." Since f(x+epsilon) doesn't exist, the gradient is undefined.
Finally, many times in theano, integer valued inputs don't actually affect the elements of
the output, only its shape.
If your function f has both an integer-valued input and an
integer-valued output, then both rules have to be combined:
- If f is defined at (x+epsilon), then the input gradient is
defined. Since f(x+epsilon) would be equal to f(x) almost
everywhere, the gradient should be 0 (first rule).
- If f is only defined where x is an integer, then the gradient
is undefined, regardless of what the gradient with respect to the
output is.
Examples:
1) f(x,y) = dot product between x and y. x and y are integers.
Since the output is also an integer, f is a step function.
Its gradient is zero almost everywhere, so Op.grad should return
zeros in the shape of x and y.
2) f(x,y) = dot product between x and y. x is floating point and y is an integer.
In this case the output is floating point. It doesn't matter that y is an integer.
We consider f to still be defined at f(x,y+epsilon). The gradient is exactly the
same as if y were floating point.
3) f(x,y) = argmax of x along axis y.
The gradient with respect to y is undefined, because f(x,y) is not defined for
floating point y. How could you take an argmax along a fraActional axis?
The gradient with respect to x is 0, because f(x+epsilon, y) = f(x) almost
everywhere.
4) f(x,y) = a vector with y elements, each of which taking on the value x
The grad method should return DisconnectedType()() for y, because the elements of
f don't depend on y. Only the shape of f depends on y. You probably also want to
implement a connection_pattern method to encode this.
5) f(x) = int(x) converts float x into an int. g(y) = float(y) converts an integer y into a float.
If the final cost C = 0.5 * g(y) = 0.5 g(f(x)), then the
gradient with respect to y will be 0.5, even if y is an
integer. However, the gradient with respect to x will be 0,
because the output of f is integer-valued.
.. function:: infer_shape(node, shapes)
Optional.
......
......@@ -29,3 +29,9 @@ class NullType(Type):
def values_eq(a, b, force_same_dtype=True):
raise ValueError("NullType has no values to compare")
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self, other):
return hash(type(self))
差异被折叠。
......@@ -4,8 +4,8 @@ linkers). It resembles the if clause of any programming language, that
has a `then` and `else` branch, and executes either one or the other
according to the condition provided.
This op contrast the already existent `switch` op, that will evaluate both
branches of the clause and afterwards pick (according to the condition)
This op differs from the already existent `switch` op, that evaluates both
branches of the clause and afterwards picks (according to the condition)
which value to report. Note also that `switch` is an elemwise operation (so
it picks each entry of a matrix according to the condition) while `ifelse`
is a global operation with a scalar condition.
......@@ -60,7 +60,7 @@ class IfElse(PureOp):
:note:
Other Linkers then CVM and VM are INCOMPATIBLE with this Op, and
will ingnore its lazy characteristic, computing both the True and
will ignore its lazy characteristic, computing both the True and
False branch before picking one.
"""
......@@ -212,7 +212,14 @@ class IfElse(PureOp):
for t in ts])
if_false = ([ins[0]] + [theano.tensor.zeros_like(f)
for f in fs] + grads)
return ([None] +
condition = ins[0]
# condition does affect the elements of the output so it is connected.
# For the sake of making the gradient convenient we assume that
# condition + epsilon always triggers the same branch as condition
condition_grad = condition.zeros_like().astype(theano.config.floatX)
return ([condition_grad] +
if_true_op.make_node(*if_true).outputs +
if_false_op.make_node(*if_false).outputs)
......
# Skip test if cuda_ndarray is not available.
from nose.plugins.skip import SkipTest
import numpy
import theano
import theano.sandbox.cuda as cuda_ndarray
if cuda_ndarray.cuda_available == False:
......
......@@ -2,10 +2,10 @@
TODO: implement Images2Neibs.{perform,infer_shape}() methods
"""
import theano
from theano import Op, Apply
import theano.tensor as T
from theano.gradient import grad_not_implemented
from theano.gradient import grad_undefined
class Images2Neibs(Op):
......@@ -59,7 +59,8 @@ class Images2Neibs(Op):
for j in xrange(list 2 dim)
for k in <image column coordinates>
for l in <image row coordinates>
output[idx,:] = flattened version of ten4[i,j,l:l+r,k:k+c]
output[idx,:]
= flattened version of ten4[i,j,l:l+r,k:k+c]
idx += 1
(note: the op isn't necessarily implemented internally with these
for loops, they're just the easiest way to describe the output pattern)
......@@ -90,8 +91,11 @@ class Images2Neibs(Op):
(hasattr(neib_shape, "equals") and
neib_shape.equals(neib_step))):
return [neibs2images(gz, neib_shape, x.shape, mode=self.mode),
None, None]
return [grad_not_implemented(self, 0, x), None, None]
grad_undefined(self, 1, neib_shape),
grad_undefined(self, 2, neib_step)]
return [grad_not_implemented(self, 0, x),
grad_undefined(self, 1, neib_shape),
grad_undefined(self, 2, neib_step)]
def c_code_cache_version(self):
return (5,)
......@@ -307,5 +311,3 @@ def neibs2images(neibs, neib_shape, original_shape, mode='valid'):
raise NotImplementedError("neibs2images do not support mode=%s" % mode)
return output_4d
差异被折叠。
......@@ -260,12 +260,16 @@ class Scan(PureOp):
zip(self.inner_seqs(self.inputs),
self.outer_seqs(inputs))):
if inner_seq.type.dtype != outer_seq[idx].type.dtype:
assert isinstance(idx, int)
raise ValueError(err_msg1 % ('sequence',
str(outer_seq),
idx,
outer_seq.type.dtype,
outer_seq.ndim,
str(inner_seq),
inner_seq.type.dtype))
inner_seq.type.dtype,
inner_seq.ndim))
argoffset += len(self.outer_seqs(inputs))
# Check that this 3 things have the same dtype for mit_mot:
# - initial state of the output
......@@ -1260,7 +1264,7 @@ class Scan(PureOp):
# the gradients with respect to all outputs)
def compute_gradient(y, g_y):
gmp = gradient.grad_sources_inputs(
[(y, g_y)], diff_inputs, False)
[(y, g_y)], diff_inputs)
return [gmp.get(p, None) for p in diff_inputs]
# 6. clean the outputs (i.e. remove update rules)
......@@ -1301,7 +1305,13 @@ class Scan(PureOp):
# 7.3. compute gradients of the inputs given one output
for dx, out in enumerate(clean_outputs):
inner_g_out = safe_new(out)
if g_outs[dx] != None:
inner_g_out = safe_new(g_outs[dx][0])
else:
# We do not have a gradient on this output so we need a
# placeholder, which for now has the same dtype as the
# output
inner_g_out = safe_new(out)
###
#### I need to clip the gradient HERE !!
......
......@@ -18,6 +18,7 @@ from theano.gof.python25 import all
from theano.gradient import DisconnectedType
from theano.sparse.utils import hash_from_sparse
import theano.tests.unittest_tools as utt
from theano.gradient import grad_not_implemented
sparse_formats = ['csc', 'csr']
......@@ -255,11 +256,13 @@ def sp_zeros_like(x):
:return: The same as `x` with zero entries
for all element.
"""
# TODO: don't restrict to CSM formats
_, _, indptr, shape = csm_properties(x)
return CSM(format=x.format)(numpy.array([], dtype=x.type.dtype),
numpy.array([]), tensor.zeros_like(indptr),
shape)
return CSM(format=x.format)(data=numpy.array([], dtype=x.type.dtype),
indices=numpy.array([]),
indptr=tensor.zeros_like(indptr),
shape=shape)
class _sparse_py_operators:
......@@ -670,7 +673,7 @@ class CSM(gof.Op):
the sparse matrix. Fancy indexing with numpy.ndarray
should be used for this purpose.
:param data: One dimensionnal tensor representing
:param data: One dimensional tensor representing
the data of the sparse to construct.
:param indices: One dimensional tensor of integers
representing the indices of the sparse
......@@ -678,7 +681,7 @@ class CSM(gof.Op):
:param indptr: One dimensional tensor of integers
representing the indice pointer for
the sparse matrix to construct.
:param shape: One dimensionnal tensor of integers
:param shape: One dimensional tensor of integers
representing the shape of the sparse
matrix to construct.
......@@ -782,6 +785,9 @@ class CSM(gof.Op):
indptr.copy()), shape.copy(),
copy=False)
def connection_pattern(self, node):
return [[True], [False], [False], [False]]
def grad(self, (x_data, x_indices, x_indptr, x_shape), (g_out,)):
g_data, g_indices, g_indptr, g_shape = csm_properties(g_out)
# unpack the data vector and wrap it as a 1d TensorType
......@@ -984,7 +990,19 @@ class DenseFromSparse(gof.op.Op):
def grad(self, (x, ), (gz, )):
if self.sparse_grad:
return [sp_ones_like(x) * gz]
left = sp_ones_like(x)
right = gz
# Do upcasting if necessary to avoid an unimplemented case
# of mul
if right.dtype == 'float64' and left.dtype == 'float32':
left = left.astype('float64')
if right.dtype == 'float32' and left.dtype == 'float64':
right = right.astype('float64')
return [left * right]
else:
return [SparseFromDense(x.type.format)(gz)]
......@@ -1993,7 +2011,9 @@ class MulSS(gof.op.Op):
def make_node(self, x, y):
x, y = as_sparse_variable(x), as_sparse_variable(y)
if x.type != y.type:
raise NotImplementedError()
raise NotImplementedError(
"MulSS not supported for differing types. "
"Got %s and %s." % (str(x.type), str(y.type)))
return gof.Apply(self, [x, y], [x.type()])
def perform(self, node, (x, y), (out, )):
......@@ -2042,7 +2062,9 @@ class MulSD(gof.op.Op):
y = tensor.cast(y, dtype)
if x.type.dtype != y.type.dtype:
raise NotImplementedError()
raise NotImplementedError(
"MulSD not implemented for different input dtypes. "
"Got %s and %s." % (x.type.dtype, y.type.dtype))
# The magic number two here arises because L{scipy.sparse}
# objects must be matrices (have dimension 2)
# Broadcasting of the sparse matrix is not supported.
......@@ -2128,7 +2150,9 @@ class MulSV(gof.op.Op):
assert y.type.ndim == 1
if x.type.dtype != y.type.dtype:
raise NotImplementedError()
raise NotImplementedError(
"MulSV not implemented for differing dtypes."
"Got %s and %s." % (str(x.type.dtype), str(y.type.dtype)))
return gof.Apply(self,
[x, y],
[SparseType(dtype=x.type.dtype,
......@@ -2142,6 +2166,15 @@ class MulSV(gof.op.Op):
def grad(self, (x, y), (gz,)):
assert _is_sparse_variable(x) and _is_dense_variable(y)
assert _is_sparse_variable(gz)
# mul_s_v is not implemented if the types vary
if gz.dtype == 'float64' and y.dtype == 'float32':
y = y.astype('float64')
if gz.dtype == 'float32' and y.dtype == 'float64':
gz = gz.astype('float64')
return mul_s_v(gz, y), sp_sum(x * gz, axis=0, sparse_grad=True)
def infer_shape(self, node, ins_shapes):
......@@ -2176,8 +2209,18 @@ def mul(x, y):
assert x_is_sparse_variable or y_is_sparse_variable
if x_is_sparse_variable and y_is_sparse_variable:
# mul_s_s is not implemented if the types differ
if y.dtype == 'float64' and x.dtype == 'float32':
x = x.astype('float64')
return mul_s_s(x, y)
elif x_is_sparse_variable and not y_is_sparse_variable:
# mul is unimplemented if the dtypes differ
if y.dtype == 'float64' and x.dtype == 'float32':
x = x.astype('float64')
return mul_s_d(x, y)
elif y_is_sparse_variable and not x_is_sparse_variable:
return mul_s_d(y, x)
......@@ -3260,7 +3303,7 @@ class SamplingDot(gof.op.Op):
rval = [
dot(p * gz, y),
dot((p * gz).T, x),
None
grad_not_implemented(self, 2, p)
]
return rval
......
差异被折叠。
......@@ -14,6 +14,7 @@ from theano.scalar import Scalar
from theano.printing import min_informative_str, pprint
from theano.gof.python25 import all, any
from theano.tensor.utils import hash_from_dict
from theano.gradient import DisconnectedType
config = theano.config
......@@ -277,7 +278,8 @@ class DimShuffle(Op):
#get the copy / view of the input depending on whether we're doingi
# things inplace or not.
if self.inplace:
get_base = ['{ PyArrayObject * %(basename)s = %(input)s', 'Py_INCREF((PyObject*)%(basename)s)']
get_base = [
'{ PyArrayObject * %(basename)s = %(input)s', 'Py_INCREF((PyObject*)%(basename)s)']
else:
get_base = [('{ PyArrayObject * %(basename)s = (PyArrayObject*)PyArray_FromAny((PyObject*)%(input)s, NULL,'
'0, 0, NPY_ALIGNED|NPY_ENSURECOPY, NULL)')]
......@@ -285,7 +287,8 @@ class DimShuffle(Op):
shape_statements = ['npy_intp dimensions[%i]' % nd_out]
for i, o in enumerate(self.new_order):
if o != 'x':
shape_statements += [('dimensions[' + str(i) + '] = %(basename)s->dimensions[' + str(o) + ']')]
shape_statements += [('dimensions[' + str(
i) + '] = %(basename)s->dimensions[' + str(o) + ']')]
else:
shape_statements += [('dimensions[' + str(i) + '] = 1')]
......@@ -294,7 +297,8 @@ class DimShuffle(Op):
#set the strides of the non-broadcasted dimensions
for i, o in enumerate(self.new_order):
if o != 'x':
strides_statements += [('strides[' + str(i) + '] = %(basename)s->strides[' + str(o) + ']')]
strides_statements += [('strides[' + str(i)
+ '] = %(basename)s->strides[' + str(o) + ']')]
else:
strides_statements += [('strides[' + str(i) + '] = 0')]
......@@ -310,7 +314,8 @@ class DimShuffle(Op):
'-1] = %(basename)s->descr->elsize'
)
for i in xrange(nd_out - 2, -1, -1):
strides_statements.append("if (strides[%(i)s] == 0) strides[%(i)s] = strides[%(i)s+1] * dimensions[%(i)s+1]" % dict(i=str(i)))
strides_statements.append(
"if (strides[%(i)s] == 0) strides[%(i)s] = strides[%(i)s+1] * dimensions[%(i)s+1]" % dict(i=str(i)))
#
# PyObject* PyArray_New(PyTypeObject* subtype, int nd, npy_intp* dims, int type_num,
......@@ -605,7 +610,8 @@ class Elemwise(Op):
# the right thing to do .. have to talk to Ian and James
# about it
if bgrads[jdx] is None:
if bgrads[jdx] is None or \
isinstance(bgrads[jdx].type, DisconnectedType):
pass
elif eval_point is not None:
if rop_out is None:
......@@ -617,6 +623,13 @@ class Elemwise(Op):
return rval
def connection_pattern(self, node):
if hasattr(self.scalar_op, 'connection_pattern'):
return self.scalar_op.connection_pattern(node)
return [[True for output in node.outputs] for ipt in node.inputs]
def grad(self, inputs, ograds):
#compute grad with respect to broadcasted input
......@@ -676,10 +689,16 @@ class Elemwise(Op):
theano.config.compute_test_value = prev_setting
if not isinstance(scalar_igrads, (list, tuple)):
raise TypeError('%s.grad returned %s instead of list or tuple' %
(str(self.scalar_op), str(type(scalar_igrads))))
nd = len(inputs[0].type.broadcastable) # this is the same for everyone
def transform(r):
# From a graph of ScalarOps, make a graph of Broadcast ops.
if isinstance(r.type, DisconnectedType):
return r
if r in scalar_inputs:
return inputs[scalar_inputs.index(r)]
if r in scalar_ograds:
......@@ -803,7 +822,7 @@ class Elemwise(Op):
errormsg = ('While computing ' + str(node.outputs) +
': Failed calling ufunc for op ' +
str(self.scalar_op) +
'for params of shape ' +
' for params of shape ' +
str([arg.shape for arg in ufunc_args]))
if config.exception_verbosity == 'high':
......@@ -1324,7 +1343,8 @@ class CAReduce(Op):
alloc += """
for(int i=0;i<%(iname)s->nd;i++){
if(PyArray_DIMS(%(iname)s)[i]==0 && tosum[i]){
PyErr_Format(PyExc_ValueError, "Input of CAReduce{%(scal_name)s} has zero-size on axis %%d",i);
PyErr_Format(PyExc_ValueError,
"Input of CAReduce{%(scal_name)s} has zero-size on axis %%d",i);
%(fail)s;
}
}
......@@ -1585,6 +1605,12 @@ class Sum(CAReduceDtype):
def grad(self, inp, grads):
x, = inp
out = self(*inp)
if out.dtype.find('int') != -1:
return [x.zeros_like().astype(theano.config.floatX)]
gz, = grads
gz = as_tensor_variable(gz)
axis = self.axis
......@@ -1601,7 +1627,7 @@ class Sum(CAReduceDtype):
new_dims.append(i)
i += 1
ds_op = DimShuffle(gz.type.broadcastable, new_dims)
gx = Elemwise(scalar.second)(x, ds_op(gz).astype(x.dtype))
gx = Elemwise(scalar.second)(x, ds_op(gz))
return [gx]
def R_op(self, inputs, eval_points):
......@@ -1646,7 +1672,7 @@ class Prod(CAReduceDtype):
def grad(self, inp, grads):
'''
The grad of this Op could be very easy, it is was not for the case
The grad of this Op could be very easy, if it is was not for the case
where zeros are present in a given "group" (ie. elements reduced
together to form the product).
......@@ -1692,8 +1718,11 @@ class Prod(CAReduceDtype):
'''
prod_in, = inp
gz, = grads
if prod_in.dtype[0:3] in ('int', 'uin'):
return [None]
out = self(*inp)
if out.dtype[0:3] in ('int', 'uin'):
return [prod_in.zeros_like().astype(theano.config.floatX)]
# Prepare the broadcasting that is used everywhere to broadcast
# over the original groups (ie. broadcast over the elements of a given
......
......@@ -5,6 +5,7 @@ import theano
import basic
from theano import gof, scalar
import basic as tensor
from theano.gradient import DisconnectedType
class DiffOp(theano.Op):
......@@ -148,7 +149,13 @@ class BinCountOp(theano.Op):
z[0] = np.bincount(x, weights=weights, minlength=self.minlength)
def grad(self, inputs, outputs_gradients):
return [None for i in inputs]
output = self(*inputs)
if output.dtype.find('int') != -1:
return [inp.zeros_like().astype(theano.config.floatX)
for inp in inputs]
raise NotImplementedError()
def infer_shape(self, node, ins_shapes):
x = node.inputs[0]
......@@ -252,6 +259,10 @@ class RepeatOp(theano.Op):
z = output_storage[0]
z[0] = np.repeat(x, repeats=repeats, axis=self.axis)
def connection_pattern(self, node):
return [[True], [False]]
def grad(self, (x, repeats), (gz, )):
if repeats.ndim == 0:
if self.axis is None:
......@@ -265,7 +276,8 @@ class RepeatOp(theano.Op):
shape = [x.shape[k] for k in range(x.ndim)]
shape.insert(axis, repeats)
return [gz.reshape(shape, x.ndim + 1).sum(axis=axis), None]
return [gz.reshape(shape, x.ndim + 1).sum(axis=axis),
DisconnectedType()()]
elif repeats.ndim == 1:
# For this implementation, we would need to specify the length
# of repeats in order to split gz in the right way to sum
......@@ -387,7 +399,6 @@ def bartlett(M):
return bartlett_(M)
class FillDiagonal(gof.Op):
# See function fill_diagonal for docstring
def __eq__(self, other):
......
......@@ -2,6 +2,8 @@ import theano
from theano.tensor import basic as T
from theano.misc import strutil
import numpy as N
from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
#TODO: speed up by reordering loops. Should pass through the videos once, incrementing all weight gradients, rather
......@@ -9,7 +11,7 @@ import numpy as N
class ConvGrad3D(theano.Op):
""" Gradient of Conv3D with respect to W """
def __eq__(self,other):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
......@@ -27,20 +29,26 @@ class ConvGrad3D(theano.Op):
return theano.Apply(self, inputs=[V_, d_, WShape_, dCdH_], outputs = [ T.TensorType(V_.dtype, (False,False,False,False,False))() ] )
def infer_shape(self, node, input_shapes):
V,d,W_shape, dCdH = node.inputs
V, d, W_shape, dCdH = node.inputs
return [ ( W_shape[0], W_shape[1], W_shape[2], W_shape[3], W_shape[4] ) ]
def grad(self,inputs, output_gradients):
C,d, WShape, B = inputs
dLdA ,= output_gradients
def connection_pattern(self, node):
z = T.zeros_like(C[0,0,0,0,:])
dLdC = convTransp3D( dLdA, z, d, B, C.shape[1:4])
dLdd = None #not differentiable, since d is not continuous
dLdWShape = None #not differentiable, since d is not continuous
dLdB = conv3D( C, dLdA, T.zeros_like(B[0,0,0,0,:]), d)
return [[True], [True], [False], [True]]
return [ dLdC, dLdd, dLdWShape, dLdB ]
def grad(self, inputs, output_gradients):
C, d, WShape, B = inputs
dLdA, = output_gradients
z = T.zeros_like(C[0, 0, 0, 0, :])
dLdC = convTransp3D(dLdA, z, d, B, C.shape[1:4])
# d actually does affect the outputs, so it's not disconnected
dLdd = grad_undefined(self, 1, d)
# The shape of the weights doesn't affect the output elements
dLdWShape = DisconnectedType()()
dLdB = conv3D(C, dLdA, T.zeros_like(B[0, 0, 0, 0, :]), d)
return [dLdC, dLdd, dLdWShape, dLdB]
def perform(self, node, inputs, output_storage):
V, d, WShape, dCdH = inputs
......@@ -64,17 +72,15 @@ class ConvGrad3D(theano.Op):
#print 'computing output of shape '+str(WShape)
for k in xrange(0,WShape[1]):
for l in xrange(0,WShape[2]):
for m in xrange(0,WShape[3]):
for i in xrange(0,batchSize):
for p in xrange(0,outputHeight):
for q in xrange(0,outputWidth):
for r in xrange(0,outputDur):
for j in xrange(0,WShape[0]):
for z in xrange(0,WShape[4]):
for k in xrange(0, WShape[1]):
for l in xrange(0, WShape[2]):
for m in xrange(0, WShape[3]):
for i in xrange(0, batchSize):
for p in xrange(0, outputHeight):
for q in xrange(0, outputWidth):
for r in xrange(0, outputDur):
for j in xrange(0, WShape[0]):
for z in xrange(0, WShape[4]):
dCdW[j,k,l,m,z] += dCdH[i,p,q,r,j] * V[i,dr*p+k,dc*q+l,dt*r+m,z]
output_storage[0][0] = dCdW
......@@ -89,7 +95,7 @@ class ConvGrad3D(theano.Op):
dCdW = outputs[0]
codeSource = """
codeSource = """
///////////// < code generated by ConvGradW3D >
//printf("\t\t\t\tConvGradW3D c code\\n");
......@@ -269,7 +275,7 @@ class ConvGrad3D(theano.Op):
///////////// < /code generated by ConvGradW3D >
"""
return strutil.renderString(codeSource,locals())
return strutil.renderString(codeSource, locals())
convGrad3D = ConvGrad3D()
......
......@@ -2,10 +2,13 @@ import numpy as N
from theano.tensor import basic as T
from theano.misc import strutil
import theano
from theano.gradient import grad_undefined
from theano.gradient import DisconnectedType
class ConvTransp3D(theano.Op):
""" "Transpose" of Conv3D (Conv3D implements multiplication by an implicitly defined matrix W. This implements multiplication by its transpose) """
def __eq__(self,other):
def __eq__(self, other):
return type(self) == type(other)
def __hash__(self):
......@@ -14,7 +17,7 @@ class ConvTransp3D(theano.Op):
def c_code_cache_version(self):
return (3,)
def make_node(self, W, b, d, H, RShape = None):
def make_node(self, W, b, d, H, RShape=None):
"""
:param W: Weights, filter
:param b: bias, shape == (W.shape[0],)
......@@ -28,7 +31,7 @@ class ConvTransp3D(theano.Op):
if RShape:
RShape_ = T.as_tensor_variable(RShape)
else:
RShape_ = T.as_tensor_variable([-1,-1,-1])
RShape_ = T.as_tensor_variable([-1, -1, -1])
return theano.Apply(self, inputs=[W_,b_,d_,H_, RShape_], outputs = [ T.TensorType(H_.dtype, (False,False,False,False,False))() ] )
......@@ -36,22 +39,25 @@ class ConvTransp3D(theano.Op):
flags = ['-Werror']
return flags
def infer_shape(self, node, input_shapes):
W,b,d,H,RShape = node.inputs
W, b, d, H, RShape = node.inputs
W_shape, b_shape, d_shape, H_shape, RShape_shape = input_shapes
return [(H_shape[0], RShape[0], RShape[1], RShape[2], W_shape[4])]
def grad(self,inputs, output_gradients):
W,b,d,H, RShape = inputs
dCdR ,= output_gradients
dCdH = conv3D( dCdR, W, T.zeros_like(H[0,0,0,0,:]), d)
WShape = W.shape
dCdW = convGrad3D(dCdR,d,WShape,H)
dCdb = T.sum(dCdR,axis=(0,1,2,3))
dCdd = None #not differentiable, since d is not continuous
dCdRShape = None #not differentiable, since RShape is not continuous
def connection_pattern(self, node):
return [[True], [True], [True], [True], [False]]
def grad(self, inputs, output_gradients):
W, b, d, H, RShape = inputs
dCdR, = output_gradients
dCdH = conv3D(dCdR, W, T.zeros_like(H[0, 0, 0, 0, :]), d)
WShape = W.shape
dCdW = convGrad3D(dCdR, d, WShape, H)
dCdb = T.sum(dCdR, axis=(0, 1, 2, 3))
# not differentiable, since d affects the output elements
dCdd = grad_undefined(self, 2, d)
# disconnected, since RShape just determines the output shape
dCdRShape = DisconnectedType()()
if 'name' in dir(dCdR) and dCdR.name is not None:
dCdR_name = dCdR.name
......@@ -76,15 +82,14 @@ class ConvTransp3D(theano.Op):
dCdW.name = 'ConvTransp3D_dCdW.H='+H_name+',dCdR='+dCdR_name+',W='+W_name
dCdb.name = 'ConvTransp3D_dCdb.H='+H_name+',dCdR='+dCdR_name+',W='+W_name+',b='+b_name
dCdH.name = 'ConvTransp3D_dCdH.H='+H_name+',dCdR='+dCdR_name
return [ dCdW, dCdb, dCdd, dCdH, dCdRShape ]
dCdH.name = 'ConvTransp3D_dCdH.H=' + H_name + ',dCdR=' + dCdR_name
return [dCdW, dCdb, dCdd, dCdH, dCdRShape]
def perform(self, node, inputs, output_storage):
W, b, d, H, RShape = inputs
# print "\t\t\t\tConvTransp3D python code"
output_storage[0][0] = computeR(W,b,d,H,RShape)
output_storage[0][0] = computeR(W, b, d, H, RShape)
def c_code(self, node, nodename, inputs, outputs, sub):
W, b, d, H, RShape = inputs
......@@ -321,33 +326,35 @@ class ConvTransp3D(theano.Op):
///////////// < /code generated by ConvTransp3D >
"""
return strutil.renderString(codeSource,locals())
return strutil.renderString(codeSource, locals())
convTransp3D = ConvTransp3D()
#If the input size wasn't a multiple of D we may need to cause some automatic padding to get the right size of reconstruction
def computeR(W,b,d,H,Rshape = None):
def computeR(W, b, d, H, Rshape=None):
assert len(W.shape) == 5
assert len(H.shape) == 5
assert len(b.shape) == 1
assert len(d) == 3
outputChannels, filterHeight, filterWidth, filterDur, inputChannels = W.shape
batchSize, outputHeight, outputWidth, outputDur, outputChannelsAgain = H.shape
outputChannels, filterHeight, filterWidth, filterDur, \
inputChannels = W.shape
batchSize, outputHeight, outputWidth, outputDur, \
outputChannelsAgain = H.shape
assert outputChannelsAgain == outputChannels
assert b.shape[0] == inputChannels
dr,dc,dt = d
dr, dc, dt = d
assert dr > 0
assert dc > 0
assert dt > 0
videoHeight = (outputHeight-1) * dr + filterHeight
videoWidth = (outputWidth-1) * dc + filterWidth
videoDur = (outputDur-1) * dt + filterDur
videoHeight = (outputHeight - 1) * dr + filterHeight
videoWidth = (outputWidth - 1) * dc + filterWidth
videoDur = (outputDur - 1) * dt + filterDur
if Rshape is not None and Rshape[0] != -1:
if Rshape[0] < videoHeight:
......@@ -364,24 +371,27 @@ def computeR(W,b,d,H,Rshape = None):
#print "video size: "+str((videoHeight, videoWidth, videoDur))
R = N.zeros( (batchSize, videoHeight,
videoWidth, videoDur, inputChannels ) , dtype=H.dtype)
R = N.zeros((batchSize, videoHeight,
videoWidth, videoDur, inputChannels), dtype=H.dtype)
#R[i,j,r,c,t] = b_j + sum_{rc,rk | d \circ rc + rk = r} sum_{cc,ck | ...} sum_{tc,tk | ...} sum_k W[k, j, rk, ck, tk] * H[i,k,rc,cc,tc]
for i in xrange(0,batchSize):
for i in xrange(0, batchSize):
#print '\texample '+str(i+1)+'/'+str(batchSize)
for j in xrange(0,inputChannels):
for j in xrange(0, inputChannels):
#print '\t\tfeature map '+str(j+1)+'/'+str(inputChannels)
for r in xrange(0,videoHeight):
for r in xrange(0, videoHeight):
#print '\t\t\trow '+str(r+1)+'/'+str(videoHeight)
for c in xrange(0,videoWidth):
for t in xrange(0,videoDur):
R[i,r,c,t,j] = b[j]
for c in xrange(0, videoWidth):
for t in xrange(0, videoDur):
R[i, r, c, t, j] = b[j]
ftc = max([0, int(N.ceil(float(t-filterDur +1 )/float(dt))) ])
fcc = max([0, int(N.ceil(float(c-filterWidth +1)/float(dc))) ])
ftc = max([0, int(N.ceil(
float(t - filterDur + 1) / float(dt)))])
fcc = max([0, int(N.ceil(
float(c - filterWidth + 1) / float(dc)))])
rc = max([0, int(N.ceil(float(r-filterHeight+1)/float(dr))) ])
rc = max([0, int(N.ceil(
float(r - filterHeight + 1) / float(dr)))])
while rc < outputHeight:
rk = r - rc * dr
if rk < 0:
......@@ -399,20 +409,21 @@ def computeR(W,b,d,H,Rshape = None):
if tk < 0:
break
R[i,r,c,t,j] += N.dot(W[:,rk,ck,tk,j], H[i,rc,cc,tc,:] )
R[
i,r,c,t,j] += N.dot(W[:,rk,ck,tk,j], H[i,rc,cc,tc,:] )
tc += 1
"" #close loop over tc
"" # close loop over tc
cc += 1
"" #close loop over cc
"" # close loop over cc
rc += 1
"" #close loop over rc
"" #close loop over t
"" #close loop over c
"" #close loop over r
"" #close loop over j
"" #close loop over i
"" # close loop over rc
"" # close loop over t
"" # close loop over c
"" # close loop over r
"" # close loop over j
"" # close loop over i
return R
......
......@@ -15,6 +15,7 @@ from theano.gof import Apply
from theano.tensor.nnet.sigm import sigmoid, softplus
from theano.gradient import DisconnectedType
from theano.gradient import grad_not_implemented
############
......@@ -79,7 +80,7 @@ class SoftmaxWithBias(gof.Op):
g_sm, = grads
if isinstance(g_sm.type, DisconnectedType):
return [ DisconnectedType()(), DisconnectedType()() ]
return [DisconnectedType()(), DisconnectedType()()]
sm = softmax_with_bias(x, b)
dx = softmax_grad(g_sm, sm)
......@@ -560,8 +561,8 @@ if 0:
axis = ds_input.owner.op.axis
sum_input = ds_input.owner.inputs[0]
if ((ds_order!=(0,'x')) or
(axis!=(1,)) or
if ((ds_order != (0, 'x')) or
(axis != (1,)) or
(sum_input is not prod_term)):
rest.append(add_in)
#print 'ds_order =', ds_order
......@@ -712,16 +713,20 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
am_shp = idx_shp
return [nll_shp, sm_shp, am_shp]
def connection_pattern(self, node):
return [[True, True, True], # x
[True, True, True], # b
[False, False, True]] # y_idx
def grad(self, inp, grads):
x, b, y_idx = inp
g_nll, g_sm, g_am = grads
dx_terms = []
db_terms = []
d_idx_terms = []
if not isinstance(g_nll.type, DisconnectedType):
nll, sm = crossentropy_softmax_1hot_with_bias(x, b, y_idx)
dx = crossentropy_softmax_1hot_with_bias_dx(g_nll, sm, y_idx)
......@@ -739,7 +744,7 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
db_terms.append(b.zeros_like())
d_idx_terms.append(y_idx.zeros_like())
def fancy_sum( terms ):
def fancy_sum(terms):
if len(terms) == 0:
return DisconnectedType()()
rval = terms[0]
......@@ -747,8 +752,8 @@ class CrossentropySoftmaxArgmax1HotWithBias(gof.Op):
rval = rval + term
return rval
return [ fancy_sum(terms) for terms in
[dx_terms, db_terms, d_idx_terms ] ]
return [fancy_sum(terms) for terms in
[dx_terms, db_terms, d_idx_terms]]
def c_headers(self):
return ['<iostream>', '<cmath>']
......@@ -897,7 +902,7 @@ class CrossentropySoftmax1HotWithBiasDx (gof.Op):
sm, tensor.fill(dy, -1), y_idx_range, y_idx),
axis=1)
g_sm = dy.dimshuffle(0, 'x') * g_dx
g_y_idx = None
g_y_idx = grad_not_implemented(self, 2, y_idx)
return [g_dy, g_sm, g_y_idx]
def c_code_cache_version(self):
......@@ -1136,7 +1141,7 @@ class CrossentropyCategorical1Hot(gof.Op):
coding, one_of_n = inp
g_y, = grads
return [crossentropy_categorical_1hot_grad(g_y, coding, one_of_n),
None]
grad_not_implemented(self, 1, one_of_n)]
crossentropy_categorical_1hot = CrossentropyCategorical1Hot()
......@@ -1325,7 +1330,6 @@ def local_advanced_indexing_crossentropy_onehot(node):
except Exception:
pass
if sm is not None and sm.owner and sm.owner.op in (softmax,
softmax_with_bias):
sm_w_bias = local_softmax_with_bias.transform(sm.owner)
......@@ -1481,7 +1485,8 @@ def local_advanced_indexing_crossentropy_onehot_grad(node):
if adv_subtensor is not None:
try:
maybe_sm, maybe_rows, maybe_labels = adv_subtensor.owner.inputs
maybe_sm, maybe_rows, \
maybe_labels = adv_subtensor.owner.inputs
except Exception:
return
......@@ -1691,7 +1696,6 @@ class Prepend_scalar_constant_to_each_row(gof.Op):
shp = (in_shapes[0][0], in_shapes[0][1] + 1)
return [shp]
def grad(self, inp, grads):
mat, = inp
goutput, = grads
......@@ -1758,18 +1762,19 @@ prepend_1_to_each_row = Prepend_scalar_constant_to_each_row(1.)
#numerically stabilize log softmax (X)
# as X-X.max(axis=1).dimshuffle(0,'x') - log(exp(X-X.max(axis=1).dimshuffle(0,'x')).sum(axis=1)).dimshuffle(0,'x)
def make_out_pattern(X):
stabilized_X = X - X.max(axis=1).dimshuffle(0,'x')
out_var = stabilized_X - tensor.log(tensor.exp(stabilized_X).sum(axis=1)).dimshuffle(0,'x')
stabilized_X = X - X.max(axis=1).dimshuffle(0, 'x')
out_var = stabilized_X - tensor.log(tensor.exp(stabilized_X).sum(
axis=1)).dimshuffle(0, 'x')
#tell DEBUG_MODE that it's OK if the original graph produced NaN and the optimized graph does not
out_var.values_eq_approx = out_var.type.values_eq_approx_remove_nan
return out_var
local_log_softmax = gof.PatternSub( in_pattern = (tensor.log, (softmax, 'x')),
out_pattern = (make_out_pattern, 'x'),
local_log_softmax = gof.PatternSub(in_pattern=(tensor.log, (softmax, 'x')),
out_pattern=(make_out_pattern, 'x'),
allow_multiple_clients=True)
#don't do register_stabilize, this is to make local_log_softmax run
#only after another more specific optimization that stabilizes cross entropy
#opt.register_stabilize(local_log_softmax, name = 'local_log_softmax')
opt.register_specialize(local_log_softmax, name = 'local_log_softmax')
opt.register_specialize(local_log_softmax, name='local_log_softmax')
......@@ -30,13 +30,20 @@ class ScalarSigmoid(scalar.UnaryScalarOp):
if x > 30.0:
return 1.0
return 1.0 / (1.0 + numpy.exp(-x))
def impl(self, x):
return ScalarSigmoid.st_impl(x)
def grad(self, inp, grads):
x, = inp
gz, = grads
y = scalar_sigmoid(x)
return [gz * y * (1.0 - y)]
rval = gz * y * (1.0 - y)
assert rval.type.dtype.find('float') != -1
return [rval]
def c_code(self, node, name, inp, out, sub):
x, = inp
z, = out
......@@ -50,6 +57,7 @@ class ScalarSigmoid(scalar.UnaryScalarOp):
return """%(z)s = %(x)s < -709.0 ? 0.0 : %(x)s > 19.0 ? 1.0 : 1.0 /(1.0+exp(-%(x)s));""" % locals()
else:
raise NotImplementedError('only floatingpoint is implemented')
def c_code_cache_version(self):
v = super(ScalarSigmoid, self).c_code_cache_version()
if v:
......@@ -61,7 +69,7 @@ sigmoid = elemwise.Elemwise(scalar_sigmoid, name='sigmoid')
sigmoid_inplace = elemwise.Elemwise(
ScalarSigmoid(scalar.transfer_type(0)),
inplace_pattern={0:0},
inplace_pattern={0: 0},
name='sigmoid_inplace',
)
......@@ -76,12 +84,15 @@ class ScalarSoftplus(scalar.UnaryScalarOp):
if x > 30.0:
return x
return numpy.log1p(numpy.exp(x))
def impl(self, x):
return ScalarSoftplus.static_impl(x)
def grad(self, inp, grads):
x, = inp
gz, = grads
return [gz * scalar_sigmoid(x)]
def c_code(self, node, name, inp, out, sub):
x, = inp
z, = out
......@@ -95,27 +106,29 @@ class ScalarSoftplus(scalar.UnaryScalarOp):
return """%(z)s = %(x)s < -745.0 ? 0.0 : %(x)s > 16.0 ? %(x)s : log1p(exp(%(x)s));""" % locals()
else:
raise NotImplementedError('only floatingpoint is implemented')
def c_code_cache_version(self):
v = super(ScalarSoftplus, self).c_code_cache_version()
if v:
return (2,) + v
else:
return v
scalar_softplus = ScalarSoftplus(scalar.upgrade_to_float, name='scalar_softplus')
scalar_softplus = ScalarSoftplus(scalar.upgrade_to_float, name= 'scalar_softplus')
softplus = elemwise.Elemwise(scalar_softplus, name='softplus')
pprint.assign(softplus, printing.FunctionPrinter('softplus'))
def _skip_mul_1(r):
if r.owner and r.owner.op == tensor.mul:
not_is_1 = [i for i in r.owner.inputs if not _is_1(i) ]
if len(not_is_1)==1:
not_is_1 = [i for i in r.owner.inputs if not _is_1(i)]
if len(not_is_1) == 1:
return not_is_1[0]
logsigm_to_softplus = gof.PatternSub(
(tensor.log, (sigmoid, 'x')),
(tensor.neg, (softplus, (tensor.neg, 'x'))),
allow_multiple_clients = True,
allow_multiple_clients=True,
skip_identities_fn=_skip_mul_1)
......@@ -131,21 +144,22 @@ def _is_1(expr):
log1msigm_to_softplus = gof.PatternSub(
(tensor.log,
(tensor.sub,
dict(pattern='y', constraint = _is_1),
dict(pattern='y', constraint=_is_1),
(sigmoid, 'x'))),
(tensor.neg, (softplus, 'x')),
allow_multiple_clients = True,
allow_multiple_clients=True,
skip_identities_fn=_skip_mul_1)
log1pexp_to_softplus = gof.PatternSub(
(tensor.log1p,
(tensor.exp, 'x')),
(softplus, 'x'),
allow_multiple_clients = True)
allow_multiple_clients=True)
opt.register_stabilize(logsigm_to_softplus, name='logsigm_to_softplus')
opt.register_stabilize(log1msigm_to_softplus, name='log1msigm_to_softplus')
opt.register_stabilize(log1pexp_to_softplus, name='log1pexp_to_softplus')
opt.register_stabilize(logsigm_to_softplus, name = 'logsigm_to_softplus')
opt.register_stabilize(log1msigm_to_softplus, name = 'log1msigm_to_softplus')
opt.register_stabilize(log1pexp_to_softplus, name = 'log1pexp_to_softplus')
def is_1pexp(t):
"""
......@@ -239,7 +253,7 @@ def partition_num_or_denom(r, f):
else:
neg_t, f_t = f_t
f_terms.append(f_t)
neg ^= neg_t #bit flip if neg_t is true
neg ^= neg_t # bit flip if neg_t is true
return f_terms, rest, neg
......@@ -291,7 +305,8 @@ def local_exp_over_1_plus_exp(node):
#find all the exp() terms in the numerator
num, denom = node.inputs
num_exp_x, num_rest, num_neg = partition_num_or_denom(num, is_exp)
denom_1pexp, denom_rest, denom_neg = partition_num_or_denom(denom, is_1pexp)
denom_1pexp, denom_rest, \
denom_neg = partition_num_or_denom(denom, is_1pexp)
sigmoids = []
for t in denom_1pexp:
......@@ -303,7 +318,7 @@ def local_exp_over_1_plus_exp(node):
# case: 1/(1+exp(x))
sigmoids.append(sigmoid(-t))
if not sigmoids: # we didn't find any. abort
if not sigmoids: # we didn't find any. abort
return
# put the new numerator together
new_num = sigmoids + [tensor.exp(t) for t in num_exp_x] + num_rest
......@@ -322,6 +337,7 @@ def local_exp_over_1_plus_exp(node):
else:
return [new_num / tensor.mul(*denom_rest)]
def parse_mul_tree(root):
"""
Parse a tree of multiplications starting at the given root.
......@@ -504,7 +520,7 @@ def perform_sigm_times_exp(tree, exp_x=None, exp_minus_x=None, sigm_x=None,
sigm_minus_x = []
if full_tree is None:
full_tree = tree
if False: # Debug code.
if False: # Debug code.
print '<perform_sigm_times_exp>'
print ' full_tree = %s' % full_tree
print ' tree = %s' % tree
......@@ -613,10 +629,13 @@ def local_inv_1_plus_exp(node):
if nonconsts[0].owner and nonconsts[0].owner.op == tensor.exp:
if scalars and numpy.allclose(numpy.sum(scalars), 1):
return opt._fill_chain(
sigmoid(tensor.neg(nonconsts[0].owner.inputs[0])),
sigmoid(
tensor.neg(nonconsts[0].owner.inputs[0])),
scalar_inputs)
# Registration is below, and conditional.
@gof.local_optimizer([tensor.sub])
def local_1msigmoid(node):
"""
......@@ -625,7 +644,7 @@ def local_1msigmoid(node):
if node.op == tensor.sub:
sub_l, sub_r = node.inputs
if len(sub_r.clients) > 1:
return # graph is using both sigm and 1-sigm
return # graph is using both sigm and 1-sigm
if sub_r.owner and sub_r.owner.op == sigmoid:
try:
val_l = opt.get_constant_value(sub_l)
......@@ -678,13 +697,14 @@ if 0:
assert t0.owner.op == div
t0top, t0bot = t0.owner.inputs
t1top, t1bot = t1.owner.inputs
rval.append(div(mul(*(t0top+t1top)), mul(*(t0bot+t1bot))))
rval.append(div(mul(*(
t0top + t1top)), mul(*(t0bot + t1bot))))
if len(rval) > 100:
# This loop can be exponentially long.
# aborting
return []
elif len(node.outputs)>1:
elif len(node.outputs) > 1:
return []
else:
return [node.outputs[0]]
......@@ -542,15 +542,12 @@ class MakeVector(T.Op):
def grad(self, inputs, output_gradients):
# If the output is of an integer dtype, no gradient shall pass
if 'int' in self.dtype:
return [None] * len(inputs)
return [ipt.zeros_like().astype(theano.config.floatX)
for ipt in inputs]
grads = []
for i, inp in enumerate(inputs):
if 'int' in inp.dtype:
# No gradient wrt integer inputs
grads.append(None)
else:
grads.append(output_gradients[0][i])
grads.append(output_gradients[0][i])
return grads
def R_op(self, inputs, eval_points):
......@@ -1914,6 +1911,8 @@ def local_subtensor_of_alloc(node):
nw_val = val[tuple(val_slices)]
nw_dims += dims[len(slices):]
if nw_val.ndim > len(nw_dims):
return False
rval = T.alloc(nw_val, *nw_dims)
if type(rval) not in (list, tuple):
rval = [rval]
......
......@@ -136,7 +136,7 @@ class RandomStreams(Component, raw_random.RandomStreamsBase):
"""
def __init__(self, seed=None, no_warn = False):
def __init__(self, seed=None, no_warn=False):
""":type seed: None or int
:param seed: a default seed to initialize the RandomState
......@@ -146,7 +146,7 @@ class RandomStreams(Component, raw_random.RandomStreamsBase):
"""
if not no_warn:
deprecation_warning()
super(RandomStreams, self).__init__(no_warn = True)
super(RandomStreams, self).__init__(no_warn=True)
self.random_state_variables = []
self.default_instance_seed = seed
......@@ -164,7 +164,6 @@ class RandomStreams(Component, raw_random.RandomStreamsBase):
def build(self, mode, memo):
"""override `Component.build` """
if self not in memo:
print 'creating RandomStreamsInstance'
memo[self] = RandomStreamsInstance(self, memo,
self.default_instance_seed)
return memo[self]
......
This source diff could not be displayed because it is too large. You can view the blob instead.
差异被折叠。
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论