提交 f09168ed authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Merged

......@@ -3,37 +3,72 @@ Modifications in the trunk since the last release
Partial of what is in trunk since the last release
--------------------------------------------------
Deprecation:
* tag.shape attribute deprecated (#633)
* FAST_RUN_NOGC mode deprecated
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
Bugs fixed:
* Bugfix in CudaNdarray.__iadd__. When it is not implemented, return the error.
* Typo fixed in tensor/opt.py
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Fixed behaviour of pydotprint's max_label_size option
Crash fixed:
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution crash.
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
crash.
Optimization:
* Optimize 4 pattern of subtensor followed by subtensor.
* Gemm inplace optimization on the GPU re-enabled
GPU:
* Move to the gpu fused elemwise that have other dtype then float32 in them(except float64) if the input and output are float32.
* This allow to move elemwise comparaison to the gpu if we cast it to float32 after that.
* Move to the gpu fused elemwise that have other dtype then float32 in them
(except float64) if the input and output are float32.
* This allow to move elemwise comparisons to the GPU if we cast it to
float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
New features:
* ProfileMode
* profile the scan overhead
* simple hook system to add profiler
* reordered the output to be in the order of more general to more specific
* var[vector of index] now work, (grad work recursivly, the direct grad work inplace, gpu work)
* var[vector of index] now work, (grad work recursively, the direct grad
work inplace, gpu work)
* limitation: work only of the outer most dimensions.
* test_value implementation to allow quick debugging at graph creation time
* cuda.root inferred if nvcc is on the path, otherwise defaults to
/usr/local/cuda
* Better graph printing for graphs involving a scan subgraph
*
Documentation:
* Better commenting of cuda_ndarray.cu
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
Unit tests:
* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
(#374)
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
Other:
* ?? a bug?? Correctly put the broadcast flag to True in the output var of a Rehapse op when we receive an int 1 in the new shape.
* ?? a bug?? Correctly put the broadcast flag to True in the output var of
a Rehapse op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default
* More compact printing (ignore leading "Composite" in op names)
Theano 0.3.1 (2011-02-21)
----------------------------
......
.. _developer:
======================
==============================================
Theano Design and Implementation Documentation
======================
==============================================
.. toctree::
......
......@@ -7,7 +7,7 @@ Tensor
This file describes the design of theano.tensor.
Elemwise grad and R_op
=================
======================
Here's another straightforward example, though a bit more elaborate
than adding two numbers together. Let's say that you want to compute
......
......@@ -557,7 +557,7 @@ used within a MinGW Shell (not available if you only installed Python(x,y)).
You do not need to do the following now, because it is not usually needed, but if
later on, when running Theano, you see an error message that looks like:
*error: 'assert' was not declared in this scope*
*error: 'assert' was not declared in this scope*
then you will have to add another section:
.. code-block:: cfg
......
......@@ -728,11 +728,11 @@ row of a matrix x:
Index-assignment is *not* supported. If you want to do something like ``a[5]
= b`` or ``a[5]+=b``, see :func:`setsubtensor` and :func:`incsubtensor` below.
= b`` or ``a[5]+=b``, see :func:`set_subtensor` and :func:`inc_subtensor` below.
.. autofunction:: theano.tensor.basic.setsubtensor
.. autofunction:: theano.tensor.basic.set_subtensor
.. autofunction:: theano.tensor.basic.incsubtensor
.. autofunction:: theano.tensor.basic.inc_subtensor
.. _tensor_operator_support:
......
......@@ -112,7 +112,7 @@ Misc
----
The sparse equivalent of dmatrix is csc_matrix and csr_matrix.
:api:`TrueDot` vs. :api:`StructuredDot`
:api:`Dot` vs. :api:`StructuredDot`
----------------------------------------
Often when you use a sparse matrix it is because there is a meaning to the
......
......@@ -1207,54 +1207,21 @@ class OpWiseCLinker(link.LocalLinker):
else:
post_thunk_old_storage = None
compute_map = {}
for k in storage_map:
compute_map[k] = [k.owner is None]
thunks = []
for node in order:
# Maker sure we use the C version of the code whenever
# possible
node._op_use_c_code = True
thunks += [node.op.make_thunk(node,
storage_map,
compute_map,
no_recycling)]
for node_idx, node in enumerate(order):
node_input_storage = [storage_map[r] for r in node.inputs]
node_output_storage = [storage_map[r] for r in node.outputs]
debug('Compiling node %i of graph' % node_idx)
thunk = None
# If the op don't override the c_code function, we don't try
# to generate a cthunk! Otherwise we won't find it in the compilation cache
# and try to compile it. This will get the lock even if we don't need it!
if node.op.c_code.im_func is not op.Op.c_code.im_func:
try:
e = Env(*graph.clone(node.inputs, node.outputs))
if self.allow_gc:
# if we allow garbage collection of intermediate nodes
# we must forbid this C implementatio from cacheing its own
# reference to its output
node_no_recycling = e.outputs
else:
node_no_recycling = [r for r, r2 in zip(e.outputs, node.outputs) if r2 in no_recycling]
cl = CLinker().accept(e, node_no_recycling)
debug('Trying CLinker.make_thunk')
thunk, node_input_filters, node_output_filters = cl.make_thunk(
input_storage = node_input_storage,
output_storage = node_output_storage,
keep_lock=getattr(get_lock,"n_lock",0) != orig_n_lock)
assert callable(thunk)
thunk.inputs = node_input_storage
thunk.outputs = node_output_storage
thunks.append(thunk)
do_python_thunk = False
except (NotImplementedError, utils.MethodNotDefined):
thunk = None
if thunk is None:
if self.fallback_on_perform:
debug('Falling back on perform')
p = node.op.perform
# default arguments are stored in the closure of `thunk`
def thunk(p=p, i=node_input_storage, o=node_output_storage,n=node):
return p(n, [x[0] for x in i], o)
#thunk = lambda p = p, i = node_input_storage, o = node_output_storage, n = node: p(n, [x[0] for x in i], o)
thunk.inputs = node_input_storage
thunk.outputs = node_output_storage
thunk.perform = p
thunks.append(thunk)
else:
raise NotImplementedError("We where not able to use c_code and perform code for this node", node)
if self.allow_gc:
post_thunk_old_storage.append([storage_map[input]
......
......@@ -6,7 +6,6 @@ from type import Type
import sys, traceback
from copy import copy
from theano.gof.python25 import all
import numpy
__excepthook = sys.excepthook
def thunk_hook(type, value, trace):
......@@ -329,7 +328,7 @@ class LocalLinker(Linker):
# 3. output storage
# 4. thunks: list of nodes' functions in the order they will be run by the function in (1)
# 5. order: list of nodes, in the order they will be run by the function in (1)
raise MethodNotDefined("make_all", type(self), self.__class__.__name__)
raise utils.MethodNotDefined("make_all", type(self), self.__class__.__name__)
def gc_helper(node_list):
"""
......@@ -391,10 +390,23 @@ class PerformLinker(LocalLinker):
order = list(env.toposort())
no_recycling = self.no_recycling
thunks = []
input_storage, output_storage, storage_map = map_storage(env, order, input_storage, output_storage)
compute_map = {}
for k in storage_map:
compute_map[k] = [k.owner is None]
thunks = []
for node in order:
# Maker sure we don't use C version of the code, but rather only
# the python version
node._op_use_c_code = False
thunks += [node.op.make_thunk(node,
storage_map,
compute_map,
no_recycling)]
computed, last_user = gc_helper(order)
if self.allow_gc:
post_thunk_old_storage = []
......@@ -402,18 +414,6 @@ class PerformLinker(LocalLinker):
post_thunk_old_storage = None
for node in order:
node_input_storage = tuple(storage_map[input] for input in node.inputs)
node_output_storage = tuple(storage_map[output] for output in node.outputs)
p = node.op.perform
# Thunk is meant to be called without arguments.
# The arguments are given in the lambda expression so that they are saved in the lambda expression.
# Using the closure in a simple way didn't work.
thunk = lambda p = p, i = node_input_storage, o = node_output_storage, n = node: p(n, [x[0] for x in i], o)
thunk.inputs = node_input_storage
thunk.outputs = node_output_storage
thunk.perform = p
thunks.append(thunk)
if self.allow_gc:
post_thunk_old_storage.append([storage_map[input]
for input in node.inputs
......
......@@ -2,8 +2,6 @@
The `Op` class is the base interface for all operations
compatible with `gof`'s :doc:`graph` routines.
"""
__docformat__ = "restructuredtext en"
......@@ -12,6 +10,11 @@ from theano import config
import graph
import numpy
import utils
import logging
from theano import config
from env import Env
import graph
import cc
class CLinkerObject(object):
......@@ -331,21 +334,22 @@ class PureOp(object):
# build test input-values
input_vals = []
for ins in node.inputs:
for i, ins in enumerate(node.inputs):
if isinstance(ins, graph.Constant):
input_vals.append(ins.value)
elif isinstance(ins,SharedVariable):
input_vals.append(ins.get_value(borrow=True))
input_vals.append(ins.get_value(borrow=True, return_internal_type=True))
elif isinstance(ins,graph.Variable) and hasattr(ins.tag, 'test_value'):
# ensure that the test value is correct
input_vals.append(ins.type.filter(ins.tag.test_value))
else:
# no test-value was specified, act accordingly
if config.compute_test_value == 'warn':
raise Warning('Cannot compute test value: input %s of Op %s missing default value')
# TODO: use warnings.warn, http://docs.python.org/library/warnings.html#warnings.warn
print >>sys.stderr, ('Warning, Cannot compute test value: input %i (%s) of Op %s missing default value' % (i, ins, node))
run_perform = False
elif config.compute_test_value == 'err':
raise ValueError('Cannot compute test value: input %s of Op %s missing default value')
raise ValueError('Cannot compute test value: input %i (%s) of Op %s missing default value' % (i, ins, node))
else:
# silently skip test
run_perform = False
......@@ -355,12 +359,23 @@ class PureOp(object):
# compute output value once with test inputs to validate graph
output_storage = [[None]] * len(node.outputs)
node.op.perform(node, input_vals, output_storage)
# add 'test_value' to output tags, so that downstream ops can use these
# numerical values as inputs to their perform method.
for (outval, node_output) in zip(output_storage, node.outputs):
node_output.tag.test_value = outval[0]
try:
node.op.perform(node, input_vals, output_storage)
# add 'test_value' to output tags, so that downstream ops can use these
# numerical values as inputs to their perform method.
for (outval, node_output) in zip(output_storage, node.outputs):
node_output.tag.test_value = outval[0]
except utils.MethodNotDefined, e:
# This case happens when the perform method is not defined
# for a certain Op.
#TODO: use the c_thunk?
if config.compute_test_value == 'warn':
# TODO: use warnings.warn
print >>sys.stderr, 'Warning, in compute_test_value:', type(e)
print >>sys.stderr, e
elif config.compute_test_value == 'err':
raise
if self.default_output is not None:
return node.outputs[self.default_output]
......@@ -404,4 +419,77 @@ class PureOp(object):
class Op(utils.object2, PureOp, CLinkerOp):
"""Convenience class to bundle `PureOp` and `CLinkerOp`"""
pass
def __new__(cls, *args, **kwargs):
# this function exists to silently and transparently ensure that all
# existing Ops get a _op_use_c_code attribute
obj = object.__new__(cls, *args, **kwargs)
if not hasattr(obj, '_op_use_c_code'):
obj._op_use_c_code = True
return obj
def __init__(self, use_c_code=True):
self._op_use_c_code = use_c_code
def make_thunk(self, node, storage_map, compute_map, no_recycling):
"""
:param node: something previously returned by self.make_node
:param storage_map: dict variable -> one-element-list where a computed
value for this variable may be found.
:param compute_map: dict variable -> one-element-list where a boolean
value will be found. The boolean indicates whether the
variable's storage_map container contains a valid value (True)
or if it has not been computed yet (False).
:param no_recycling: list of variables for which it is forbidden to
reuse memory allocated by a previous call.
"""
logger = logging.getLogger('theano.Op')
node_input_storage = [storage_map[r] for r in node.inputs]
node_output_storage = [storage_map[r] for r in node.outputs]
node_input_compute = [compute_map[r] for r in node.inputs]
node_output_compute = [compute_map[r] for r in node.outputs]
#logger.debug('Compiling node %i of graph' % node_idx)
if self._op_use_c_code:
try:
e = Env(*graph.clone(node.inputs, node.outputs))
e_no_recycling = [new_o
for (new_o, old_o) in zip(e.outputs, node.outputs)
if old_o in no_recycling]
cl = cc.CLinker().accept(e,
no_recycling=e_no_recycling)
logger.debug('Trying CLinker.make_thunk')
fill_storage, node_input_filters, node_output_filters = cl.make_thunk(
input_storage = node_input_storage,
output_storage = node_output_storage)
def rval():
fill_storage()
for o in node.outputs:
compute_map[o][0] = True
rval.cthunk = fill_storage.cthunk
rval.inputs = node_input_storage
rval.outputs = node_output_storage
rval.lazy = False
return rval
except (NotImplementedError, utils.MethodNotDefined):
logger.debug('Falling back on perform')
# condition: either there was no c_code, or it failed
p = node.op.perform
# default arguments are stored in the closure of `rval`
def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
r = p(n, [x[0] for x in i], o)
for o in node.outputs:
compute_map[o][0] = True
return r
rval.inputs = node_input_storage
rval.outputs = node_output_storage
rval.perform = p
rval.lazy = False
return rval
from ops import (cholesky, matrix_inverse, solve,
diag, extract_diag, alloc_diag,
det, PSD_hint,
trace, spectral_radius_bound)
差异被折叠。
import numpy
from theano import tensor, function
from .ops import *
if 0:
def test_cholesky():
#TODO: test upper and lower triangular
#todo: unittest randomseed
rng = numpy.random.RandomState(1234)
r = rng.randn(5,5)
pd = numpy.dot(r,r.T)
x = tensor.matrix()
chol = Cholesky()(x)
f = function([x], tensor.dot(chol, chol.T)) # an optimization could remove this
ch_f = function([x], chol)
# quick check that chol is upper-triangular
ch = ch_f(pd)
print ch
assert ch[0,4] != 0
assert ch[4,0] == 0
assert numpy.allclose(numpy.dot(ch.T,ch),pd)
assert not numpy.allclose(numpy.dot(ch,ch.T),pd)
def test_inverse_correctness():
#todo: unittest randomseed
rng = numpy.random.RandomState(12345)
r = rng.randn(4,4)
x = tensor.matrix()
xi = matrix_inverse(x)
ri = function([x], xi)(r)
assert ri.shape == r.shape
assert ri.dtype == r.dtype
rir = numpy.dot(ri,r)
rri = numpy.dot(r,ri)
assert numpy.allclose(numpy.identity(4), rir), rir
assert numpy.allclose(numpy.identity(4), rri), rri
def test_inverse_grad():
rng = numpy.random.RandomState(1234)
r = rng.randn(4,4)
tensor.verify_grad(matrix_inverse, [r], rng=numpy.random)
def test_det_grad():
rng = numpy.random.RandomState(1234)
r = rng.randn(5,5)
tensor.verify_grad(det, [r], rng=numpy.random)
......@@ -4792,7 +4792,7 @@ outer = Outer()
#########################
def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False,
assume_continuously_differentiable = False):
disconnected_inputs='raise'):
"""
:type cost: Scalar (0-dimensional) `Variable`
:type wrt: `Variable` or list of `Variable`s.
......@@ -4804,13 +4804,13 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False,
:param warn_type: a value of True will cause warnings to be logged for any Op that emits a
gradient that does not match its input type.
:param assume_continuously_differentiable : flag that says if grad is strict about what it returns.
If set to false it will raise an exception for any argument in
``wrt`` for which there is no gradient either because some op does
not know how to compute the gradient with respect to that argument
or the argument is not part of the computational graph. If the flag
is set to true, the ``grad`` method returns zeros like the argument
( i.e. it makes the assumption that the gradient should be 0).
:type disconnected_inputs: string
:param disconnected_inputs: Defines the behaviour if some of the variables
in ``wrt`` are not part of the computational graph computing ``cost``
(or if all links are non-differentiable). The possible values are:
- 'ignore': considers that the gradient on these parameters is zero.
- 'warn': consider the gradient zero, and print a warning.
- 'raise': raise an exception.
:rtype: `Variable` or list of `Variable`s (depending upon `wrt`)
......@@ -4853,13 +4853,24 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False,
wrt = [wrt]
ret = []
for p in wrt:
if p not in gmap and not assume_continuously_differentiable:
raise ValueError(("grad method was asked to compute the gradient "
"with respect to a variable that is not part of "
"the computational graph of the cost, or is used "
"by a non-differentiable operator"), p)
if p in gmap:
ret.append(gmap[p])
else:
ret.append(gmap.get(p, zeros_like(p)))
message = ("grad method was asked to compute the gradient "
"with respect to a variable that is not part of "
"the computational graph of the cost, or is used "
"only by a non-differentiable operator: %s" % p)
if disconnected_inputs == 'ignore':
pass
elif disconnected_inputs == 'warn':
warnings.warn(message, stacklevel=1)
elif disconnected_inputs == 'raise':
raise ValueError(message)
else:
raise ValueError("Invalid value for keyword "
"'disconnected_inputs', valid values are "
"'ignore', 'warn' and 'raise'.")
ret.append(zeros_like(p))
if len(ret) == 1:
return ret[0]
......@@ -5134,7 +5145,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None, rel_tol=No
g_cost = cast(g_cost, o_output.dtype)
symbolic_grad = grad(cost, tensor_pt, g_cost,
assume_continuously_differentiable = True)
disconnected_inputs='ignore')
#if o_output.dtype in ['float32','float64']:
# assert all([x.dtype == o_output.dtype for x in symbolic_grad]),("Expected grad of type %s, got %s "%( symbolic_grad.dtype, o_output.dtyp))
......
......@@ -3346,7 +3346,7 @@ class test_grad(unittest.TestCase):
o = test_grad.O()
a1 = o.make_node()
g = grad(a1.outputs[0], a1.outputs[1],
assume_continuously_differentiable = True)
disconnected_inputs='ignore')
self.assertTrue(g.owner.op == fill)
self.assertTrue(g.owner.inputs[1].data == 0)
self.assertRaises(ValueError, grad, a1.outputs[0], 'wtf')
......@@ -3356,7 +3356,7 @@ class test_grad(unittest.TestCase):
o = test_grad.O()
a1 = o.make_node()
g0,g1,g2 = grad(a1.outputs[0], a1.inputs + [scalar('z')],
assume_continuously_differentiable = True)
disconnected_inputs='ignore')
self.assertTrue(o.gval0 is g0)
self.assertTrue(o.gval1 is g1)
self.assertTrue(g2.owner.op == fill)
......@@ -3366,7 +3366,7 @@ class test_grad(unittest.TestCase):
"""Ensure that a zero gradient has the proper shape."""
x = dmatrix()
f = theano.function([x], grad(dscalar(), x,
assume_continuously_differentiable= True))
disconnected_inputs='ignore'))
a = numpy.ones((3, 7))
self.assertTrue((f(a) == 0).all()) # Zero gradient.
self.assertTrue(a.shape == f(a).shape) # With proper shape.
......
......@@ -2674,9 +2674,9 @@ def test_make_vector():
s = mv.sum()
gb = T.grad(s, b, assume_continuously_differentiable=True)
gi = T.grad(s, i, assume_continuously_differentiable=True)
gd = T.grad(s, d, assume_continuously_differentiable=True)
gb = T.grad(s, b, disconnected_inputs='ignore')
gi = T.grad(s, i, disconnected_inputs='ignore')
gd = T.grad(s, d, disconnected_inputs='ignore')
#print 'gb =', gb
#print 'gi =', gi
#print 'gd =', gd
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论