Merged

f09168ed · Olivier Delalleau · e9ff85cf · db7d1387 · f09168ed · f09168ed
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -3,37 +3,72 @@ Modifications in the trunk since the last release
 Partial of what is in trunk since the last release
 --------------------------------------------------
 Deprecation:
+ * tag.shape attribute deprecated (#633)
+ * FAST_RUN_NOGC mode deprecated
+ * CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
 Bugs fixed:
 * Bugfix in CudaNdarray.__iadd__. When it is not implemented, return the error.
+ * Typo fixed in tensor/opt.py
+ * THEANO_FLAGS='optimizer=None' now works as expected
+ * Fixed memory leak in error handling on GPU-to-host copy
+ * Fix relating specifically to Python 2.7 on Mac OS X
+ * infer_shape can now handle Python longs
+ * Fixed behaviour of pydotprint's max_label_size option
 Crash fixed:
- * Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution crash.
+ * Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
+   crash.
 Optimization:
 * Optimize 4 pattern of subtensor followed by subtensor.
+ * Gemm inplace optimization on the GPU re-enabled
 GPU:
- * Move to the gpu fused elemwise that have other dtype then float32 in them(except float64) if the input and output are float32.
+ * Move to the gpu fused elemwise that have other dtype then float32 in them
-   * This allow to move elemwise comparaison to the gpu if we cast it to float32 after that.
+   (except float64) if the input and output are float32.
+   * This allow to move elemwise comparisons to the GPU if we cast it to
+     float32 after that.
 * Implemented CudaNdarray.ndim to have the same interface in ndarray.
+ * Fixed slowdown caused by multiple chained views on CudaNdarray objects
+ * CudaNdarray_alloc_contiguous changed so as to never try to free
+   memory on a view: new "base" property
+ * Safer decref behaviour in CudaNdarray in case of failed allocations
+ * New GPU implementation of tensor.basic.outer
 New features:
 * ProfileMode
    * profile the scan overhead
    * simple hook system to add profiler
    * reordered the output to be in the order of more general to more specific
- * var[vector of index] now work, (grad work recursivly, the direct grad work inplace, gpu work)
+ * var[vector of index] now work, (grad work recursively, the direct grad
+   work inplace, gpu work)
    * limitation: work only of the outer most dimensions.
+ * test_value implementation to allow quick debugging at graph creation time
+ * cuda.root inferred if nvcc is on the path, otherwise defaults to
+   /usr/local/cuda
+ * Better graph printing for graphs involving a scan subgraph
+ *
 Documentation:
+ * Better commenting of cuda_ndarray.cu
+ * Fixes in the scan documentation: add missing declarations/print statements
+ * Better error message on failed __getitem__
+ * Updated documentation on profile mode
 Unit tests:
 * More strict float comparaison by default
 * Reuse test for subtensor of tensor for gpu tensor(more gpu test)
+ * Tests that check for aliased function inputs and assure appropriate copying
+   (#374)
+ * Better test of copies in CudaNdarray
+ * New tests relating to the new base pointer requirements
 Other:
- * ?? a bug?? Correctly put the broadcast flag to True in the output var of a Rehapse op when we receive an int 1 in the new shape.
+ * ?? a bug?? Correctly put the broadcast flag to True in the output var of
+   a Rehapse op when we receive an int 1 in the new shape.
+ * pydotprint: high contrast mode is now the default
+ * More compact printing (ignore leading "Composite" in op names)
 Theano 0.3.1 (2011-02-21)
 ----------------------------

--- a/doc/developer/index.txt
+++ b/doc/developer/index.txt
 .. _developer:
-======================
+==============================================
 Theano Design and Implementation Documentation
-======================
+==============================================
 .. toctree::

--- a/doc/developer/tensor.txt
+++ b/doc/developer/tensor.txt
@@ -7,7 +7,7 @@ Tensor
 This file describes the design of theano.tensor.
 Elemwise grad and R_op 
-=================
+======================
 Here's another straightforward example, though a bit more elaborate
 than adding two numbers together. Let's say that you want to compute

--- a/doc/install.txt
+++ b/doc/install.txt
@@ -557,7 +557,7 @@ used within a MinGW Shell (not available if you only installed Python(x,y)).
  You do not need to do the following now, because it is not usually needed, but if
  later on, when running Theano, you see an error message that looks like:
-    *error: 'assert' was not declared in this scope*
+  *error: 'assert' was not declared in this scope*
  then you will have to add another section:
    .. code-block:: cfg

--- a/doc/library/tensor/basic.txt
+++ b/doc/library/tensor/basic.txt
@@ -728,11 +728,11 @@ row of a matrix x:
 Index-assignment is *not* supported.  If you want to do something like ``a[5]
-= b`` or ``a[5]+=b``, see :func:`setsubtensor` and :func:`incsubtensor` below.
+= b`` or ``a[5]+=b``, see :func:`set_subtensor` and :func:`inc_subtensor` below.
-.. autofunction:: theano.tensor.basic.setsubtensor
+.. autofunction:: theano.tensor.basic.set_subtensor
-.. autofunction:: theano.tensor.basic.incsubtensor
+.. autofunction:: theano.tensor.basic.inc_subtensor
 .. _tensor_operator_support:

--- a/doc/sandbox/sparse.txt
+++ b/doc/sandbox/sparse.txt
@@ -112,7 +112,7 @@ Misc
 ----
 The sparse equivalent of dmatrix is csc_matrix and csr_matrix.
-:api:`TrueDot` vs. :api:`StructuredDot`
+:api:`Dot` vs. :api:`StructuredDot`
 ----------------------------------------
 Often when you use a sparse matrix it is because there is a meaning to the

--- a/theano/gof/cc.py
+++ b/theano/gof/cc.py
@@ -1207,54 +1207,21 @@ class OpWiseCLinker(link.LocalLinker):
            else:
                post_thunk_old_storage = None
+            compute_map = {}
+            for k in storage_map:
+                compute_map[k] = [k.owner is None]
            thunks = []
+            for node in order:
+                # Maker sure we use the C version of the code whenever
+                # possible
+                node._op_use_c_code = True
+                thunks += [node.op.make_thunk(node,
+                                        storage_map,
+                                        compute_map,
+                                        no_recycling)]
            for node_idx, node in enumerate(order):
-                node_input_storage = [storage_map[r] for r in node.inputs]
-                node_output_storage = [storage_map[r] for r in node.outputs]
-                debug('Compiling node %i of graph' % node_idx)
-                thunk = None
-                # If the op don't override the c_code function, we don't try
-                # to generate a cthunk! Otherwise we won't find it in the compilation cache
-                # and try to compile it. This will get the lock even if we don't need it!
-                if node.op.c_code.im_func is not op.Op.c_code.im_func:
-                    try:
-                        e = Env(*graph.clone(node.inputs, node.outputs))
-                        if self.allow_gc:
-                            # if we allow garbage collection of intermediate nodes
-                            # we must forbid this C implementatio from cacheing its own
-                            # reference to its output
-                            node_no_recycling = e.outputs
-                        else:
-                            node_no_recycling = [r for r, r2 in zip(e.outputs, node.outputs) if r2 in no_recycling]
-                        cl = CLinker().accept(e, node_no_recycling)
-                        debug('Trying CLinker.make_thunk')
-                        thunk, node_input_filters, node_output_filters = cl.make_thunk(
-                            input_storage = node_input_storage,
-                            output_storage = node_output_storage,
-                            keep_lock=getattr(get_lock,"n_lock",0) != orig_n_lock)
-                        assert callable(thunk)
-                        thunk.inputs = node_input_storage
-                        thunk.outputs = node_output_storage
-                        thunks.append(thunk)
-                        do_python_thunk = False
-                    except (NotImplementedError, utils.MethodNotDefined):
-                        thunk = None
-                if thunk is None:
-                    if self.fallback_on_perform:
-                        debug('Falling back on perform')
-                        p = node.op.perform
-                        # default arguments are stored in the closure of `thunk`
-                        def thunk(p=p, i=node_input_storage, o=node_output_storage,n=node):
-                            return p(n, [x[0] for x in i], o)
-                        #thunk = lambda p = p, i = node_input_storage, o = node_output_storage, n = node: p(n, [x[0] for x in i], o)
-                        thunk.inputs = node_input_storage
-                        thunk.outputs = node_output_storage
-                        thunk.perform = p
-                        thunks.append(thunk)
-                    else:
-                        raise NotImplementedError("We where not able to use c_code and perform code for this node", node)
                if self.allow_gc:
                    post_thunk_old_storage.append([storage_map[input]

--- a/theano/gof/link.py
+++ b/theano/gof/link.py
@@ -6,7 +6,6 @@ from type import Type
 import sys, traceback
 from copy import copy
 from theano.gof.python25 import all
-import numpy
 __excepthook = sys.excepthook
 def thunk_hook(type, value, trace):
@@ -329,7 +328,7 @@ class LocalLinker(Linker):
        # 3. output storage
        # 4. thunks: list of nodes' functions in the order they will be run by the function in (1)
        # 5. order: list of nodes, in the order they will be run by the function in (1)
-        raise MethodNotDefined("make_all", type(self), self.__class__.__name__)
+        raise utils.MethodNotDefined("make_all", type(self), self.__class__.__name__)
 def gc_helper(node_list):
    """
@@ -391,10 +390,23 @@ class PerformLinker(LocalLinker):
        order = list(env.toposort())
        no_recycling = self.no_recycling
-        thunks = []
        input_storage, output_storage, storage_map = map_storage(env, order, input_storage, output_storage)
+        compute_map = {}
+        for k in storage_map:
+            compute_map[k] = [k.owner is None]
+        thunks = []
+        for node in order:
+            # Maker sure we don't use C version of the code, but rather only
+            # the python version
+            node._op_use_c_code = False
+            thunks += [node.op.make_thunk(node,
+                                    storage_map,
+                                    compute_map,
+                                    no_recycling)]
        computed, last_user = gc_helper(order)
        if self.allow_gc:
            post_thunk_old_storage = []
@@ -402,18 +414,6 @@ class PerformLinker(LocalLinker):
            post_thunk_old_storage = None
        for node in order:
-            node_input_storage = tuple(storage_map[input] for input in node.inputs)
-            node_output_storage = tuple(storage_map[output] for output in node.outputs)
-            p = node.op.perform
-            # Thunk is meant to be called without arguments.
-            # The arguments are given in the lambda expression so that they are saved in the lambda expression.
-            # Using the closure in a simple way didn't work.
-            thunk = lambda p = p, i = node_input_storage, o = node_output_storage, n = node: p(n, [x[0] for x in i], o)
-            thunk.inputs = node_input_storage
-            thunk.outputs = node_output_storage
-            thunk.perform = p
-            thunks.append(thunk)
            if self.allow_gc:
                post_thunk_old_storage.append([storage_map[input]
                    for input in node.inputs

--- a/theano/gof/op.py
+++ b/theano/gof/op.py
@@ -2,8 +2,6 @@
 The `Op` class is the base interface for all operations
 compatible with `gof`'s :doc:`graph` routines.
 """
 __docformat__ = "restructuredtext en"
@@ -12,6 +10,11 @@ from theano import config
 import graph
 import numpy
 import utils
+import logging
+from theano import config
+from env import Env
+import graph
+import cc
 class CLinkerObject(object):
@@ -331,21 +334,22 @@ class PureOp(object):
            # build test input-values
            input_vals = []
-            for ins in node.inputs:
+            for i, ins in enumerate(node.inputs):
                if isinstance(ins, graph.Constant):
                    input_vals.append(ins.value)
                elif isinstance(ins,SharedVariable):
-                    input_vals.append(ins.get_value(borrow=True))
+                    input_vals.append(ins.get_value(borrow=True, return_internal_type=True))
                elif isinstance(ins,graph.Variable) and hasattr(ins.tag, 'test_value'):
                    # ensure that the test value is correct
                    input_vals.append(ins.type.filter(ins.tag.test_value))
                else:
                    # no test-value was specified, act accordingly
                    if config.compute_test_value == 'warn':
-                        raise Warning('Cannot compute test value: input %s of Op %s missing default value')
+                        # TODO: use warnings.warn, http://docs.python.org/library/warnings.html#warnings.warn
+                        print >>sys.stderr, ('Warning, Cannot compute test value: input %i (%s) of Op %s missing default value' % (i, ins, node))
                        run_perform = False
                    elif config.compute_test_value == 'err':
-                        raise ValueError('Cannot compute test value: input %s of Op %s missing default value')
+                        raise ValueError('Cannot compute test value: input %i (%s) of Op %s missing default value' % (i, ins, node))
                    else:
                        # silently skip test
                        run_perform = False
@@ -355,12 +359,23 @@ class PureOp(object):
                # compute output value once with test inputs to validate graph
                output_storage = [[None]] * len(node.outputs)
-                node.op.perform(node, input_vals, output_storage)
+                try:
+                    node.op.perform(node, input_vals, output_storage)
-                # add 'test_value' to output tags, so that downstream ops can use these
-                # numerical values as inputs to their perform method.
+                    # add 'test_value' to output tags, so that downstream ops can use these
-                for (outval, node_output) in zip(output_storage, node.outputs):
+                    # numerical values as inputs to their perform method.
-                    node_output.tag.test_value = outval[0]
+                    for (outval, node_output) in zip(output_storage, node.outputs):
+                        node_output.tag.test_value = outval[0]
+                except utils.MethodNotDefined, e:
+                    # This case happens when the perform method is not defined
+                    # for a certain Op.
+                    #TODO: use the c_thunk?
+                    if config.compute_test_value == 'warn':
+                        # TODO: use warnings.warn
+                        print >>sys.stderr, 'Warning, in compute_test_value:', type(e)
+                        print >>sys.stderr, e
+                    elif config.compute_test_value == 'err':
+                        raise
        if self.default_output is not None:
            return node.outputs[self.default_output]
@@ -404,4 +419,77 @@ class PureOp(object):
 class Op(utils.object2, PureOp, CLinkerOp):
    """Convenience class to bundle `PureOp` and `CLinkerOp`"""
-    pass
+    def __new__(cls, *args, **kwargs):
+        # this function exists to silently and transparently ensure that all
+        # existing Ops get a _op_use_c_code attribute
+        obj = object.__new__(cls, *args, **kwargs)
+        if not hasattr(obj, '_op_use_c_code'):
+            obj._op_use_c_code = True
+        return obj
+    def __init__(self, use_c_code=True):
+        self._op_use_c_code = use_c_code
+    def make_thunk(self, node, storage_map, compute_map, no_recycling):
+        """
+        :param node: something previously returned by self.make_node
+        :param storage_map: dict variable -> one-element-list where a computed
+                value for this variable may be found.
+        :param compute_map: dict variable -> one-element-list where a boolean
+                value will be found.  The boolean indicates whether the
+                variable's storage_map container contains a valid value (True)
+                or if it has not been computed yet (False).
+        :param no_recycling: list of variables for which it is forbidden to
+                reuse memory allocated by a previous call.
+        """
+        logger = logging.getLogger('theano.Op')
+        node_input_storage = [storage_map[r] for r in node.inputs]
+        node_output_storage = [storage_map[r] for r in node.outputs]
+        node_input_compute = [compute_map[r] for r in node.inputs]
+        node_output_compute = [compute_map[r] for r in node.outputs]
+        #logger.debug('Compiling node %i of graph' % node_idx)
+        if self._op_use_c_code:
+            try:
+                e = Env(*graph.clone(node.inputs, node.outputs))
+                e_no_recycling = [new_o
+                        for (new_o, old_o) in zip(e.outputs, node.outputs)
+                        if old_o in no_recycling]
+                cl = cc.CLinker().accept(e,
+                        no_recycling=e_no_recycling)
+                logger.debug('Trying CLinker.make_thunk')
+                fill_storage, node_input_filters, node_output_filters = cl.make_thunk(
+                    input_storage = node_input_storage,
+                    output_storage = node_output_storage)
+                def rval():
+                    fill_storage()
+                    for o in node.outputs:
+                        compute_map[o][0] = True
+                rval.cthunk = fill_storage.cthunk
+                rval.inputs = node_input_storage
+                rval.outputs = node_output_storage
+                rval.lazy = False
+                return rval
+            except (NotImplementedError, utils.MethodNotDefined):
+                logger.debug('Falling back on perform')
+        # condition: either there was no c_code, or it failed
+        p = node.op.perform
+        # default arguments are stored in the closure of `rval`
+        def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
+            r = p(n, [x[0] for x in i], o)
+            for o in node.outputs:
+                compute_map[o][0] = True
+            return r
+        rval.inputs = node_input_storage
+        rval.outputs = node_output_storage
+        rval.perform = p
+        rval.lazy = False
+        return rval
--- a/theano/linalg/__init__.py
+++ b/theano/linalg/__init__.py
+from ops import (cholesky, matrix_inverse, solve,
+        diag, extract_diag, alloc_diag,
+        det, PSD_hint,
+        trace, spectral_radius_bound)
--- a/theano/linalg/ops.py
+++ b/theano/linalg/ops.py
--- a/theano/linalg/tests/test_linalg.py
+++ b/theano/linalg/tests/test_linalg.py
+import numpy
+from theano import tensor, function
+from .ops import *
+if 0:
+    def test_cholesky():
+        #TODO: test upper and lower triangular
+        #todo: unittest randomseed 
+        rng = numpy.random.RandomState(1234)
+        r = rng.randn(5,5)
+        pd = numpy.dot(r,r.T)
+        x = tensor.matrix()
+        chol = Cholesky()(x)
+        f = function([x], tensor.dot(chol, chol.T)) # an optimization could remove this
+        ch_f = function([x], chol)
+        # quick check that chol is upper-triangular
+        ch = ch_f(pd)
+        print ch
+        assert ch[0,4] != 0
+        assert ch[4,0] == 0
+        assert numpy.allclose(numpy.dot(ch.T,ch),pd)
+        assert not numpy.allclose(numpy.dot(ch,ch.T),pd)
+def test_inverse_correctness():
+    #todo: unittest randomseed
+    rng = numpy.random.RandomState(12345)
+    r = rng.randn(4,4)
+    x = tensor.matrix()
+    xi = matrix_inverse(x)
+    ri = function([x], xi)(r)
+    assert ri.shape == r.shape
+    assert ri.dtype == r.dtype
+    rir = numpy.dot(ri,r)
+    rri = numpy.dot(r,ri)
+    assert numpy.allclose(numpy.identity(4), rir), rir
+    assert numpy.allclose(numpy.identity(4), rri), rri
+def test_inverse_grad():
+    rng = numpy.random.RandomState(1234)
+    r = rng.randn(4,4)
+    tensor.verify_grad(matrix_inverse, [r], rng=numpy.random)
+def test_det_grad():
+    rng = numpy.random.RandomState(1234)
+    r = rng.randn(5,5)
+    tensor.verify_grad(det, [r], rng=numpy.random)
--- a/theano/tensor/basic.py
+++ b/theano/tensor/basic.py
@@ -4792,7 +4792,7 @@ outer = Outer()
 #########################
 def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False,
-         assume_continuously_differentiable = False):
+         disconnected_inputs='raise'):
    """
    :type cost: Scalar (0-dimensional) `Variable`
    :type wrt: `Variable` or list of `Variable`s.
@@ -4804,13 +4804,13 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False,
    :param warn_type: a value of True will cause warnings to be logged for any Op that emits a
        gradient that does not match its input type.
-    :param assume_continuously_differentiable : flag that says if grad is strict about what it returns.
+    :type disconnected_inputs: string
-        If set to false it will raise an exception for any argument in
+    :param disconnected_inputs: Defines the behaviour if some of the variables
-        ``wrt`` for which there is no gradient either because some op does
+        in ``wrt`` are not part of the computational graph computing ``cost``
-        not know how to compute the gradient with respect to that argument
+        (or if all links are non-differentiable). The possible values are:
-        or the argument is not part of the computational graph. If the flag
+        - 'ignore': considers that the gradient on these parameters is zero.
-        is set to true, the ``grad`` method returns zeros like the argument
+        - 'warn': consider the gradient zero, and print a warning.
-        ( i.e. it makes the assumption that the gradient should be 0).
+        - 'raise': raise an exception.
    :rtype: `Variable` or list of `Variable`s (depending upon `wrt`)
@@ -4853,13 +4853,24 @@ def grad(cost, wrt, g_cost=None, consider_constant=[], warn_type=False,
        wrt = [wrt]
    ret = []
    for p in wrt:
-        if p not in gmap and not assume_continuously_differentiable:
+        if p in gmap:
-            raise ValueError(("grad method was asked to compute the gradient "
+            ret.append(gmap[p])
-                             "with respect to a variable that is not part of "
-                             "the computational graph of the cost, or is used "
-                             "by a non-differentiable operator"), p)
        else:
-            ret.append(gmap.get(p, zeros_like(p)))
+            message = ("grad method was asked to compute the gradient "
+                    "with respect to a variable that is not part of "
+                    "the computational graph of the cost, or is used "
+                    "only by a non-differentiable operator: %s" % p)
+            if disconnected_inputs == 'ignore':
+                pass
+            elif disconnected_inputs == 'warn':
+                warnings.warn(message, stacklevel=1)
+            elif disconnected_inputs == 'raise':
+                raise ValueError(message)
+            else:
+                raise ValueError("Invalid value for keyword "
+                        "'disconnected_inputs', valid values are "
+                        "'ignore', 'warn' and 'raise'.")
+            ret.append(zeros_like(p))
    if len(ret) == 1:
        return ret[0]
@@ -5134,7 +5145,7 @@ def verify_grad(fun, pt, n_tests=2, rng=None, eps=None, abs_tol=None, rel_tol=No
        g_cost = cast(g_cost, o_output.dtype)
    symbolic_grad = grad(cost, tensor_pt, g_cost,
-                         assume_continuously_differentiable = True)
+                         disconnected_inputs='ignore')
    #if o_output.dtype in ['float32','float64']:
    #    assert all([x.dtype == o_output.dtype for x in symbolic_grad]),("Expected grad of type %s, got %s "%( symbolic_grad.dtype, o_output.dtyp))

--- a/theano/tensor/tests/test_basic.py
+++ b/theano/tensor/tests/test_basic.py
@@ -3346,7 +3346,7 @@ class test_grad(unittest.TestCase):
        o = test_grad.O()
        a1 = o.make_node()
        g = grad(a1.outputs[0], a1.outputs[1],
-                 assume_continuously_differentiable = True)
+                 disconnected_inputs='ignore')
        self.assertTrue(g.owner.op == fill)
        self.assertTrue(g.owner.inputs[1].data == 0)
        self.assertRaises(ValueError, grad, a1.outputs[0], 'wtf')
@@ -3356,7 +3356,7 @@ class test_grad(unittest.TestCase):
        o = test_grad.O()
        a1 = o.make_node()
        g0,g1,g2 = grad(a1.outputs[0], a1.inputs + [scalar('z')],
-                        assume_continuously_differentiable = True)
+                        disconnected_inputs='ignore')
        self.assertTrue(o.gval0 is g0)
        self.assertTrue(o.gval1 is g1)
        self.assertTrue(g2.owner.op == fill)
@@ -3366,7 +3366,7 @@ class test_grad(unittest.TestCase):
        """Ensure that a zero gradient has the proper shape."""
        x = dmatrix()
        f = theano.function([x], grad(dscalar(), x,
-                                      assume_continuously_differentiable= True))
+                                      disconnected_inputs='ignore'))
        a = numpy.ones((3, 7))
        self.assertTrue((f(a) == 0).all())  # Zero gradient.
        self.assertTrue(a.shape == f(a).shape)  # With proper shape.

--- a/theano/tensor/tests/test_opt.py
+++ b/theano/tensor/tests/test_opt.py
@@ -2674,9 +2674,9 @@ def test_make_vector():
        s = mv.sum()
-        gb = T.grad(s, b, assume_continuously_differentiable=True)
+        gb = T.grad(s, b, disconnected_inputs='ignore')
-        gi = T.grad(s, i, assume_continuously_differentiable=True)
+        gi = T.grad(s, i, disconnected_inputs='ignore')
-        gd = T.grad(s, d, assume_continuously_differentiable=True)
+        gd = T.grad(s, d, disconnected_inputs='ignore')
        #print 'gb =', gb
        #print 'gi =', gi
        #print 'gd =', gd