Merge remote-tracking branch 'upstream/master' into issue-1080

ed61933a · abalkin · 5a7f5d8f · 7bdb8a8c · ed61933a · ed61933a
--- a/NEWS.txt
+++ b/NEWS.txt
 .. _NEWS:

-Updates in the Trunk since the last release:
+=============
+Release Notes
+=============
+
+Theano 0.6rc2 (November 21th, 2012)
+===================================
+
+Highlight:
+ * Fix a few regression inserted in 0.6rc1.
+ * A few new features.
+ * Speed up.
+ * Scan fix.
+ * Crash fix.
+ * A few small interface change.
+
+Commiters for this rc2 only:
+Razvan Pascanu
+Pascal Lamblin
+Frederic Bastien
+Ian Goodfellow
+Jeremiah Lowin
+Caglar Gulcehre
+Jey Kottalam
+Matthew Rocklin
+abalkin
+
+
+Regression in 0.6rc1 fixed:
+ * Fix the scan gradient dtype issue. In 0.6rc1, some upcast where inserted. (Razvan P.)
+ * Now grad() will do as before the 0.6rc1 for float, i.e. the grad dtype will be the same as the inputs inside the graph. If you ask for the direct grad, it will return the computed dtype. (Pascal L.)
+
+Wrong results fix:
+ * Scan fix in some case didn't returned the good results. (Razvan P., reported by Jeremiah L.)
+   This happen if you have a state with only neg tap and the outputs of the state is a function of some sequence.
+   If you have multiple state, there was no problem.
+ * Fixed bug in Scan with multiple outputs,
+   where one output would sometimes overwrite another one. (Razvan P.)
+ * Clip.grad treated the gradient with respect to the clipping boundary as always 0. (Ian G.)
+
+Interface change:
+ * Now we do not support unaligned ndarray in python code. (Frederic B.)
+   We did not support it in c code and supporting it in python code made
+   the detection harder.
+ * Now we only support officialy scipy 0.7.2 and numpy 1.5.0 (Frederic B.)
+   We weren't and aren't testing with older version.
+ * The theano.sparse.SparseType is available even when scipy is not (Frederic B.)
+ * Fixes issue where members of consider_constant grad parameter
+   were treated differently from Constant variables. (Ian G.)
+ * Remove the parameter g_cost to theano.grad(). (Ian G.)
+   Use the new more powerfull parameter known_grads instead.
+
+NumPy interface support:
+ * theano.tensor.where is an alias for theano.tensor.switch to support NumPy semantic. (Ian G.)
+ * TensorVariable objects now have dot, argmin, argmax, clip, conj, repeat, trace, std, round,
+   ravel and argsort functions and the real and imag properties as numpy.ndarray object.
+   The functionality was already available in Theano. (abalkin)
+
+Speed up:
+ * A C version of the SoftMax op (Razvan P.)
+   There was c code for the softmax with bias code.
+ * Faster GpuIncSubtensor (Ian G.)
+ * Faster copy on the GPU for 4d tensor. (Ian G.)
+ * The fix of flatten infer_shape re-enable an optimization (Pascal L.)
+   * The bug was introduced in 0.6rc1.
+ * Enable inc_subtensor on the GPU when updating it with a float64 dtype. (Ian G.)
+   It was causing an optimization warning.
+ * Make DeepCopy reuse preallocated memory. (Frederic B.)
+ * Move then convolution to the GPU when the image shape and logical image shape differ. (Frederic Bastien)
+ * C code for the View Op (Razvan P., Pascal L.)
+
+New Feature:
+ * Added a monitoring mode "MonitorMode" as a debugging tool. (Olivier D.)
+ * Allow integer axes when keepdims==True (Jeremiah Lowin)
+ * Add erfinv and erfcinv op. (Jey Kottalam)
+ * Added tensor.batched_dot(). (Caglar Gulcehre)
+   It use scan behind the scene, but making doing this easier.
+ * theano.get_constant_value(x) (Frederic B.)
+   This try to do have x as a constant int.
+   This do some constant folding to try to convert x into an int.
+   Used by some optimization.
+ * Add theano.tensor.io.{MPIRecv,MPIRecvWait,MPISend,MPISendWait} (Matthew Rocklin)
+   Theano do not automatically use them. It is up to you to use them and split your computation.
+ * Added theano.sandbox.linalg.eig (abalkin)
+ * Started some support for Python3 (abalkin)
+   setup.py support python3 now.
+   It call 2to3 during the setup.
+   Python3 not fully supported as we didn't update the c code.
+
+
+Crash Fix:
+ * Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
+ * Fix an optimization warning. Now it get optimized. (Frederic B.)
+ * Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
+ * Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
+ * Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.)
+   Also implement the gradient on the min/max bound.
+ * Fix crash in the grad of tensor.switch for int (Ian G.)
+ * Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
+ * Fix crash as sometimes sparse.dot would return a different dtype number
+   that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
+ * Better error msg (Ian G.)
+ * Move all sparse random function back to sandbox as it don't have a state inside Theano. (Pascal L.)
+   They where moved outside the sandbox in 0.6rc1
+ * LoadFromDisk now is allowed to take only support some memmap mode. (Pascal L.)
+   Otherwise, this was causing errors, segmentation faults or wrong results.
+ * Fix import problem on PiCloud (Jeremiah Lowin)
+    * You need to use the c|py linker with the default
+      environment. Otherwise, you need to create your own environment.
+ * Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
+ * Better handling and error message of gradients on integer. (Ian G.)
+ * Fixes a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)

-https://github.com/Theano/Theano/wiki/Devnews
+Other:
+ * Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G., abalkin.

 =============
 Release Notes

--- a/doc/NEWS.txt
+++ b/doc/NEWS.txt
 .. _NEWS:

-Updates in the Trunk since the last release:
+=============
+Release Notes
+=============
+
+Theano 0.6rc2 (November 21th, 2012)
+===================================
+
+Highlight:
+ * Fix a few regression inserted in 0.6rc1.
+ * A few new features.
+ * Speed up.
+ * Scan fix.
+ * Crash fix.
+ * A few small interface change.
+
+Commiters for this rc2 only:
+Razvan Pascanu
+Pascal Lamblin
+Frederic Bastien
+Ian Goodfellow
+Jeremiah Lowin
+Caglar Gulcehre
+Jey Kottalam
+Matthew Rocklin
+abalkin
+
+
+Regression in 0.6rc1 fixed:
+ * Fix the scan gradient dtype issue. In 0.6rc1, some upcast where inserted. (Razvan P.)
+ * Now grad() will do as before the 0.6rc1 for float, i.e. the grad dtype will be the same as the inputs inside the graph. If you ask for the direct grad, it will return the computed dtype. (Pascal L.)
+
+Wrong results fix:
+ * Scan fix in some case didn't returned the good results. (Razvan P., reported by Jeremiah L.)
+   This happen if you have a state with only neg tap and the outputs of the state is a function of some sequence.
+   If you have multiple state, there was no problem.
+ * Fixed bug in Scan with multiple outputs,
+   where one output would sometimes overwrite another one. (Razvan P.)
+ * Clip.grad treated the gradient with respect to the clipping boundary as always 0. (Ian G.)
+
+Interface change:
+ * Now we do not support unaligned ndarray in python code. (Frederic B.)
+   We did not support it in c code and supporting it in python code made
+   the detection harder.
+ * Now we only support officialy scipy 0.7.2 and numpy 1.5.0 (Frederic B.)
+   We weren't and aren't testing with older version.
+ * The theano.sparse.SparseType is available even when scipy is not (Frederic B.)
+ * Fixes issue where members of consider_constant grad parameter
+   were treated differently from Constant variables. (Ian G.)
+ * Remove the parameter g_cost to theano.grad(). (Ian G.)
+   Use the new more powerfull parameter known_grads instead.
+
+NumPy interface support:
+ * theano.tensor.where is an alias for theano.tensor.switch to support NumPy semantic. (Ian G.)
+ * TensorVariable objects now have dot, argmin, argmax, clip, conj, repeat, trace, std, round,
+   ravel and argsort functions and the real and imag properties as numpy.ndarray object.
+   The functionality was already available in Theano. (abalkin)
+
+Speed up:
+ * A C version of the SoftMax op (Razvan P.)
+   There was c code for the softmax with bias code.
+ * Faster GpuIncSubtensor (Ian G.)
+ * Faster copy on the GPU for 4d tensor. (Ian G.)
+ * The fix of flatten infer_shape re-enable an optimization (Pascal L.)
+   * The bug was introduced in 0.6rc1.
+ * Enable inc_subtensor on the GPU when updating it with a float64 dtype. (Ian G.)
+   It was causing an optimization warning.
+ * Make DeepCopy reuse preallocated memory. (Frederic B.)
+ * Move then convolution to the GPU when the image shape and logical image shape differ. (Frederic Bastien)
+ * C code for the View Op (Razvan P., Pascal L.)
+
+New Feature:
+ * Added a monitoring mode "MonitorMode" as a debugging tool. (Olivier D.)
+ * Allow integer axes when keepdims==True (Jeremiah Lowin)
+ * Add erfinv and erfcinv op. (Jey Kottalam)
+ * Added tensor.batched_dot(). (Caglar Gulcehre)
+   It use scan behind the scene, but making doing this easier.
+ * theano.get_constant_value(x) (Frederic B.)
+   This try to do have x as a constant int.
+   This do some constant folding to try to convert x into an int.
+   Used by some optimization.
+ * Add theano.tensor.io.{MPIRecv,MPIRecvWait,MPISend,MPISendWait} (Matthew Rocklin)
+   Theano do not automatically use them. It is up to you to use them and split your computation.
+ * Added theano.sandbox.linalg.eig (abalkin)
+ * Started some support for Python3 (abalkin)
+   setup.py support python3 now.
+   It call 2to3 during the setup.
+   Python3 not fully supported as we didn't update the c code.
+
+
+Crash Fix:
+ * Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
+ * Fix an optimization warning. Now it get optimized. (Frederic B.)
+ * Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
+ * Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
+ * Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.)
+   Also implement the gradient on the min/max bound.
+ * Fix crash in the grad of tensor.switch for int (Ian G.)
+ * Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
+ * Fix crash as sometimes sparse.dot would return a different dtype number
+   that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
+ * Better error msg (Ian G.)
+ * Move all sparse random function back to sandbox as it don't have a state inside Theano. (Pascal L.)
+   They where moved outside the sandbox in 0.6rc1
+ * LoadFromDisk now is allowed to take only support some memmap mode. (Pascal L.)
+   Otherwise, this was causing errors, segmentation faults or wrong results.
+ * Fix import problem on PiCloud (Jeremiah Lowin)
+    * You need to use the c|py linker with the default
+      environment. Otherwise, you need to create your own environment.
+ * Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
+ * Better handling and error message of gradients on integer. (Ian G.)
+ * Fixes a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)

-https://github.com/Theano/Theano/wiki/Devnews
+Other:
+ * Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G., abalkin.

 =============
 Release Notes
@@ -72,7 +183,7 @@ Deprecation:
   This was a predecessor of SharedVariable with a less pythonic philosophy.

 Interface changes:
- * Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.8.
+ * Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.7.2.
 * In Theano 0.5, we removed the deprecated sharedvar.value property.
   Now we raise an error if you access it. (Frederic B.)
 * theano.function does not accept duplicate inputs, so function([x, x], ...)

--- a/doc/conf.py
+++ b/doc/conf.py
@@ -53,7 +53,7 @@ copyright = '2008--2012, LISA lab'
 # The short X.Y version.
 version = '0.6'
 # The full version, including alpha/beta/rc tags.
-release = '0.6rc1'
+release = '0.6rc2'

 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:

--- a/setup.py
+++ b/setup.py
@@ -55,7 +55,7 @@ PLATFORMS           = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
 MAJOR               = 0
 MINOR               = 6
 MICRO               = 0
-SUFFIX              = "rc1"  # Should be blank except for rc's, betas, etc.
+SUFFIX              = "rc2"  # Should be blank except for rc's, betas, etc.
 ISRELEASED          = False

 VERSION             = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX)

--- a/theano/gradient.py
+++ b/theano/gradient.py
@@ -365,7 +365,7 @@ def grad(cost, wrt, consider_constant=None,
        (or if all links are non-differentiable). The possible values are:
        - 'ignore': considers that the gradient on these parameters is zero.
        - 'warn': consider the gradient zero, and print a warning.
-        - 'raise': raise an exception.
+        - 'raise': raise DisconnectedInputError.

    :type add_names: bool
    :param add_names: If True, variables generated by grad will be named
@@ -468,7 +468,7 @@ def grad(cost, wrt, consider_constant=None,
                'Ambiguous whether %s should be made into tensor'
                ' or sparse theano variable' % str(type(g_var)))

-        if g_var.type not in [NullType, DisconnectedType] and 'float' \
+        if not isinstance(g_var.type, (NullType, DisconnectedType)) and 'float' \
            not in str(g_var.type.dtype):
            raise TypeError("Gradients must always be NullType, "
                    "DisconnectedType, or continuous, but grad was "
@@ -482,28 +482,31 @@ def grad(cost, wrt, consider_constant=None,
        grad_dict[var] = g_var


-
-    # variables that do not influence the cost have zero gradient.
-    # if wrt is such a variable, populate the grad_dict with this info
-    # so that wrt not being in var_to_node_to_idx won't cause an error below
-    # according to the flag, possibly raise an error if wrt is disconnected
-    for elem in wrt:
-        if elem not in var_to_node_to_idx and elem is not cost \
-                and elem not in grad_dict:
+    def handle_disconnected(var):
            message = ("grad method was asked to compute the gradient "
                    "with respect to a variable that is not part of "
                    "the computational graph of the cost, or is used "
-                    "only by a non-differentiable operator: %s" % elem)
+                    "only by a non-differentiable operator: %s" % var)
            if disconnected_inputs == 'ignore':
                pass
            elif disconnected_inputs == 'warn':
                warnings.warn(message, stacklevel=2)
            elif disconnected_inputs == 'raise':
-                raise ValueError(message)
+                raise DisconnectedInputError(message)
            else:
                raise ValueError("Invalid value for keyword "
                        "'disconnected_inputs', valid values are "
                        "'ignore', 'warn' and 'raise'.")
+
+
+    # variables that do not influence the cost have zero gradient.
+    # if wrt is such a variable, populate the grad_dict with this info
+    # so that wrt not being in var_to_node_to_idx won't cause an error below
+    # according to the flag, possibly raise an error if wrt is disconnected
+    for elem in wrt:
+        if elem not in var_to_node_to_idx and elem is not cost \
+                and elem not in grad_dict:
+            handle_disconnected(elem)
            grad_dict[elem] = DisconnectedType()()

    cost_name = None
@@ -523,6 +526,7 @@ def grad(cost, wrt, consider_constant=None,

    for i in xrange(len(rval)):
        if isinstance(rval[i].type, DisconnectedType):
+            handle_disconnected(rval[i])
            if return_disconnected == 'zero':
                rval[i] = _float_zeros_like(wrt[i])
            elif return_disconnected == 'None':
@@ -719,7 +723,13 @@ class NullTypeGradError(TypeError):
    """
    Raised when grad encounters a NullType.
    """
-    pass
+
+class DisconnectedInputError(ValueError):
+    """
+    Raised when grad is asked to compute the gradient
+    with respect to a disconnected input and
+    disconnected_inputs='raise'.
+    """

 def _populate_grad_dict(var_to_node_to_idx,
        grad_dict, wrt, cost_name=None):
@@ -776,8 +786,42 @@ def _populate_grad_dict(var_to_node_to_idx,
                        input_to_outputs in connection_pattern
                    ]

-            if True in inputs_connected:
-                # At least one input of this op is connected to the cost so we must
+            #List of bools indicating if each output is an integer dtype
+            output_is_int = [hasattr(output.type, 'dtype') and
+                    output.type.dtype in theano.tensor.discrete_dtypes
+                    for output in node.outputs]
+
+            #List of bools indicating if each output is NullType
+            ograd_is_nan = [isinstance(output.type, NullType)
+                    for output in output_grads]
+
+            # List of bools indicating if each input only has NullType outputs
+            only_connected_to_nan = [(True not in
+                [in_to_out and out_to_cost and not out_nan
+                    for in_to_out, out_to_cost, out_nan in
+                    zip(in_to_outs, outputs_connected, ograd_is_nan)])
+                for in_to_outs in connection_pattern]
+
+            if True not in inputs_connected:
+                # All outputs of this op are disconnected so we can skip
+                # Calling the op's grad method and report that the inputs
+                # are disconnected
+                # (The op's grad method could do this too, but this saves the
+                # implementer the trouble of worrying about this case)
+                input_grads = [DisconnectedType()() for ipt in inputs]
+            elif False not in only_connected_to_nan:
+                # All inputs are only connected to nan gradients, so we don't
+                # need to bother calling the grad method. We know the gradient
+                # with respect to all connected inputs is nan.
+                input_grads = []
+                for connected in inputs_connected:
+                    if connected:
+                        input_grads.append(NullType()())
+                    else:
+                        input_grads.append(DisconnectedType()())
+            else:
+                # At least one input of this op is connected to the cost so and
+                # not all output gradients are undefined so we must
                # call the op's grad method

                # Each Op's grad function requires inputs and output_grads
@@ -848,13 +892,6 @@ def _populate_grad_dict(var_to_node_to_idx,
                if len(input_grads) != len(inputs):
                    raise ValueError(("%s returned the wrong number of" +\
                            " gradient terms.") % str(node.op))
-            else:
-                # All outputs of this op are disconnected so we can skip
-                # Calling the op's grad method and report that the inputs
-                # are disconnected
-                # (The op's grad method could do this too, but this saves the
-                # implementer the trouble of worrying about this case)
-                input_grads = [DisconnectedType()() for ipt in inputs]

            # must convert to list in case the op returns a tuple
            # we won't be able to post-process out the Nones if it does that
@@ -862,18 +899,15 @@ def _populate_grad_dict(var_to_node_to_idx,

            # Do type checking on the result

-            #List of bools indicating if each output is an integer dtype
-            output_is_int = [hasattr(output.type, 'dtype') and
-                    output.type.dtype in theano.tensor.discrete_dtypes
-                    for output in node.outputs]

-            #List of bools indicating if each input only has integer outputs
+            # List of bools indicating if each input only has integer outputs
            only_connected_to_int = [(True not in
                [in_to_out and out_to_cost and not out_int
                    for in_to_out, out_to_cost, out_int in
                    zip(in_to_outs, outputs_connected, output_is_int)])
                for in_to_outs in connection_pattern]

+
            for i, term in enumerate(input_grads):

                # Disallow Nones
@@ -898,6 +932,10 @@ def _populate_grad_dict(var_to_node_to_idx,
                                ' returned an integer-valued variable.'
                                ' (Input index %d, dtype %s)' % (i,
                                    term.type.dtype))
+
+                    if only_connected_to_nan[i]:
+                        assert isinstance(term.type, NullType)
+
                    if only_connected_to_int[i]:
                        # This term has only integer outputs and we know
                        # it's not undefined or disconnected

--- a/theano/tensor/elemwise.py
+++ b/theano/tensor/elemwise.py
@@ -722,20 +722,19 @@ class Elemwise(Op):
    def _bgrad(self, inputs, ograds):
        # returns grad, with respect to broadcasted versions of inputs

-        # Gradients (especially on the final costs) don't have to be symbolic
-        # e.g., ograds will be [ 1. ] if your objective is c and the output
-        # of the current apply node is c
-        ograds = map(as_tensor_variable, ograds)
-
        prev_setting = theano.config.compute_test_value

        try:

            theano.config.compute_test_value = 'off'

-            scalar_inputs = [Scalar(dtype=t.type.dtype)() for t in inputs]
-            scalar_ograds = [Scalar(dtype=ograd.type.dtype)()
-                    for ograd in ograds]
+            def as_scalar(t):
+                if isinstance(t.type, (NullType, DisconnectedType)):
+                    return t
+                return Scalar(t.type.dtype)()
+
+            scalar_inputs = map(as_scalar, inputs)
+            scalar_ograds = map(as_scalar, ograds)
            scalar_igrads = self.scalar_op.grad(scalar_inputs, scalar_ograds)
            for igrad in scalar_igrads:
                assert igrad is not None

--- a/theano/tests/test_gradient.py
+++ b/theano/tests/test_gradient.py
@@ -517,5 +517,42 @@ def test_known_grads_integers():

    assert np.allclose(g_actual, gv)

+def test_undefined_cost_grad():
+
+        # Tests that if we say the cost is not differentiable via the
+        # known_grads mechanism, it is treated as such by the rest of the
+        # system.
+        # This is so that Ops that are built around minigraphs like OpFromGraph
+        # and scan can implement Op.grad by passing ograds to known_grads
+
+        x = theano.tensor.iscalar()
+        y = theano.tensor.iscalar()
+        cost = x + y
+        assert cost.dtype in theano.tensor.discrete_dtypes
+        try:
+            grads = theano.tensor.grad(cost, [x, y], known_grads = {cost: NullType()() })
+        except theano.gradient.NullTypeGradError:
+            return
+        raise AssertionError("An undefined gradient has been ignored.")
+
+def test_disconnected_cost_grad():
+
+        # Tests that if we say the cost is disconnected via the
+        # known_grads mechanism, it is treated as such by the rest of the
+        # system.
+        # This is so that Ops that are built around minigraphs like OpFromGraph
+        # and scan can implement Op.grad by passing ograds to known_grads
+
+        x = theano.tensor.iscalar()
+        y = theano.tensor.iscalar()
+        cost = x + y
+        assert cost.dtype in theano.tensor.discrete_dtypes
+        try:
+            grads = theano.tensor.grad(cost, [x, y], known_grads = {cost: gradient.DisconnectedType()() },
+                    disconnected_inputs='raise')
+        except theano.gradient.DisconnectedInputError:
+            return
+        raise AssertionError("A disconnected gradient has been ignored.")
+
 if __name__ == '__main__':
    unittest.main()