Merge branch 'master' of https://github.com/Theano/Theano into grad_advinc_subtensor

ea3d6101 · Rami Al-Rfou · 45af1986 · 7bdb8a8c · ea3d6101 · ea3d6101
--- a/NEWS.txt
+++ b/NEWS.txt
 .. _NEWS:

-Updates in the Trunk since the last release:
+=============
+Release Notes
+=============
+
+Theano 0.6rc2 (November 21th, 2012)
+===================================
+
+Highlight:
+ * Fix a few regression inserted in 0.6rc1.
+ * A few new features.
+ * Speed up.
+ * Scan fix.
+ * Crash fix.
+ * A few small interface change.
+
+Commiters for this rc2 only:
+Razvan Pascanu
+Pascal Lamblin
+Frederic Bastien
+Ian Goodfellow
+Jeremiah Lowin
+Caglar Gulcehre
+Jey Kottalam
+Matthew Rocklin
+abalkin
+
+
+Regression in 0.6rc1 fixed:
+ * Fix the scan gradient dtype issue. In 0.6rc1, some upcast where inserted. (Razvan P.)
+ * Now grad() will do as before the 0.6rc1 for float, i.e. the grad dtype will be the same as the inputs inside the graph. If you ask for the direct grad, it will return the computed dtype. (Pascal L.)
+
+Wrong results fix:
+ * Scan fix in some case didn't returned the good results. (Razvan P., reported by Jeremiah L.)
+   This happen if you have a state with only neg tap and the outputs of the state is a function of some sequence.
+   If you have multiple state, there was no problem.
+ * Fixed bug in Scan with multiple outputs,
+   where one output would sometimes overwrite another one. (Razvan P.)
+ * Clip.grad treated the gradient with respect to the clipping boundary as always 0. (Ian G.)
+
+Interface change:
+ * Now we do not support unaligned ndarray in python code. (Frederic B.)
+   We did not support it in c code and supporting it in python code made
+   the detection harder.
+ * Now we only support officialy scipy 0.7.2 and numpy 1.5.0 (Frederic B.)
+   We weren't and aren't testing with older version.
+ * The theano.sparse.SparseType is available even when scipy is not (Frederic B.)
+ * Fixes issue where members of consider_constant grad parameter
+   were treated differently from Constant variables. (Ian G.)
+ * Remove the parameter g_cost to theano.grad(). (Ian G.)
+   Use the new more powerfull parameter known_grads instead.
+
+NumPy interface support:
+ * theano.tensor.where is an alias for theano.tensor.switch to support NumPy semantic. (Ian G.)
+ * TensorVariable objects now have dot, argmin, argmax, clip, conj, repeat, trace, std, round,
+   ravel and argsort functions and the real and imag properties as numpy.ndarray object.
+   The functionality was already available in Theano. (abalkin)
+
+Speed up:
+ * A C version of the SoftMax op (Razvan P.)
+   There was c code for the softmax with bias code.
+ * Faster GpuIncSubtensor (Ian G.)
+ * Faster copy on the GPU for 4d tensor. (Ian G.)
+ * The fix of flatten infer_shape re-enable an optimization (Pascal L.)
+   * The bug was introduced in 0.6rc1.
+ * Enable inc_subtensor on the GPU when updating it with a float64 dtype. (Ian G.)
+   It was causing an optimization warning.
+ * Make DeepCopy reuse preallocated memory. (Frederic B.)
+ * Move then convolution to the GPU when the image shape and logical image shape differ. (Frederic Bastien)
+ * C code for the View Op (Razvan P., Pascal L.)
+
+New Feature:
+ * Added a monitoring mode "MonitorMode" as a debugging tool. (Olivier D.)
+ * Allow integer axes when keepdims==True (Jeremiah Lowin)
+ * Add erfinv and erfcinv op. (Jey Kottalam)
+ * Added tensor.batched_dot(). (Caglar Gulcehre)
+   It use scan behind the scene, but making doing this easier.
+ * theano.get_constant_value(x) (Frederic B.)
+   This try to do have x as a constant int.
+   This do some constant folding to try to convert x into an int.
+   Used by some optimization.
+ * Add theano.tensor.io.{MPIRecv,MPIRecvWait,MPISend,MPISendWait} (Matthew Rocklin)
+   Theano do not automatically use them. It is up to you to use them and split your computation.
+ * Added theano.sandbox.linalg.eig (abalkin)
+ * Started some support for Python3 (abalkin)
+   setup.py support python3 now.
+   It call 2to3 during the setup.
+   Python3 not fully supported as we didn't update the c code.
+
+
+Crash Fix:
+ * Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
+ * Fix an optimization warning. Now it get optimized. (Frederic B.)
+ * Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
+ * Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
+ * Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.)
+   Also implement the gradient on the min/max bound.
+ * Fix crash in the grad of tensor.switch for int (Ian G.)
+ * Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
+ * Fix crash as sometimes sparse.dot would return a different dtype number
+   that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
+ * Better error msg (Ian G.)
+ * Move all sparse random function back to sandbox as it don't have a state inside Theano. (Pascal L.)
+   They where moved outside the sandbox in 0.6rc1
+ * LoadFromDisk now is allowed to take only support some memmap mode. (Pascal L.)
+   Otherwise, this was causing errors, segmentation faults or wrong results.
+ * Fix import problem on PiCloud (Jeremiah Lowin)
+    * You need to use the c|py linker with the default
+      environment. Otherwise, you need to create your own environment.
+ * Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
+ * Better handling and error message of gradients on integer. (Ian G.)
+ * Fixes a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)

-https://github.com/Theano/Theano/wiki/Devnews
+Other:
+ * Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G., abalkin.

 =============
 Release Notes

--- a/doc/NEWS.txt
+++ b/doc/NEWS.txt
 .. _NEWS:

-Updates in the Trunk since the last release:
+=============
+Release Notes
+=============
+
+Theano 0.6rc2 (November 21th, 2012)
+===================================
+
+Highlight:
+ * Fix a few regression inserted in 0.6rc1.
+ * A few new features.
+ * Speed up.
+ * Scan fix.
+ * Crash fix.
+ * A few small interface change.
+
+Commiters for this rc2 only:
+Razvan Pascanu
+Pascal Lamblin
+Frederic Bastien
+Ian Goodfellow
+Jeremiah Lowin
+Caglar Gulcehre
+Jey Kottalam
+Matthew Rocklin
+abalkin
+
+
+Regression in 0.6rc1 fixed:
+ * Fix the scan gradient dtype issue. In 0.6rc1, some upcast where inserted. (Razvan P.)
+ * Now grad() will do as before the 0.6rc1 for float, i.e. the grad dtype will be the same as the inputs inside the graph. If you ask for the direct grad, it will return the computed dtype. (Pascal L.)
+
+Wrong results fix:
+ * Scan fix in some case didn't returned the good results. (Razvan P., reported by Jeremiah L.)
+   This happen if you have a state with only neg tap and the outputs of the state is a function of some sequence.
+   If you have multiple state, there was no problem.
+ * Fixed bug in Scan with multiple outputs,
+   where one output would sometimes overwrite another one. (Razvan P.)
+ * Clip.grad treated the gradient with respect to the clipping boundary as always 0. (Ian G.)
+
+Interface change:
+ * Now we do not support unaligned ndarray in python code. (Frederic B.)
+   We did not support it in c code and supporting it in python code made
+   the detection harder.
+ * Now we only support officialy scipy 0.7.2 and numpy 1.5.0 (Frederic B.)
+   We weren't and aren't testing with older version.
+ * The theano.sparse.SparseType is available even when scipy is not (Frederic B.)
+ * Fixes issue where members of consider_constant grad parameter
+   were treated differently from Constant variables. (Ian G.)
+ * Remove the parameter g_cost to theano.grad(). (Ian G.)
+   Use the new more powerfull parameter known_grads instead.
+
+NumPy interface support:
+ * theano.tensor.where is an alias for theano.tensor.switch to support NumPy semantic. (Ian G.)
+ * TensorVariable objects now have dot, argmin, argmax, clip, conj, repeat, trace, std, round,
+   ravel and argsort functions and the real and imag properties as numpy.ndarray object.
+   The functionality was already available in Theano. (abalkin)
+
+Speed up:
+ * A C version of the SoftMax op (Razvan P.)
+   There was c code for the softmax with bias code.
+ * Faster GpuIncSubtensor (Ian G.)
+ * Faster copy on the GPU for 4d tensor. (Ian G.)
+ * The fix of flatten infer_shape re-enable an optimization (Pascal L.)
+   * The bug was introduced in 0.6rc1.
+ * Enable inc_subtensor on the GPU when updating it with a float64 dtype. (Ian G.)
+   It was causing an optimization warning.
+ * Make DeepCopy reuse preallocated memory. (Frederic B.)
+ * Move then convolution to the GPU when the image shape and logical image shape differ. (Frederic Bastien)
+ * C code for the View Op (Razvan P., Pascal L.)
+
+New Feature:
+ * Added a monitoring mode "MonitorMode" as a debugging tool. (Olivier D.)
+ * Allow integer axes when keepdims==True (Jeremiah Lowin)
+ * Add erfinv and erfcinv op. (Jey Kottalam)
+ * Added tensor.batched_dot(). (Caglar Gulcehre)
+   It use scan behind the scene, but making doing this easier.
+ * theano.get_constant_value(x) (Frederic B.)
+   This try to do have x as a constant int.
+   This do some constant folding to try to convert x into an int.
+   Used by some optimization.
+ * Add theano.tensor.io.{MPIRecv,MPIRecvWait,MPISend,MPISendWait} (Matthew Rocklin)
+   Theano do not automatically use them. It is up to you to use them and split your computation.
+ * Added theano.sandbox.linalg.eig (abalkin)
+ * Started some support for Python3 (abalkin)
+   setup.py support python3 now.
+   It call 2to3 during the setup.
+   Python3 not fully supported as we didn't update the c code.
+
+
+Crash Fix:
+ * Fix a crash related to scan.grad due to the new mechanism. (Ian G.)
+ * Fix an optimization warning. Now it get optimized. (Frederic B.)
+ * Fix crash introduced in 0.6rc1 in theano.grad (Ian G.)
+ * Fix crash introduced in 0.6rc1 in the grad of scan (Razvan P.)
+ * Fix crash introduced in 0.6rc1 in the grad of clip (Ian G.)
+   Also implement the gradient on the min/max bound.
+ * Fix crash in the grad of tensor.switch for int (Ian G.)
+ * Fix crash when mixing shared variable on the GPU and sparse dot. (Pascal L.)
+ * Fix crash as sometimes sparse.dot would return a different dtype number
+   that is equivalent but not the one expected. (Pascal L., reported by Rami Al-Rfou)
+ * Better error msg (Ian G.)
+ * Move all sparse random function back to sandbox as it don't have a state inside Theano. (Pascal L.)
+   They where moved outside the sandbox in 0.6rc1
+ * LoadFromDisk now is allowed to take only support some memmap mode. (Pascal L.)
+   Otherwise, this was causing errors, segmentation faults or wrong results.
+ * Fix import problem on PiCloud (Jeremiah Lowin)
+    * You need to use the c|py linker with the default
+      environment. Otherwise, you need to create your own environment.
+ * Fix a crash during optimization when we take a subtensor of a constant with a non constant index. (Ian G.)
+ * Better handling and error message of gradients on integer. (Ian G.)
+ * Fixes a crash where Scan assumed all TypeErrors raised by the grad function were due to undefined gradients (Ian G.)

-https://github.com/Theano/Theano/wiki/Devnews
+Other:
+ * Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G., abalkin.

 =============
 Release Notes
@@ -72,7 +183,7 @@ Deprecation:
   This was a predecessor of SharedVariable with a less pythonic philosophy.

 Interface changes:
- * Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.8.
+ * Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.7.2.
 * In Theano 0.5, we removed the deprecated sharedvar.value property.
   Now we raise an error if you access it. (Frederic B.)
 * theano.function does not accept duplicate inputs, so function([x, x], ...)

--- a/doc/conf.py
+++ b/doc/conf.py
@@ -53,7 +53,7 @@ copyright = '2008--2012, LISA lab'
 # The short X.Y version.
 version = '0.6'
 # The full version, including alpha/beta/rc tags.
-release = '0.6rc1'
+release = '0.6rc2'

 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:

--- a/doc/extending/op.txt
+++ b/doc/extending/op.txt
@@ -249,6 +249,8 @@ following methods:
  1) They must be Variable instances.
  2) When they are types that have dtypes, they must never have an integer dtype.

+  The output gradients passed *to* Op.grad will also obey these constraints.
+
  Integers are a tricky subject. Integers are the main reason for having DisconnectedType,
  NullType or zero gradient. When you have an integer as an argument to your grad method,
  recall the definition of a derivative to help you decide what value to return:

--- a/doc/library/printing.txt
+++ b/doc/library/printing.txt
@@ -57,8 +57,8 @@ Theano also provides :func:`theano.printing.pydotprint` that creates a png image

 The parameter in T.dscalar('x') in the first line is the name of this variable 
 in the graph. This name is used when printing the graph to make it more readable.
-If no name is provided the variable x is printed as its type as. In this example
-<TensorType(float64, scalar)>. 
+If no name is provided the variable x is printed as its type as returned by
+x.type(). In this example - <TensorType(float64, scalar)>. 

 The name parameter can be any string. There are no naming restrictions: 
 in particular, you can have many variables with the same name. 
@@ -86,7 +86,7 @@ The line ``|x [@C`` means the variable named ``x`` with debugprint identifier
 your graph, their different debugprint identifier will be your clue.

 The line ``|TensorConstant{2.0} [@B]`` means that there is a constant 2.0
-wit this debugprint identifier.
+with this debugprint identifier.

 The line ``Elemwise{mul,no_inplace} [@A] ''`` is indented less than
 the other ones, because it means there is a variable computed by multiplying
@@ -121,7 +121,7 @@ Elemwise{mul} [@A] ''
 |Elemwise{mul} [@B] ''   
 |Elemwise{pow} [@C] ''   

-If the depth parameter is provided, it limits the nuber of levels that are
+If the depth parameter is provided, it limits the number of levels that are
 shown.



--- a/doc/tutorial/debug_faq.txt
+++ b/doc/tutorial/debug_faq.txt
@@ -14,7 +14,7 @@ own Theano code, and even (it happens) in Theano's internals, in
 Isolating the Problem/Testing Theano Compiler
 ---------------------------------------------

-You can run your Theano function in a :ref:`DebugMode<using_debugmode>`. 
+You can run your Theano function in a :ref:`DebugMode<using_debugmode>`.
 This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.


@@ -56,12 +56,12 @@ following example.
    # compile and call the actual function
    f = theano.function([x], h2)
    f(numpy.random.rand(5, 10))
-   
+
 Running the above code generates the following error message:

 .. code-block:: bash

-    Definition in: 
+    Definition in:
      File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 1102, in apply
        lopt_change = self.process_node(fgraph, node, lopt)
      File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 882, in process_node
@@ -83,8 +83,8 @@ Running the above code generates the following error message:
        thunk()
      File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/cc.py", line 1111, in execute
        raise exc_type, exc_value, exc_trace
-    ValueError: ('Shape mismatch: x has 10 cols but y has 20 rows', 
-    _dot22(x, <TensorType(float64, matrix)>), [_dot22.0], 
+    ValueError: ('Shape mismatch: x has 10 cols but y has 20 rows',
+    _dot22(x, <TensorType(float64, matrix)>), [_dot22.0],
    _dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4')

 Needless to say, the above is not very informative and does not provide much in
@@ -114,7 +114,7 @@ following error message, which properly identifies *line 23* as the culprit.

    Traceback (most recent call last):
      File "test2.py", line 23, in <module>
-        h1 = T.dot(x,func_of_W1)  
+        h1 = T.dot(x,func_of_W1)
      File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 360, in __call__
        node.op.perform(node, input_vals, output_storage)
      File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/basic.py", line 4458, in perform
@@ -167,8 +167,8 @@ Theano provides a 'Print' op to do this.

 Since Theano runs your program in a topological order, you won't have precise
 control over the order in which multiple ``Print()`` ops are evaluted.  For a more
-precise inspection of what's being computed where, when, and how, see the discussion 
-:ref:`faq_wraplinker`.
+precise inspection of what's being computed where, when, and how, see the discussion
+:ref:`faq_monitormode`.

 .. warning::

@@ -196,7 +196,7 @@ You can read about them in :ref:`libdoc_printing`.
 "The Function I Compiled is Too Slow, what's up?"
 -------------------------------------------------

-First, make sure you're running in ``FAST_RUN`` mode. Even though 
+First, make sure you're running in ``FAST_RUN`` mode. Even though
 ``FAST_RUN`` is the default mode, insist by passing ``mode='FAST_RUN'``
 to ``theano.function`` (or ``theano.make``) or by setting :attr:`config.mode`
 to ``FAST_RUN``.
@@ -206,7 +206,7 @@ Second, try the Theano :ref:`using_profilemode`.  This will tell you which

 Tips:

-* Use the flags ``floatX=float32`` to require type *float32* instead of *float64*; 
+* Use the flags ``floatX=float32`` to require type *float32* instead of *float64*;
  Use the Theano constructors matrix(),vector(),... instead of dmatrix(), dvector(),...
  since they respectively involve the default types *float32* and *float64*.
 * Check in the ``profile`` mode that there is no ``Dot`` op in the post-compilation
@@ -216,48 +216,79 @@ Tips:
  of type *float64*.


-.. _faq_wraplinker:
+.. _faq_monitormode:

-"How do I Step through a Compiled Function with the WrapLinker?"
----------------------------------------------------------------
+"How do I Step through a Compiled Function?"
+--------------------------------------------

-This is not exactly a FAQ, but the doc is here for now...
-It's pretty easy to roll-your-own evaluation mode.
-Check out this one:
+You can use ``MonitorMode`` to inspect the inputs and outputs of each
+node being executed when the function is called. The code snipped below
+shows how to print all inputs and outputs:

 .. code-block:: python

-    class PrintEverythingMode(Mode):
-        def __init__(self):
-            def print_eval(i, node, fn):
-                print i, node, [input[0] for input in fn.inputs],
-                fn()
-                print [output[0] for output in fn.outputs]
-            wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()], [print_eval])
-            super(PrintEverythingMode, self).__init__(wrap_linker, optimizer='fast_run')
-
-When you use ``mode=PrintEverythingMode()`` as the mode for ``Function`` or ``Method``,
-then you should see [potentially a lot of] output.  Every ``Apply`` node will be printed out,
-along with its position in the graph, the arguments to the functions ``perform`` or
-``c_code`` and the output it computed.  
+    import theano
+
+    def inspect_inputs(i, node, fn):
+        print i, node, [input[0] for input in fn.inputs],
+
+    def inspect_outputs(i, node, fn):
+        print [output[0] for output in fn.outputs]

->>> x = T.dscalar('x')
->>> f = function([x], [5 * x], mode=PrintEverythingMode())
->>> f(3)
->>> # print: 0 Elemwise{mul,no_inplace}(5, x) [array(5, dtype=int8), array(3.0)] [array(15.0)]
->>> # print: [array(15.0)]
+    x = theano.tensor.dscalar('x')
+    f = theano.function([x], [5 * x],
+                        mode=theano.compile.MonitorMode(
+                            pre_func=inspect_inputs,
+                            post_func=inspect_outputs))
+    f(3)

+    # The code will print the following:
+    #   0 Elemwise{mul,no_inplace}(TensorConstant{5.0}, x) [array(5.0), array(3.0)] [array(15.0)]
+
+When using these ``inspect_inputs`` and ``inspect_outputs`` functions
+with ``MonitorMode``, you should see [potentially a lot of] printed output.
+Every ``Apply`` node will be printed out,
+along with its position in the graph, the arguments to the functions ``perform`` or
+``c_code`` and the output it computed.
 Admittedly, this may be a huge amount of
 output to read through if you are using big tensors... but you can choose to
-put logic inside of the *print_eval* function that would, for example, print 
+add logic that would, for instance, print
 something out only if a certain kind of op were used, at a certain program
 position, or only if a particular value showed up in one of the inputs or outputs.
-Use your imagination :)
+A typical example is to detect when NaN values are added into computations, which
+can be achieved as follows:
+
+.. code-block:: python
+
+    import numpy
+
+    import theano
+
+    def detect_nan(i, node, fn):
+        for output in fn.outputs:
+            if numpy.isnan(output[0]).any():
+                print '*** NaN detected ***'
+                theano.printing.debugprint(node)
+                print 'Inputs : %s' % [input[0] for input in fn.inputs]
+                print 'Outputs: %s' % [output[0] for output in fn.outputs]
+                break
+
+    x = theano.tensor.dscalar('x')
+    f = theano.function([x], [theano.tensor.log(x) * x],
+                        mode=theano.compile.MonitorMode(
+                            post_func=detect_nan))
+    f(0)  # log(0) * 0 = -inf * 0 = NaN
+
+    # The code above will print:
+    #   *** NaN detected ***
+    #   Elemwise{Composite{[mul(log(i0), i0)]}} [@A] ''
+    #    |x [@B]
+    #   Inputs : [array(0.0)]
+    #   Outputs: [array(nan)]
+

 .. TODO: documentation for link.WrapLinkerMany

-This can be a really powerful debugging tool. Note the call to *fn* inside the call to
-*print_eval*; without it, the graph wouldn't get computed at all!

 How to Use pdb
 --------------

--- a/doc/tutorial/modes.txt
+++ b/doc/tutorial/modes.txt
@@ -153,6 +153,13 @@ short name        Full constructor
 ``ProfileMode``   ``compile.profilemode.ProfileMode()``                           C implementations where available, all available graph transformations, print profile information.
 ================= =============================================================== ===============================================================================

+.. Note::
+
+    For debugging purpose, there also exists a ``MonitorMode`` (which has no
+    short name). It can be used to step through the execution of a function:
+    see :ref:`the debugging FAQ<faq_monitormode>` for details.
+
+
 Linkers
 =======


--- a/setup.py
+++ b/setup.py
@@ -14,13 +14,13 @@ try:
 except ImportError:
    from distutils.core import setup
 try:
-   from distutils.command.build_py import build_py_2to3 \
+    from distutils.command.build_py import build_py_2to3 \
        as build_py
-   from distutils.command.build_scripts import build_scripts_2to3 \
+    from distutils.command.build_scripts import build_scripts_2to3 \
        as build_scripts
 except ImportError:
-   from distutils.command.build_py import build_py
-   from distutils.command.build_scripts import build_scripts
+    from distutils.command.build_py import build_py
+    from distutils.command.build_scripts import build_scripts


 CLASSIFIERS = """\
@@ -55,7 +55,7 @@ PLATFORMS           = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
 MAJOR               = 0
 MINOR               = 6
 MICRO               = 0
-SUFFIX              = "rc1"  # Should be blank except for rc's, betas, etc.
+SUFFIX              = "rc2"  # Should be blank except for rc's, betas, etc.
 ISRELEASED          = False

 VERSION             = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX)

--- a/theano/compile/__init__.py
+++ b/theano/compile/__init__.py
@@ -21,6 +21,8 @@ from module import *
 import debugmode   # register DEBUG_MODE
 from debugmode import DebugMode

+from monitormode import MonitorMode
+
 from profilemode import ProfileMode

 from theano.compile.sharedvalue import shared, shared_constructor, SharedVariable

--- a/theano/compile/builders.py
+++ b/theano/compile/builders.py
@@ -55,9 +55,12 @@ class OpFromGraph(gof.Op):

        if grad_depth > 0:
            output_grads = [t() for t in self.output_types]
-            gd = G.grad_sources_inputs(zip(self.outputs, output_grads),
-                    self.inputs)
-            gs = map(gd.get, self.inputs)
+            # OpFromGraph doesn't implement a connection_pattern, so for now we regard
+            # all inputs and outputs as connected. This will compute the right numerical
+            # value for the gradients but could fail to raise the disconnected inputs error
+            # in some cases.
+            gs = G.grad(cost=None, known_grads=dict(zip(self.outputs, output_grads)),
+                    wrt=self.inputs, disconnected_inputs='ignore')
            self.grad_ops = []
            for g in gs:
                if g is None:

--- a/theano/compile/monitormode.py
+++ b/theano/compile/monitormode.py
+# Note: this code was initially copied from the 'pyutools' package by its
+# original author, and re-licensed under Theano's license.
+
+
+import theano
+from theano.compile.mode import Mode
+
+
+class MonitorMode(Mode):
+
+    """
+    `MonitorMode` is a debug mode to easily step through function execution.
+
+    Its default behavior is to behave like the 'FAST_RUN' mode. By providing
+    either a `pre_func` (called before a node is executed) or a `post_func`
+    (called after a node is executed) monitoring function, the user can inspect
+    node behavior.
+
+    A typical use case is to detect the introduction of NaN values in a graph.
+    For an example of such a use case, see doc/tutorial/debug_faq.txt.
+    """
+
+    def __init__(self, pre_func=None, post_func=None, optimizer='fast_run'):
+        """
+        Constructor.
+
+        :param pre_func: A function to call before executing a thunk, with
+            arguments:
+                - the thunk index
+                - the Apply node
+                - the thunk to be called
+
+        :param post_func: A function to call after executing a thunk, with the
+            same three arguments as `pre_func`.
+
+        :param optimizer: The optimizer to use. One may use for instance
+            'fast_compile' to skip optimizations.
+        """
+        self.pre_func = pre_func
+        self.post_func = post_func
+        wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()],
+                                                [self.eval])
+        super(MonitorMode, self).__init__(wrap_linker, optimizer=optimizer)
+
+    def eval(self, i, node, fn):
+        """
+        The method that calls the thunk `fn`.
+        """
+        if self.pre_func is not None:
+            self.pre_func(i, node, fn)
+        fn()
+        if self.post_func is not None:
+            self.post_func(i, node, fn)
--- a/theano/compile/tests/test_monitormode.py
+++ b/theano/compile/tests/test_monitormode.py
+import numpy
+
+import theano
+
+
+def test_detect_nan():
+    """
+    Test the code snippet example that detects NaN values.
+    """
+    nan_detected = [False]
+
+    def detect_nan(i, node, fn):
+        for output in fn.outputs:
+            if numpy.isnan(output[0]).any():
+                print '*** NaN detected ***'
+                theano.printing.debugprint(node)
+                print 'Inputs : %s' % [input[0] for input in fn.inputs]
+                print 'Outputs: %s' % [output[0] for output in fn.outputs]
+                nan_detected[0] = True
+                break
+
+    x = theano.tensor.dscalar('x')
+    f = theano.function([x], [theano.tensor.log(x) * x],
+                        mode=theano.compile.MonitorMode(
+                            post_func=detect_nan))
+    f(0)  # log(0) * 0 = -inf * 0 = NaN
+    assert nan_detected[0]
--- a/theano/gradient.py
+++ b/theano/gradient.py
@@ -13,9 +13,11 @@ import warnings
 _logger = logging.getLogger('theano.gradient')

 import numpy  # for numeric_grad
+np = numpy

 import theano

+from itertools import izip
 from theano import gof
 from theano.gof import Variable
 from theano.gof.python25 import all
@@ -317,9 +319,6 @@ def Lop(f, wrt, eval_points, consider_constant=None,
        coordinates of the tensor element in the last
        If `f` is a list/tuple, then return a list/tuple with the results.
    """
-    if consider_constant is None:
-        consider_constant = []
-
    if type(eval_points) not in (list, tuple):
        eval_points = [eval_points]

@@ -333,50 +332,15 @@ def Lop(f, wrt, eval_points, consider_constant=None,
    f = list(f)
    grads = list(eval_points)

-    for elem in consider_constant:
-        assert elem not in f
-        f.append(elem)
-        grads.append(elem.zeros_like())
-
    if not isinstance(wrt, (list, tuple)):
        wrt = [wrt]

-    arg1 = zip(f, eval_points)
-    arg2 = list(wrt)
-
-    gmap = grad_sources_inputs(
-        arg1,
-        arg2)
-
-    # Note : If p is not in gmap there can be several reasons, among which
-    # is the fact that p might not be part of the computational graph. A
-    # simple example is that for a+b for e.g. a[0] is not part of the graph,
-    # so Theano does not know how to compute TT.grad(TT.sum(a+b), a[0])
-    # such subtle cases can be fixed by a more careful implementation of the
-    # gradient, but for now Theano needs to throw an exception, and make the
-    # user aware that it does not know how to compute that gradient
-    ret = []
-    for p in wrt:
-        if p in gmap:
-            ret.append(gmap[p])
-        else:
-            message = (
-                "Lop method was asked to compute the gradient "
-                "with respect to a variable that is not part of "
-                "the computational graph of the cost, or is used "
-                "only by a non-differentiable operator: %s" % p)
-            if disconnected_inputs == 'ignore':
-                pass
-            elif disconnected_inputs == 'warn':
-                warnings.warn(message, stacklevel=1)
-            elif disconnected_inputs == 'raise':
-                raise ValueError(message)
-            else:
-                raise ValueError(
-                    "Invalid value for keyword "
-                    "'disconnected_inputs', valid values are "
-                    "'ignore', 'warn' and 'raise'.")
-            ret.append(p.zeros_like())
+    assert len(f) == len(grads)
+    known = dict(izip(f, grads))
+
+    ret = grad(cost=None, known_grads=known,
+            consider_constant=consider_constant, wrt=wrt,
+            disconnected_inputs=disconnected_inputs)

    return format_as(using_list, using_tuple, ret)

@@ -385,14 +349,13 @@ def Lop(f, wrt, eval_points, consider_constant=None,
 # Gradient
 #########################

-def grad(cost, wrt, g_cost=None, consider_constant=None,
-        disconnected_inputs='raise', add_names=True):
+def grad(cost, wrt, consider_constant=None,
+        disconnected_inputs='raise', add_names=True,
+        known_grads=None, return_disconnected='zero'):
    """
    :type cost: Scalar (0-dimensional) Variable.
+        May optionally be None if known_grads is provided.
    :type wrt: Variable or list of Variables.
-    :type g_cost: Scalar Variable, or None.
-    :param g_cost: an expression for the gradient through cost.  The default is
-        ``ones_like(cost)``.
    :param consider_constant: a list of expressions not to backpropagate
        through

@@ -402,13 +365,27 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,
        (or if all links are non-differentiable). The possible values are:
        - 'ignore': considers that the gradient on these parameters is zero.
        - 'warn': consider the gradient zero, and print a warning.
-        - 'raise': raise an exception.
+        - 'raise': raise DisconnectedInputError.

    :type add_names: bool
    :param add_names: If True, variables generated by grad will be named
        (d<cost.name>/d<wrt.name>) provided that both cost and wrt have
        names

+    :type known_grads: dict
+    :param known_grads: If not None, a dictionary mapping variables to their
+            gradients. This is useful in the case where you know the
+            gradient on some variables but do not know the original
+            cost.
+
+    :type return_disconnected: string
+    :param return_disconnected:
+        'zero' : If wrt[i] is disconnected, return value i will be
+                 wrt[i].zeros_like()
+        'None' : If wrt[i] is disconnected, return value i will be
+                 None
+        'Disconnected' : returns variables of type DisconnectedType
+
    :rtype: Variable or list/tuple of Variables (depending upon `wrt`)

    :return: symbolic expression of gradient of `cost` with respect to `wrt`.
@@ -422,29 +399,17 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,
    if tensor is None:
        from theano import tensor

-    if isinstance(cost.type, NullType):
+    if cost is None:
+        assert known_grads is not None
+
+    if cost is not None and isinstance(cost.type, NullType):
        raise ValueError("Can't differentiate a NaN cost."
            "cost is NaN because " + \
                cost.type.why_null)

-    if cost.ndim != 0:
+    if cost is not None and cost.ndim != 0:
        raise TypeError("cost must be a scalar.")

-    if consider_constant is None:
-        consider_constant = []
-    else:
-        # error checking on consider_constant: verify that it is a collection
-        # of theano variables
-        # this is important, if someone accidentally passes a nested data
-        # structure with theano variables at the leaves, only the root will
-        # be properly considered constant
-        if not hasattr(consider_constant, '__iter__'):
-            raise TypeError('consider_constant must be an iterable collection,'
-                    ' got ' + str(type(consider_constant)))
-        for elem in consider_constant:
-            if not isinstance(elem, gof.Variable):
-                raise TypeError('Elements of consider_constant must be '
-                                'variables, but got ' + str(type(elem)))

    if isinstance(wrt, set):
        raise TypeError("wrt must not be a set. sets have no defined "
@@ -461,83 +426,99 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,
            raise TypeError("Expected Variable, got " + str(elem) +
                    " of type "+str(type(elem)))

-    var_to_node_to_idx = _populate_var_to_node_to_idx([cost], wrt)
+    outputs = []
+    if cost is not None:
+        outputs.append(cost)
+    if known_grads is not None:
+        outputs.extend(known_grads.keys())
+
+    var_to_node_to_idx = _populate_var_to_node_to_idx(
+            outputs, wrt, consider_constant)

    # build a dict mapping var to the gradient of cost with respect to var
    grad_dict = {}

-    # The gradient of the cost should default to 1 if the cost is of a
-    # continuous dtype (float, for the moment, as complex are unsupported),
-    # and should always be 0 if the cost is of discrete (integer) dtype.
-    if getattr(cost.type, 'dtype', None) not in tensor.float_dtypes:
-        if g_cost is not None:
-            try:
-                cval = theano.get_constant_value(g_cost)
-                if cval == 0:
-                    g_cost_is_zero = True
-                else:
-                    g_cost_is_zero = False
-            except TypeError:
-                g_cost_is_zero = False
-
-            if not g_cost_is_zero:
-                raise ValueError("The gradient of a cost of non-continuous "
-                        "dtype (here, %s), if it is defined, should be 0. "
-                        "However, a value of %s was provided in the 'g_cost' "
-                        "argument of theano.grad(). To remove this error, "
-                        "you can simply omit the 'g_cost' argument, or "
-                        "give it the default value of None." % (
-                            getattr(g_cost.type, 'dtype', 'no dtype defined'),
-                            g_cost))
-        g_cost = tensor.zeros_like(cost)
-
-    elif g_cost is None:
-        # cost.type.dtype is in tensor.float_dtypes at that point
-        g_cost = tensor.ones_like(cost)
+    if known_grads is None:
+        known_grads = {}

-    else:
-        # Cast the provided gradient so that it has the same dtype
-        # as the cost.
-        g_cost = g_cost.astype(cost.type.dtype)
+    # The gradient of the cost is 1 unless specified otherwise by known_grads.
+    if cost is not None:
+        if cost in known_grads:
+            g_cost = known_grads[cost]
+        else:
+            g_cost = _float_ones_like(cost)
+        # g_cost may be Disconnected or NullType. A creative use of the function,
+        # sure, but nonetheless one we can and should support. So before we try
+        # to cast it make sure it even has a dtype
+        if hasattr(g_cost.type, 'dtype') and cost.type.dtype not in tensor.discrete_dtypes:
+            # Here we enforce the constraint that floating point variables have
+            # the same dtype as their gradient.
+            g_cost = g_cost.astype(cost.type.dtype)
+        # DO NOT enforce g_cost to be 0 if cost is an integer.
+        # This is to be enforced by the Op.grad method for the Op that outputs cost.
+        assert g_cost not in tensor.discrete_dtypes
+
+        grad_dict[cost] = g_cost
+
+    for var in known_grads:
+        g_var = known_grads[var]
+
+        if not hasattr(g_var, 'type'):
+            raise TypeError('output grads must be theano variables.'
+                'Ambiguous whether %s should be made into tensor'
+                ' or sparse theano variable' % str(type(g_var)))

-    grad_dict[cost] = g_cost
+        if not isinstance(g_var.type, (NullType, DisconnectedType)) and 'float' \
+            not in str(g_var.type.dtype):
+            raise TypeError("Gradients must always be NullType, "
+                    "DisconnectedType, or continuous, but grad was "
+                    "given a known_grad of type "+str(g_var.type))

-    # the gradient of the constants is 0
-    for const in consider_constant:
-        grad_dict[const] = DisconnectedType()()
+        # DO NOT check that these gradients are equal to 0 if var is int
+        # The gradient is allowed to be non-zero on var in that case
+        # Ops outputing var should not backpropagate its gradient further
+        # but that is enforced elsewhere (grep for only_connected_to_int)

-    # variables that do not influence the cost have zero gradient.
-    # if wrt is such a variable, populate the grad_dict with this info
-    # so that wrt not being in var_to_node_to_idx won't cause an error below
-    # according to the flag, possibly raise an error if wrt is disconnected
-    for elem in wrt:
-        if elem not in var_to_node_to_idx and elem is not cost:
+        grad_dict[var] = g_var
+
+
+    def handle_disconnected(var):
            message = ("grad method was asked to compute the gradient "
                    "with respect to a variable that is not part of "
                    "the computational graph of the cost, or is used "
-                    "only by a non-differentiable operator: %s" % elem)
+                    "only by a non-differentiable operator: %s" % var)
            if disconnected_inputs == 'ignore':
                pass
            elif disconnected_inputs == 'warn':
                warnings.warn(message, stacklevel=2)
            elif disconnected_inputs == 'raise':
-                raise ValueError(message)
+                raise DisconnectedInputError(message)
            else:
                raise ValueError("Invalid value for keyword "
                        "'disconnected_inputs', valid values are "
                        "'ignore', 'warn' and 'raise'.")
+
+
+    # variables that do not influence the cost have zero gradient.
+    # if wrt is such a variable, populate the grad_dict with this info
+    # so that wrt not being in var_to_node_to_idx won't cause an error below
+    # according to the flag, possibly raise an error if wrt is disconnected
+    for elem in wrt:
+        if elem not in var_to_node_to_idx and elem is not cost \
+                and elem not in grad_dict:
+            handle_disconnected(elem)
            grad_dict[elem] = DisconnectedType()()

    cost_name = None
-    if add_names:
+    if add_names and cost is not None:
        cost_name = cost.name

    # Make sure we didn't initialize the grad_dict with any ints
-    # for non-int outputs
+    # The gradient may NEVER be an int, even if the variable is an int.
+    # Read the Op contract and talk to Ian Goodfellow before changing this!
    for var in grad_dict:
        g = grad_dict[var]
-        if (hasattr(g.type, 'dtype') and
-                getattr(var.type, 'dtype', '') in tensor.float_dtypes):
+        if hasattr(g.type, 'dtype'):
            assert g.type.dtype in tensor.float_dtypes

    rval = _populate_grad_dict(var_to_node_to_idx,
@@ -545,7 +526,13 @@ def grad(cost, wrt, g_cost=None, consider_constant=None,

    for i in xrange(len(rval)):
        if isinstance(rval[i].type, DisconnectedType):
-            rval[i] = _float_zeros_like(wrt[i])
+            handle_disconnected(rval[i])
+            if return_disconnected == 'zero':
+                rval[i] = _float_zeros_like(wrt[i])
+            elif return_disconnected == 'None':
+                rval[i] = None
+            else:
+                assert return_disconnected == 'Disconnected'

    if using_tuple:
        rval = tuple(rval)
@@ -592,15 +579,18 @@ def _node_to_pattern(node):
    return connection_pattern


-def _populate_var_to_node_to_idx(outputs, wrt):
+def _populate_var_to_node_to_idx(outputs, wrt, consider_constant):
    """
-    Common code shared between grad and grad_sources_inputs
+    Helper function for grad function.

    outputs: a list of variables we want to take gradients of

    wrt: a list of variables we want to take the gradient with
        respect to.

+    consider_constant: a list of variables not to backpropagate
+        through.
+
    returns:

     var_to_app_to_idx:
@@ -622,8 +612,30 @@ def _populate_var_to_node_to_idx(outputs, wrt):
      This set is exactly the set of variables that connect
      the variables in wrt to the cost being differentiated.

+      (A variable in consider_constant is not a function of
+      anything)
+
    """

+    # Validate and format consider_constant
+    if consider_constant is None:
+        consider_constant = []
+    else:
+        # error checking on consider_constant: verify that it is a collection
+        # of theano variables
+        # this is important, if someone accidentally passes a nested data
+        # structure with theano variables at the leaves, only the root will
+        # be properly considered constant
+        try:
+            iter(consider_constant)
+        except TypeError:
+            raise TypeError('consider_constant must be an iterable collection,'
+                    ' got ' + str(type(consider_constant)))
+        for elem in consider_constant:
+            if not isinstance(elem, gof.Variable):
+                raise TypeError('Elements of consider_constant must be '
+                                'variables, but got ' + str(type(elem)))
+
    # var_to_app_to_idx[var][node] = [i,j] means node has
    # var as input at positions i and j
    var_to_app_to_idx = {}
@@ -638,9 +650,17 @@ def _populate_var_to_node_to_idx(outputs, wrt):
    accounted_for = set([])

    def account_for(var):
+        # Don't visit the same variable twice
        if var in accounted_for:
            return
        accounted_for.add(var)
+
+        # Constants are not a function of anything
+        if var in consider_constant:
+            return
+
+        # Recursively add the variables that this variable is
+        # a function of.
        if var.owner is not None:
            app = var.owner

@@ -699,11 +719,22 @@ def _populate_var_to_node_to_idx(outputs, wrt):

    return var_to_app_to_idx

+class NullTypeGradError(TypeError):
+    """
+    Raised when grad encounters a NullType.
+    """
+
+class DisconnectedInputError(ValueError):
+    """
+    Raised when grad is asked to compute the gradient
+    with respect to a disconnected input and
+    disconnected_inputs='raise'.
+    """

 def _populate_grad_dict(var_to_node_to_idx,
        grad_dict, wrt, cost_name=None):
    """
-        Common code shared between grad_sources_inputs and grad
+        Helper function for grad function.

        var_to_node_to_idx: a dictionary mapping a variable to
                a second dictionary.
@@ -711,14 +742,12 @@ def _populate_grad_dict(var_to_node_to_idx,
                this variable to the variable's index in the apply
                node's input list

-        grad_dict: a dictionary mapping variables to their gradients
-                   should be populated by grad or grad_sources_inputs
-
-                        grad should set gradients to DisconnectedType()() for
-                        variables to be considered constant, set the
-                        gradient for the cost variable to g_cost, etc.
-
-                        both should set the gradient for disconnected
+        grad_dict: A dictionary mapping variables to their gradients.
+                   Should be populated by grad function, which should:
+                       -Set the gradient with respect to the cost to 1
+                       -Load all gradients from known_grads, possibly overriding
+                        the cost
+                       -Set the gradient for disconnected
                        inputs to a variable with type DisconnectedType()

        wrt: the minimal set of variables that must be included in grad_dict
@@ -757,8 +786,42 @@ def _populate_grad_dict(var_to_node_to_idx,
                        input_to_outputs in connection_pattern
                    ]

-            if True in inputs_connected:
-                # At least one input of this op is connected to the cost so we must
+            #List of bools indicating if each output is an integer dtype
+            output_is_int = [hasattr(output.type, 'dtype') and
+                    output.type.dtype in theano.tensor.discrete_dtypes
+                    for output in node.outputs]
+
+            #List of bools indicating if each output is NullType
+            ograd_is_nan = [isinstance(output.type, NullType)
+                    for output in output_grads]
+
+            # List of bools indicating if each input only has NullType outputs
+            only_connected_to_nan = [(True not in
+                [in_to_out and out_to_cost and not out_nan
+                    for in_to_out, out_to_cost, out_nan in
+                    zip(in_to_outs, outputs_connected, ograd_is_nan)])
+                for in_to_outs in connection_pattern]
+
+            if True not in inputs_connected:
+                # All outputs of this op are disconnected so we can skip
+                # Calling the op's grad method and report that the inputs
+                # are disconnected
+                # (The op's grad method could do this too, but this saves the
+                # implementer the trouble of worrying about this case)
+                input_grads = [DisconnectedType()() for ipt in inputs]
+            elif False not in only_connected_to_nan:
+                # All inputs are only connected to nan gradients, so we don't
+                # need to bother calling the grad method. We know the gradient
+                # with respect to all connected inputs is nan.
+                input_grads = []
+                for connected in inputs_connected:
+                    if connected:
+                        input_grads.append(NullType()())
+                    else:
+                        input_grads.append(DisconnectedType()())
+            else:
+                # At least one input of this op is connected to the cost so and
+                # not all output gradients are undefined so we must
                # call the op's grad method

                # Each Op's grad function requires inputs and output_grads
@@ -779,38 +842,46 @@ def _populate_grad_dict(var_to_node_to_idx,

                inputs = [try_to_copy_if_needed(ipt) for ipt in inputs]

+
+
                # Build a list of output gradients with the same dtype as
                # the corresponding output variable.
                # If an output is of a float dtype, we want to cast the
                # output gradient into the same dtype, to avoid having a
                # gradient graph with double precision (taking more memory,
                # and more computation).
-                # If an output is of an integer dtype, then we ensure the
-                # output gradient is zero, and that zero can be represented
-                # in the same int dtype.
-                # If an output gradient is a NullType or DisconnectedType,
-                # then it will not have a dtype, and it will not be changed.
+                # If an output is of an integer dtype, then we just leave it
+                # alone.
+                # DO NOT force integer variables to have zero grad. This causes
+                # bugs where we fail to detect disconnected or undefined gradients.
+                # DO NOT force integer variables to have integer dtype. This is
+                # a violation of the op contract.
                new_output_grads = []
                for o, og in zip(node.outputs, output_grads):
                    o_dt = getattr(o.type, 'dtype', None)
                    og_dt = getattr(og.type, 'dtype', None)
-                    if og_dt and o_dt in theano.tensor.discrete_dtypes:
-                        new_output_grads.append(o.zeros_like())
-                    elif o_dt and og_dt and o_dt != og_dt:
+                    if o_dt not in theano.tensor.discrete_dtypes and og_dt and o_dt != og_dt:
                        new_output_grads.append(og.astype(o_dt))
                    else:
                        new_output_grads.append(og)

-                # Make sure that, if new_output_grads[i] has a dtype:
-                # - it is the same dtype as outputs[i]
-                # - if the dtype is an int, then new_output_grads[i] is 0.
+                # Make sure that, if new_output_grads[i] has a floating point dtype,
+                # it is the same dtype as outputs[i]
                for o, ng in zip(node.outputs, new_output_grads):
                    o_dt = getattr(o.type, 'dtype', None)
                    ng_dt = getattr(ng.type, 'dtype', None)
-                    if ng_dt:
+                    if ng_dt is not None and o_dt not in theano.tensor.discrete_dtypes:
                        assert ng_dt == o_dt
-                        if ng_dt in theano.tensor.discrete_dtypes:
-                            assert theano.get_constant_value(ng) == 0
+
+                # Someone who had obviously not read the Op contract tried
+                # to modify this part of the function.
+                # If you ever think it is a good idea to make an integer
+                # valued gradient, please
+                # 1) Read the Op contract again
+                # 2) Talk to Ian Goodfellow
+                # (Both of these sources will tell you not to do it)
+                for ng in new_output_grads:
+                    assert getattr(ng.type, 'dtype', None) not in theano.tensor.discrete_dtypes

                input_grads = node.op.grad(inputs, new_output_grads)

@@ -821,13 +892,6 @@ def _populate_grad_dict(var_to_node_to_idx,
                if len(input_grads) != len(inputs):
                    raise ValueError(("%s returned the wrong number of" +\
                            " gradient terms.") % str(node.op))
-            else:
-                # All outputs of this op are disconnected so we can skip
-                # Calling the op's grad method and report that the inputs
-                # are disconnected
-                # (The op's grad method could do this too, but this saves the
-                # implementer the trouble of worrying about this case)
-                input_grads = [DisconnectedType()() for ipt in inputs]

            # must convert to list in case the op returns a tuple
            # we won't be able to post-process out the Nones if it does that
@@ -835,18 +899,15 @@ def _populate_grad_dict(var_to_node_to_idx,

            # Do type checking on the result

-            #List of bools indicating if each output is an integer dtype
-            output_is_int = [hasattr(output.type, 'dtype') and
-                    output.type.dtype in theano.tensor.discrete_dtypes
-                    for output in node.outputs]

-            #List of bools indicating if each input only has integer outputs
+            # List of bools indicating if each input only has integer outputs
            only_connected_to_int = [(True not in
                [in_to_out and out_to_cost and not out_int
                    for in_to_out, out_to_cost, out_int in
                    zip(in_to_outs, outputs_connected, output_is_int)])
                for in_to_outs in connection_pattern]

+
            for i, term in enumerate(input_grads):

                # Disallow Nones
@@ -863,6 +924,7 @@ def _populate_grad_dict(var_to_node_to_idx,
                            'the grad_undefined or grad_unimplemented helper '
                            'functions.') % node.op)

+
                if not isinstance(term.type,
                        (NullType, DisconnectedType)):
                    if term.type.dtype not in theano.tensor.float_dtypes:
@@ -870,19 +932,18 @@ def _populate_grad_dict(var_to_node_to_idx,
                                ' returned an integer-valued variable.'
                                ' (Input index %d, dtype %s)' % (i,
                                    term.type.dtype))
+
+                    if only_connected_to_nan[i]:
+                        assert isinstance(term.type, NullType)
+
                    if only_connected_to_int[i]:
                        # This term has only integer outputs and we know
                        # it's not undefined or disconnected
                        # The only other valid thing it can be is 0

-                        no_constant_value = True
-                        try:
-                            constant_value = theano.get_constant_value(term)
-                            no_constant_value = False
-                        except TypeError:
-                            pass
-
-                        if no_constant_value:
+                        is_zero = _is_zero(term)
+                        assert is_zero in ['yes', 'no', 'maybe']
+                        if is_zero == 'maybe':
                            msg = "%s.grad returned %s of type %s for input"
                            msg += " %d. This input's only connections to "
                            msg += "the cost through this op are via "
@@ -896,8 +957,7 @@ def _populate_grad_dict(var_to_node_to_idx,
                            msg = msg % (str(node.op), str(term),
                                    str(type(term)), i)

-                            raise ValueError(msg)
-                        if constant_value != 0:
+                        if is_zero == 'no':
                            msg = "%s.grad returned %s of type %s for input"
                            msg += " %d. Since this input is only connected "
                            msg += "to integer-valued outputs, it should "
@@ -905,7 +965,7 @@ def _populate_grad_dict(var_to_node_to_idx,
                            msg += "%s."

                            msg % (str(node.op), str(term), str(type(term)),
-                                    i, str(constant_value))
+                                    i, str(theano.get_constant_value(term)))

                            raise ValueError(msg)

@@ -961,7 +1021,7 @@ def _populate_grad_dict(var_to_node_to_idx,
                                        type(term)))

                        if isinstance(term.type, NullType):
-                            raise TypeError("tensor.grad "
+                            raise NullTypeGradError("tensor.grad "
                                "encountered a NaN. " +\
                                    term.type.why_null)

@@ -997,113 +1057,6 @@ def _populate_grad_dict(var_to_node_to_idx,

    return rval

-
-def grad_sources_inputs(sources, graph_inputs):
-    """
-    Used to compute the gradient of a cost with respect to all the
-    variables between graph_input and cost, but in the special
-    case where you don't know the cost, you only know its gradient
-    on a set of intermediate values.
-
-    A gradient source is a pair (``v``, ``g_v``), in which ``v`` is
-    a `Variable`, and ``g_v`` is a `Variable` that is a gradient wrt
-    ``v``. More specifically, ``g_v`` is the gradient of an external
-    scalar cost, ``cost`` (that is not explicitly used), wrt ``v``.
-
-    This function traverses the graph backward from the ``r`` sources,
-    calling ``op.grad(...)`` for all ops with some non-None gradient
-    on an output, to compute gradients of ``cost`` wrt intermediate
-    variables and ``graph_inputs``.
-
-    The ``op.grad(...)`` functions are called like this:
-
-    .. code-block:: python
-
-        op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
-
-    This call to ``op.grad`` should return a list or tuple: one symbolic
-    gradient per input. These gradients represent the gradients of
-    the same implicit ``cost`` mentionned above, wrt ``op.inputs``.  Note
-    that this is **not** the same as the gradient of ``op.outputs`` wrt
-    ``op.inputs``.
-
-    If ``op`` has a single input, then ``op.grad`` should return a list
-    or tuple of length 1.
-    For each input wrt to which ``op`` is not differentiable, it should
-    return ``None`` instead of a `Variable` instance.
-
-    If a source ``r`` receives a gradient from another source ``r2``,
-    then the effective gradient on ``r`` is the sum of both gradients.
-
-
-    :type sources: list of pairs of Variable: (v, gradient-on-v) to
-                   initialize the total_gradient dictionary
-    :param sources: gradients to back-propagate using chain rule
-    :type graph_inputs: list of Variable
-    :param graph_inputs: variables considered to be constant
-        (do not backpropagate through them)
-
-    :rtype: dictionary whose keys and values are of type Variable
-    :return: mapping from each Variable encountered in the backward
-        traversal to the gradient with respect to that Variable.
-
-    It is assumed that there is some objective J shared between all members of
-    sources, so that for each v, gradient-on-v is the gradient of J with
-    respect to v
-
-    """
-
-    outputs, output_grads = zip(*sources)
-
-    for output_grad in output_grads:
-        if not hasattr(output_grad, 'type'):
-            raise TypeError('output grads must be theano variables.'
-                    'Ambiguous whether %s should be made into tensor'
-                    ' or sparse theano variable' % str(type(output_grad)))
-
-    if graph_inputs is None:
-        graph_inputs = gof.graph.inputs(outputs)
-
-    wrt = graph_inputs
-
-    var_to_node_to_idx = _populate_var_to_node_to_idx(outputs, wrt)
-
-    # build a dict mapping var to the gradient of cost with respect to var
-    grad_dict = {}
-
-    for output, output_grad in sources:
-        # The gradient of the cost should always be 0 if the cost is of
-        # discrete (integer) dtype.
-        if getattr(output.type, 'dtype', '') not in theano.tensor.float_dtypes:
-            output_grad = output.zeros_like()
-
-        else:
-            # Cast the provided gradient so that it has the same dtype
-            # as the cost.
-            output_grad = output_grad.astype(output.type.dtype)
-
-        grad_dict[output] = output_grad
-
-    # variables that do not influence the cost have zero gradient.
-    # if wrt is such a variable, populate the grad_dict with this info
-    # so that wrt not being in var_to_node_to_idx won't cause an error below
-    # according to the flag, possibly raise an error if wrt is disconnected
-    for elem in wrt:
-        if elem not in var_to_node_to_idx and elem not in outputs:
-            grad_dict[elem] = DisconnectedType()()
-
-    _populate_grad_dict(var_to_node_to_idx,
-            grad_dict, wrt)
-
-    # post-process out the DisconnectedTypes
-    for key in grad_dict:
-        if isinstance(grad_dict[key].type, DisconnectedType):
-            if hasattr(key, 'zeros_like'):
-                grad_dict[key] = _float_zeros_like(key)
-
-    return grad_dict
-
-
 def _float_zeros_like(x):
    """ Like zeros_like, but forces the object to have a
    a floating point dtype """
@@ -1634,3 +1587,32 @@ def hessian(cost, wrt, consider_constant=None,
                 "script that generated the error)")
        hessians.append(hess)
    return format_as(using_list, using_tuple, hessians)
+
+def _is_zero(x):
+    """
+    Returns 'yes', 'no', or 'maybe' indicating whether x
+    is always 0.
+    'maybe' means that x is an expression that is complicated enough
+    that we can't tell that it simplifies to 0.
+    """
+    if not hasattr(x, 'type'):
+        return np.all(x == 0.)
+    if isinstance(x.type, NullType):
+        return 'no'
+    if isinstance(x.type, DisconnectedType):
+        return 'yes'
+
+    no_constant_value = True
+    try:
+        constant_value = theano.get_constant_value(x)
+        no_constant_value = False
+    except TypeError:
+        pass
+
+    if no_constant_value:
+        return 'maybe'
+
+    if constant_value != 0.:
+        return 'no'
+
+    return 'yes'
--- a/theano/printing.py
+++ b/theano/printing.py
@@ -40,7 +40,7 @@ def debugprint(obj, depth=-1, print_type=False,
    :type depth: integer
    :param depth: print graph to this depth (-1 for unlimited)
    :type print_type: boolean
-    :param print_type: wether to print the type of printed objects
+    :param print_type: whether to print the type of printed objects
    :type file: None, 'str', or file-like object
    :param file: print to this file ('str' means to return a string)
    :type ids: str
@@ -531,11 +531,11 @@ def pydotprint(fct, outfile=None,
    label each edge between an input and the Apply node with the
    input's index.

-    green boxes are inputs variables to the graph
-    blue boxes are outputs variables of the graph
-    grey boxes are variables that are not outputs and are not used
+    Green boxes are inputs variables to the graph,
+    blue boxes are outputs variables of the graph,
+    grey boxes are variables that are not outputs and are not used,
    red ellipses are transfers from/to the gpu (ops with names GpuFromHost,
-    HostFromGpu)
+    HostFromGpu).

    """
    if colorCodes is None:

--- a/theano/scan_module/scan_op.py
+++ b/theano/scan_module/scan_op.py
@@ -221,7 +221,8 @@ class Scan(PureOp):
                    'following error has been encountered: The '
                    '%s %s (argument number %d) has dtype '
                    '%s and %d dimension(s). The corresponding slice %s '
-                    'however has dtype %s and %d dimension(s). This '
+                    'however has dtype %s and %d dimension(s) (it should '
+                    'have the same dtype and one fewer dimensions). This '
                    'should never happen, please '
                    'report to theano-dev mailing list'
                   )
@@ -1261,11 +1262,9 @@ class Scan(PureOp):
                             if x in diff_inputs]
            for x in consider_inps:
                try:
-                    _gmp = gradient.grad_sources_inputs(
-                        [(y, g_y)],
-                        [x])
-                    gmp[x] = _gmp[x]
-                except TypeError:
+                    gmp[x] = gradient.grad(cost=None,
+                                           known_grads={y: g_y}, wrt=x)
+                except gradient.NullTypeGradError:
                    # It means the gradient is undefined (which implies
                    # is connected)
                    gmp[x] = x
@@ -1374,11 +1373,21 @@ class Scan(PureOp):
                        self.inner_nitsot_outs(self_outputs))

        def compute_gradient(y, g_y):
-            gmp = gradient.grad_sources_inputs(
-                    [(y, g_y)],
-                    [x for x in theano.gof.graph.inputs([y])
-                     if x in diff_inputs])
-            return [gmp.get(p, None) for p in diff_inputs]
+            if 'int' in str(g_y.dtype):
+                raise TypeError("Gradients may never be integers but g_y "
+                        "has type "+str(g_y.type))
+
+            wrt  = [x for x in theano.gof.graph.inputs([y])
+                    if x in diff_inputs]
+            grads =  gradient.grad(
+                    cost = None,
+                    known_grads = {y : g_y },
+                    wrt=wrt, consider_constant=wrt,
+                    disconnected_inputs='ignore',
+                    return_disconnected='None')
+            gmp = dict(zip(wrt, grads))
+            rval =  [gmp.get(p, None) for p in diff_inputs]
+            return rval
        dC_dinps_t = [None for inp in diff_inputs]
        disconnected_dC_dinps_t = [True for inp in diff_inputs]
        dC_dXts = []

--- a/theano/tensor/basic.py
+++ b/theano/tensor/basic.py
@@ -464,13 +464,27 @@ def _allclose(a, b, rtol=None, atol=None):
    return numpy.allclose(a, b, atol=atol_, rtol=rtol_)


+class NotConstantError(TypeError):
+    """
+    Raised by get_constant_value if called on something that is
+    not constant.
+    For now it is a TypeError, to maintain the old interface
+    that get_constant_value should raise a TypeError in this
+    situation. However, this is unsafe because get_constant_value
+    could inadvertently raise a TypeError if it has a bug.
+    So we should eventually make NotConstantError derive
+    from Exception directly, and modify all code that uses
+    get_constant_value to catch this more specific exception.
+    """
+    pass
+
 def get_constant_value(v):
    """return the constant scalar(0-D) value underlying variable `v`

    If v is the output of dimshuffles, fills, allocs, rebroadcasts, cast
    this function digs through them.

-    If `v` is not some view of constant data, then raise a TypeError.
+    If `v` is not some view of constant data, then raise a NotConstantError.

    :note: There may be another function similar to this one in the
        code, but I'm not sure where it is.
@@ -490,7 +504,7 @@ def get_constant_value(v):
            numpy.complex(data)  # works for all numeric scalars
            return data
        except Exception:
-            raise TypeError(
+            raise NotConstantError(
                'v.data is non-numeric, non-scalar, or has more than one'
                ' unique value', v)
    if v.owner:
@@ -518,9 +532,17 @@ def get_constant_value(v):
            v.owner.op.perform(v.owner, [const], ret)
            return ret[0][0]
        if isinstance(v.owner.op, Subtensor) and v.ndim == 0:
-            if isinstance(v.owner.inputs[0], TensorConstant):
-                return v.owner.inputs[0].data.__getitem__(
+            # This condition depends on Subtensor always embedding constant
+            # indices in the Op rather than making them inputs to the Apply node
+            if isinstance(v.owner.inputs[0], TensorConstant) and \
+                len(v.owner.inputs) == 1:
+                try:
+                    return v.owner.inputs[0].data.__getitem__(
                    tuple(v.owner.op.idx_list))
+                except IndexError:
+                    raise IndexError(str(tuple(v.owner.op.idx_list))+" is not a valid index into " + \
+                            str(v.owner.inputs[0].data))
+

            # The index list 'idx_list' should have length the same
            # shape as the input.
@@ -1614,6 +1636,9 @@ class _tensor_py_operators:
    def flatten(self, ndim=1):
        return flatten(self, ndim)

+    def ravel(self):
+        return flatten(self)
+
    # CASTING
    def astype(self, dtype):
        return cast(self, dtype)
@@ -1712,6 +1737,8 @@ class _tensor_py_operators:
    def __rdot__(right, left):
        return dot(left, right)

+    dot = __dot__
+    
    def sum(self, axis=None, dtype=None, keepdims=False):
        """See `theano.tensor.sum`"""
        return sum(self, axis=axis, dtype=dtype, keepdims=keepdims)
@@ -1736,6 +1763,10 @@ class _tensor_py_operators:
        """See `theano.tensor.var`"""
        return var(self, axis, keepdims=keepdims)

+    def std(self, axis=None, keepdims=False):
+        """See `theano.tensor.std`"""
+        return std(self, axis, keepdims=keepdims)
+
    def min(self, axis=None, keepdims=False):
        """See `theano.tensor.min`"""
        return min(self, axis, keepdims=keepdims)
@@ -1744,6 +1775,40 @@ class _tensor_py_operators:
        """See `theano.tensor.max`"""
        return max(self, axis, keepdims=keepdims)

+    def argmin(self, axis=None, keepdims=False):
+        """See `theano.tensor.argmin`"""
+        return argmin(self, axis, keepdims=keepdims)
+
+    def argmax(self, axis=None, keepdims=False):
+        """See `theano.tensor.argmax`"""
+        return argmax(self, axis, keepdims=keepdims)
+
+    def argsort(self,  axis=-1, kind='quicksort', order=None):
+        """See `theano.tensor.sort.argsort`"""
+        from theano.tensor.sort import argsort
+        return argsort(self, axis, kind, order)
+        
+    def clip(self, a_min, a_max):
+        "Clip (limit) the values in an array."
+        return clip(self, a_min, a_max)
+
+    def conj(self):
+        """See `theano.tensor.conj`"""
+        return conj(self)
+
+    def repeat(self, repeats, axis=None):
+        """See `theano.tensor.repeat`"""
+        from theano.tensor.extra_ops import repeat
+        return repeat(self, repeats, axis)
+
+    def round(self, mode="half_away_from_zero"):
+        """See `theano.tensor.round`"""
+        return round(self, mode)
+
+    def trace(self):
+        from theano.sandbox.linalg import trace
+        return trace(self)
+
    # TO TRUMP NUMPY OPERATORS
    __array_priority__ = 1000

@@ -2949,12 +3014,12 @@ def psi(a):
 @_scal_elemwise_with_nfunc('real', 1, -1)
 def real(z):
    """Return real component of complex-valued tensor `z`"""
-
+_tensor_py_operators.real = property(real)

 @_scal_elemwise_with_nfunc('imag', 1, -1)
 def imag(z):
    """Return imaginary component of complex-valued tensor `z`"""
-
+_tensor_py_operators.imag = property(imag)

 @_scal_elemwise_with_nfunc('angle', 1, -1)
 def angle(z):
@@ -3782,7 +3847,7 @@ class AdvancedIndexingError(TypeError):
 class Subtensor(Op):
    """Return a subtensor view

-    The inputs array is the tensor x, followed by scalar integer variables.
+    The inputs array is the tensor x, followed by scalar integer types.
    TODO: WRITEME: how are the scalar integer variables formatted?

    This class uses a relatively complex internal representation of the inputs
@@ -3791,7 +3856,7 @@ class Subtensor(Op):
    idx_list: instance variable TODO: WRITEME: is this a list or a tuple?
                                        (old docstring gives two conflicting
                                        descriptions)
-              elements are either integers, theano scalars, or slices.
+              elements are either integers, theano scalar types, or slices.
              one element per "explicitly named dimension"
                TODO: WRITEME: what is an "explicitly named dimension" ?

@@ -3800,7 +3865,11 @@ class Subtensor(Op):
              if slice:
                  start/stop/step members of each slice are integer indices
                  into the inputs array or None
-                  integer indices be actual integers or theano scalars
+                  integer indices be actual integers or theano scalar types
+
+    Note that the idx_list defines the Op, so two Subtensor instances are
+    considered to be different Ops if they have different idx_list fields.
+    This means that the entries in it are theano Types, not theano Variables.

    @todo: add support for advanced tensor indexing (in Subtensor_dx too).

@@ -3818,6 +3887,17 @@ class Subtensor(Op):

    @staticmethod
    def collapse(idxs, cond):
+        """
+
+        idxs: a list of indices or slices.
+        cond: a callable that returns a bool
+
+        returns: idxs, with the slices flattened out into a list.
+                if cond is true for an entry, does not flatten it.
+
+        """
+
+
        ret = []

        def helper(entry):
@@ -3830,10 +3910,20 @@ class Subtensor(Op):

        for idx in idxs:
            helper(idx)
+
+
        return ret

    @staticmethod
    def convert(entry, slice_ok=True):
+        """
+        The "idx_list" field is unique to each Subtensor instance.
+        It is not unique to each Apply node, so it should not refer to
+        specific Variables. This method changes references to Variables
+        into references to Types.
+        TODO: WRITEME: This method also accepts "entry" already being a Type;
+            when would that happen?
+        """
        invalid_scal_types = [scal.float64, scal.float32]
        scal_types = [scal.int64, scal.int32, scal.int16, scal.int8]
        tensor_types = [lscalar, iscalar, wscalar, bscalar]

--- a/theano/tensor/elemwise.py
+++ b/theano/tensor/elemwise.py
@@ -722,20 +722,19 @@ class Elemwise(Op):
    def _bgrad(self, inputs, ograds):
        # returns grad, with respect to broadcasted versions of inputs

-        # Gradients (especially on the final costs) don't have to be symbolic
-        # e.g., ograds will be [ 1. ] if your objective is c and the output
-        # of the current apply node is c
-        ograds = map(as_tensor_variable, ograds)
-
        prev_setting = theano.config.compute_test_value

        try:

            theano.config.compute_test_value = 'off'

-            scalar_inputs = [Scalar(dtype=t.type.dtype)() for t in inputs]
-            scalar_ograds = [Scalar(dtype=ograd.type.dtype)()
-                    for ograd in ograds]
+            def as_scalar(t):
+                if isinstance(t.type, (NullType, DisconnectedType)):
+                    return t
+                return Scalar(t.type.dtype)()
+
+            scalar_inputs = map(as_scalar, inputs)
+            scalar_ograds = map(as_scalar, ograds)
            scalar_igrads = self.scalar_op.grad(scalar_inputs, scalar_ograds)
            for igrad in scalar_igrads:
                assert igrad is not None

--- a/theano/tensor/nnet/conv.py
+++ b/theano/tensor/nnet/conv.py
@@ -801,10 +801,9 @@ class ConvOp(OpenMPOp):

            # mimic what happens inside theano.grad: get the input gradient
            # of the final cost wrt all variables involved.
-            tmp_gmap = theano.gradient.grad_sources_inputs(
-                [(node, gz)], [inputs, kerns])
+            return theano.gradient.grad(cost=None,
+                    known_grads={node: gz}, wrt=[inputs, kerns])

-            return [tmp_gmap[inputs], tmp_gmap[kerns]]

        if self.dx not in (1, 2) or self.dy not in (1, 2):
            raise NotImplementedError(

--- a/theano/tensor/nnet/tests/test_nnet.py
+++ b/theano/tensor/nnet/tests/test_nnet.py
@@ -1046,7 +1046,7 @@ class T_CrossentropyCategorical1Hot(utt.InferShapeTester):

            # Verify the gradient when providing output gradient
            h = theano.function([x, y, a],
-                                T.grad(expr, x, g_cost=a * x.sum()), mode=mode)
+                    T.grad(expr, x, known_grads={expr:a * x.sum()}), mode=mode)
            try:
                assert 8 <= len(h.maker.fgraph.toposort()) <= 17
                validate_grad_graph(h)

--- a/theano/tensor/tests/test_basic.py
+++ b/theano/tensor/tests/test_basic.py
@@ -14,7 +14,7 @@ builtin_min = __builtin__.min

 from nose.plugins.skip import SkipTest
 import numpy
-from numpy.testing import dec
+from numpy.testing import dec, assert_array_equal, assert_allclose
 from numpy.testing.noseclasses import KnownFailureTest

 import theano
@@ -7001,6 +7001,85 @@ class TestInferShape(utt.InferShapeTester):
                                [tile(adtens4, aivec_val, ndim)],
                                [adtens4_val], Tile)

+class TestTensorInstanceMethods(unittest.TestCase):
+    def setUp(self):
+        self.vars = matrices('X', 'Y')
+        self.vals = [rand(2,2),rand(2,2)]
+
+    def test_argmin(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        assert_array_equal(X.argmin().eval({X: x}), x.argmin())
+
+    def test_argmax(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        assert_array_equal(X.argmax().eval({X: x}), x.argmax())
+
+    def test_argsort(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        assert_array_equal(X.argsort().eval({X: x}), x.argsort())
+        assert_array_equal(X.argsort(1).eval({X: x}), x.argsort(1))
+
+    def test_dot(self):
+        X, Y = self.vars
+        x, y = self.vals
+        Z = X.clip(0.5 - Y, 0.5 + Y)
+        z = x.clip(0.5 - y, 0.5 + y)
+        assert_array_equal(Z.eval({X: x, Y: y}), z)
+
+    def test_dot(self):
+        X, Y = self.vars
+        x, y = self.vals
+        assert_array_equal(x.dot(y), X.dot(Y).eval({X: x, Y: y}))
+        Z = X.dot(Y)
+        z = x.dot(y)
+        assert_array_equal(x.dot(z), X.dot(Z).eval({X: x, Z: z}))
+
+    def test_real_imag(self):
+        X, Y = self.vars
+        x, y = self.vals
+        Z = X + Y * 1j
+        z = x + y * 1j
+        assert_array_equal(Z.real.eval({Z: z}), x)
+        assert_array_equal(Z.imag.eval({Z: z}), y)
+
+    def test_conj(self):
+        X, Y = self.vars
+        x, y = self.vals
+        Z = X + Y * 1j
+        z = x + y * 1j
+        assert_array_equal(Z.conj().eval({Z: z}), z.conj())
+
+    def test_round(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        assert_array_equal(X.round().eval({X: x}), x.round())
+
+    def test_std(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        # std() is implemented as theano tree and does not pass its
+        # args directly to numpy. This sometimes results in small
+        # difference, so we use allclose test.
+        assert_allclose(X.std().eval({X: x}), x.std())
+
+    def test_repeat(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        assert_array_equal(X.repeat(2).eval({X: x}), x.repeat(2))
+
+    def test_trace(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        assert_array_equal(X.trace().eval({X: x}), x.trace())
+
+    def test_ravel(self):
+        X, _ = self.vars
+        x, _ = self.vals
+        assert_array_equal(X.ravel().eval({X: x}), x.ravel())
+

 if __name__ == '__main__':


--- a/theano/tests/test_gradient.py
+++ b/theano/tests/test_gradient.py
@@ -6,7 +6,6 @@ import unittest
 import theano
 from theano import gof

-from theano.gradient import grad_sources_inputs
 from theano import gradient
 from theano.tensor.nnet.Conv3D import conv3D
 from theano import config
@@ -16,6 +15,16 @@ from theano.gof.null_type import NullType
 one = theano.tensor.as_tensor_variable(1.)


+def grad_sources_inputs(sources, inputs):
+    """
+    This implements the old grad_sources_inputs function in terms of
+    the new interface so the tests don't need to be rewritten.
+    """
+    if inputs is None:
+        inputs = theano.gof.graph.inputs([source[0] for source in sources])
+    return dict(zip(inputs,theano.gradient.grad(cost=None, known_grads=dict(sources),
+        wrt=inputs, consider_constant=inputs)))
+
 class testgrad_sources_inputs(unittest.TestCase):

    def test_retNone1(self):
@@ -369,35 +378,6 @@ class test_grad(unittest.TestCase):
        # If we made it to here without an exception, then the
        # connection_pattern functionality worked correctly

-    def test_sum_disconnected(self):
-
-        # Tests that we can add DisconnectedType to other terms correctly
-        x = theano.tensor.scalar()
-        y = x * 2.
-        z = x + 1.
-        cost = y + z
-        theano.tensor.grad(cost, x, consider_constant=[y, z])
-        # In an earlier version of theano, the above line would have failed
-        # while trying to add two DisconnectedTypes
-
-    def test_output_grad_on_int(self):
-        # If the g_cost argument is specified when x has a discrete dtype,
-        # g_cost should be equivalent to 0.
-        x = theano.tensor.iscalar('x')
-        y = x * 2
-
-        # Should work:
-        c0 = theano.tensor.constant(0)
-        theano.grad(y, x, g_cost=c0)
-        theano.grad(y, x, g_cost=y.zeros_like())
-        theano.grad(y, x, g_cost=y.zeros_like().astype('float64'))
-
-        # Should raise ValueError
-        c1 = theano.tensor.constant(1)
-        self.assertRaises(ValueError, theano.grad, y, x, g_cost=c1)
-        s0 = theano.shared(np.zeros((), dtype='int8'))
-        self.assertRaises(ValueError, theano.grad, y, x, g_cost=s0)
-
    def test_downcast_dtype(self):
        # Test that the gradient of a cost wrt a float32 variable does not
        # get upcasted to float64.
@@ -418,6 +398,161 @@ class test_grad(unittest.TestCase):
        # be downcasted to float32, so dc_dx should also be float32
        assert dc_dx.dtype == 'float32'

+    def test_grad_constant(self):
+
+        # Test that the gradient handles Constants and consider_constant variables
+        # consistently
+
+        x = theano.tensor.scalar()
+        y = theano.tensor.scalar()
+        z_x = x + y
+        z_one = one + y
+        g_x = theano.tensor.grad(z_x, x, consider_constant=[x])
+        g_one = theano.tensor.grad(z_one, one)
+
+        f = theano.function([x, y],[g_x, g_one])
+
+        g_x, g_one = f(1, .5)
+
+        if not np.allclose(g_x, g_one):
+            raise AssertionError("Gradient using consider constant is " + str(g_x)\
+                    + " but gradient with respect to the same Constant is " + \
+                    str(g_one))
+
+
+def test_known_grads():
+
+    # Tests that the grad method with no known_grads
+    # matches what happens if you put its own known_grads
+    # in for each variable
+
+    full_range = theano.tensor.arange(10)
+    x = theano.tensor.scalar('x')
+    t = theano.tensor.iscalar('t')
+    ft = full_range[t]
+    ft.name = 'ft'
+    coeffs = theano.tensor.vector('c')
+    ct = coeffs[t]
+    ct.name = 'ct'
+    p = x ** ft
+    p.name = 'p'
+    y = ct * p
+    y.name = 'y'
+    cost = theano.tensor.sqr(y)
+    cost.name = 'cost'
+
+    layers = [
+            [cost],
+            [y],
+            [ct,p],
+            [ct, x, ft],
+            [coeffs, t, full_range, x]
+            ]
+
+    inputs = [coeffs, t, x]
+
+    rng = np.random.RandomState([2012, 11, 15])
+    values = [rng.randn(10), rng.randint(10), rng.randn() ]
+    values = [np.cast[ipt.dtype](value) for ipt, value in zip(inputs, values)]
+
+    true_grads = theano.tensor.grad(cost, inputs, disconnected_inputs='ignore')
+    true_grads = theano.function(inputs, true_grads)
+    true_grads = true_grads(*values)
+
+    for layer in layers:
+        print 'Testing by separately computing ',layer
+        first = theano.tensor.grad(cost, layer, disconnected_inputs='ignore')
+        known = dict(zip(layer, first))
+        full = theano.tensor.grad(cost=None,
+                known_grads=known,wrt=inputs, disconnected_inputs='ignore')
+        full = theano.function(inputs, full)
+        full = full(*values)
+        assert len(true_grads) == len(full)
+        for a, b, var in zip(true_grads, full, inputs):
+            if not np.allclose(a, b):
+                print 'Failure'
+                print a
+                print b
+                print var
+                print layer
+                for v in known:
+                    print v,':',theano.function(inputs,known[v])(*values)
+                assert False
+
+def test_dxdx():
+
+
+    # Tests that the gradient of a scalar with respect to itself is 1
+    # I use an integer in this case because people keep changing this
+    # gradient to be 0 on integers but according to our interpretation
+    # of the gradient as defined in the Op contract, it should be 1.
+    # If you feel the need to change this unit test you are probably
+    # modifying the Op contract and should definitely get the approval
+    # of multiple people on theano-dev.
+
+    x = theano.tensor.iscalar()
+    g = theano.tensor.grad(x, x)
+
+    g = g.eval({ x : 12 })
+
+    assert np.allclose(g,1.)
+
+def test_known_grads_integers():
+
+    # Tests that known_grads works on integers
+
+    x = theano.tensor.iscalar()
+    g_expected = theano.tensor.scalar()
+
+    g_grad = theano.gradient.grad(cost=None,
+            known_grads={x : g_expected},
+            wrt=x)
+
+    f = theano.function([g_expected],g_grad)
+
+    x = -3
+    gv = np.cast[theano.config.floatX](.6)
+
+    g_actual = f(gv)
+
+    assert np.allclose(g_actual, gv)
+
+def test_undefined_cost_grad():
+
+        # Tests that if we say the cost is not differentiable via the
+        # known_grads mechanism, it is treated as such by the rest of the
+        # system.
+        # This is so that Ops that are built around minigraphs like OpFromGraph
+        # and scan can implement Op.grad by passing ograds to known_grads
+
+        x = theano.tensor.iscalar()
+        y = theano.tensor.iscalar()
+        cost = x + y
+        assert cost.dtype in theano.tensor.discrete_dtypes
+        try:
+            grads = theano.tensor.grad(cost, [x, y], known_grads = {cost: NullType()() })
+        except theano.gradient.NullTypeGradError:
+            return
+        raise AssertionError("An undefined gradient has been ignored.")
+
+def test_disconnected_cost_grad():
+
+        # Tests that if we say the cost is disconnected via the
+        # known_grads mechanism, it is treated as such by the rest of the
+        # system.
+        # This is so that Ops that are built around minigraphs like OpFromGraph
+        # and scan can implement Op.grad by passing ograds to known_grads
+
+        x = theano.tensor.iscalar()
+        y = theano.tensor.iscalar()
+        cost = x + y
+        assert cost.dtype in theano.tensor.discrete_dtypes
+        try:
+            grads = theano.tensor.grad(cost, [x, y], known_grads = {cost: gradient.DisconnectedType()() },
+                    disconnected_inputs='raise')
+        except theano.gradient.DisconnectedInputError:
+            return
+        raise AssertionError("A disconnected gradient has been ignored.")

 if __name__ == '__main__':
    unittest.main()
--- a/theano/tests/test_rop.py
+++ b/theano/tests/test_rop.py
@@ -341,15 +341,9 @@ class test_RopLop(RopLop_checker):
        rop_out2 = tensor.Rop((m, v, m + v), [m, v], [m_, v_])
        assert isinstance(rop_out2, tuple)
        assert len(rop_out2) == 3
-        lop_out1 = tensor.Lop([m, v, m + v], (m, v), [m_, v_])
-        assert isinstance(lop_out1, tuple)
-        assert len(lop_out1) == 2
-        lop_out2 = tensor.Lop((m, v, m + v), [m, v], [m_, v_])
-        assert isinstance(lop_out2, list)
-        assert len(lop_out2) == 2

        all_outs = []
-        for o in rop_out1, rop_out2, lop_out1, lop_out2:
+        for o in rop_out1, rop_out2:
            all_outs.extend(o)
        f = theano.function([m, v, m_, v_], all_outs)
        f(mval, vval, m_val, v_val)