merge

e8f198a4 · Razvan Pascanu · ce827d20 · 2d42dc14 · e8f198a4 · e8f198a4
--- a/doc/library/scan.txt
+++ b/doc/library/scan.txt
@@ -10,15 +10,16 @@ Guide
 =====
 The scan functions provides the basic functionality needed to do loops
-in Theano. Scan comes with many whistles and bells, that can be easily
+in Theano. Scan comes with many whistles and bells, which we will introduce
-introduced through a few examples :
+by way of examples.
-Basic functionality :  Computing :math:`A^k`
--------------------------------------------
+Simple loop with accumulation:  Computing :math:`A^k`
+-----------------------------------------------------
 Assume that, given *k* you want to get ``A**k`` using a loop.
 More precisely, if *A* is a tensor you want to compute
-``A**k`` elemwise. The python/numpy code would loop like
+``A**k`` elemwise. The python/numpy code might look like:
 .. code-block:: python
@@ -26,42 +27,176 @@ More precisely, if *A* is a tensor you want to compute
  for i in xrange(k):
    result = result * A
-The equivalent Theano code would be
+There are three thing here that we need to handle: the initial value
+assigned to ``result``, the accumulation of results in ``result``, and
+the unchanging variable ``A``. Unchanging variables are passed to scan as
+``non_sequences``. Initialization occurs in ``outputs_info``, and the accumulation
+happens automatically.
+The equivalent Theano code would be:
 .. code-block:: python
  # Symbolic description of the result
-  result,updates = theano.scan(fn = lambda x_tm1,A: x_tm1*A,\
+  result, updates = theano.scan(fn=lambda prior_result, A: prior_result * A,
-                       outputs_info = T.ones_like(A),\
+                                outputs_info=T.ones_like(A),
-                       non_sequences  = A, \
+                                non_sequences=A,
-                       n_steps        = k)
+                                n_steps=k)
+  # We only care about A**k, but scan has provided us with A**1 through A**k.
+  # Discard the values that we don't care about. Scan is smart enough not to
+  # notice this and not waste memory saving them.
+  final_result = result[-1]
  # compiled function that returns A**k
-  f = theano.function([A,k], result[-1], updates = updates)
+  power = theano.function(inputs=[A,k], outputs=final_result, updates=updates)
 Let us go through the example line by line. What we did is first to
-construct a function (using a lambda expression) that given `x_tm1` and
+construct a function (using a lambda expression) that given ``prior_result`` and
-`A` returns `x_tm1*A`. Given the order of the parameters, `x_tm1`
+``A`` returns ``prior_result * A``. The order of parameters is fixed by scan:
-is the value of our output at time step ``t-1``. Therefore
+the output of the prior call to ``fn`` (or the initial value, initially)
-``x_t`` (value of output at time `t`) is `A` times value of output
+is the first parameter, followed by all non-sequences.
-at `t-1`.
-Next we initialize the output as a tensor with same
+Next we initialize the output as a tensor with same shape and dtype as ``A``,
-shape as A filled with ones. We give A to scan as a non sequence parameter  and
+filled with ones. We give ``A`` to scan as a non sequence parameter and
-specify the number of steps k to iterate over our lambda expression.
+specify the number of steps ``k`` to iterate over our lambda expression.
-Scan will return a tuple, containing our result (``result``) and a
+Scan return a tuples, containing our result (``result``) and a
-dictionary of updates ( empty in this case). Note that the result
+dictionary of updates (empty in this case). Note that the result
 is not a matrix, but a 3D tensor containing the value of ``A**k`` for
-each step. We want the last value ( after k steps ) so we compile
+each step. We want the last value (after ``k`` steps) so we compile
 a function to return just that. Note that there is an optimization, that
 at compile time will detect that you are using just the last value of the
 result and ensure that scan does not store all the intermediate values
-that are used. So do not worry if A and k are large.
+that are used. So do not worry if ``A`` and ``k`` are large.
+Iterating over the first dimension of a tensor: Calculating a polynomial
+------------------------------------------------------------------------
+In addition to looping a fixed number of times, scan can iterate over
+the leading dimension of tensors (similar to Python's ``for x in a_list``).
+The tensor(s) to be looped over should be provided to scan using the
+``sequence`` keyword argument.
+Here's an example that builds a symbolic calculation of a polynomial
+from a list of its coefficients:
+.. code-block:: python
+    coefficients = theano.tensor.vector("coefficients")
+    x = T.scalar("x")
+    max_coefficients_supported = 10000
+    # Generate the components of the polynomial
+    components, updates = theano.scan(fn=lambda coefficient, power, free_variable: coefficient * (free_variable ** power),
+                                      outputs_info=None,
+                                      sequences=[coefficients, theano.tensor.arange(max_coefficients_supported)],
+                                      non_sequences=x)
+    # Sum them up
+    polynomial = components.sum()
+    # Compile a function
+    calculate_polynomial = theano.function(inputs=[coefficients, x], outputs=polynomial)
+    # Test
+    test_coefficients = numpy.asarray([1, 0, 2], dtype=numpy.float32)
+    test_value = 3
+    print calculate_polynomial(test_coefficients, test_value)
+    print 1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2) 
+There are a few things to note here.
+First, we calculate the polynomial by first generating each of the coefficients, and
+then summing them at the end. (We could also have accumulated them along the way, and then
+taken the last one, which would have been more memory-efficient, but this is an example.)
+Second, there is no accumulation of results, we can set ``outputs_info`` to ``None``. This indicates
+to scan that it doesn't need to pass the prior result to ``fn``.
+The general order of function parameters to ``fn`` is::
+    sequences (if any), prior result(s) (if needed), non-sequences (if any)
+Third, there's a handy trick used to simulate python's ``enumerate``: simply include
+``theano.tensor.arange`` to the sequences.
+Fourth, given multiple sequences of uneven lengths, scan will truncate to the shortest of them.
+This makes it safe to pass a very long arange, which we need to do for generality, since
+arange must have its length specified at creation time.
+Simple accumulation into a scalar, ditching lamba
+-------------------------------------------------
+This should be fairly self-explanatory.
+.. code-block:: python
+    up_to = T.iscalar("up_to")
+    # define a named function, rather than using lambda
+    def accumulate_by_adding(arange_val, sum_to_date):
+        return sum_to_date + arange_val
+    scan_result, scan_updates = theano.scan(fn=accumulate_by_adding,
+                                            outputs_info=T.as_tensor_variable(0),
+                                            sequences=T.arange(up_to))
+    triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result)
+    # test
+    some_num = 15
+    print triangular_sequence(some_num)
+    print [n * (n + 1) // 2 for n in xrange(some_num)]
+Another simple example
+----------------------
+Unlike some of the prior examples, this one is hard to reproduce except by using scan.
+This takes a sequence of array indices, and values to place there,
+and a "model" output array (whose shape and dtype will be mimicked),
+and produces a sequence of arrays with the shape and dtype of the model,
+with all values set to zero except at the provided array indices.
+.. code-block:: python
+    location = T.imatrix("location")
+    values = T.vector("values")
+    output_model = T.matrix("output_model")
+    def set_value_at_position(a_location, a_value, output_model):
+        zeros = T.zeros_like(output_model)
+        zeros_subtensor = zeros[a_location[0], a_location[1]]
+        return T.set_subtensor(zeros_subtensor, a_value)
+    result, updates = theano.scan(fn=set_value_at_position,
+                                  outputs_info=None,
+                                  sequences=[location, values],
+                                  non_sequences=output_model)
+    assign_values_at_positions = theano.function(inputs=[location, values, output_model], outputs=result)
+    # test
+    test_locations = numpy.asarray([[1, 1], [2, 3]], dtype=numpy.int32)
+    test_values = numpy.asarray([42, 50], dtype=numpy.float32)
+    test_output_model = numpy.zeros((5, 5), dtype=numpy.float32)
+    print assign_values_at_positions(test_locations, test_values, test_output_model)
+This demonstrates that you can introduce new Theano variables into a scan function.
 Multiple outputs, several taps values - Recurrent Neural Network with Scan
 --------------------------------------------------------------------------
-A more practical task would be to implement a RNN using scan. Assume
+The examples above showed simple uses of scan. However, scan also supports
+referring not only to the prior result and the current sequence value, but
+also looking back more than one step.
+This is needed, for example, to implement a RNN using scan. Assume
 that our RNN is defined as follows :
 .. math::

--- a/theano/gof/python25.py
+++ b/theano/gof/python25.py
@@ -69,3 +69,27 @@ else:
     partial = functools.partial
     defaultdict = collections.defaultdict
 __all__ = ['all', 'any']
+if sys.version_info[:2] < (2,6):
+    # Borrowed from Python docs
+    def combinations(iterable, r):
+        # combinations('ABCD', 2) --> AB AC AD BC BD CD
+        # combinations(range(4), 3) --> 012 013 023 123
+        pool = tuple(iterable)
+        n = len(pool)
+        if r > n:
+            return
+        indices = range(r)
+        yield tuple(pool[i] for i in indices)
+        while True:
+            for i in reversed(range(r)):
+                if indices[i] != i + n - r:
+                    break
+            else:
+                return
+            indices[i] += 1
+            for j in range(i+1, r):
+                indices[j] = indices[j-1] + 1
+            yield tuple(pool[i] for i in indices)
+else:
+    from itertools import combinations
--- a/theano/sandbox/cuda/cuda_ndarray.cu
+++ b/theano/sandbox/cuda/cuda_ndarray.cu
@@ -2982,18 +2982,20 @@ CudaNdarray_dimshuffle(CudaNdarray * self, unsigned int len, const int * pattern
        }
 	else if(dims_taken[pattern[i]])
 	{
-	  PyErr_SetString(PyExc_ValueError, "Cudandarray_dimshuffle: The same input dimension may not appear twice in the list of output dimensions");
+                PyErr_Format(PyExc_ValueError, "Cudandarray_dimshuffle: invalid pattern for Cudandarray_dimshuffle. You used the dimensions %d multiple time",
+			     pattern[i]);
 	  free(newdims);
 	  return -1;
 	}
-        else
+        else if (pattern[i]>= self->nd)
-        {
+	{
-            if ((dims_taken[pattern[i]]) || (pattern[i]>= self->nd))
+	    PyErr_Format(PyExc_ValueError, "Cudandarray_dimshuffle: invalid pattern for Cudandarray_dimshuffle. You asked for a dimensions that don't exist %d for a %d dims CudaNdarray",
-            {
+			 pattern[i], self->nd);
-                PyErr_SetString(PyExc_ValueError, "Cudandarray_dimshuffle: invalid pattern for Cudandarray_dimshuffle");
+	    free(newdims);
-                free(newdims);
+	    return -1;
-                return -1;
+	}
-            }
+	else
+	{
            newdims[i] = CudaNdarray_HOST_DIMS(self)[pattern[i]];
            newstrides[i] = CudaNdarray_HOST_STRIDES(self)[pattern[i]];
            dims_taken[pattern[i]] = 1;

--- a/theano/scalar/basic.py
+++ b/theano/scalar/basic.py
@@ -1103,19 +1103,9 @@ class Clip(ScalarOp):
            return gx, None, None
        else:
            return None, None, None
-clip = Clip(upcast_out, name = 'clip')
+# Don't allow complex even if numpy do
+# As there is no mathematical reason for this function on complex
-class First(BinaryScalarOp):
+clip = Clip(upcast_out_no_complex, name = 'clip')
-    def impl(self, x, y):
-        return x
-    def c_code(self, node, name, (x, y), (z, ), sub):
-        return "%(z)s = %(x)s;" % locals()
-    def grad(self, (x, y), (gz, )):
-        if x.type in continuous_types:
-            return gz, None
-        else:
-            return None,None
-first = First(transfer_type(0), name = 'first')
 class Second(BinaryScalarOp):
    def impl(self, x, y):

--- a/theano/tensor/opt.py
+++ b/theano/tensor/opt.py
@@ -1095,20 +1095,36 @@ def local_useless_subtensor(node):
    """
    Remove Subtensor if it take the full input
    """
-    shape_of = node.env.shape_feature.shape_of
    if isinstance(node.op, T.Subtensor):
+        shape_of = node.env.shape_feature.shape_of
+        node_input_idx = 1
        for pos, idx in enumerate(node.op.idx_list):
+            length_pos_data = sys.maxint
+            length_pos_shape_i = None
            try:
-                length_pos = shape_i(node.inputs[0])[pos]
+                length_pos = shape_of[node.inputs[0]][pos]
-            except:
+                if isinstance(length_pos, theano.tensor.basic.TensorConstant):
+                    length_pos_data = length_pos.data
+                else:
+                    length_pos_shape_i = node.inputs[node_input_idx].owner.inputs[0]
+            except Exception, e:
                length_pos = None
            if ( isinstance(idx,slice) and
                idx.start in [0,None] and
                idx.step in [1,None] and
-                idx.stop in [sys.maxint, None, length_pos]):
+                (idx.stop in [sys.maxint, None, length_pos_data] or
+                 (isinstance(idx.stop, int) and idx.stop>=length_pos_data) or
+                 (isinstance(idx.stop, theano.scalar.Scalar) and
+                  length_pos==length_pos_shape_i)
+                 )):
                pass
            else:
                return False
+            if isinstance(idx, slice):
+                node_input_idx += sum([isinstance(idx.start, theano.scalar.Scalar),
+                                       isinstance(idx.stop, theano.scalar.Scalar),
+                                       isinstance(idx.step, theano.scalar.Scalar)])
        return [node.inputs[0]]

--- a/theano/tensor/tests/test_basic.py
+++ b/theano/tensor/tests/test_basic.py
@@ -15,7 +15,7 @@ from theano.tensor import inplace
 from copy import copy
 from theano import compile, config
 from theano import gof
-from theano.gof.python25 import any, all
+from theano.gof.python25 import any, all, combinations
 from theano.compile.mode import get_default_mode
 from theano import function
@@ -218,6 +218,16 @@ def randint_ranged(min, max, shape):
 def randc128_ranged(min, max, shape):
    return numpy.asarray(numpy.random.rand(*shape) * (max - min) + min, dtype='complex128')
+def rand_of_dtype(shape, dtype):
+    if 'int' in dtype:
+        return randint(*shape).astype(dtype)
+    elif 'float' in dtype:
+        return rand(*shape).astype(dtype)
+    elif 'complex' in dtype:
+        return randcomplex(*shape).astype(dtype)
+    else:
+        raise TypeError()
 def makeBroadcastTester(op, expected, checks = {}, **kwargs):
    name = str(op) + "Tester"
    if kwargs.has_key('inplace'):
@@ -789,73 +799,15 @@ DotTester = makeTester(name = 'DotTester',
                        bad_runtime = dict(bad1 = (rand(5, 7), rand(5, 7)),
                                           bad2 = (rand(5, 7), rand(8, 3))))
-ClipTester = makeTester(name='ClipTester',
-                        op=clip,
-                        expected=lambda x, y, z: numpy.clip(x, y, z),
-                        good = dict(correct1=((5 * rand(5, 5)).astype('float32'),
-                                              -1, 1),
-                                    correct2=((5 * rand(5, 5)).astype('float64'),
-                                              -1, 1),
-                                    correct3=(randint(5, 5).astype('int8'),
-                                              -1, 1),
-                                    correct4=(randint(5, 5).astype('int16'),
-                                              -1, 1),
-                                    correct5=(randint(5, 5).astype('int32'),
-                                              -1, 1),
-                                    correct6=(randint(5, 5).astype('int64'),
-                                              -1, 1)),
-                        # These don't build -- is this equivalent to marking
-                        # them as 'known fail'?
-                        bad_build=dict(
-                            bad1=(randcomplex(5, 5).astype('complex64'),
-                                  -1, 1),
-                            bad2=(randcomplex(5, 5).astype('complex128'),
-                                  -1, 1)),
-                        # I can't think of any way to make this fail at runtime
-                        bad_runtime=dict())
 def _numpy_second(x, y):
-    if x.ndim != y.ndim:
+    return numpy.broadcast_arrays(x, y)[1]
-        return broadcast_arrays(x, y)[1]
-    else:
-        return y
-def combinations(iterable, r):
-    # Borrowed from Python docs - can be removed when we drop
-    # support for 2.4/2.5
-    # combinations('ABCD', 2) --> AB AC AD BC BD CD
-    # combinations(range(4), 3) --> 012 013 023 123
-    pool = tuple(iterable)
-    n = len(pool)
-    if r > n:
-        return
-    indices = range(r)
-    yield tuple(pool[i] for i in indices)
-    while True:
-        for i in reversed(range(r)):
-            if indices[i] != i + n - r:
-                break
-        else:
-            return
-        indices[i] += 1
-        for j in range(i+1, r):
-            indices[j] = indices[j-1] + 1
-        yield tuple(pool[i] for i in indices)
 ALL_DTYPES = ('int8', 'int16', 'int32', 'int64',
              'float32', 'float64', 'complex64', 'complex128')
+REAL_DTYPES = ALL_DTYPES[:-2]
+COMPLEX_DTYPES = ALL_DTYPES[-2:]
-def rand_of_dtype(shape, dtype):
+def multi_dtype_checks(shape1, shape2, dtypes=ALL_DTYPES, nameprefix=''):
-    if 'int' in dtype:
-        return randint(*shape).astype(dtype)
-    elif 'float' in dtype:
-        return rand(*shape).astype(dtype)
-    elif 'complex' in dtype:
-        return randcomplex(*shape).astype(dtype)
-    else:
-        raise TypeError()
-def multi_dtype_tests(shape1, shape2, dtypes=ALL_DTYPES, nameprefix=''):
    for dtype1, dtype2 in combinations(dtypes, 2):
        name1 = '%s_%s_%s' % (nameprefix, dtype1, dtype2)
        name2 = '%s_%s_%s' % (nameprefix, dtype2, dtype1)
@@ -864,7 +816,7 @@ def multi_dtype_tests(shape1, shape2, dtypes=ALL_DTYPES, nameprefix=''):
        yield (name1, (obj1, obj2))
        yield (name2, (obj2, obj1))
-def multi_dtype_cast_tests(shape, dtypes=ALL_DTYPES, nameprefix=''):
+def multi_dtype_cast_checks(shape, dtypes=ALL_DTYPES, nameprefix=''):
    for dtype1, dtype2 in combinations(dtypes, 2):
        name1 = '%s_%s_%s' % (nameprefix, dtype1, dtype2)
        name2 = '%s_%s_%s' % (nameprefix, dtype2, dtype1)
@@ -878,18 +830,17 @@ SecondBroadcastTester = makeTester(
                            op=second,
                            expected=_numpy_second,
                            good=dict(itertools.chain(
-                                multi_dtype_tests((4, 5), (5,)),
+                                multi_dtype_checks((4, 5), (5,)),
-                                multi_dtype_tests((2, 3, 2), (3, 2)),
+                                multi_dtype_checks((2, 3, 2), (3, 2)),
-                                multi_dtype_tests((2, 3, 2), (2,)),
+                                multi_dtype_checks((2, 3, 2), (2,)),
                            )),
                            # I can't think of any way to make this fail at
                            # build time
-                            bad_build=None,
                            # Just some simple smoke tests
                            bad_runtime=dict(
                                fail1=(rand(5, 4), rand(5)),
                                fail2=(rand(3, 2, 3), rand(6, 9)),
-                                fail3=(randint(6, 2), rand(3, 2)),
+                                fail3=(randint(6, 2, 9), rand(3, 2)),
                            )
                        )
@@ -898,30 +849,84 @@ SecondSameRankTester = makeTester(
                            op=second,
                            expected=_numpy_second,
                            good=dict(itertools.chain(
-                                multi_dtype_tests((4, 5), (4, 5)),
+                                multi_dtype_checks((4, 5), (4, 5)),
-                                multi_dtype_tests((5, 4), (4, 5)),
+                                multi_dtype_checks((1, 2), (3, 2)),
-                                multi_dtype_tests((1, 4), (3, 2)),
+                                multi_dtype_checks((3, 2), (1, 2)),
                            )),
-                            bad_build=None,
+                            # These sizes are not broadcastable to one another
-                            bad_runtime=None
+                            # and SHOULD raise an error, but currently don't.
+                            bad_runtime=dict(itertools.chain(
+                                multi_dtype_checks((4, 5), (5, 4)),
+                                multi_dtype_checks((1, 5), (5, 4)),
+                            ))
                        )
-CastTester = makeTester(
+class CastTester(unittest.TestCase):
-                name='CastTester',
+    def test_good_between_real_types(self):
-                op=cast,
+        expected = lambda x, y: x.astype(y),
-                expected=lambda x, y: x.astype(y),
+        good = itertools.chain(
-                good=dict(itertools.chain(
+                    multi_dtype_cast_checks((2,), dtypes=REAL_DTYPES),
-                    multi_dtype_cast_tests((2,)),
+                    # Casts from foo to foo
-                    [('%s_%s' % (dtype, dtype),
+                    [('%s_%s' % (rand_of_dtype((2,), dtype), dtype),
                      (rand_of_dtype((2,), dtype), dtype))
-                     for dtype in ALL_DTYPES]
+                     for dtype in ALL_DTYPES])
-                )),
+        for testname, (obj, dtype) in good:
-                bad_build=dict(
+            inp = tensor.vector(dtype=obj.dtype)
-                    fail_not_a_real_dtype=((2,), 'blah')
+            out = tensor.cast(inp, dtype=dtype)
-                ),
+            f = function([inp], out)
-                bad_runtime=None
+            assert f(obj).dtype == numpy.dtype(dtype)
-            )
+    def test_cast_from_real_to_complex(self):
+        for real_dtype in REAL_DTYPES:
+            for complex_dtype in COMPLEX_DTYPES:
+                inp = tensor.vector(dtype=real_dtype)
+                out = tensor.cast(inp, dtype=complex_dtype)
+                f = function([inp], out)
+                obj = rand_of_dtype((2, ), real_dtype)
+                assert f(obj).dtype == numpy.dtype(complex_dtype)
+    def test_cast_from_complex_to_real_raises_error(self):
+        for real_dtype in REAL_DTYPES:
+            for complex_dtype in COMPLEX_DTYPES:
+                inp = tensor.vector(dtype=real_dtype)
+                self.assertRaises(TypeError, tensor.cast(inp, dtype=complex_dtype))
+ClipTester = makeTester(name='ClipTester',
+                        op=clip,
+                        expected=lambda x, y, z: numpy.clip(x, y, z),
+                        good = dict(correct1=((5 * rand(5, 5)).astype('float32'),
+                                          numpy.array(-1, dtype='float32'),
+                                          numpy.array(1, dtype='float32')),
+                                    correct2=((5 * rand(5, 5)).astype('float64'),
+                                          numpy.array(-1, dtype='float64'),
+                                          numpy.array(1, dtype='float64')),
+                                    correct3=(randint(5, 5).astype('int8'),
+                                          numpy.array(-1, dtype='int8'),
+                                          numpy.array(1, dtype='int8')),
+                                    correct4=(randint(5, 5).astype('int16'),
+                                          numpy.array(-1, dtype='int16'),
+                                          numpy.array(1, dtype='int16')),
+                                    correct5=(randint(5, 5).astype('int32'),
+                                          numpy.array(-1, dtype='int32'),
+                                          numpy.array(1, dtype='int32')),
+                                    correct6=(randint(5, 5).astype('int64'),
+                                          numpy.array(-1, dtype='int64'),
+                                          numpy.array(1, dtype='int64')),
+                                    # min > max. messed up behaviour, but
+                                    # should be same as NumPy's
+                                    correct7=((5 * rand(5, 5)).astype('float64'),
+                                          numpy.array(1, dtype='float64'),
+                                          numpy.array(-1, dtype='float64')))
+                       )
+                        # I can't think of any way to make this fail at runtime
+class T_Clip(unittest.TestCase):
+    def test_complex_value(self):
+        for dtype in ['complex64', 'complex128']:
+            a = tensor.vector(dtype=dtype)
+            b = tensor.scalar()
+            c = tensor.scalar()
+            self.assertRaises(TypeError, clip, a, b, c)
 #TODO: consider moving this function / functionality to gradient.py
 #      rationale: it's tricky, and necessary everytime you want to verify
@@ -1824,7 +1829,7 @@ class T_subtensor(unittest.TestCase):
            mode_opt = 'FAST_RUN'
        mode_opt = compile.mode.get_mode(mode_opt)
-        data = self.shared(numpy.array(numpy.arange(5),dtype ='int32'))
+        data = self.shared(numpy.array(numpy.arange(5),dtype=self.dtype))
        for start in [None]+ [-8,-5,-1,0,1,5,8]:
            outs   = []
            shapes = []
@@ -1847,7 +1852,7 @@ class T_subtensor(unittest.TestCase):
        if mode_opt == 'FAST_COMPILE':
            mode_opt = 'FAST_RUN'
        mode_opt = compile.mode.get_mode(mode_opt)
-        v_data = numpy.array(numpy.arange(5), dtype = 'int32')
+        v_data = numpy.array(numpy.arange(5), dtype=self.dtype)
        t_data = self.shared(v_data)
        start  = theano.tensor.iscalar('b')
        stop   = theano.tensor.iscalar('e')
@@ -2275,12 +2280,13 @@ class T_Join_and_Split(unittest.TestCase):
        f = function([a,b], c)
        rng = numpy.random.RandomState(seed=utt.fetch_seed())
-        a_val = rng.rand(1, 4, 1)
+        a_val = rng.rand(1, 4, 1).astype(config.floatX)
-        b_val = rng.rand(1, 3, 1)
+        b_val = rng.rand(1, 3, 1).astype(config.floatX)
        f(a_val, b_val)
        utt.verify_grad((lambda a,b: join(1,a,b)), [a_val, b_val], rng=rng)
        # Should raise an error if dimension 0 does not match
-        self.assertRaises(ValueError, f, rng.rand(2,4,1), b_val)
+        bad_a_val = rng.rand(2, 4, 1).astype(config.floatX)
+        self.assertRaises(ValueError, f, bad_a_val, b_val)
    def test_broadcastable_flag_assignment_mixed_thisaxes(self):
        """
@@ -2295,12 +2301,13 @@ class T_Join_and_Split(unittest.TestCase):
        f = function([a,b], c)
        rng = numpy.random.RandomState(seed=utt.fetch_seed())
-        a_val = rng.rand(2, 4, 1)
+        a_val = rng.rand(2, 4, 1).astype(config.floatX)
-        b_val = rng.rand(1, 4, 1)
+        b_val = rng.rand(1, 4, 1).astype(config.floatX)
        f(a_val, b_val)
        utt.verify_grad((lambda a,b: join(0,a,b)), [a_val, b_val], rng=rng)
        # Should raise an error if b_val.shape[0] is not 1
-        self.assertRaises(TypeError, f, a_val, rng.rand(3,4,1))
+        bad_b_val = rng.rand(3, 4, 1).astype(config.floatX)
+        self.assertRaises(TypeError, f, a_val, bad_b_val)
    def test_broadcastable_flags_all_broadcastable_on_joinaxis(self):
        """
@@ -2315,13 +2322,15 @@ class T_Join_and_Split(unittest.TestCase):
        f = function([a,b], c)
        rng = numpy.random.RandomState(seed=utt.fetch_seed())
-        a_val = rng.rand(1, 4, 1)
+        a_val = rng.rand(1, 4, 1).astype(config.floatX)
-        b_val = rng.rand(1, 4, 1)
+        b_val = rng.rand(1, 4, 1).astype(config.floatX)
        f(a_val, b_val)
        utt.verify_grad((lambda a,b: join(0,a,b)), [a_val, b_val], rng=rng)
        # Should raise an error if length of dimension 0 is not 1
-        self.assertRaises(TypeError, f, rng.rand(2,4,1), b_val)
+        bad_a_val = rng.rand(2, 4, 1).astype(config.floatX)
-        self.assertRaises(TypeError, f, a_val, rng.rand(3,4,1))
+        bad_b_val = rng.rand(3, 4, 1).astype(config.floatX)
+        self.assertRaises(TypeError, f, bad_a_val, b_val)
+        self.assertRaises(TypeError, f, a_val, bad_b_val)
    def test_broadcastable_single_input_broadcastable_dimension(self):
        """
@@ -2336,11 +2345,12 @@ class T_Join_and_Split(unittest.TestCase):
        f = function([a], b)
        rng = numpy.random.RandomState(seed=utt.fetch_seed())
-        a_val = rng.rand(1, 4, 1)
+        a_val = rng.rand(1, 4, 1).astype(config.floatX)
        f(a_val)
        utt.verify_grad((lambda a: join(0,a)), [a_val], rng=rng)
        # Should raise an error if length of dimension 0 is not 1
-        self.assertRaises(TypeError, f, rng.rand(2,4,1))
+        bad_a_val = rng.rand(2, 4, 1).astype(config.floatX)
+        self.assertRaises(TypeError, f, bad_a_val)
    def test_broadcastable_flags_many_dims_and_inputs(self):
        """
@@ -2364,26 +2374,32 @@ class T_Join_and_Split(unittest.TestCase):
        g = function([a,b,c,d,e], f)
        rng = numpy.random.RandomState(seed=utt.fetch_seed())
-        a_val = rng.rand(1, 1, 1, 1, 2, 1)
+        a_val = rng.rand(1, 1, 1, 1, 2, 1).astype(config.floatX)
-        b_val = rng.rand(1, 1, 1, 1, 2, 1)
+        b_val = rng.rand(1, 1, 1, 1, 2, 1).astype(config.floatX)
-        c_val = rng.rand(1, 1, 1, 1, 2, 1)
+        c_val = rng.rand(1, 1, 1, 1, 2, 1).astype(config.floatX)
-        d_val = rng.rand(1, 1, 1, 1, 2, 1)
+        d_val = rng.rand(1, 1, 1, 1, 2, 1).astype(config.floatX)
-        e_val = rng.rand(1, 1, 1, 1, 2, 1)
+        e_val = rng.rand(1, 1, 1, 1, 2, 1).astype(config.floatX)
        g(a_val, b_val, c_val, d_val, e_val)
        utt.verify_grad((lambda a,b,c,d,e: join(0,a,b,c,d,e)),
                [a_val, b_val, c_val, d_val, e_val], rng=rng)
        # Should raise an error if length of dimension 0 is not 1
-        self.assertRaises(TypeError, g, rng.rand(2,1,1,1,2,1), b_val, c_val, d_val, e_val)
+        bad_val = rng.rand(2, 1, 1, 1, 2, 1).astype(config.floatX)
-        self.assertRaises(TypeError, g, a_val, rng.rand(2,1,1,1,2,1), c_val, d_val, e_val)
+        self.assertRaises(TypeError, g, bad_val, b_val, c_val, d_val, e_val)
-        self.assertRaises(TypeError, g, a_val, b_val, rng.rand(2,1,1,1,2,1), d_val, e_val)
+        self.assertRaises(TypeError, g, a_val, bad_val, c_val, d_val, e_val)
-        self.assertRaises(TypeError, g, a_val, b_val, c_val, rng.rand(2,1,1,1,2,1), e_val)
+        self.assertRaises(TypeError, g, a_val, b_val, bad_val, d_val, e_val)
-        self.assertRaises(TypeError, g, a_val, b_val, c_val, d_val, rng.rand(2,1,1,1,2,1))
+        self.assertRaises(TypeError, g, a_val, b_val, c_val, bad_val, e_val)
+        self.assertRaises(TypeError, g, a_val, b_val, c_val, d_val, bad_val)
        # Should raise an error if any dimension other than 4 has length != 1
-        self.assertRaises(ValueError, g, rng.rand(1,2,1,1,2,1), b_val, c_val, d_val, e_val)
+        bad_a_val = rng.rand(1, 2, 1, 1, 2, 1).astype(config.floatX)
-        self.assertRaises(ValueError, g, a_val, rng.rand(1,1,1,1,2,2), c_val, d_val, e_val)
+        bad_b_val = rng.rand(1, 1, 1, 1, 2, 2).astype(config.floatX)
-        self.assertRaises(ValueError, g, a_val, b_val, rng.rand(1,1,2,1,2,1), d_val, e_val)
+        bad_c_val = rng.rand(1, 1, 2, 1, 2, 1).astype(config.floatX)
-        self.assertRaises(ValueError, g, a_val, b_val, c_val, rng.rand(1,2,1,1,2,1), e_val)
+        bad_d_val = rng.rand(1, 2, 1, 1, 2, 1).astype(config.floatX)
-        self.assertRaises(ValueError, g, a_val, b_val, c_val, d_val, rng.rand(1,1,1,2,2,1))
+        bad_e_val = rng.rand(1, 1, 1, 2, 2, 1).astype(config.floatX)
+        self.assertRaises(ValueError, g, bad_a_val, b_val, c_val, d_val, e_val)
+        self.assertRaises(ValueError, g, a_val, bad_b_val, c_val, d_val, e_val)
+        self.assertRaises(ValueError, g, a_val, b_val, bad_c_val, d_val, e_val)
+        self.assertRaises(ValueError, g, a_val, b_val, c_val, bad_d_val, e_val)
+        self.assertRaises(ValueError, g, a_val, b_val, c_val, d_val, bad_e_val)
 class test_comparison(unittest.TestCase):

--- a/theano/tensor/tests/test_opt.py
+++ b/theano/tensor/tests/test_opt.py
@@ -1147,12 +1147,76 @@ def test_log_add():
 def test_local_useless_subtensor():
    x = TT.matrix('x')
-    f = function([x], TT.exp(x)[0:], mode=mode_opt)
-    prog=f.maker.env.toposort()
+    # Test default
-    assert prog[0].op == TT.exp
+    for dims in [(slice(0,None),),
-    assert len(prog)==1
+                 (slice(0,None),slice(0,None)),
-    f([[0,1],[2,3]]) # let debugmode test something
+                 ]:
+        f = function([x], TT.exp(x).__getitem__(dims), mode=mode_opt)
+        #theano.printing.debugprint(f)
+        prog=f.maker.env.toposort()
+        assert prog[0].op == TT.exp
+        assert len(prog)==1
+        f([[0,1,2],[3,4,5]]) # let debugmode test something
+    x_c = specify_shape(x, (2,3))
+    # Test constant
+    for dims, res in [((slice(0,2),), True),
+                 ((slice(0,2),slice(0,None)), True),
+                 ((slice(0,2),slice(0,3)), True),
+                 ((slice(0,None),slice(0,3)), True),
+                 ((slice(0,3),slice(0,13)), True),
+                 ((slice(0,3),slice(0,2)), False),
+                 ((slice(0,1),slice(0,None)), False),
+                 ((slice(0,1),1), False),
+                 ]:
+        f = function([x], TT.exp(x_c).__getitem__(dims), mode=mode_opt)
+        #theano.printing.debugprint(f)
+        prog=f.maker.env.toposort()
+        if res:
+            assert isinstance(prog[0].op, theano.tensor.basic.SpecifyShape), dims
+            assert prog[1].op == TT.exp, dims
+            assert len(prog)==2, dims
+        else:
+            assert any([isinstance(node.op, Subtensor) for node in prog])
+        f([[0,1,2],[3,4,5]]) # let debugmode test something
+    # Test Variable
+    for idx, (dims, res) in enumerate([
+            ((slice(0,x.shape[0]),), True),
+            ((slice(0,x.shape[1]),), False),
+            ((slice(0,x.shape[0]),slice(0,x.shape[1]),), True),
+            ((slice(0,x.shape[0]),slice(0,x.shape[0]),), False),
+            ((slice(0,x.shape[1]),slice(0,x.shape[0]),), False),
+            ((slice(0,x.shape[1]),slice(0,x.shape[1]),), False),
+            ((slice(0,x.shape[1]),2), False),
+            ((slice(0,x.shape[1]),slice(x.shape[0]-x.shape[0],x.shape[1]),), False),
+            ]):
+        f = function([x], TT.exp(x).__getitem__(dims), mode=mode_opt)
+        #theano.printing.debugprint(f)
+        prog=f.maker.env.toposort()
+        if res:
+            assert prog[0].op == TT.exp, dims
+            assert len(prog)==1, dims
+        else:
+            assert any([isinstance(node.op, Subtensor) for node in prog])
+        f([[0,1,2],[3,4,5]]) # let debugmode test something
+    # Test mix Variable and Constant
+    # Currently not supported
+    for idx, (dims, res) in enumerate([
+            ((slice(0,x.shape[0]),slice(0,3)), False),
+            ((slice(0,3),slice(0,x.shape[1])), False),
+            ]):
+        f = function([x], TT.exp(x_c).__getitem__(dims), mode=mode_opt)
+        #theano.printing.debugprint(f)
+        prog=f.maker.env.toposort()
+        if res:
+            assert prog[0].op == TT.exp, dims
+            assert len(prog)==1, dims
+        else:
+            assert any([isinstance(node.op, Subtensor) for node in prog])
+        f([[0,1,2],[3,4,5]]) # let debugmode test something
 class test_local_subtensor_lift(unittest.TestCase):
@@ -2320,7 +2384,7 @@ class T_local_erfc(unittest.TestCase):
 class test_local_remove_switch_const_cond(unittest.TestCase):
    def setUp(self):
-        self.mode = theano.compile.get_default_mode().excluding('constant_folding')
+        self.mode = mode_opt.excluding('constant_folding')
    def test_const0(self):