merge

ad471767 · Pascal Lamblin · e2a47f19 · 2dab32a8 · ad471767 · ad471767
--- a/doc/glossary.txt
+++ b/doc/glossary.txt
@@ -87,11 +87,6 @@ Glossary of terminology
        Part of a function :term:`Mode` -- an object responsible for 'running'
        the compiled function.  Among other things, the linker determines whether computations are carried out with C or Python code.
-    Merge
-        A simple optimization in which redundant :term:`Apply` nodes are
-        combined.  For example, in ``function([x,y], [(x+y)*2, (x+y)*3])`` the merge
-        optimization will ensure that ``x`` and ``y`` are only added once.
    Mode 
        An object providing an :term:`optimizer` and a :term:`linker` that is
        passed to :term:`theano.function`.  It parametrizes how an expression

--- a/doc/index.txt
+++ b/doc/index.txt
@@ -33,7 +33,8 @@ Roughly in order of what you'll want to check out:
 * :ref:`introduction` -- What is Theano?
 * :ref:`tutorial` -- Learn the basics.
-* :ref:`libdoc` -- All Theano's functionality, module by module.
+* :ref:`libdoc` -- Theano's functionality, module by module.
+* :ref:`optimizations` -- Guide to Theano's graph optimizations.
 * :ref:`extending` -- Learn to add a Type, Op, or graph optimization.
 * :ref:`internal` -- How to maintaining Theano, LISA-specific tips, and more...
 * `API <api/>`_ -- The automatically-generated API
@@ -60,6 +61,7 @@ Community
   install
   tutorial/index
   library/index
+   optimizations
   extending/index
   glossary
   links

--- a/doc/introduction.txt
+++ b/doc/introduction.txt
@@ -35,7 +35,7 @@ limited to:
 * using inplace operations wherever it does not interfere with aliasing
 * loop fusion for elementwise sub-expressions
 * improvements to numerical stability (e.g.  :math:`\log(1+\exp(x))` and :math:`\log(\sum_i \exp(x[i]))`)
-* for a complete list, see :ref:`_optimizations`
+* for a complete list, see :ref:`optimizations`
 Theano was written at the LISA_ lab to support rapid development of
 efficient machine learning algorithms. Theano is

--- a/doc/library/index.txt
+++ b/doc/library/index.txt
@@ -5,7 +5,8 @@
 Library Documentation
 =====================
-This documentation covers Theano module-wise.
+This documentation covers Theano module-wise.  This is suited to finding the
+Types and Ops that you can use to build and compile expression graphs.
 .. toctree::
   :maxdepth: 1

--- a/doc/library/tensor/index.txt
+++ b/doc/library/tensor/index.txt
@@ -18,6 +18,7 @@ sanity, they are grouped into the following sections:
    :maxdepth: 1
    basic
+    raw_random
    shared_randomstreams
    nnet
    signal

--- a/doc/library/tensor/raw_random.txt
+++ b/doc/library/tensor/raw_random.txt
+.. _libdoc_tensor_raw_random:
+=============================================
+:mod:`raw_random` -- Low-level random numbers
+=============================================
+.. module:: raw_random
+   :platform: Unix, Windows
+   :synopsis: symbolic random variables
+.. moduleauthor:: LISA
+Raw random provides the random-number drawing functionality, that underlies
+the friendlier :class:`RandomStreams` interface.
+Reference
+=========
+.. class:: RandomStateType(gof.Type)
+    A `Type` for variables that will take ``numpy.random.RandomState`` values.
+.. function:: random_state_type(name=None)
+    Return a new Variable whose ``.type`` is ``random_state_variable``.
+.. class:: RandomFunction(gof.Op)
+    Op that draws random numbers from a numpy.RandomState object.  This Op is
+    parametrized to draw numbers from many possible distributions.
+.. function:: uniform(random_state, size=(), low=0.0, high=1.0)
+    Sample from a uniform distribution between low and high.
+    If the size argument is ambiguous on the number of
+    dimensions, the first argument may be a plain integer
+    to supplement the missing information.
+    :returns: :class:`RandomVariable`, NewRandomState
+.. function:: binomial(random_state, size=(), n=1, p=0.5)
+    Sample n times with probability of success prob for each trial,
+    return the number of successes.
+    If the size argument is ambiguous on the number of
+    dimensions, the first argument may be a plain integer
+    to supplement the missing information.
+    :returns: :class:`RandomVariable`, NewRandomState
+.. function:: normal(random_state, size=(), avg=0.0, std=1.0)
+    Sample from a normal distribution centered on avg with
+    the specified standard deviation (std)
+    If the size argument is ambiguous on the number of
+    dimensions, the first argument may be a plain integer
+    to supplement the missing information.
+    :returns: :class:`RandomVariable`, NewRandomState
+.. function:: random_integers(random_state, size=(), low=0, high=1)
+    Sample a random integer between low and high, both inclusive.
+    If the size argument is ambiguous on the number of
+    dimensions, the first argument may be a plain integer
+    to supplement the missing information.
+    :returns: :class:`RandomVariable`, NewRandomState
+.. function:: permutation(random_state, size=(), n=1)
+    Returns permutations of the integers between 0 and n-1, as many times
+    as required by size. For instance, if size=(p,q), p*q permutations
+    will be generated, and the output shape will be (p,q,n), because each
+    permutation is of size n.
+    If the size argument is ambiguous on the number of dimensions, the first
+    argument may be a plain integer i, which should correspond to len(size).
+    Note that the output will then be of dimension i+1.
+    :returns: :class:`RandomVariable`, NewRandomState
+.. function:: multinomial(random_state, size=(), p_vals=[0.5, 0.5])
+    Sample from a multinomial distribution defined by probabilities pvals,
+    as many times as required by size. For instance, if size=(p,q), p*q
+    samples will be drawn, and the output shape will be (p,q,len(pvals)).
+    If the size argument is ambiguous on the number of dimensions, the first
+    argument may be a plain integer i, which should correspond to len(size).
+    Note that the output will then be of dimension i+1.
+    :returns: :class:`RandomVariable`, NewRandomState
+.. class:: RandomStreamsBase(object)
+    .. method:: binomial(self, size=(), n=1, prob=0.5, ndim=None):
+        Sample n times with probability of success prob for each trial, return the number of
+        successes.
+        If the size argument is ambiguous on the number of dimensions, the first argument may be a
+        plain integer to supplement the missing information.
+    .. method:: uniform(self,  size=(), low=0.0, high=1.0, ndim=None):
+        Sample a tensor of given size whose element from a uniform distribution between low and high.
+        If the size argument is ambiguous on the number of
+        dimensions, the first argument may be a plain integer
+        to supplement the missing information.
+    .. method:: normal(self, size=(), avg=0.0, std=1.0, ndim=None):
+        Usage: normal(random_state, size,
+        Sample from a normal distribution centered on avg with
+        the specified standard deviation (std)
+        If the size argument is ambiguous on the number of
+        dimensions, the first argument may be a plain integer
+        to supplement the missing information.
+    .. method:: random_integers(self, size=(), low=0, high=1, ndim=None):
+        Usage: random_integers(random_state, size, low=0, high=1)
+        Sample a random integer between low and high, both inclusive.
+        If the size argument is ambiguous on the number of
+        dimensions, the first argument may be a plain integer
+        to supplement the missing information.
+    .. method:: permutation(self, size=(), n=1, ndim=None):
+        Returns permutations of the integers between 0 and n-1, as many times
+        as required by size. For instance, if size=(p,q), p*q permutations
+        will be generated, and the output shape will be (p,q,n), because each
+        permutation is of size n.
+        Theano tries to infer the number of dimensions from the length of the size argument, but you
+        may always specify it with the `ndim` parameter.
+        .. note:: 
+            Note that the output will then be of dimension ndim+1.
+    .. method:: multinomial(self, size=(), n=1, pvals=[0.5, 0.5], ndim=None):
+        Sample n times from a multinomial distribution defined by probabilities pvals,
+        as many times as required by size. For instance, if size=(p,q), p*q
+        samples will be drawn, and the output shape will be (p,q,len(pvals)).
+        Theano tries to infer the number of dimensions from the length of the size argument, but you
+        may always specify it with the `ndim` parameter.
+        .. note:: 
+            Note that the output will then be of dimension ndim+1.
+    .. method:: shuffle_row_elements(self, input):
+        Return a variable with every row (rightmost index) shuffled.
+        This uses permutation random variable internally, available via the ``.permutation``
+        attribute of the return value.
--- a/doc/library/tensor/shared_randomstreams.txt
+++ b/doc/library/tensor/shared_randomstreams.txt
@@ -101,10 +101,11 @@ For example:
 Reference
 =========
-.. class:: RandomStreams(object)
+.. class:: RandomStreams(raw_random.RandomStreamsBase)
-    This is a symbolic stand-in for ``numpy.random.RandomState``. It has
+    This is a symbolic stand-in for ``numpy.random.RandomState``. 
-    methods such as `uniform` and `normal` that return symbolic random variables.
+    Random variables of various distributions are instantiated by calls to
+    parent class :class:`raw_random.RandomStreamsBase`.
    .. method:: updates()
@@ -118,34 +119,22 @@ Reference
        `meta_seed` will be used to seed a temporary random number generator,
        that will in turn generate seeds for each of the random variables that
-        has been created by this object.
+        has been created by this object (via `gen`).
        :returns: None
-    .. method:: binomial(self, size, n=1, p=0.5)
+    .. method:: gen(op, *args, **kwargs)
-        Symbolic stand-in for numpy.random.RandomState.binomial
+        Return the random variable from `op(*args, **kwargs)`, but
+        also install special attributes (``.rng`` and ``update``, see
+        :class:`RandomVariable` ) into it.
-        :returns: :class:`RandomVariable` of float64 that will have `shape==size` at run-time.
+        This function also adds the returned variable to an internal list so
+        that it can be seeded later by a call to `seed`.
-    .. method:: uniform(self, size, low=0.0, high=1.0)
+    .. method:: uniform, normal, binomial, multinomial, random_integers, ...
-        Symbolic stand-in for numpy.random.RandomState.uniform
-        :returns: :class:`RandomVariable` of float64 that will have `shape==size` at run-time.
-    .. method:: normal(self, size, loc=0.0, std=1.0)
-        Symbolic stand-in for numpy.random.RandomState.normal
-        :returns: :class:`RandomVariable` of float64 that will have `shape==size` at run-time.
-    .. method:: random_integers(self, size, low=0, high=1)
-        Symbolic stand-in for numpy.random.RandomState.random_integers
-        :returns: :class:`RandomVariable` of int64 that will have `shape==size` at run-time.
+        See :class:`raw_random.RandomStreamsBase`.
 .. class:: RandomVariable(object)
@@ -163,114 +152,3 @@ Reference
        Including this pair in the``updates`` list to function will cause the
        function to update the random number generator feeding this variable.
-.. _libdoc_tensor_raw_random:
-=============================================
-:mod:`raw_random` -- Low-level random numbers
-=============================================
-.. module:: raw_random
-   :platform: Unix, Windows
-   :synopsis: symbolic random variables
-.. moduleauthor:: LISA
-Raw random provides the random-number drawing functionality, that underlies
-the :class:`RandomStreams` interface.
-Reference
-=========
-.. class:: RandomStateType(gof.Type)
-    A `Type` for variables that will take ``numpy.random.RandomState`` values.
-.. class:: RandomFunction(gof.Op)
-    Op that draws random numbers from a numpy.RandomState object.  This Op is
-    parametrized to draw numbers from many distributions.
-.. function:: random_function(fn, dtype, *rfargs, **rfkwargs)
-    Returns a wrapper around RandomFunction which automatically infers the number
-    of dimensions of the output from the given shape. If the shape cannot be inferred,
-    the user can give an integer as first argument, which will be interpreted as the
-    number of dimensions.
-    If the distribution is not scalar (e.g., a multinomial), the output will have
-    more dimensions than what the shape argument suggests. The "ndim_added" keyword
-    arguments allows to specify how many dimensions to add (for a multinomial, 1).
-    The number of dimensions for the following shape arguments can be inferred:
-    * shape(x)
-    * make_lvector(x, y, z, ...)
-    * ndarrays,  constants
-.. function:: uniform(random_state, size, low=0.0, high=1.0)
-    Sample from a uniform distribution between low and high.
-    If the size argument is ambiguous on the number of
-    dimensions, the first argument may be a plain integer
-    to supplement the missing information.
-    :returns: :class:`RandomVariable`, NewRandomState
-.. function:: binomial(random_state, size, n=1, p=0.5)
-    Sample n times with probability of success prob for each trial,
-    return the number of successes.
-    If the size argument is ambiguous on the number of
-    dimensions, the first argument may be a plain integer
-    to supplement the missing information.
-    :returns: :class:`RandomVariable`, NewRandomState
-.. function:: normal(random_state, size, avg=0.0, std=1.0)
-    Sample from a normal distribution centered on avg with
-    the specified standard deviation (std)
-    If the size argument is ambiguous on the number of
-    dimensions, the first argument may be a plain integer
-    to supplement the missing information.
-    :returns: :class:`RandomVariable`, NewRandomState
-.. function:: random_integers(random_state, size, low=0, high=1)
-    Sample a random integer between low and high, both inclusive.
-    If the size argument is ambiguous on the number of
-    dimensions, the first argument may be a plain integer
-    to supplement the missing information.
-    :returns: :class:`RandomVariable`, NewRandomState
-.. function:: permutation(random_state, size, n=1)
-    Returns permutations of the integers between 0 and n-1, as many times
-    as required by size. For instance, if size=(p,q), p*q permutations
-    will be generated, and the output shape will be (p,q,n), because each
-    permutation is of size n.
-    If the size argument is ambiguous on the number of dimensions, the first
-    argument may be a plain integer i, which should correspond to len(size).
-    Note that the output will then be of dimension i+1.
-    :returns: :class:`RandomVariable`, NewRandomState
-.. function:: multinomial(random_state, size, p_vals=[0.5, 0.5])
-    Sample from a multinomial distribution defined by probabilities pvals,
-    as many times as required by size. For instance, if size=(p,q), p*q
-    samples will be drawn, and the output shape will be (p,q,len(pvals)).
-    If the size argument is ambiguous on the number of dimensions, the first
-    argument may be a plain integer i, which should correspond to len(size).
-    Note that the output will then be of dimension i+1.
-    :returns: :class:`RandomVariable`, NewRandomState
--- a/doc/optimizations.txt
+++ b/doc/optimizations.txt
+.. _optimizations:
+==============
+Optimizations
+==============
+Theano applies many kinds of graph optimizations, with different objectives:
+* simplifying and standardizing the form of the expression graph 
+  (e.g.  :term:`merge`, :term:`add canonicalization<add canonicalization>`), 
+* reducing the maximum memory footprint (e.g. :term:`inplace_elemwise`),
+* increasing execution speed (e.g. :term:`constant folding`).
+The optimizations are listed in roughly chronological order.  The table below
+gives a quick summary of the optimizations included in the default modes. 
+The descriptions are brief and point to further reading.
+If you would like to add an additional optimization, refer to
+:ref:`optimization` in the guide to extending Theano.
+..  #COMMENT
+    Since the print_summary method has been added to several OpDBs and
+    optimizers, it is possible to compute an accurate and up-to-date
+    optimization list by typing
+    python -c 'import theano; theano.compile.FAST_RUN.optimizer.print_summary()'
+    python -c 'import theano; theano.compile.FAST_COMPILE.optimizer.print_summary()'
+    etc.
+========================================================= ========= ============
+Optimization                                              FAST_RUN  FAST_COMPILE
+========================================================= ========= ============
+:term:`merge`                                             x         x
+:term:`constant folding<constant folding>`                x
+:term:`shape promotion<shape promotion>`                  x
+:term:`fill promotion <fill promotion>`                   x
+:term:`fill cut<fill cut>`                                x
+:term:`inc_subtensor srlz.<inc_subtensor serialization>`  x
+:term:`reshape_chain`                                     x
+:term:`const. elimination<constant elimination>`          x
+:term:`add canonical. <add canonicalization>`             x
+:term:`mul canonical. <mul canonicalization>`             x
+:term:`dot22`                                             x
+:term:`sparse_dot`                                        x
+:term:`sum_scalar_mul`                                    x
+:term:`neg_neg`                                           x
+:term:`neg_div_neg`                                       x
+:term:`add specialize <add specialization>`               x
+:term:`mul specialize <mul specialization>`               x
+:term:`pow specialize <pow specialization>`               x
+:term:`inplace_setsubtensor`                              x
+:term:`gemm`                                              x
+:term:`inplace_elemwise`                                  x
+:term:`inplace_random`                                    x
+:term:`elemwise fusion`     
+:term:`GPU transfer`
+========================================================= ========= ============
+.. glossary::
+    merge
+        A simple optimization in which redundant :term:`Apply` nodes are
+        combined.  For example, in ``function([x,y], [(x+y)*2, (x+y)*3])`` the merge
+        optimization will ensure that ``x`` and ``y`` are only added once.
+        This optimization is very useful because it frees users to write
+        highly redundant mathematical code.  Theano will make sure to compute
+        just what is necessary.
+        See :class:`MergeOptimizer`.
+    constant folding
+        When all the inputs to an expression are constant, then the expression
+        can be pre-computed at compile-time.
+        See ***TODO***
+    shape promotion
+        See ***TODO***
+    fill promotion 
+        See ***TODO***
+    fill cut             
+        See ***TODO***
+    inc_subtensor serialization  
+        ***TODO***
+    reshape_chain        
+        This optimizes graphs like ``reshape(reshape(x, shape1), shape2)`` -> ``reshape(x, shape2)``
+        See also ***TODO***
+    constant elimination   
+        ***TODO***
+    add canonicalization
+        ***TODO***
+    mul canonicalization       
+        ***TODO***
+    dot22                
+        This simple optimization replaces dot(matrix, matrix) with a special
+        `dot22` op that only works for matrix multiplication.  This op is
+        implemented with a call to GEMM, and sometimes replaced entirely by
+        the :term:`gemm` optimization.
+        See also, ***TODO***.
+    sparse_dot           
+        ***TODO***
+    sum_scalar_mul       
+        This optimizes graphs like ``sum(scalar * tensor)`` -> ``scalar * sum(tensor)``
+        See ***TODO***
+    neg_neg              
+        Composition of two negatives can be cancelled out.
+        See ***TODO***
+    neg_div_neg          
+        Matching negatives in both the numerator and denominator can both be removed.
+        See ***TODO***
+    add specialization       
+        This optimization simplifies expressions involving the addition of
+        zero.
+        See ***TODO***
+    mul specialization       
+        Several special cases of mul() exist, and this optimization tries to
+        recognize them. Some examples include:
+        * ``mul(x,x)`` -> ``x**2``
+        * ``mul(x,0)`` -> ``zeros_like(x)``
+        * ``mul(x, -1)`` -> ``neg(x)``
+        See ***TODO***
+    pow specialization       
+        Several special cases of pow() exist, and this optimization tries to
+        recognize them. Some examples include:
+        * ``pow(x,2)`` -> ``x**2``
+        * ``pow(x,0)`` -> ``ones_like(x)``
+        * ``pow(x, -0.5)`` -> ``inv(sqrt(x))``
+        See also ***TODO***
+    inplace_setsubtensor 
+        In order to be a pure Op, setsubtensor must copy its entire input, and
+        modify just the subtensor in question (possibly a single element).  It
+        is much more efficient to modify that element inplace.
+        See ***TODO***
+    gemm                 
+        Numerical libraries such as MKL and ATLAS implement the BLAS-level-3
+        interface, and provide a function `GEMM` that implements 
+        :math:`Z \leftarrow \alpha A \cdot B + \beta Z`, for matrices `A`, `B`
+        and `Z`, and scalars :math:`\alpha, \beta`.
+        This optimization tries to rearrange a variety of linear algebra
+        expressions into one or more instances of this motif, and replace them
+        each with a single `Gemm` Op.
+        See ***TODO***
+    inplace_elemwise
+        When one of the inputs to an elementwise expression has the same type
+        and shape as the output, and is no longer needed for computation after
+        the elemwise expression is evaluated, then we can reuse the storage of
+        the input to store the output.
+        See ***TODO***
+    inplace_random       
+        Typically when a graph uses random numbers, the RandomState is stored
+        in a shared variable, used once per call and, updated after each function
+        call.  In this common case, it makes sense to update the random number generator in-place.
+        See ***TODO***
+    elemwise fusion
+        See ***TODO***
+    GPU transfer
+        The current strategy for choosing which expressions to evaluate on the
+        CPU and which to evaluate on the GPU is a greedy one.  There are a
+        number of Ops ***TODO*** with GPU implementations and whenever we find
+        a graph copying data from GPU to CPU in order to evaluate an
+        expression that could have been evaluated on the GPU, we substitute
+        the GPU version of that Op for the CPU version.  Likewise if we are
+        copying the output of a Op with a GPU implementation to the GPU, 
+        then we substitute the GPU version for the CPU version.  In this way, if all goes well,
+        this procedure will result in a graph with the following form:
+            1. copy non-shared inputs to GPU
+            2. carry out most/all computations on the GPU
+            3. copy output back to CPU
+        When using a GPU, :func:`shared()` will default to GPU storage for
+        'float32' ndarray arguments, and these shared variables act as seeds
+        for the greedy algorithm.
+        See ***TODO***
--- a/doc/proposals/index.txt
+++ b/doc/proposals/index.txt
@@ -10,4 +10,5 @@ Proposals for new/revised features
    pfunc
    noupdates
+    opt_patterns2
--- a/doc/proposals/opt_patterns2.txt
+++ b/doc/proposals/opt_patterns2.txt
+======================
+Optimization Patterns
+======================
+.. note:
+   Proposed 2010 01 20
+Motivation
+==========
+Theano optimizations are organized at high level, 
+but canonicalization and specialization (C&S) are a mess.  It is difficult to know how a graph will
+be optimized, or to know in which order optimizations will be performed.  
+C&S is also slow because of the guess-and-check nature of node optimization within equilibrium
+optimizers (VERIFY THIS BY PROFILING).
+C&S functions are also very difficult and tedious to write because of
+symmetries in the graph, and because of the lack of standard Op names 
+(e.g. ``T.add``, ``T.and_``, and ``T._shape``).  Gemm and the advanced_indexing -> xent
+optimization are particularly tricky examples.
+Defining a sort of regexp-like approach for describing graph substitutions would ideally be
+less error-prone, less tedious, more efficient to evaluate, easier to document, and all-round
+better.
+Proposal
+========
+In a nutshell: revisit the PatternSub and make it more powerful.
+Olivier B. (original author or PatternSub) mentioned that one of the problems was the annoyance
+of working through DimShuffle
+Olivier B. also suggests writing scalar-related patterns in terms of scalars, and then inferring Tensor-related patterns.
--- a/theano/gof/opt.py
+++ b/theano/gof/opt.py
@@ -73,6 +73,8 @@ class Optimizer(object):
        """
        pass
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s id=%i" %(' '*level, self.__class__.__name__, id(self))
 class FromFunctionOptimizer(Optimizer):
    """WRITEME"""
@@ -81,6 +83,11 @@ class FromFunctionOptimizer(Optimizer):
    def add_requirements(self, env):
        env.extend(toolbox.ReplaceValidate())
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s id=%i" %(' '*level, 
+                str(self.apply),
+                id(self))
 def optimizer(f):
    """decorator for FromFunctionOptimizer"""
    return FromFunctionOptimizer(f)
@@ -137,6 +144,12 @@ class SeqOptimizer(Optimizer, list):
    def __repr__(self):
        return list.__repr__(self)
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s (%i)" %(' '*level, self.__class__.__name__, id(self))
+        for opt in self:
+            opt.print_summary(stream, level=level+2)
 class _metadict:
@@ -354,6 +367,8 @@ class LocalOptimizer(object):
        This is the place to do it."""
        env.extend(toolbox.ReplaceValidate())
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s id=%i" %(' '*level, self.__class__.__name__, id(self))
 class FromFunctionLocalOptimizer(LocalOptimizer):
    """WRITEME"""
@@ -364,6 +379,10 @@ class FromFunctionLocalOptimizer(LocalOptimizer):
        return self._tracks
    def __str__(self):
        return getattr(self, 'name', '<FromFunctionLocalOptimizer instance>')
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s id=%i" %(' '*level, 
+                str(self.transform),
+                id(self))
 def local_optimizer(*tracks):
    def decorator(f):
@@ -388,6 +407,11 @@ class LocalOptGroup(LocalOptimizer):
            if repl:
                return repl
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s id=%i" %(' '*level, self.__class__.__name__, id(self))
+        for lopt in self.opts:
+            lopt.print_summary(stream, level=level+2)
 class _LocalOpKeyOptGroup(LocalOptGroup):
    """WRITEME"""
@@ -466,6 +490,12 @@ class OpRemove(LocalOptimizer):
    def __str__(self):
        return "%s(x) -> x" % (self.op)
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s(%s) id=%i" %(' '*level, 
+                self.__class__.__name__, 
+                str(self.op),
+                id(self))
 class PatternSub(LocalOptimizer):
    """WRITEME
@@ -618,6 +648,12 @@ class PatternSub(LocalOptimizer):
    def __repr__(self):
        return str(self)
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s(%s, %s) id=%i" %(' '*level, 
+                self.__class__.__name__, 
+                str(self.in_pattern),
+                str(self.out_pattern),
+                id(self))
 ##################
@@ -772,6 +808,11 @@ class NavigatorOptimizer(Optimizer):
        if self.local_opt:
            self.local_opt.add_requirements(env)
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s (%i)" %(' '*level, self.__class__.__name__, id(self))
+        self.local_opt.print_summary(stream, level=level+2)
 class TopoOptimizer(NavigatorOptimizer):
    """WRITEME"""
@@ -807,6 +848,7 @@ class TopoOptimizer(NavigatorOptimizer):
        self.detach_updater(env, u)
 class OpKeyOptimizer(NavigatorOptimizer):
    """WRITEME"""
@@ -919,6 +961,10 @@ class EquilibriumOptimizer(NavigatorOptimizer):
        if max_use_abort:
            print >> sys.stderr, "WARNING: EquilibriumOptimizer max'ed out"
+    def print_summary(self, stream=sys.stdout, level=0):
+        print >> stream, "%s%s id=%i" %(' '*level, self.__class__.__name__, id(self))
+        for lopt in self.local_optimizers:
+            lopt.print_summary(stream, level=level+2)
 #################

--- a/theano/gof/optdb.py
+++ b/theano/gof/optdb.py
@@ -95,6 +95,11 @@ class DB(object):
        for variable in variables:
            return variable
+    def print_summary(self, stream=sys.stdout):
+        print >> stream, "%s (id %i)"%(self.__class__.__name__, id(self))
+        print >> stream, "  names", self._names
+        print >> stream, "  db", self.__db__
 class Query(object):

--- a/theano/sandbox/conv.py
+++ b/theano/sandbox/conv.py
@@ -329,7 +329,7 @@ class ConvOp(Op):
            rstride = int(N.ceil(kshp_logical[0] / float(kshp[0])))
            cstride = int(N.ceil(kshp_logical[1] / float(kshp[1])))
            buf = N.zeros((nkern,stacklen)+ self.kshp_logical, dtype=filtersflipped.dtype)
-            if kshp_logical_top_aligned:
+            if self.kshp_logical_top_aligned:
                roffset=coffset=0
            else:
                roffset=(kshp_logical[0] - (kshp[0]*rstride) - 1+rstride) % rstride
@@ -367,6 +367,9 @@ class ConvOp(Op):
        if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical:
            raise NotImplementedError('todo')
+        if self.dx!=1 or self.dy!=1:
+            raise Exception("ERROR: We disable ConvOp.grad now when dx!=1 or dy!=1 as we think their is a high probability of bug in it. We need to raise the error on the gradient to .1!")
        all_shape = self.imshp is not None and self.kshp is not None and self.nkern is not None and self.bsize is not None
        if not all_shape and (self.dx!=1 or self.dy!=1):

--- a/theano/sandbox/cuda/tests/test_nnet.py
+++ b/theano/sandbox/cuda/tests/test_nnet.py
@@ -346,7 +346,7 @@ def cmp_run_conv_nnet2_classif(seed, isize, ksize, bsize,
                               n_iter=10,
                               gpu_only=False,
                               cpu_only=False,
-                               float_atol=1e-08,
+                               float_atol=1e-06,
                               check_isfinite=True,
                               pickle=False,
                               verbose=0,

--- a/theano/sandbox/test_conv.py
+++ b/theano/sandbox/test_conv.py
@@ -498,7 +498,7 @@ class TestConvOp(unittest.TestCase):
        imshps = [(2,3,4)]
        modes = ['valid', 'full']
        unroll = [(0,0,True),(1,1,False),(2,3,False),(1,1,False),(0,0,False)]#(batch,kern,patch)
-        ssizes = [(1,1),(2,2)]
+        ssizes = [(1,1)]#,(2,2)]#grad for ss!=(1,1) is currently disabled!
        for typ in types:
            imgs  = T.TensorType(typ, (False, False, False, False),'imgs')
@@ -550,8 +550,8 @@ class TestConvOp(unittest.TestCase):
                                    print mode, imshp, kshp, un_b, un_k, ss
                                    #TODO the tolerance needed to pass is very high for float32(0.17). Is this acceptable? Expected?
 				    tol = None
-				    if typ=="float32":
+                                    if typ=="float32" and (ss[0]!=1 or ss[1]!=1):
-					tol = 0.17
+					tol = 0.1
                                    utt.verify_grad(test_i, [imgvals],
                                                    cast_to_output_type=True,
                                                    tol=tol)

--- a/theano/scalar/basic.py
+++ b/theano/scalar/basic.py
@@ -210,7 +210,10 @@ class Scalar(Type):
            template <typename T>
            theano_complex%(nbits)s(const T& y) { *this = y; }
+            template <typename TR, typename TI>
+            theano_complex%(nbits)s(const TR& r, const TI& i) { this->real=r; this->imag=i; }
         };
         """
        operator_eq = """
        template <> %(mytype)s & %(mytype)s::operator=<npy_int8>(const npy_int8 & y)
@@ -237,7 +240,37 @@ class Scalar(Type):
        template <> %(mytype)s & %(mytype)s::operator=<theano_complex64>(const theano_complex64 & y)
        { this->real=y.real; this->imag=y.imag; return *this; }
+        template <typename T>
+        const %(mytype)s 
+        operator+(const %(mytype)s &x, const T& y)
+        { return %(mytype)s(x.real+y, x.imag); }
+        template <typename T>
+        const %(mytype)s 
+        operator+(const T& y, const %(mytype)s &x)
+        { return %(mytype)s(x.real+y, x.imag); }
+        template <typename T>
+        const %(mytype)s 
+        operator-(const %(mytype)s &x, const T& y)
+        { return %(mytype)s(x.real-y, x.imag); }
+        template <typename T>
+        const %(mytype)s 
+        operator-(const T& x, const %(mytype)s &y)
+        { return %(mytype)s(x-y.real, -y.imag); }
+        template <typename T>
+        const %(mytype)s 
+        operator*(const %(mytype)s &x, const T& y)
+        { return %(mytype)s(x.real*y, x.imag*y); }
+        template <typename T>
+        const %(mytype)s 
+        operator*(const T& x, const %(mytype)s &y)
+        { return %(mytype)s(x*y.real, x*y.imag); }
        """
        # todo: use C templating
        return template % dict(nbits = 64, half_nbits = 32) \
                + template % dict(nbits = 128, half_nbits = 64) \
@@ -245,8 +278,8 @@ class Scalar(Type):
                + operator_eq % dict(mytype='theano_complex64')
    def c_code_cache_version(self):
-        #return ()
        # no need to put lib.amdlibm here as c_compile_args() are put in the key.
+        return (6,)  # added implemeentations of operators that work with scalar arguments
        return (5,)  #added constructors to theano_complex class
        return (4,)  #explicit T given in specialization of operator= lines.  This makes it compile with open64
@@ -381,13 +414,27 @@ def float_out(*types):
    return float64,
 def upgrade_to_float(*types):
    """
-    This upgrade the types to float32 or float64 to don't loose any precision.
+    Upgrade any int types to float32 or float64 to avoid losing any precision.
    """
    conv = {int8: float32,
            int16: float32,
            int32: float64,
            int64: float64}
    return Scalar(Scalar.upcast(*[conv.get(type, type) for type in types])),
+def same_out_nocomplex(type):
+    if type in complex_types:
+        raise TypeError('complex argument not supported')
+    return type,
+def int_out_nocomplex(*types):
+    for type in types:
+        if type in complex_types:
+            raise TypeError('complex argument not supported')
+    return int64,
+def float_out_nocomplex(*types):
+    for type in types:
+        if type in complex_types:
+            raise TypeError('complex argument not supported')
+    return float64,
 class ScalarOp(Op):
@@ -997,7 +1044,6 @@ class Abs(UnaryScalarOp):
            return "%(z)s = fabs(%(x)s);" % locals()
        if type in complex_types:
            return "%(z)s = sqrt(%(x)s.real*%(x)s.real + %(x)s.imag*%(x)s.imag);" % locals()
-        #complex, other?
        raise NotImplementedError('type not supported', type)
 abs_ = Abs(same_out)
@@ -1010,8 +1056,19 @@ class Sgn(UnaryScalarOp):
    def c_code(self, node, name, (x, ), (z, ), sub):
        #casting is done by compiler
        #TODO: use copysign
+        type = node.inputs[0].type
+        if type in float_types:
            return "%(z)s = (%(x)s >= 0) ? (%(x)s == 0) ? 0.0 : 1.0 : -1.0;" % locals()
-sgn = Sgn(same_out, name = 'sgn')
+        if type in int_types:
+            return "%(z)s = (%(x)s >= 0) ? (%(x)s == 0) ? 0 : 1 : -1;" % locals()
+        raise TypeError() #complex has no sgn
+    def c_code_cache_version(self):
+        s = super(Sgn, self).c_code_cache_version()
+        if s:
+            return (3,) + s
+        else: #if parent is unversioned, we are too
+            return s
+sgn = Sgn(same_out_nocomplex, name = 'sgn')
 class Ceil(UnaryScalarOp):
    def impl(self, x):
@@ -1020,7 +1077,7 @@ class Ceil(UnaryScalarOp):
        return None,
    def c_code(self, node, name, (x,), (z,), sub):
        return "%(z)s = ceil(%(x)s);" % locals()
-ceil = Ceil(same_out, name = 'ceil')
+ceil = Ceil(same_out_nocomplex, name = 'ceil')
 class Floor(UnaryScalarOp):
    def impl(self, x):
@@ -1029,14 +1086,14 @@ class Floor(UnaryScalarOp):
        return None,
    def c_code(self, node, name, (x,), (z,), sub):
        return "%(z)s = floor(%(x)s);" % locals()
-floor = Floor(same_out, name = 'ceil')
+floor = Floor(same_out_nocomplex, name = 'ceil')
 class IRound(UnaryScalarOp):
    def impl(self, x):
        return numpy.asarray(numpy.round(x), dtype = 'int64')
    def c_code(self, node, name, (x, ), (z, ), sub):
        return "%(z)s = round(%(x)s);" % locals()
-iround = IRound(int_out)
+iround = IRound(int_out_nocomplex)
 class Neg(UnaryScalarOp):
    def impl(self, x):
@@ -1080,6 +1137,8 @@ class Log(UnaryScalarOp):
        #todo: the version using log2 seems to be very slightly faster
        # on some machines for some reason, check if it's worth switching
        #return "%(z)s = log2(%(x)s) * 0.69314718055994529;" % locals()
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = log(%(x)s);" % locals()
 log = Log(upgrade_to_float, name = 'log')
@@ -1096,6 +1155,8 @@ class Log2(UnaryScalarOp):
        #backport
        #return gz / (x * math.log(2.0)) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = log2(%(x)s);" % locals()
 log2 = Log2(upgrade_to_float, name = 'log2')
@@ -1105,28 +1166,43 @@ class Log10(UnaryScalarOp):
        return numpy.log10(x)
    def grad(self, (x, ), (gz, )):
        if x.type in grad_types:
-           return gz / (x * math.log(10.0)),
+           return gz / (x * numpy.log(10.0)),
        else:
           return None
        #backport
-        #return gz / (x * math.log(10.0)) if x.type in grad_types else None,
+        #return gz / (x * numpy.log(10.0)) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = log10(%(x)s);" % locals()
 log10 = Log10(upgrade_to_float, name = 'log10')
+class Log1p(UnaryScalarOp):
+    """ log(1+x) """
+    def impl(self, x):
+        return numpy.log1p(x)
+    def grad(self, (x,), (gz,)):
+        return [gz / (1+x)]
+    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
+        return "%(z)s = log1p(%(x)s);" % locals()
+log1p = Log1p(upgrade_to_float, name = 'log1p')
 class Exp(UnaryScalarOp):
    def impl(self, x):
-        return math.exp(x)
+        return numpy.exp(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return gz * exp(x),
      else:
        return None,
     #backport
     #return gz * exp(x) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = exp(%(x)s);" % locals()
 exp = Exp(upgrade_to_float, name = 'exp')
@@ -1147,7 +1223,7 @@ sqr = Sqr(same_out, name = 'sqr')
 class Sqrt(UnaryScalarOp):
    def impl(self, x):
-        return math.sqrt(x)
+        return numpy.sqrt(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return (gz * 0.5) / sqrt(x),
@@ -1156,12 +1232,14 @@ class Sqrt(UnaryScalarOp):
      #backport
      #return (gz * 0.5) / sqrt(x) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = sqrt(%(x)s);" % locals()
 sqrt = Sqrt(upgrade_to_float, name = 'sqrt')
 class Cos(UnaryScalarOp):
    def impl(self, x):
-        return math.cos(x)
+        return numpy.cos(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return -gz * sin(x), 
@@ -1170,12 +1248,14 @@ class Cos(UnaryScalarOp):
      #backport
      #  return -gz * sin(x) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = cos(%(x)s);" % locals()
 cos = Cos(upgrade_to_float, name = 'cos')
 class Sin(UnaryScalarOp):
    def impl(self, x):
-        return math.sin(x)
+        return numpy.sin(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return gz * cos(x), 
@@ -1184,12 +1264,14 @@ class Sin(UnaryScalarOp):
      #backport
      #  return gz * cos(x) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = sin(%(x)s);" % locals()
 sin = Sin(upgrade_to_float, name = 'sin')
 class Tan(UnaryScalarOp):
    def impl(self, x):
-        return math.tan(x)
+        return numpy.tan(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return gz / sqr(cos(x)),
@@ -1198,6 +1280,8 @@ class Tan(UnaryScalarOp):
      #backport
      #return gz / sqr(cos(x)) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = tan(%(x)s);" % locals()
 tan = Tan(upgrade_to_float, name = 'tan')
@@ -1206,7 +1290,7 @@ class Cosh(UnaryScalarOp):
    cosh(x) = (exp(x) + exp(-x)) / 2
    """
    def impl(self, x):
-        return math.cosh(x)
+        return numpy.cosh(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return gz * sinh(x),
@@ -1215,6 +1299,8 @@ class Cosh(UnaryScalarOp):
      #backport
      #return gz * sinh(x) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = cosh(%(x)s);" % locals()
 cosh = Cosh(upgrade_to_float, name = 'cosh')
@@ -1223,7 +1309,7 @@ class Sinh(UnaryScalarOp):
    sinh(x) = (exp(x) - exp(-x)) / 2
    """
    def impl(self, x):
-        return math.sinh(x)
+        return numpy.sinh(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return gz * cosh(x),
@@ -1232,6 +1318,8 @@ class Sinh(UnaryScalarOp):
    #backport
    #return gz * cosh(x) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = sinh(%(x)s);" % locals()
 sinh = Sinh(upgrade_to_float, name = 'sinh')
@@ -1241,7 +1329,7 @@ class Tanh(UnaryScalarOp):
            = (exp(2*x) - 1) / (exp(2*x) + 1)
    """
    def impl(self, x):
-        return math.tanh(x)
+        return numpy.tanh(x)
    def grad(self, (x, ), (gz, )):
      if x.type in grad_types:
        return gz * (1 - sqr(tanh(x))),
@@ -1250,6 +1338,8 @@ class Tanh(UnaryScalarOp):
    #backport
    #return gz * (1 - sqr(tanh(x))) if x.type in grad_types else None,
    def c_code(self, node, name, (x, ), (z, ), sub):
+        if node.inputs[0].type in complex_types:
+            raise NotImplementedError('type not supported', type)
        return "%(z)s = tanh(%(x)s);" % locals()
 tanh = Tanh(upgrade_to_float, name = 'tanh')

--- a/theano/tensor/basic.py
+++ b/theano/tensor/basic.py
@@ -1437,6 +1437,10 @@ def log2(a):
 def log10(a):
    """base 10 logarithm of a"""
+@_scal_elemwise
+def log1p(a):
+    """log(1+a)"""
 @_scal_elemwise
 def sgn(a):
    """sign of a"""
@@ -3466,7 +3470,10 @@ class numeric_grad:
                raise ValueError('argument element %i has wrong shape %s' %(i,str((a.shape,
                    b.shape))))
            errs.append(numpy.max(numeric_grad.abs_rel_err(a,b)))
+        if numpy.all(numpy.isfinite(errs)):
            return numpy.max(errs), numpy.argmax(errs)
+        else:
+            return float('inf'), 0
 def verify_grad(op, pt, n_tests=2, rng=None, eps=None, tol=None, mode=None, cast_to_output_type=False):

--- a/theano/tensor/inplace.py
+++ b/theano/tensor/inplace.py
@@ -100,6 +100,10 @@ def inv_inplace(a):
 def log_inplace(a):
    """base e logarithm of a (inplace on a)"""
+@_scal_inplace
+def log1p_inplace(a):
+    """log(1+a)"""
 @_scal_inplace
 def log2_inplace(a):
    """base 2 logarithm of a (inplace on a)"""

--- a/theano/tensor/nnet.py
+++ b/theano/tensor/nnet.py
@@ -43,7 +43,11 @@ class ScalarSigmoid(scalar.UnaryScalarOp):
        else:
            raise NotImplementedError('only floatingpoint is implemented')
    def c_code_cache_version(self):
-        return (2,)
+        v = super(ScalarSigmoid, self).c_code_cache_version()
+        if v:
+            return (2,) + v
+        else:
+            return v
 scalar_sigmoid = ScalarSigmoid(scalar.upgrade_to_float, name='scalar_sigmoid')
 sigmoid = elemwise.Elemwise(scalar_sigmoid, name='sigmoid')
@@ -74,7 +78,11 @@ class ScalarSoftplus(scalar.UnaryScalarOp):
        else:
            raise NotImplementedError('only floatingpoint is implemented')
    def c_code_cache_version(self):
-        return (2,)
+        v = super(ScalarSoftplus, self).c_code_cache_version()
+        if v:
+            return (2,) + v
+        else:
+            return v
 scalar_softplus = ScalarSoftplus(scalar.upgrade_to_float, name='scalar_softplus')
 softplus = elemwise.Elemwise(scalar_softplus, name='softplus')

--- a/theano/tensor/opt.py
+++ b/theano/tensor/opt.py
@@ -44,23 +44,32 @@ def _fill_chain(new_out, orig_inputs):
        new_out = T.fill(i, new_out)
    return [new_out]
-def get_constant_value(v):
+def get_constant_value(v, fill=False):
    """return the constant value underlying variable `v`
-    If v is the output of dimshuffles, this function digs through them.
+    If v is the output of dimshuffles, fills, this function digs through them.
    If `v` is not some view of constant data, then raise a TypeError.
+    if fill is True, then it returns (v, [...]) where the second term is a list of variables
+    that were used in the fill expressions
    :note: There may be another function similar to this one in the code, but I'm not sure where it
    is.
    """
-    if not isinstance(v, gof.Variable):
-        return v # why would this happen?
    if isinstance(v, gof.Constant):
+        if fill:
+            return v.data, []
        return v.data
    if v.owner and isinstance(v.owner.op, T.DimShuffle):
-        return get_constant_value(v.owner.inputs[0])
+        return get_constant_value(v.owner.inputs[0], fill=fill)
+    if fill:
+        if v.owner and v.owner.op == T.fill:
+            shape, val = v.owner.inputs
+            # fill(a,b) fills the shape of 'a' filled with 'b'
+            rval, rshapes = get_constant_value(val, fill=fill)
+            return rval, rshapes + [shape]
    raise TypeError(v)
 @gof.optimizer
@@ -1122,6 +1131,30 @@ register_specialize(local_add_specialize)
 mul_canonizer = in2out(gof.LocalOptGroup(local_mul_canonizer, local_fill_cut, local_fill_sink))
+@register_specialize
+@gof.local_optimizer([T.log])
+def local_log1p(node):
+    # log(1+exp(x)) -> log1p(x)
+    if node.op == T.log:
+        log_arg, = node.inputs
+        if log_arg.owner and log_arg.owner.op == T.add:
+            add_inputs = log_arg.owner.inputs
+            consts = [0]
+            fills = []
+            nonconsts = []
+            for add_in in add_inputs:
+                try:
+                    v, f = get_constant_value(add_in, fill=True)
+                    consts.append(v)
+                    fills.extend(f)
+                except:
+                    nonconsts.append(add_in)
+            if nonconsts:
+                if numpy.allclose(numpy.sum(consts), 1):
+                    if len(nonconsts)==1:
+                        return _fill_chain(T.log1p(nonconsts[0]), fills)
+                    else:
+                        return _fill_chain(T.log1p(T.add(*nonconsts)), fills)
 def add_calculate(num, denum, aslist = False, out_type=None):

--- a/theano/tensor/randomstreams.py
+++ b/theano/tensor/randomstreams.py
@@ -6,7 +6,7 @@ import numpy
 from theano.compile import module, In, Component
 from theano.gof import Container
-from theano.tensor import raw_random, permute_row_elements
+from theano.tensor import raw_random
 class RandomStreamsInstance(object):
    """RandomStreamsInstance"""
@@ -86,7 +86,7 @@ class RandomStreamsInstance(object):
                return
        raise KeyError(item)
-class RandomStreams(Component):
+class RandomStreams(Component, raw_random.RandomStreamsBase):
    """Module component with similar interface to numpy.random (numpy.random.RandomState)"""
    random_state_variables = []
@@ -147,52 +147,3 @@ class RandomStreams(Component):
        self.random_state_variables.append((random_state_variable, new_r))
        return out
-    def binomial(self, *args, **kwargs):
-        """Return a symbolic binomial sample
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.binomial, *args, **kwargs)
-    def uniform(self, *args, **kwargs):
-        """Return a symbolic uniform sample
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.uniform, *args, **kwargs)
-    def normal(self, *args, **kwargs):
-        """Return a symbolic normal sample
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.normal, *args, **kwargs)
-    def random_integers(self, *args, **kwargs):
-        """Return a symbolic random integer sample
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.random_integers, *args, **kwargs)
-    def permutation(self, *args, **kwargs):
-        """Return a symbolic permutation of integers
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.permutation, *args, **kwargs)
-    def multinomial(self, *args, **kwargs):
-        """Return a symbolic multinomial sample
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.multinomial, *args, **kwargs)
-    def shuffle_row_elements(self, input):
-        """Return a variable with every row (rightmost index) shuffled"""
-        perm = self.permutation(input.ndim-1, input.shape[:-1], input.shape[-1])
-        shuffled = permute_row_elements(input, perm)
-        return shuffled
--- a/theano/tensor/raw_random.py
+++ b/theano/tensor/raw_random.py
@@ -50,7 +50,7 @@ class RandomFunction(gof.Op):
    """
-    def __init__(self, fn, outtype, *args, **kwargs):
+    def __init__(self, fn, outtype, inplace=False, ndim_added=0 ):
        """
        :param fn: a member function of numpy.RandomState
        Technically, any function with a signature like the ones in numpy.random.RandomState
@@ -72,19 +72,18 @@ class RandomFunction(gof.Op):
            addition to the shape's dimensions (used in multinomial and
            permutation).
        """
-        self.__setstate__([fn, outtype, args, kwargs])
+        self.__setstate__([fn, outtype, inplace, ndim_added])
    def __eq__(self, other):
        return type(self) == type(other) \
            and self.fn == other.fn\
            and self.outtype == other.outtype\
-            and self.args == other.args\
            and self.inplace == other.inplace\
            and self.ndim_added == other.ndim_added
    def __hash__(self):
        return hash(type(self)) ^ hash(self.fn) \
-                ^ hash(self.outtype) ^ hash(self.args)\
+                ^ hash(self.outtype)  \
                ^ hash(self.inplace) ^ hash(self.ndim_added)
    def __getstate__(self):
@@ -92,7 +91,7 @@ class RandomFunction(gof.Op):
    def __setstate__(self, state):
        self.state = state
-        fn, outtype, args, kwargs = state
+        fn, outtype, inplace, ndim_added = state
        if isinstance(fn, str):
          self.fn = getattr(numpy.random.RandomState, fn)
        else:
@@ -100,11 +99,10 @@ class RandomFunction(gof.Op):
        #backport
        #self.fn = getattr(numpy.random.RandomState, fn) if isinstance(fn, str) else fn
        self.outtype = outtype
-        self.args = tuple(tensor.as_tensor_variable(arg) for arg in args)
+        self.inplace = inplace
-        self.inplace = kwargs.pop('inplace', False)
        if self.inplace:
            self.destroy_map = {0: [0]}
-        self.ndim_added = kwargs.pop('ndim_added', 0)
+        self.ndim_added = ndim_added
    def make_node(self, r, shape, *args):
        """
@@ -147,29 +145,9 @@ class RandomFunction(gof.Op):
        # convert args to TensorType instances
        # and append enough None's to match the length of self.args
        args = map(tensor.as_tensor_variable, args)
-        if len(args) > len(self.args):
-            raise TypeError('Too many args for this kind of random generator')
-        args += (None,) * (len(self.args) - len(args))
-        assert len(args) == len(self.args)
-        # build the inputs to this Apply by overlaying args on self.args
-        inputs = []
-        for arg, default in zip(args, self.args):
-            # The NAACL test is failing because of this assert.
-            # I am commenting out the requirement that the dtypes match because it doesn't seem
-            # to me to be necessary (although I agree it is typically true).
-            # -JB 20090819
-            #assert arg is None or default.type.dtype == arg.type.dtype
-            if arg is None:
-              input = default
-            else:
-              input = arg
-            #backport
-            #input = default if arg is None else arg
-            inputs.append(input)
        return gof.Apply(self,
-                         [r, shape] + inputs,
+                         [r, shape] + args,
                         [r.type(), self.outtype()])
    def perform(self, node, inputs, (rout, out)):
@@ -198,102 +176,79 @@ class RandomFunction(gof.Op):
    def grad(self, inputs, outputs):
        return [None for i in inputs]
+def _infer_ndim(ndim, shape):
+    """returns int, variable pair, such that the int is the length of the variable, and the
+    variable is an integer or uint vector
+    """
+    if isinstance(shape, (tuple, list)):
+        v_shape = tensor.TensorConstant(type=tensor.lvector, data=numpy.asarray(shape, dtype='int64'))
+    else:
+        v_shape = tensor.as_tensor_variable(shape)
+    if not (v_shape.dtype.startswith('int') or v_shape.dtype.startswith('uint')):
+        raise TypeError('shape must be an integer vector or list')
+    if ndim is None:
+        #infer ndim
+        ndim = tensor.get_vector_length(v_shape)
-__oplist_constructor_list = []
+    return ndim, v_shape
-"""List of functions to be listed as op constructors in the oplist (`gen_oplist`, doc/oplist.txt)."""
-def constructor(f):
-    """Add `f` to :doc:`oplist`.
-    Make `f` appear as a constructor in the oplist (`gen_oplist`, doc/oplist.txt).
+def uniform(random_state, size=(), low=0.0, high=1.0, ndim=None):
    """
-    __oplist_constructor_list.append(f)
+    Sample from a uniform distribution between low and high.
-    return f
-def __oplist_tag(thing, tag):
-    tags = getattr(thing, '__oplist_tags', [])
-    tags.append(tag)
-    thing.__oplist_tags = tags
-def random_function(fn, dtype, *rfargs, **rfkwargs):
+    If the size argument is ambiguous on the number of
+    dimensions, the first argument may be a plain integer
+    to supplement the missing information.
    """
-    Returns a wrapper around RandomFunction which automatically infers the number
+    ndim, size = _infer_ndim(ndim, size)
-    of dimensions of the output from the given shape. If the shape cannot be inferred,
+    op = RandomFunction('uniform', 
-    the user can give an integer as first argument, which will be interpreted as the
+            tensor.TensorType(dtype = 'float64', broadcastable = (False,)*ndim) )
-    number of dimensions.
+    return op(random_state, size, low, high)
-    If the distribution is not scalar (e.g., a multinomial), the output will have
+def binomial(random_state, size=(), n=1, prob=0.5, ndim=None):
-    more dimensions than what the shape argument suggests. The "ndim_added" keyword
+    """
-    arguments allows to specify how many dimensions to add (for a multinomial, 1).
+    Sample n times with probability of success prob for each trial, return the number of
+    successes.
-    The number of dimensions for the following shape arguments can be inferred:
+    If the size argument is ambiguous on the number of dimensions, the first argument may be a
-    - shape(x)
+    plain integer to supplement the missing information.
-    - make_lvector(x, y, z, ...)
-    - constants
    """
-    @constructor
+    ndim, size = _infer_ndim(ndim, size)
-    def f(r, ndim, *args, **kwargs):
+    op = RandomFunction('binomial', 
-        if isinstance(ndim, int):
+            tensor.TensorType(dtype = 'int64', broadcastable = (False,)*ndim) )
-            shape, args = args[0], args[1:]
+    return op(random_state, size, n, prob)
-        else:
-            shape = ndim
+def normal(random_state, size=(),  avg=0.0, std=1.0, ndim=None):
-            if shape == () or shape == []:
+    """
-                shape = tensor.TensorConstant(type = tensor.lvector, data = shape)
+    Usage: normal(random_state, size,
-            else:
+    Sample from a normal distribution centered on avg with
-                shape = tensor.as_tensor_variable(shape)
+    the specified standard deviation (std)
-            ndim = tensor.get_vector_length(shape)
-            if ndim is None:
+    If the size argument is ambiguous on the number of
-                raise ValueError('Cannot infer the number of dimensions from the shape argument.')
+    dimensions, the first argument may be a plain integer
-        # note: rf could be cached for future use
+    to supplement the missing information.
-        ndim_added = rfkwargs.get('ndim_added', 0)
+    """
-        ndim += ndim_added
+    ndim, size = _infer_ndim(ndim, size)
-        rf = RandomFunction(fn, tensor.TensorType(dtype = dtype, broadcastable = (False,)*ndim), *rfargs, **rfkwargs)
+    op = RandomFunction('normal', 
-        return rf(r, shape, *args, **kwargs)
+            tensor.TensorType(dtype = 'float64', broadcastable = (False,)*ndim) )
-    return f
+    return op(random_state, size, avg, std)
+def random_integers(random_state, size=(), low=0, high=1, ndim=None):
-# we need to provide defaults for all the functions in order to infer the argument types...
+    """
+    Usage: random_integers(random_state, size, low=0, high=1)
-uniform = random_function('uniform', 'float64', 0.0, 1.0)
+    Sample a random integer between low and high, both inclusive.
-uniform.__doc__ = """
-Usage: uniform(random_state, size, low=0.0, high=1.0)
+    If the size argument is ambiguous on the number of
-Sample from a uniform distribution between low and high.
+    dimensions, the first argument may be a plain integer
+    to supplement the missing information.
-If the size argument is ambiguous on the number of
+    """
-dimensions, the first argument may be a plain integer
+    ndim, size = _infer_ndim(ndim, size)
-to supplement the missing information.
+    op = RandomFunction('random_integers', 
-"""
+            tensor.TensorType(dtype = 'int64', broadcastable = (False,)*ndim) )
+    return op(random_state, size, low, high)
-binomial = random_function('binomial', 'int64', 1, 0.5)
-binomial.__doc__ = """
-Usage: binomial(random_state, size, n=1, prob=0.5)
-Sample n times with probability of success prob for each trial,
-return the number of successes.
-If the size argument is ambiguous on the number of
-dimensions, the first argument may be a plain integer
-to supplement the missing information.
-"""
-normal = random_function('normal', 'float64', 0.0, 1.0)
-normal.__doc__ = """
-Usage: normal(random_state, size, avg=0.0, std=1.0)
-Sample from a normal distribution centered on avg with
-the specified standard deviation (std)
-If the size argument is ambiguous on the number of
-dimensions, the first argument may be a plain integer
-to supplement the missing information.
-"""
-random_integers = random_function('random_integers', 'int64', 0, 1)
-random_integers.__doc__ = """
-Usage: random_integers(random_state, size, low=0, high=1)
-Sample a random integer between low and high, both inclusive.
-If the size argument is ambiguous on the number of
-dimensions, the first argument may be a plain integer
-to supplement the missing information.
-"""
 def permutation_helper(random_state, n, shape):
    """Helper function to generate permutations from integers.
@@ -318,43 +273,144 @@ def permutation_helper(random_state, n, shape):
    out = numpy.zeros(out_shape, int)
    for i in numpy.ndindex(*shape):
        out[i] = random_state.permutation(n)
+    print 'RETURNING', out.shape
    return out
-permutation = random_function(permutation_helper, 'int64', 1, ndim_added=1)
+def permutation(random_state, size=(), n=1, ndim=None):
-permutation.__doc__ = """
+    """
-Usage: permutation(random_state, size, n)
+    Returns permutations of the integers between 0 and n-1, as many times
-Returns permutations of the integers between 0 and n-1, as many times
+    as required by size. For instance, if size=(p,q), p*q permutations
-as required by size. For instance, if size=(p,q), p*q permutations
+    will be generated, and the output shape will be (p,q,n), because each
-will be generated, and the output shape will be (p,q,n), because each
+    permutation is of size n.
-permutation is of size n.
-If the size argument is ambiguous on the number of dimensions, the first
+    Theano tries to infer the number of dimensions from the length of the size argument, but you
-argument may be a plain integer i, which should correspond to len(size).
+    may always specify it with the `ndim` parameter.
-Note that the output will then be of dimension i+1.
-"""
-multinomial = random_function('multinomial', 'float64', 1, [0.5, 0.5], ndim_added=1)
+    .. note:: 
-multinomial.__doc__ = """
+        Note that the output will then be of dimension ndim+1.
-Usage: multinomial(random_state, size, pvals)
+    """
+    ndim, size = _infer_ndim(ndim, size)
+    print "NDIM", ndim, size
+    op = RandomFunction(permutation_helper, 
+            tensor.TensorType(dtype='int64', broadcastable=(False,)*(ndim+1)),
+            ndim_added=1)
+    return op(random_state, size, n)
+def multinomial(random_state, size=(), n=1, pvals=[0.5, 0.5], ndim=None):
+    """
+    Sample n times from a multinomial distribution defined by probabilities pvals,
+    as many times as required by size. For instance, if size=(p,q), p*q
+    samples will be drawn, and the output shape will be (p,q,len(pvals)).
-Sample from a multinomial distribution defined by probabilities pvals,
+    Theano tries to infer the number of dimensions from the length of the size argument, but you
-as many times as required by size. For instance, if size=(p,q), p*q
+    may always specify it with the `ndim` parameter.
-samples will be drawn, and the output shape will be (p,q,len(pvals)).
-If the size argument is ambiguous on the number of dimensions, the first
+    .. note:: 
-argument may be a plain integer i, which should correspond to len(size).
+        Note that the output will then be of dimension ndim+1.
-Note that the output will then be of dimension i+1.
+    """
-"""
+    ndim, size = _infer_ndim(ndim, size)
+    op = RandomFunction('multinomial', 
+            tensor.TensorType(dtype = 'int64', broadcastable = (False,)*(ndim+1)),
+            ndim_added=1)
+    return op(random_state, size, n, pvals)
 @gof.local_optimizer([None])
 def random_make_inplace(node):
    op = node.op
    if isinstance(op, RandomFunction) and not op.inplace:
-        opkwargs = dict(inplace=True, ndim_added=op.ndim_added)
+        new_op = RandomFunction(op.fn, op.outtype, inplace=True, ndim_added=op.ndim_added)
-        return RandomFunction(op.fn, op.outtype, *op.args, **opkwargs).make_node(*node.inputs).outputs
+        return new_op.make_node(*node.inputs).outputs
    return False
 optdb.register('random_make_inplace', opt.in2out(random_make_inplace, ignore_newtrees=True), 99, 'fast_run', 'inplace')
+class RandomStreamsBase(object):
+    def binomial(self, size=(), n=1, prob=0.5, ndim=None):
+        """
+        Sample n times with probability of success prob for each trial, return the number of
+        successes.
+        If the size argument is ambiguous on the number of dimensions, the first argument may be a
+        plain integer to supplement the missing information.
+        """
+        return self.gen(binomial, size, n, prob, ndim=ndim)
+    def uniform(self,  size=(), low=0.0, high=1.0, ndim=None):
+        """
+        Sample a tensor of given size whose element from a uniform distribution between low and high.
+        If the size argument is ambiguous on the number of
+        dimensions, the first argument may be a plain integer
+        to supplement the missing information.
+        """
+        return self.gen(uniform, size, low, high, ndim=ndim)
+    def normal(self, size=(), avg=0.0, std=1.0, ndim=None):
+        """
+        Usage: normal(random_state, size,
+        Sample from a normal distribution centered on avg with
+        the specified standard deviation (std)
+        If the size argument is ambiguous on the number of
+        dimensions, the first argument may be a plain integer
+        to supplement the missing information.
+        """
+        return self.gen(normal, size, avg, std, ndim=ndim)
+    def random_integers(self, size=(), low=0, high=1, ndim=None):
+        """
+        Usage: random_integers(random_state, size, low=0, high=1)
+        Sample a random integer between low and high, both inclusive.
+        If the size argument is ambiguous on the number of
+        dimensions, the first argument may be a plain integer
+        to supplement the missing information.
+        """
+        return self.gen(random_integers, size, low, high, ndim=ndim)
+    def permutation(self, size=(), n=1, ndim=None):
+        """
+        Returns permutations of the integers between 0 and n-1, as many times
+        as required by size. For instance, if size=(p,q), p*q permutations
+        will be generated, and the output shape will be (p,q,n), because each
+        permutation is of size n.
+        Theano tries to infer the number of dimensions from the length of the size argument, but you
+        may always specify it with the `ndim` parameter.
+        .. note:: 
+            Note that the output will then be of dimension ndim+1.
+        """
+        return self.gen(permutation, size, n, ndim=ndim)
+    def multinomial(self, size=(), n=1, pvals=[0.5, 0.5], ndim=None):
+        """
+        Sample n times from a multinomial distribution defined by probabilities pvals,
+        as many times as required by size. For instance, if size=(p,q), p*q
+        samples will be drawn, and the output shape will be (p,q,len(pvals)).
+        Theano tries to infer the number of dimensions from the length of the size argument, but you
+        may always specify it with the `ndim` parameter.
+        .. note:: 
+            Note that the output will then be of dimension ndim+1.
+        """
+        return self.gen(multinomial, size, n, pvals, ndim=ndim)
+    def shuffle_row_elements(self, input):
+        """Return a variable with every row (rightmost index) shuffled.
+        This uses permutation random variable internally, available via the ``.permutation``
+        attribute of the return value.
+        """
+        perm = self.permutation(size=input.shape[:-1], n=input.shape[-1], ndim=input.ndim-1)
+        shuffled = tensor.permute_row_elements(input, perm)
+        shuffled.permutation = perm
+        return shuffled
--- a/theano/tensor/shared_randomstreams.py
+++ b/theano/tensor/shared_randomstreams.py
@@ -22,7 +22,7 @@ def randomstate_constructor(value, name=None, strict=False):
            name=name,
            strict=strict)
-class RandomStreams(object):
+class RandomStreams(raw_random.RandomStreamsBase):
    """Module component with similar interface to numpy.random (numpy.random.RandomState)"""
    state_updates = []
@@ -100,7 +100,6 @@ class RandomStreams(object):
        """
        item.value = val
    def gen(self, op, *args, **kwargs):
        """Create a new random stream in this container.
@@ -123,64 +122,3 @@ class RandomStreams(object):
        self.state_updates.append(out.update)
        return out
-    def binomial(self, *args, **kwargs):
-        """Return a symbolic binomial sample
-        *args and **kwargs will be passed to numpy.random.RandomState.binomial
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.binomial, *args, **kwargs)
-    def uniform(self, *args, **kwargs):
-        """Return a symbolic uniform sample
-        *args and **kwargs will be passed to numpy.random.RandomState.uniform
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.uniform, *args, **kwargs)
-    def normal(self, *args, **kwargs):
-        """Return a symbolic normal sample
-        *args and **kwargs will be passed to numpy.random.RandomState.normal
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.normal, *args, **kwargs)
-    def random_integers(self, *args, **kwargs):
-        """Return a symbolic random integer sample
-        *args and **kwargs will be passed to numpy.random.RandomState.random_integers
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.random_integers, *args, **kwargs)
-    def permutation(self, *args, **kwargs):
-        """Return a symbolic permutation of integers
-        *args and **kwargs will be passed to numpy.random.RandomState.permutation
-        This is a shortcut for a call to `self.gen`
-        """
-        return self.gen(raw_random.permutation, *args, **kwargs)
-    def multinomial(self, *args, **kwargs):
-        """Return a symbolic multinomial sample
-        This is a shortcut for a call to `self.gen`
-        *args and **kwargs will be passed to numpy.random.RandomState.multinomial
-        """
-        return self.gen(raw_random.multinomial, *args, **kwargs)
-    def shuffle_row_elements(self, input):
-        """Return a variable with every row (rightmost index) shuffled"""
-        perm = self.permutation(input.ndim-1, input.shape[:-1], input.shape[-1])
-        shuffled = permute_row_elements(input, perm)
-        return shuffled
--- a/theano/tensor/tests/test_basic.py
+++ b/theano/tensor/tests/test_basic.py
@@ -444,6 +444,17 @@ Log10InplaceTester = makeBroadcastTester(op = inplace.log10_inplace,
                                          grad = _grad_broadcast_unary_positive,
                                          inplace = True)
+Log1pTester = makeBroadcastTester(op = log1p,
+                                  expected = numpy.log1p,
+                                  good = _good_broadcast_unary_positive,
+                                  grad = _grad_broadcast_unary_positive)
+Log1pInplaceTester = makeBroadcastTester(op = inplace.log1p_inplace,
+                                         expected = numpy.log1p,
+                                         good = _good_broadcast_unary_positive,
+                                         grad = _grad_broadcast_unary_positive,
+                                         inplace = True)
 SqrtTester = makeBroadcastTester(op = sqrt,
                                   expected = numpy.sqrt,
                                   good = _good_broadcast_unary_positive,
@@ -1088,9 +1099,7 @@ class test_bitwise(unittest.TestCase):
        self.failUnless(numpy.all(v == (~l)), (l, r, v))
 class T_add(unittest.TestCase):
    def setUp(self):
        utt.seed_rng()
@@ -1117,8 +1126,11 @@ class T_add(unittest.TestCase):
    def test_grad_col(self):
        utt.verify_grad(add, [numpy.random.rand(3, 5), numpy.random.rand(3, 1)])
-class T_exp(unittest.TestCase):
+class T_ceil(unittest.TestCase):
+    def test_complex(self):
+        self.assertRaises(TypeError, ceil, zvector())
+class T_exp(unittest.TestCase):
    def test_grad_0(self):
        utt.verify_grad(exp, [
            numpy.asarray([[ 1.5089518 ,  1.48439076, -4.7820262 ],
@@ -1128,6 +1140,19 @@ class T_exp(unittest.TestCase):
            numpy.asarray([[ 1.5089518 ,  1.48439076, -4.7820262 ],
            [ 2.04832468,  0.50791564, -1.58892269]])])
+    def test_int(self):
+        x = ivector()
+        f = function([x], exp(x))
+        exp_3 = f([3])
+        assert exp_3.dtype == 'float64'
+    def test_complex(self):
+        x = zvector()
+        assert exp(x).dtype == 'complex128'
+        f = function([x], exp(x))
+        exp_3 = f([3+2j])
+        assert numpy.allclose(exp_3, numpy.exp(3+2j))
 class T_divimpl(unittest.TestCase):
    def test_impls(self):
        i = iscalar()

--- a/theano/tensor/tests/test_opt.py
+++ b/theano/tensor/tests/test_opt.py
@@ -7,7 +7,7 @@ import theano
 from theano import gof
 from theano.tensor.opt import *
 from theano import tensor
-from theano.tensor import TensorType
+from theano.tensor import TensorType, inplace
 from theano.gof import Env
 from theano.tensor.elemwise import DimShuffle
 from theano import pprint, shared
@@ -78,55 +78,8 @@ def test_add_canonizer_problem0():
    r = segment_labels * 5
    f = function([label], r)
-# class _test_inplace_opt(unittest.TestCase):
-#     def test_straightforward(self):
-#         x, y, z = inputs()
-#         e = x + y + z
-#         g = Env([x, y], [e])
-#         self.failUnless(str(g) == "[Broadcast{Add}(Broadcast{Add}(x, y), z)]")
-#         inplace_optimizer.optimize(g)
-#         self.failUnless(str(g) == "[Broadcast{Add}{0: 0}(Broadcast{Add}{0: 0}(x, y), z)]")
-#     def test_multiple_uses(self):
-#         x, y, z = inputs()
-#         e0 = x + y
-#         e1 = x * y
-#         g = Env([x, y], [e0, e1])
-#         self.failUnless(str(g) == "[Broadcast{Add}(x, y), Broadcast{Mul}(x, y)]")
-#         inplace_optimizer.optimize(g)
-#         self.failUnless(str(g) == "[Broadcast{Add}{0: 0}(x, y), Broadcast{Mul}(x, y)]" \
-#             or str(g) == "[Broadcast{Add}(x, y), Broadcast{Mul}{0: 0}(x, y)]")
-#     def test_user_inplace(self):
-#         x, y, z = inputs()
-#         e0 = x + y
-#         e1 = tensor._mul_inplace(x, y)
-#         g = Env([x, y], [e0, e1])
-#         self.failUnless(str(g) == "[Broadcast{Add}(x, y), Broadcast{Mul}{0: 0}(x, y)]")
-#         inplace_optimizer.optimize(g)
-#         self.failUnless(str(g) == "[Broadcast{Add}(x, y), Broadcast{Mul}{0: 0}(x, y)]")
-#     def test_inplace_on_second_argument(self):
-#         x, y, z = inputs()
-#         e0 = x + y
-#         e1 = tensor._mul_inplace(x, z)
-#         g = Env([x, y], [e0, e1])
-#         self.failUnless(str(g) == "[Broadcast{Add}(x, y), Broadcast{Mul}{0: 0}(x, z)]")
-#         inplace_optimizer.optimize(g)
-#         self.failUnless(str(g) == "[Broadcast{Add}{0: 1}(x, y), Broadcast{Mul}{0: 0}(x, z)]")
 from theano.tensor import *
-#from sandbox import pprint
 class test_greedy_distribute(unittest.TestCase):
    def test_main(self):
        a, b, c, d, x, y, z = matrices('abcdxyz')
@@ -597,191 +550,6 @@ def test_local_shape_lift_dot():
                print pprint(g.outputs[0]), args_to_result[(x,y)]
                assert pprint(g.outputs[0]) == args_to_result[(x,y)]
-#     def test_plusmin(self):
-#         x, y, z = inputs()
-#         a, b, c, d = more_inputs()
-# #        e = x - x
-# #        e = (2.0 + x) - (2.0 + y)
-# #        e = (2.0 + x) - (4.0 + y)
-# #        e = x - (y - z)
-# #        e = (x + y) - x
-# #        e = (x - y) + (y - z) + (z - x)
-# #        e = (a - b) + (b - c) + (c - d)
-# #        e = x + -y
-# #        e = a - b - b + a + b + c + b - c
-# #        e = x + log(y) - x + y
-#         e = 2.0 + x + 4.0
-#         g = Env([x, y, z, a, b, c, d], [e])
-#         print g
-#         gof.ConstantFinder().optimize(g)
-#         addfn = lambda *inputs: sum(inputs)
-#         subfn = lambda x, y: x - y
-#         negfn = lambda x: -x
-#         Canonizer(Add, Sub, Neg, addfn, subfn, negfn).optimize(g)
-#         print g
-#     def test_both(self):
-#         x, y, z = inputs()
-#         a, b, c, d = more_inputs()
-#         e0 = (x * y / x)
-#         e = e0 + e0 - e0
-#         g = Env([x, y, z, a, b, c, d], [e])
-#         print g
-#         gof.ConstantFinder().optimize(g)
-#         mulfn = lambda *inputs: reduce(lambda x, y: x * y, (1,) + inputs)
-#         divfn = lambda x, y: x / y
-#         invfn = lambda x: 1 / x
-#         Canonizer(Mul, Div, Inv, mulfn, divfn, invfn).optimize(g)
-#         addfn = lambda *inputs: reduce(lambda x, y: x + y, (0,) + inputs)
-#         subfn = lambda x, y: x - y
-#         negfn = lambda x: -x
-#         Canonizer(Add, Sub, Neg, addfn, subfn, negfn).optimize(g)
-#         print g
-#     def test_group_powers(self):
-#         x, y, z, a, b, c, d = floats('xyzabcd')
-###################
-#         c1, c2 = constant(1.), constant(2.)
-#         #e = pow(x, c1) * pow(x, y) / pow(x, 7.0) # <-- fucked
-#         #f = -- moving from div(mul.out, pow.out) to pow(x, sub.out)
-#         e = div(mul(pow(x, 2.0), pow(x, y)), pow(x, 7.0))
-#         g = Env([x, y, z, a, b, c, d], [e])
-#         print g
-#         print g.inputs, g.outputs, g.orphans
-#         f = sub(add(2.0, y), add(7.0))
-#         g.replace(e, pow(x, f))
-#         print g
-#         print g.inputs, g.outputs, g.orphans
-#         g.replace(f, sub(add(2.0, y), add(7.0))) # -- moving from sub(add.out, add.out) to sub(add.out, add.out)
-#         print g
-#         print g.inputs, g.outputs, g.orphans
-###################
-# #        e = x * exp(y) * exp(z)
-# #        e = x * pow(x, y) * pow(x, z)
-# #        e = pow(x, y) / pow(x, z)
-#         e = pow(x, 2.0) * pow(x, y) / pow(x, 7.0) # <-- fucked
-# #        e = pow(x - x, y)
-# #        e = pow(x, 2.0 + y - 7.0)
-# #        e = pow(x, 2.0) * pow(x, y) / pow(x, 7.0) / pow(x, z)
-# #        e = pow(x, 2.0 + y - 7.0 - z)
-# #        e = x ** y / x ** y
-# #        e = x ** y / x ** (y - 1.0)
-# #        e = exp(x) * a * exp(y) / exp(z)
-#         g = Env([x, y, z, a, b, c, d], [e])
-#         g.extend(gof.PrintListener(g))
-#         print g, g.orphans
-#         mulfn = lambda *inputs: reduce(lambda x, y: x * y, (1,) + inputs)
-#         divfn = lambda x, y: x / y
-#         invfn = lambda x: 1 / x
-#         Canonizer(mul, div, inv, mulfn, divfn, invfn, group_powers).optimize(g)
-#         print g, g.orphans
-#         addfn = lambda *inputs: reduce(lambda x, y: x + y, (0,) + inputs)
-#         subfn = lambda x, y: x - y
-#         negfn = lambda x: -x
-#         Canonizer(add, sub, neg, addfn, subfn, negfn).optimize(g)
-#         print g, g.orphans
-#         pow2one_float.optimize(g)
-#         pow2x_float.optimize(g)
-#         print g, g.orphans
-# class _test_cliques(unittest.TestCase):
-#     def test_straightforward(self):
-#         x, y, z = inputs()
-#         m = y * z
-#         d = tensor.dot(x, m)
-#         d.name = 'd'
-#         e = x + y + d
-#         g = Env([x, y, z], [e])
-#         cliques = find_cliques(g)
-#         self.failUnless(len(cliques) == 2)
-#         (i1, o1), (i2, o2) = cliques
-#         self.failUnless(str(Env(i1, o1)) == "[Broadcast{Add}(Broadcast{Add}(x, y), d)]")
-#         self.failUnless(str(Env(i2, o2)) == "[Broadcast{Mul}(y, z)]")
-# #         print g
-# #         for i, o in find_cliques(g):
-# #             print "-->", Env(i, [o])
-#     def test_broadcasting(self):
-#         x, y, z = inputs([0]*1, [0]*2, [0]*3)
-#         e = x + y + z
-#         g = Env([x, y, z], [e])
-#         lift_dimshuffle.optimize(g)
-#         self.failUnless(len(find_cliques(g, through_broadcast = True)) == 1)
-#         self.failUnless(len(find_cliques(g, through_broadcast = False)) == 2)
-# #         print g
-# #         for i, o in find_cliques(g, True):
-# #             print "-->", Env(i, [o])
-# # class _test_clique_opt(unittest.TestCase):
-# #     def test_straightforward(self):
-# #         x, y, z = inputs()
-# #         e = x ** 2.0 #x * x
-# #         g = Env([x], [e])
-# #         gof.ConstantFinder().optimize(g)
-# #         opt = CliqueOptimizer(through_broadcast = False,
-# #                               scalar_optimizer = scalar_opt.opt2,
-# #                               make_composite = False)
-# #         print g
-# #         opt.optimize(g)
-# #         print g
-# #     def test_inplace(self):
-# #         x, y, z = inputs()
-# #         #e = tensor._add_inplace(x, y + z)
-# #         e = x + tensor._add_inplace(y, z)
-# #         g = Env([x, y, z], [e])
-# #         opt = CliqueOptimizer(through_broadcast = False,
-# #                               scalar_optimizer = None,
-# #                               make_composite = True)
-# #         print g
-# #         opt.optimize(g)
-# #         print g
-# # #        print g.outputs[0].owner.c_code(['x', 'y', 'z'], ['e'], dict(fail = "FAIL;", id = 0))
-# #         print gof.OpWiseCLinker(g).make_function()(numpy.ones((5, 5)), numpy.ones((5, 5)), numpy.ones((5, 5)))
-# #     def test_straightforward(self):
-# #         x, y, z = inputs()
-# #         e = x + y + z
-# #         g = Env([x, y, z], [e])
-# #         opt = CliqueOptimizer(through_broadcast = False,
-# #                               scalar_optimizer = None,
-# #                               make_composite = True)
-# #         print g
-# #         opt.optimize(g)
-# #         print g
-# # #        print g.outputs[0].owner.c_code(['x', 'y', 'z'], ['e'], dict(fail = "FAIL;", id = 0))
-# #         print gof.OpWiseCLinker(g).make_function()(numpy.ones((5, 5)), numpy.ones((5, 5)), numpy.ones((5, 5)))
-# #     def test_straightforward2(self):
-# #         x, y, z = inputs()
-# #         m = y * z
-# #         d = tensor.dot(x, m)
-# #         d.name = 'd'
-# #         e = x + y + d
-# #         g = Env([x, y, z], [e])
-# #         opt = CliqueOptimizer(through_broadcast = False,
-# #                               scalar_optimizer = None,
-# #                               make_composite = True)
-# #         print g
-# #         opt.optimize(g)
-# #         print g
-# # #        print g.outputs[0].owner.c_code(['x', 'y', 'z'], ['e'], dict(fail = "FAIL;", id = 0))
-# #         print gof.OpWiseCLinker(g).make_function()(numpy.ones((5, 5)), numpy.ones((5, 5)), numpy.ones((5, 5)))
 def test_const_type_in_mul_canonizer():
    input = dmatrix()
    w = dmatrix()
@@ -1136,7 +904,38 @@ class test_fusion(unittest.TestCase):
 #            cases[id]=None #to remove g, that link to out that link to the ndarray!
            #g.owner.inputs[0] is out... make owner a weakref?
+def test_log1p():
+    # check some basic cases
+    x = dvector()
+    f = function([x], T.log(1+(x)), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.log1p]
+    f = (function([x], T.log(1+(-x))), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.neg, inplace.log1p_inplace]
+    f = (function([x], -T.log(1+(-x))), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.neg, inplace.log1p_inplace, inplace.neg_inplace]
+    # check trickier cases (and use different dtype)
+    y = fmatrix()
+    f = (function([x,y], T.log(fill(y,1)+(x))), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.DimShuffle([False], ['x', 0], True), T.log1p, T.fill]
+    f = (function([x,y], T.log(0+(x) + fill(y,1.0) )), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.DimShuffle([False], ['x', 0], True), T.log1p, T.fill]
+    f = (function([x,y], T.log(2+(x) - fill(y,1.0) )), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.DimShuffle([False], ['x', 0], True), T.log1p, T.fill]
+    f([1e-7, 10], [[0, 0], [0, 0]]) #debugmode will verify values 
+    # should work for complex
+    z = zmatrix()
+    f = function([z], T.log(1+(z)), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.log1p]
+    # should work for int
+    z = imatrix()
+    f = function([z], T.log(1+(z)), mode='FAST_RUN')
+    assert [node.op for node in f.maker.env.toposort()] == [T.log1p]
 if __name__ == '__main__':
 #    unittest.main()

--- a/theano/tensor/tests/test_randomstreams.py
+++ b/theano/tensor/tests/test_randomstreams.py
@@ -109,12 +109,18 @@ class T_RandomStreams(unittest.TestCase):
        out = m.random.uniform((2,2))
        m.fn = Method([], out)
        made = m.make()
-        made.random.initialize(seed=789)
+        #as a distraction, install various seeds
+        made.random.initialize(seed=789)
        made.random.seed(888)
-        rng = numpy.random.RandomState(823874)
+        # then replace the rng of the stream we care about via setitem
-        made.random[out.rng] = numpy.random.RandomState(823874)
+        realseed = 823874
+        rng = numpy.random.RandomState(realseed)
+        made.random[out.rng] = numpy.random.RandomState(realseed)
+        print made.fn()
+        print rng.uniform(size=(2,2))
        fn_val0 = made.fn()
        fn_val1 = made.fn()
@@ -153,7 +159,7 @@ class T_RandomStreams(unittest.TestCase):
        # ndim specified, consistent with shape, OK
        m2 = Module()
        m2.random = RandomStreams(234)
-        m2.fn = Method([], m2.random.uniform(2, (2,2)))
+        m2.fn = Method([], m2.random.uniform((2,2), ndim=2))
        made2 = m2.make()
        made2.random.initialize()
@@ -164,7 +170,7 @@ class T_RandomStreams(unittest.TestCase):
        # ndim specified, inconsistent with shape, should raise ValueError
        m3 = Module()
        m3.random = RandomStreams(234)
-        m3.fn = Method([], m3.random.uniform(1, (2,2)))
+        m3.fn = Method([], m3.random.uniform((2,2), ndim=1))
        made3 = m3.make()
        made3.random.initialize()
        self.assertRaises(ValueError, made3.fn)

--- a/theano/tensor/tests/test_raw_random.py
+++ b/theano/tensor/tests/test_raw_random.py
@@ -5,6 +5,7 @@ import numpy as N
 from theano.tests import unittest_tools
 from theano.tensor.raw_random import *
+from theano.tensor import raw_random 
 from theano import tensor
@@ -12,7 +13,7 @@ from theano import compile, gof
 class T_random_function(unittest.TestCase):
    def test_basic_usage(self):
-        rf = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector, -2.0, 2.0)
+        rf = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector)
        assert not rf.inplace
        assert getattr(rf, 'destroy_map', {}) == {}
@@ -32,23 +33,21 @@ class T_random_function(unittest.TestCase):
        assert numpy.all(f_0 == f_1)
    def test_inplace_norun(self):
-        rf = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector, -2.0, 2.0,
+        rf = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector, inplace=True)
-                inplace=True)
        assert rf.inplace
        assert getattr(rf, 'destroy_map', {}) != {}
    def test_args(self):
        """Test that arguments to RandomFunction are honored"""
-        rf2 = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector, -2.0, 2.0)
+        rf2 = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector)
-        rf4 = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector, -4.0, 4.0,
+        rf4 = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector, inplace=True)
-                inplace=True)
        rng_R = random_state_type()
        # use make_node to override some of the self.args
-        post_r2, out2 = rf2(rng_R, (4,))
+        post_r2, out2 = rf2(rng_R, (4,), -2, 2)
-        post_r2_4, out2_4 = rf2(rng_R, (4,), -4.0)
+        post_r2_4, out2_4 = rf2(rng_R, (4,), -4.0, 2)
        post_r2_4_4, out2_4_4 = rf2(rng_R, (4,), -4.0, 4.0)
-        post_r4, out4 = rf4(rng_R, (4,))
+        post_r4, out4 = rf4(rng_R, (4,), -4, 4)
        f = compile.function(
                [compile.In(rng_R, value=numpy.random.RandomState(55), update=post_r4, mutable=True)], 
@@ -65,7 +64,7 @@ class T_random_function(unittest.TestCase):
    def test_inplace_optimization(self):
        """Test that FAST_RUN includes the random_make_inplace optimization"""
        #inplace = False
-        rf2 = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector, -2.0, 2.0)
+        rf2 = RandomFunction(numpy.random.RandomState.uniform, tensor.dvector)
        rng_R = random_state_type()
        # use make_node to override some of the self.args
@@ -92,19 +91,18 @@ class T_random_function(unittest.TestCase):
    def test_random_function_ndim(self):
        """Test that random_function helper function accepts ndim as first argument"""
-        rf2 = random_function(numpy.random.RandomState.uniform, 'float64', -2.0, 2.0)
        rng_R = random_state_type()
        # ndim is an optional argument indicating the length of the 'shape'
        # ndim not specified, OK
-        post_out4,      out4    =   rf2(rng_R, (4,))
+        post_out4,      out4    =   uniform(rng_R, (4,))
        # ndim specified, consistent with shape, OK
-        post_out1_4,    out1_4  =   rf2(rng_R, 1, (4,))
+        post_out1_4,    out1_4  =   uniform(rng_R, (4,), ndim=1)
-        post_out2_4_4,  out2_4_4=   rf2(rng_R, 2, (4, 4))
+        post_out2_4_4,  out2_4_4=   uniform(rng_R, (4, 4), ndim=2)
        # ndim specified, but not compatible with shape
-        post_out2_4,    out2_4  =   rf2(rng_R, 2, (4,))
+        post_out2_4,    out2_4  =   uniform(rng_R, (4,), ndim=2)
        f_ok = compile.function(
                [compile.In(rng_R, value=numpy.random.RandomState(55), update=post_out2_4_4, mutable=True)],
@@ -132,18 +130,31 @@ class T_random_function(unittest.TestCase):
        # Specifying a different ndim_added will change the Op's output ndim,
        # so numpy.uniform will produce a result of incorrect shape,
        # and a ValueError should be raised.
+        def ndim_added_deco(ndim_added):
-        uni_1 = random_function(numpy.random.RandomState.uniform, 'float64', -2.0, 2.0, ndim_added=1)
+            def randomfunction(random_state, size=(), low=0.0, high=0.0, ndim=None):
-        uni_0 = random_function(numpy.random.RandomState.uniform, 'float64', -2.0, 2.0, ndim_added=0)
+                ndim, size = raw_random._infer_ndim(ndim, size)
-        uni_m1 = random_function(numpy.random.RandomState.uniform, 'float64', -2.0, 2.0, ndim_added=-1)
+                op = RandomFunction('uniform', 
+                        tensor.TensorType(dtype = 'float64', broadcastable =
+                            (False,)*(ndim+ndim_added)),
+                        ndim_added=ndim_added)
+                return op(random_state, size, low, high)
+            return randomfunction
+        uni_1 = ndim_added_deco(1)
+        uni_0 = ndim_added_deco(0)
+        uni_m1 = ndim_added_deco(-1)
+        #uni_1 = random_function(numpy.random.RandomState.uniform, 'float64', -2.0, 2.0, ndim_added=1)
+        #uni_0 = random_function(numpy.random.RandomState.uniform, 'float64', -2.0, 2.0, ndim_added=0)
+        #uni_m1 = random_function(numpy.random.RandomState.uniform, 'float64', -2.0, 2.0, ndim_added=-1)
        rng_R = random_state_type()
-        p_uni11, uni11 = uni_1(rng_R, 1, (4,))
+        p_uni11, uni11 = uni_1(rng_R, size=(4,))
-        p_uni12, uni12 = uni_1(rng_R, 2, (3,4))
+        p_uni12, uni12 = uni_1(rng_R, size=(3,4))
-        p_uni01, uni01 = uni_0(rng_R, 1, (4,))
+        p_uni01, uni01 = uni_0(rng_R, size=(4,))
-        p_uni02, uni02 = uni_0(rng_R, 2, (3,4))
+        p_uni02, uni02 = uni_0(rng_R, size=(3,4))
-        p_unim11, unim11 = uni_m1(rng_R, 1, (4,))
+        p_unim11, unim11 = uni_m1(rng_R, size=(4,))
-        p_unim12, unim12 = uni_m1(rng_R, 2, (3,4))
+        p_unim12, unim12 = uni_m1(rng_R, size=(3,4))
        self.assertEqual(uni11.ndim, 2)
        self.assertEqual(uni12.ndim, 3)
@@ -320,7 +331,8 @@ class T_random_function(unittest.TestCase):
    def test_permutation(self):
        """Test that raw_random.permutation generates the same results as numpy."""
        rng_R = random_state_type()
-        post_r, out = permutation(rng_R, (9,), 6)
+        post_r, out = permutation(rng_R, size=(9,), n=6)
+        print 'OUT NDIM', out.ndim
        f = compile.function(
                [compile.In(rng_R, value=numpy.random.RandomState(55), update=post_r, mutable=True)],
                [out], accept_inplace=True)
@@ -365,6 +377,24 @@ class T_random_function(unittest.TestCase):
        self.assertTrue(val0.shape == (7,3,5))
        self.assertTrue(val1.shape == (7,3,5))
+    def test_symbolic_shape(self):
+        rng_R = random_state_type()
+        shape = tensor.lvector()
+        post_r, out = uniform(rng_R, shape, ndim=2)
+        f = compile.function([rng_R, shape], out)
+        rng_state0 = numpy.random.RandomState(55)
+        assert f(rng_state0, [2,3]).shape == (2,3)
+        assert f(rng_state0, [4,8]).shape == (4,8)
+        self.assertRaises(ValueError, f, rng_state0, [4])
+        self.assertRaises(ValueError, f, rng_state0, [4,3,4,5])
 if __name__ == '__main__':
    from theano.tests import main
    main("test_raw_random")
--- a/theano/tensor/tests/test_shared_randomstreams.py
+++ b/theano/tensor/tests/test_shared_randomstreams.py
@@ -11,8 +11,9 @@ from theano import function
 from theano import tensor
 from theano import compile, gof
+from theano.tests import unittest_tools
-class T_RandomStreams(unittest.TestCase):
+class T_SharedRandomStreams(unittest.TestCase):
    def test_tutorial(self):
        srng = RandomStreams(seed=234)
@@ -109,6 +110,96 @@ class T_RandomStreams(unittest.TestCase):
        assert numpy.all(fn_val0 == numpy_val0)
        assert numpy.all(fn_val1 == numpy_val1)
+    def test_permutation(self):
+        """Test that RandomStreams.uniform generates the same results as numpy"""
+        # Check over two calls to see if the random state is correctly updated.
+        random = RandomStreams(234)
+        fn = function([], random.permutation((20,), 10), updates=random.updates())
+        fn_val0 = fn()
+        fn_val1 = fn()
+        rng_seed = numpy.random.RandomState(234).randint(2**30)
+        rng = numpy.random.RandomState(int(rng_seed)) #int() is for 32bit
+        # rng.permutation outputs one vector at a time, so we iterate.
+        numpy_val0 = numpy.asarray([rng.permutation(10) for i in range(20)])
+        numpy_val1 = numpy.asarray([rng.permutation(10) for i in range(20)])
+        assert numpy.all(fn_val0 == numpy_val0)
+        assert numpy.all(fn_val1 == numpy_val1)
+    def test_multinomial(self):
+        """Test that RandomStreams.multinomial generates the same results as numpy"""
+        # Check over two calls to see if the random state is correctly updated.
+        random = RandomStreams(234)
+        fn = function([], random.multinomial((4,4), 1, [0.1]*10), updates=random.updates())
+        fn_val0 = fn()
+        fn_val1 = fn()
+        rng_seed = numpy.random.RandomState(234).randint(2**30)
+        rng = numpy.random.RandomState(int(rng_seed)) #int() is for 32bit
+        numpy_val0 = rng.multinomial(1, [0.1]*10, size=(4,4))
+        numpy_val1 = rng.multinomial(1, [0.1]*10, size=(4,4))
+        assert numpy.all(fn_val0 == numpy_val0)
+        assert numpy.all(fn_val1 == numpy_val1)
+    def test_shuffle_row_elements(self):
+        """Test that RandomStreams.shuffle_row_elements generates the right results"""
+        # Check over two calls to see if the random state is correctly updated.
+        # On matrices, for each row, the elements of that row should be shuffled.
+        # Note that this differs from numpy.random.shuffle, where all the elements
+        # of the matrix are shuffled.
+        random = RandomStreams(234)
+        m_input = tensor.dmatrix()
+        f = function([m_input], random.shuffle_row_elements(m_input), updates=random.updates())
+        val_rng = numpy.random.RandomState(unittest_tools.fetch_seed())
+        in_mval = val_rng.uniform(-2, 2, size=(20,5))
+        fn_mval0 = f(in_mval)
+        fn_mval1 = f(in_mval)
+        print in_mval[0]
+        print fn_mval0[0]
+        print fn_mval1[0]
+        assert not numpy.all(in_mval == fn_mval0)
+        assert not numpy.all(in_mval == fn_mval1)
+        assert not numpy.all(fn_mval0 == fn_mval1)
+        rng_seed = numpy.random.RandomState(234).randint(2**30)
+        rng = numpy.random.RandomState(int(rng_seed))
+        numpy_mval0 = in_mval.copy()
+        numpy_mval1 = in_mval.copy()
+        for row in numpy_mval0:
+            rng.shuffle(row)
+        for row in numpy_mval1:
+            rng.shuffle(row)
+        assert numpy.all(numpy_mval0 == fn_mval0)
+        assert numpy.all(numpy_mval1 == fn_mval1)
+        # On vectors, the behaviour is the same as numpy.random.shuffle,
+        # except that it does not work in place, but returns a shuffled vector.
+        random1 = RandomStreams(234)
+        v_input = tensor.dvector()
+        f1 = function([v_input], random1.shuffle_row_elements(v_input))
+        in_vval = val_rng.uniform(-3, 3, size=(12,))
+        fn_vval = f1(in_vval)
+        numpy_vval = in_vval.copy()
+        vrng = numpy.random.RandomState(int(rng_seed))
+        vrng.shuffle(numpy_vval)
+        print in_vval
+        print fn_vval
+        print numpy_vval
+        assert numpy.all(numpy_vval == fn_vval)
+        # Trying to shuffle a vector with function that should shuffle
+        # matrices, or vice versa, raises a TypeError
+        self.assertRaises(TypeError, f1, in_mval)
+        self.assertRaises(TypeError, f, in_vval)
 if __name__ == '__main__':
    from theano.tests import main