merge

2a37e894 · James Bergstra · 337609b2 · f18853a1 · 2a37e894 · 2a37e894
--- a/NEWS.txt
+++ b/NEWS.txt
 Modification in the trunk since the last release
 ------------------------------------------------
+THIS IS A PARITIAL LIST. WE SHOULD CHECK ALL COMMIT SINCE LAST RELEASE AND ADD WHAT IS MISSING.

- * Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
+
+bugfix:
+ * The random number generator in theano/sandbox/rng_mrg.py did not always return the same sequence of number on the cpu and gpu.
+    * In that case their was garbadge in the value return, but that garbage looked random. So if you usage did not depend too much
+      on the type of random, you can be ok.
+ * Memory leak on the gpu when doing x+=y and that x and y are cuda_ndarray.
+    * The leak was introduced the 3 December 2010.
+    * This was being used when inc_subtensor is called and moved to the GPU(as an GpuIncSubtensor op)
+ * In python mode(not the default mode) when input of elemwise operation was an empty ndarray. We where not returning an empty ndarray.
+ * Fix some segfault at exit with gpu code.
+ * Add a feature to don't have an exception that make theano crash when taking the gradient on DimShuffle in some particular case.
+ * Fix compilation crash for gpuElemwise with tensor with very high number of dimensions.
+ * Disabled c code generator that make gcc crash on complex type.
+
+optimization:
 * Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
 * Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
+ * remove join of only 1 element
+ * fix ticket #596:cpu join of only 1 element that was not moved to the gpu.
+ * During optimization consider one more case in get_constant_value
+ * new SpecifyShape op that allow to pass more shape info in the graph.
+
+gpu:
+ * cuda_shared.value = X now work inplace!
+     * cuda_shared_var.set_value(new_ndarray) will overrite the old value inplace in the most common case.
+ * allow to create a CudaNdarraySharedVariable from a CudaNdarray.
+
+other:
+ * compiledir now include the python version to make it easier for people with many python version
+ * tensor.prod now implement the gradient
+ * DebugMode now warn if an op declared itself as returning a view of the input but did not do so.
+    * This can block other op from being inplace on the same inputs. Could lower the reuse of memory.
+ * added theano.tensor.std as a short cut to sqrt(var(input=input, axis=axis)).
+ * Sparse.structured_dot now work with both matrice are sparse
+ * Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
+ * new init_gpu_device theano flags.
+
+doc:
+ * Documented lib.amdlibm config variable.
+ * A new page(was done for 0.3 but error hidded it on the web page) on the memory aliasing contract of Theano.
+
+TODO before new version:
+ * shared.value is deprecated, use shared.get_value or shared_set_value!
+ * doc in the installation instruction about the new init_gpu_device

 Theano 0.3 (2010-11-23)
 -----------------------

--- a/theano/compile/debugmode.py
+++ b/theano/compile/debugmode.py
@@ -532,20 +532,16 @@ def _check_inputs(node, storage_map, r_vals, dr_vals, active_nodes, clobber_dr_v
        for oo,ii in vmap.iteritems():
            out_var = storage_map[node.outputs[oo]][0]
            in_var = storage_map[node.inputs[ii[0]]][0]
-            # We don't try to optimize simple scalar, as this is not worth our time
-            # This happen at least in Subtensor when the output is a scalar
-            # But this depend on the version of numpy!
-            if getattr(out_var,'size',2)==1:
+            # We don't try to optimize simple scalar and empty ndarray,
+            # as this is not worth our time. This happen at least in
+            # Subtensor when the output is a scalar But this depend on
+            # the version of numpy!
+            if getattr(out_var,'size',2)<=1:
                continue
            if isinstance(node.op, theano.compile.mode.OutputGuard):
                # This class is not in the final graph.
                continue
            if not _may_share_memory(out_var, in_var):
-                #when a subtensor return a tensor of ndim==0, numpy seam to return a copy.
-                #when have an empty ndarray(happen with output guard) it is not the same. why?
-
-                if hasattr(out_var,'ndim') and (out_var.ndim>0 and out_var.size>0):
-                    continue
                opt_warning("input idx %d marked as viewed but new memory allocated by node '%s'"%(ii[0],str(node)))

    for r_idx, r in enumerate(node.inputs):

--- a/theano/sandbox/test_rng_mrg.py
+++ b/theano/sandbox/test_rng_mrg.py
@@ -496,7 +496,7 @@ def test_normal0():
            basictest(f, steps, const_size, target_avg=-5.0, target_std=2.0, prefix='gpu mrg ', allow_01=True, inputs=input, mean_rtol=rtol)
            # Need to allow some rounding error as their is float
            # computation that are done on the gpu vs cpu
-            numpy.allclose(out, gpu_out, rtol=5e-6, atol=1e-6)
+            assert numpy.allclose(out, gpu_out, rtol=5e-6, atol=1e-6)


        print ''