提交 2a37e894 authored 作者: James Bergstra's avatar James Bergstra

merge

Modification in the trunk since the last release
------------------------------------------------
THIS IS A PARITIAL LIST. WE SHOULD CHECK ALL COMMIT SINCE LAST RELEASE AND ADD WHAT IS MISSING.
* Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
bugfix:
* The random number generator in theano/sandbox/rng_mrg.py did not always return the same sequence of number on the cpu and gpu.
* In that case their was garbadge in the value return, but that garbage looked random. So if you usage did not depend too much
on the type of random, you can be ok.
* Memory leak on the gpu when doing x+=y and that x and y are cuda_ndarray.
* The leak was introduced the 3 December 2010.
* This was being used when inc_subtensor is called and moved to the GPU(as an GpuIncSubtensor op)
* In python mode(not the default mode) when input of elemwise operation was an empty ndarray. We where not returning an empty ndarray.
* Fix some segfault at exit with gpu code.
* Add a feature to don't have an exception that make theano crash when taking the gradient on DimShuffle in some particular case.
* Fix compilation crash for gpuElemwise with tensor with very high number of dimensions.
* Disabled c code generator that make gcc crash on complex type.
optimization:
* Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
* Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
* remove join of only 1 element
* fix ticket #596:cpu join of only 1 element that was not moved to the gpu.
* During optimization consider one more case in get_constant_value
* new SpecifyShape op that allow to pass more shape info in the graph.
gpu:
* cuda_shared.value = X now work inplace!
* cuda_shared_var.set_value(new_ndarray) will overrite the old value inplace in the most common case.
* allow to create a CudaNdarraySharedVariable from a CudaNdarray.
other:
* compiledir now include the python version to make it easier for people with many python version
* tensor.prod now implement the gradient
* DebugMode now warn if an op declared itself as returning a view of the input but did not do so.
* This can block other op from being inplace on the same inputs. Could lower the reuse of memory.
* added theano.tensor.std as a short cut to sqrt(var(input=input, axis=axis)).
* Sparse.structured_dot now work with both matrice are sparse
* Sparse type is now supported by the shape op and the ShapeFeature optimizer work correctly with them.
* new init_gpu_device theano flags.
doc:
* Documented lib.amdlibm config variable.
* A new page(was done for 0.3 but error hidded it on the web page) on the memory aliasing contract of Theano.
TODO before new version:
* shared.value is deprecated, use shared.get_value or shared_set_value!
* doc in the installation instruction about the new init_gpu_device
Theano 0.3 (2010-11-23)
-----------------------
......
......@@ -532,20 +532,16 @@ def _check_inputs(node, storage_map, r_vals, dr_vals, active_nodes, clobber_dr_v
for oo,ii in vmap.iteritems():
out_var = storage_map[node.outputs[oo]][0]
in_var = storage_map[node.inputs[ii[0]]][0]
# We don't try to optimize simple scalar, as this is not worth our time
# This happen at least in Subtensor when the output is a scalar
# But this depend on the version of numpy!
if getattr(out_var,'size',2)==1:
# We don't try to optimize simple scalar and empty ndarray,
# as this is not worth our time. This happen at least in
# Subtensor when the output is a scalar But this depend on
# the version of numpy!
if getattr(out_var,'size',2)<=1:
continue
if isinstance(node.op, theano.compile.mode.OutputGuard):
# This class is not in the final graph.
continue
if not _may_share_memory(out_var, in_var):
#when a subtensor return a tensor of ndim==0, numpy seam to return a copy.
#when have an empty ndarray(happen with output guard) it is not the same. why?
if hasattr(out_var,'ndim') and (out_var.ndim>0 and out_var.size>0):
continue
opt_warning("input idx %d marked as viewed but new memory allocated by node '%s'"%(ii[0],str(node)))
for r_idx, r in enumerate(node.inputs):
......
......@@ -496,7 +496,7 @@ def test_normal0():
basictest(f, steps, const_size, target_avg=-5.0, target_std=2.0, prefix='gpu mrg ', allow_01=True, inputs=input, mean_rtol=rtol)
# Need to allow some rounding error as their is float
# computation that are done on the gpu vs cpu
numpy.allclose(out, gpu_out, rtol=5e-6, atol=1e-6)
assert numpy.allclose(out, gpu_out, rtol=5e-6, atol=1e-6)
print ''
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论