* https://github.com/Theano/Theano/pull/1591 # need info
Theano Development version
==========================
NEWS.txt:
We recommand everybody to update to this version.
Highlights:
* Python 3.3 compatibility with buildbot.
* Python 3.3 compatibility with buildbot test for it.
* Full advanced indexing support.
* Better Windows 64 bit support.
* New profiler.
* Better error messages that help debugging.
* Better support of newer NumPy version (remove useless warning/crash).
* Faster optimization/compilation for big graph.
* Move in Theano the Conv3d2d implementation.
* Better SymPy/Theano bridge: Make an Theano op from SymPy expression and use SymPy c code generator.
* Bug fixes.
Committers for this rc4 only:
Frederic Bastien
Pascal Lamblin
Arnaud Bergeron
abalkin
Olivier Delalleau
John Salvatier
Razvan Pascanu
Jeremiah Lowin
Ludwig Schmidt-Hackenberg
Vivek Kulkarni
Matthew Rocklin
Gabe Schwartz
James Bergstra
Sigurd Spieckermann
Bogdan Budescu
Mehdi Mirza
Nicolas Bouchard
Ethan Buchman
Guillaume Desjardins
Ian Goodfellow
Jason Yosinski
Sina Honari
Ben McCann
David Warde-Farley
Ilya Dyachenko
Jan Schlüter
Micky Latowicki
Yaroslav Halchenko
Alexander Belopolsky
Hannes Schulz
Huy Nguyen
Robert Kern
Sebastian Berg
Vincent Dumoulin
Wei Li
XterNalz
Installation:
* Canopy support (direct link to MKL):
...
...
@@ -25,22 +79,33 @@ Installation:
* Anaconda instructions (Pascal L., Frederic B.)
* Doc Ubuntu 13.04 (Frederic B.)
Committers for this rc3 only:
* Better support of newer NumPy version(remove useless warning/crash) (Frederic B., Huy Nguyen)
Bug fixes:
* Fix wrong result of GpuDownsampleFactorMaxGrad on Mac OSX. (Pascal L.)
* Auto-Detect and work around a bug in BLAS on MacOS X (Pascal L.)
* Work around bug in MacOS X. If 2 compiled modules had the same name, the OS or Python was not always the right one even when we used the right handle to it. (Pascal L.)
* Scan: if a scan node was cloned (by theano.clone) with different inputs, and if both the initial and the cloned nodes are used in the function being compiled, the value of the outputs of one would be replaced with the outputs of the other one. (Pascal L.)
* Sparse: Disable the optimization that introduce the CSMGradC op as it don't work correctly with unsorted indices. (Frederic B.)
* Mac: Fix wrong result of GpuDownsampleFactorMaxGrad on Mac OSX. (Pascal L.)
* Mac: Auto-Detect and work around a bug in BLAS on MacOS X (Pascal L.)
* Mac: Work around bug in MacOS X. If 2 compiled modules had the same name, the OS or Python was not always the right one even when we used the right handle to it. (Pascal L.)
Use this hash in the Python module, and in %(nodename)s, so that different helper functions in the support code for different Ops will always have different names.
* Fix infinite loop related to Scan on the GPU. (Pascal L.)
* Fix ConstructSparseFromList.infer_shape (Pascal L., reported by Rami Al-Rfou')
* (introduced in the development version after 0.6rc3 release) (Frederic B.)
Reduction that upcasts the input on no axis (ex: call theano.sum() on a scalar when the original dtype isn't float64 or [u]int64). It produced bad results as we don't upcast the inputs in the code, we just copy them.
Reduction that upcasts the input on no axis (ex: call theano.sum() on a scalar when the original dtype isn't float64 or
[u]int64). It produced bad results as we did not upcasted the inputs in the code, we just copy them.
* Fix some cases of theano.clone() when we get a replacement of x that is a function of x. (Razvan P., reported by Akio Takano)
* Fix grad of Alloc when we unbroadcast the value value and it isn't a scalar. (Frederic B., reported Ian G.)
* I some cases (I think most cases), there was an exception raised in the theano.tensor.grad() method.
But in theory, there could be bad shapes produced in the unbroadcasted dimensions.
New Features:
* Python 3.3 compatible (abalkin, Gabe Schwartz, Frederic B.)
* compilation work on ARM processor (Raspberry Pi, Vincent Dumoulin)
* Add numpy.random.choice wrapper to our random number generator (Sigurd Spieckermann)
* Better SymPy/Theano bridge: Make an Theano op from SymPy expression and use SymPy c code generator (Matthew Rocklin)
* Move in Theano the Conv3d2d implementation (James Bergstra, Frederic B., Pascal L.)
* First version of the new GPU back-end available (Arnaud Bergeron, Frederic B.)
* Not all Ops have been converted to this new back-end.
To use, use Theano flag device=cudaN or device=openclN, where N is a integer.
* Python 3.3 compatible (abalkin, Gabe Schwartz, Frederic B., Pascal L.)
* A new profiler (Frederic B.)
The new profiler now can profile the memory with the Theano flag profile_memory=True.
The ProfileMode now can't profile memory anymore and prints a message about it.
...
...
@@ -88,6 +153,10 @@ New Features:
* Make GpuCrossentropySoftmaxArgmax1HotWithBias and GpuCrossentropySoftmax1HotWithBiasDx work for bigger inputs (Frederic B., reported by Ryan Price)
* Finish and move out of sandbox theano.sparse.basic.true_dot (Nicolas Bouchard, Frederic B.)
And document all sparse dot variants.
* Implement the mode ignore_borders for GpuImages2Neibs (Frederic B.)
* Make many reduction algo accept a scalar numpy.ndarray as axis (Jeremiah Lowin)
* Allow numpy.asarray(cuda_ndarray, dtype=...) (Frederic B.)
* theano-cache cleanup now remove cached module old version of code. (Frederic B.)
Interface Deprecation (a warning is printed):
...
...
@@ -97,20 +166,38 @@ Interface Deprecation (a warning is printed):
Deprecate the old interface for this. (Frederic B.)
Interface Changes:
* Interface change subtensor and take are not in tensor.basic anymore. They where available from tensor.* and are still avail from there. (Frederic B., Matthew Rocklin)
* This lower the basic.py size to 191k, so under 200k for github search.
* Add -m32 or -m64 in the module cache key and add the python bitwidth in the compiledir path. (Pascal L.)
* mrg.normal now has the parameter size mandatory. It was crashing with the default value of None. (Olivier D.)
* Remove the deprecated passing of multiple modes to theano function. (Frederic B.)
* Change FunctionGraph Features interface of the {on_prune(),on_import()} call back to take a reason. (Frederic B.)
* FunctionGraph now clone the input graph by default. (Frederic B.)
* A parameter allow to don't do this clone.
* This was needed to speed up compilation
New Interface (reuses existing functionality):
* Add hostname as a var in compiledir_format (Frederic B.)
* Add a new Theano flag: compute_test_value_opt. It take the same value as compute_test_value. It enable compute_test_value during Theano optimization. Only useful to debug Theano optimization. Also small changes to some optimization to work correctly in that setup. (Frederic B.)
* Add the value pdb to the Theano flag: compute_test_value and compute_test_value_opt. (Frederic B.)
* Add the Theano flag: optimizer_verbose. Default False. When True, we print all the optimization being applied.(Frederic B.)
* Add Op.c_init_code() to allow running the code when the c cmodule is imported (Pascal L.)
* Allow theano.tensor.ones(3) to support scalar and not just list of scalar as numpy.ones (Jeremiah Lowin)
* Make the memory profiler print the FLOPS used for the ops that know how to compute it. (Frederic B.)
New debug features:
Speed-ups:
* Optimizer speed up. (Frederic B.)
* Fix warning/not detection on newer llvm version on Mac. (Pascal L., reported by Jeremiah Lowin and Chris Fonnesbeck)
* Allow pickling of more Op to allow reusing the compiled code (Pascal L., Frederic B.)
* Optimize more cases of dot22 and scalar when we can't make a gemm (Pascal L., Frederic B.)
* Speed up GpuJoin with c code (Ludwig Schmidt-Hackenberg, Frederic B.)
* Faster GpuAdvancedIncSubtensor1 on Fermi GPU (and up) on matrix. (Vivek Kulkarni)
* Faster GPUAdvancedIncSubtensor1 in some cases on all GPU (Vivek Kulkarni)
* Implemented c_code for AdvancedSubtensor1 (abalkin)
* Add the equivalent of -march=native to g++ command line. (Frederic B., Pascal L.)
* Speed up compilation with Scan (Jan Schlüter)
* Merge more Scan node together (Pascal L., Yao Li).
* Add MakeVector.c_code (Fred)
* Add Shape.c_code (Fred)
* Optimize Elemwise when all the inputs are fortran (Frederic B.)
...
...
@@ -122,8 +209,22 @@ Speed-ups:
* A fix that removes a local_setsubtensor_of_allocs optimization warning and enables it in that case. (Frederic B., reported by John Salvatier)
* Make inv_as_solve optimization work (Matthew Rocklin)
Crash fixes:
Crash/no return fixes:
* Fix shape crash inserted by Scan optimization. The gradient of some recursive scan was making the PushOutSeqScan optimization insert crash during the execution of a Theano function. (Frédéric B., reported by Hugo Larochelle)
* Fix command not returning with recent mingw64 on Windows (Pascal L., reported by many people)
* Fix infinite loop related to Scan on the GPU. (Pascal L.)
* Fix infinite loop when the compiledir is full. (Frederic B.)
* Fix a shape cycle crash in the optimizer (Pascal L., Frédéric B., reported by Cho KyungHyun)
* Fix MRG normal now accept to generate scalar. (Pascal L.)
* Fix some GPU compilation issue on Mac (John Yani, Frédéric B.)
* Fix crash when building symbolic random variables with a mix of symbolic and numeric scalar in the "size" parameter. (Pascal L., Reported by Wu Zhen Zhou)
* Make some Op.grad() implemention don't return None (Pascal L.)
* Crash fix in the grad of elemwise about an DisconnectedType (Pascal L, reported by Thomas Wiecki)
* Fix local_gpu_multinomial optimization handling of broadcast information. (Frederic B., reported by Caglar)
* Fix crash with change introduced in NumPy 1.7.1 (Pascal L., reported by Thomas Wiecki)
* Compilation failure with complex (Pascal L., reported by autumncat)
* Gpu reduction on all dimensions of a 4d tensor. (Frederic B., reported by Arjun Jain)
* Fix crash for a combination of grad of dot and dimshuffle when only one of the inputs for a corresponding dimensions was knowing that it was broadcastable. (Frederic B., reported by Micky Latowicki)
* AdvancedSubtensor1: allow broadcasted index vector. (Frederic B., reported by Jeremiah Lowin)
* Fix compute_test_value for ifelse (Olivier D., reported by Bitton Tenessi)
* Fix import error with some versions of NumPy (Olivier D.)
...
...
@@ -131,14 +232,15 @@ Crash fixes:
* Fix compute_test_value for a non_sequence when calling the gradient of Scan (Pascal L., reported by Bitton Tenessi).
* Crash fix in Scan following interface change in 0.6rc2 (Razvan P.)
* Crash fix on Scan (Razvan P.)
* Crash fix on Scan (Pascal L., reported by Sina Honari and Sigurd)
* Fix crash in Scan gradient related to compute_test_value (Frederic B., reported by Bitton Tenessi)
* Fix a scan optimization warning/error depending of Theano flags (Frederic B.)
* Fixed crash for unimplemented elemwise gradient (Olivier D., reported by Michael McNeil Forbes)
* Fix crash in the elemwise python code for some big shape with power of 2. (Sina Honari, Pascal L.)
* Fix compile and import errors on Windows (bbudescu)
* Fix compile and import errors on Windows including for the GPU. (Bogdan Budescu)
* DebugMode print more info when there is an error. (Frederic B.)
* Better profiling of test time with `theano-nose --time-profile`. (Frederic B.)
* Detection of infinite loop with global optimizer. (Pascal L.)
* DebugMode.check_preallocated_output now also work on Theano function output. (Pascal L.)
* DebugMode will now complain when the strides of CudaNdarray of dimensions of 1 are not 0. (Frederic B.)
Speed-ups:
* c_code for SpecifyShape op. (Frederic B.)
* cross-entropy optimization now work when specify_shape is used. (Pascal L.)
* The Scan optimization ScanSaveMem and PushOutDot1 applied more frequently. (Razvan P, reported Abalkin)
A skipped optimization warning was printed.
* dot(vector, vector) now faster with some BLAS implementation. (Eric Hunsberger)
OpenBLAS and possibly others didn't call {s,d}dot internally when we called {s,d}gemv.
MKL was doing this.
* Compilation speed up: Take the compiledir lock only for op that generate c_code. (Frederic B)
* More scan optimization (Razvan P.)
* Opt to make RNN fast in Theano.
* Optimize some case of dot, by moving them outside of Scan.
* Move some sequences outside of scan too.
* Merge more scan inputs, mostly byproduct of other Scan optimizations.
* c_code for theano.sparse.AddSD. (Rami Al-Rfou', Vivek Kulkarni)
Crash Fixes:
* Fix crash about dimshuffle. (abalkin)
* Fix crash at compilation. (Olivier D.)
* Fix openmp detection. (Pascal L.)
Resulted in a crash with EPD on Windows.
* Fix for new BLAS interface in SciPy. (Olivier D.)
Fix crash with some development version of SciPy.
* GpuSum work with bigger shape when summing on the first dim on 3d tensor. (Frederic B., reported Chris Currivan)
* Windows compilation crash fix. (Frederic B.)
* Make CrossentropySoftmax1HotWithBiasDx and CrossentropySoftmaxArgmax1HotWithBias support uint* dtype. (Frederic B., reported by Mark Fenner)
* Fix GpuSoftmax and GpuSoftmaxWithBias crash on GTX285. (Frederic B.)
* Fix crash due to a race condition when importing theano. (Ian G.)
* Fix crash from path problem with `theano-nose --batch`. (Abalkin)
* Fix crash with tensor.roll(Var, iscalar). (Frederic B., reported by Jeremiah Lowin)
* Fix compilation crash with llvm on Mac. (Abalkin)
* Fix the grad of Scan that told wrongly that there is no connection between cost and parameters. (Razvan P.)
* The infer shape mechanism now force that broadcasted dimensions have a shape know to be equivalent to one during compilation.
Sometimes, we where not able knowing this before run time and resulted in crash. (Frederic B.)
* Fix compilation problems on GPU on Windows. (Frederic B.)
* Fix copy on the GPU with big shape for 4d tensor (Pascal L.)
* GpuSubtensor didn't set the stride to 0 for dimensions of 1. This could lead to check failing later that caused a crash. (Frederic B., reported by vmichals)
Theoretical bugfix (bug that won't happen with current Theano code, but if you messed with the internal, could have affected you):
* GpuContiguous, GpuAlloc, GpuDownSampleGrad, Conv2d now check the preallocated outputs strides before using it. (Pascal L.)
* GpuDownSample, GpuDownSampleGrad didn't work correctly with negative strides in their output due to problem with nvcc (Pascal L, reported by abalkin?)
Others:
* Fix race condition when determining if g++ is available. (Abalkin)
* Documentation improvements. (Many people including David W-F, abalkin, Amir Elaguizy, Olivier D., Frederic B.)
* The current GPU back-end have a new function CudaNdarray_prep_output(CudaNdarray ** arr, int nd, const int * dims) (Ian G)
=============
Release Notes
=============
Theano 0.6rc2 (November 21th, 2012)
===================================
Highlights:
* Fix for a few regressions introduced in 0.6rc1.
* A few new features.
* Speed-ups.
* Scan fixes.
* Crash fixes.
* A few small interface changes.
Commiters for this rc2 only:
Razvan Pascanu
Pascal Lamblin
Frederic Bastien
Ian Goodfellow
Jeremiah Lowin
Caglar Gulcehre
Jey Kottalam
Matthew Rocklin
abalkin
Regressions in 0.6rc1 fixed:
* Fixed the scan gradient dtype issue. In 0.6rc1, some upcast were inserted. (Razvan P.)
* Now grad() will do as before 0.6rc1 for float, i.e. the grad dtype will be the same as the inputs inside the graph. If you ask for the direct grad, it will return the computed dtype. (Pascal L.)
Wrong results fixes:
* Scan fix in some case didn't returned the good results. (Razvan P., reported by Jeremiah L.)
This happened if you had a state with only neg tap and the output of the state was a function of some sequence.
If you had multiple states, there was no problem.
* Fixed bug in Scan with multiple outputs,
where one output would sometimes overwrite another one. (Razvan P.)
* Clip.grad treated the gradient with respect to the clipping boundary as always 0. (Ian G.)
Interface changes:
* We do not support anymore unaligned ndarray in Python code. (Frederic B.)
We did not support it in C code and supporting it in Python code made
the detection harder.
* Now we only officially support SciPy 0.7.2 and NumPy 1.5.0 (Frederic B.)
We weren't and aren't testing with older versions.
* The theano.sparse.SparseType is available even when SciPy is not (Frederic B.)
* Fixed issue where members of consider_constant grad parameter
were treated differently from Constant variables. (Ian G.)
* Removed the parameter g_cost from theano.grad(). (Ian G.)
Use the new more powerful parameter known_grads instead.
NumPy interface support:
* theano.tensor.where is an alias for theano.tensor.switch to support NumPy semantic. (Ian G.)
* TensorVariable objects now have dot, argmin, argmax, clip, conj, repeat, trace, std, round,
ravel and argsort functions and the real and imag properties as numpy.ndarray objects.
The functionality was already available in Theano. (abalkin)
Speed-ups:
* A C version of the SoftMax op (Razvan P.)
There was C code for the softmax with bias code.
* Faster GpuIncSubtensor (Ian G.)
* Faster copy on the GPU for 4d tensor. (Ian G.)
* The fix of flatten infer_shape re-enables an optimization (Pascal L.)
* The bug was introduced in 0.6rc1.
* Enable inc_subtensor on the GPU when updating it with a float64 dtype. (Ian G.)
It was causing an optimization warning.
* Make DeepCopy reuse preallocated memory. (Frederic B.)
* Move the convolution to the GPU when the image shape and logical image shape differ. (Frederic Bastien)
* C code for the View Op (Razvan P., Pascal L.)
New Features:
* Added a monitoring mode "MonitorMode" as a debugging tool. (Olivier D.)
* Allow integer axes when keepdims==True (Jeremiah Lowin)
* Added erfinv and erfcinv op. (Jey Kottalam)
* Added tensor.batched_dot(). (Caglar Gulcehre)
It uses scan behind the scenes, but makes doing this easier.
* theano.get_constant_value(x) (Frederic B.)
This tries to have x as a constant int.
This does some constant folding to try to convert x into an int.