Copied NEWS.txt to doc subfolder

b6db35ce · Olivier Delalleau · d7caa062 · b6db35ce
--- a/doc/NEWS.txt
+++ b/doc/NEWS.txt
 .. _NEWS:
+Updates in the Trunk since the last release:
+https://github.com/Theano/Theano/wiki/Devnews
 =============
 Release Notes
 =============
-Theano 0.5 (23 February 2012)
+Theano 0.6rc1 (October 1st, 2012)
-=============================
+=================================
-Highlight:
+Highlights:
- * Moved to github: http://github.com/Theano/Theano/
+ * Bug fixes, crash fixes, CPU and GPU speed up.
- * Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets
+ * theano_var.eval({other_var: val[,...]} to simplify the usage of Theano (Ian G.)
- * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
+ * New default linker `cvm`. This is the execution engine that tells what op to run in which order.
- * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+   It is now implemented in C and enables lazy evaluation of ifelse op.
- * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
+ * Faster theano.function compilation. (Pascal L., Ian G.)
-   and dot(vector, vector). (James, Frederic, Pascal)
+ * Big sparse submodule update and documentation of it. (Nicolas Bouchard)
- * C implementation of Alloc. (James, Pascal)
+ * Use GPU asynchronous functionality (Frederic B.)
- * theano.grad() now also work with sparse variable. (Arnaud)
+ * Better Windows support.
- * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
- * See the Interface changes.
+Known bug:
+ * A few crash cases that will be fixed by the final release.
-Interface Behavior Changes:
+Bug fixes:
- * The current default value of the parameter axis of
+ * Outputs of Scan nodes could contain corrupted values: some parts of the
-   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
+   output would be repeated a second time, instead of the correct values.
-   numpy: None. i.e. operate on all dimensions of the tensor.
+   It happened randomly, and quite infrequently, but the bug has been present
-   (Frederic Bastien, Olivier Delalleau) (was deprecated and generated
+   (both in Python and Cython) since April 2011. (Pascal L.)
-   a warning since Theano 0.3 released Nov. 23rd, 2010)
+ * In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale.
- * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
+   It did not return the right number of elements. (Frederic B.)
-   You can specify the output dtype with a new dtype parameter to sum.
+ * set_subtensor(x[int vector], new_value) when moved to the GPU
-   The output dtype is the one using for the summation.
+   was transformed into inc_subtensor on the GPU. Now we have a correct
-   There is no warning in previous Theano version about this.
+   (but slow) GPU implementation.
-   The consequence is that the sum is done in a dtype with more precision than before.
+   Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly
-   So the sum could be slower, but will be more resistent to overflow.
+   in all cases as well as all inc_subtensor.
-   This new behavior is the same as numpy. (Olivier, Pascal)
+   Note 2: If your code was affected by the incorrect behavior, we now print
- * When using a GPU, detect faulty nvidia drivers. This was detected
+   a warning by default (Frederic B.)
-   when running Theano tests. Now this is always tested. Faulty
+ * Fixed an issue whereby config values were used as default arguments,
-   drivers results in wrong results for reduce operations. (Frederic B.)
+   with those defaults then stuck at old values if the config variables were
+   changed during program execution. (David W-F)
+ * Fixed many subtle bugs involving mutable default arguments which may have
-Interface Features Removed (most were deprecated):
+   led to unexpected behaviour, such as objects sharing instance variables
- * The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
+   they were not supposed to share. (David W-F)
-   were accepted only by theano.function().
+ * Correctly record the GPU device number used when we let the driver select it.
-   Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
+   (Frederic B.)
- * tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt
+ * CAReduce with NaN in inputs did not return the good output. (Pascal L.)
-   (list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
+     * This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
- * A few tag.shape and Join.vec_length left have been removed. (Frederic)
+ * The grad of TensorDot, was returning the wrong shape for some combination of axes.
- * The .value attribute of shared variables is removed, use shared.set_value()
+   We now raise NotImplementedError in those cases. (Frederic B.)
-   or shared.get_value() instead. (Frederic)
+ * conv2d with subsample >2 returned wrong values. (Pascal L.)
- * Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
+     * Fixed when mode==valid, disabled when mode==full
-   If you use it, Theano will now raise an error. (Olivier D.)
+ * theano.sparse.CSMGrad op (generated by the grad of CSM) didn't
- * scan interface changes: (Razvan Pascanu)
+   handle unsorted input correctly and gradient that is sparser
-    * The use of `return_steps` for specifying how many entries of the output
+   than the input. In that case, a bad result was returned. But this could
-      to return has been removed. Instead, apply a subtensor to the output
+   happen only when a sparse input of a Theano function was not
-      returned by scan to select a certain slice.
+   sorted. This happens for example with sparse advanced indexing from
-    * The inner function (that scan receives) should return its outputs and
+   scipy. The conclusion is most of time Nan in the graph.
-      updates following this order: [outputs], [updates], [condition].
+   (Yann Dauphin)
-      One can skip any of the three if not used, but the order has to stay unchanged.
+ * theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handle
+   correctly not contiguous inputs/outputs. (Pascal L.)
-Interface bug fix:
+ * Fix a corner case CVM updates case. (Pascal L.)
- * Rop in some case should have returned a list of one Theano variable,
+   This happened if the update to a shared variable is itself after optimization.
-   but returned the variable itself. (Razvan)
+   The CVM was not used by default.
+ * Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.)
-New deprecation (will be removed in Theano 0.6, warning generated if you use them):
+   This probably didn't cause problem as there is only the UsmmCscDense op
- * tensor.shared() renamed to tensor._shared(). You probably want to
+   (used call to Usmm with CSC matrix) that could interfere with them.
-   call theano.shared() instead! (Olivier D.)
+Deprecation:
+ * Deprecated the Module class (Ian G.)
-Bug fixes (incorrect results):
+   This was a predecessor of SharedVariable with a less pythonic philosophy.
- * On CPU, if the convolution had received explicit shape information,
-   they where not checked at runtime.  This caused wrong result if the
+Interface changes:
-   input shape was not the one expected. (Frederic, reported by Sander
+ * Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.8.
-   Dieleman)
+ * In Theano 0.5, we removed the deprecated sharedvar.value property.
- * Theoretical bug: in some case we could have GPUSum return bad value.
+   Now we raise an error if you access it. (Frederic B.)
-   We were not able to reproduce this problem
+ * theano.function does not accept duplicate inputs, so function([x, x], ...)
+   does not work anymore. (Pascal L.)
-   * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
+ * theano.function now raises an error if some of the provided inputs are
-     01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
+   not part of the computational graph needed to compute the output, for
+   instance, function([x, y], [y]). You can use the kwarg
- * div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
+   ``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
- * theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
+   (Pascal L.)
-   The grad is now disabled and returns an error. (Frederic)
+ * New Theano flag "on_unused_input" that defines the default value of the
- * An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)"
+   previous point. (Frederic B.)
-   and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your
+ * tensor.alloc() now raises an error during graph build time
-   code was affected by this bug. (Olivier, reported by Sander Dieleman)
+   when we try to create less dimensions than the number of dimensions
- * When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]),
+   the provided value have. In the past, the error was at run time.
-   an optimization replacing it with a direct indexing (x[d]) used an incorrect formula,
+   (Frederic B.)
-   leading to incorrect results. (Pascal, reported by Razvan)
+ * Remove theano.Value and related stuff (Ian G.)
- * The tile() function  is now stricter in what it accepts to allow for better
+   This was a test of what ended up as SharedVariable.
-   error-checking/avoiding nonsensical situations. The gradient has been
+ * Renamed Env to FunctionGraph, and object attribute "env" to "fgraph" (Ian G.)
-   disabled for the time being as it only implemented (incorrectly) one special
+   Deprecation warning printed when you try to access the "env" attribute.
-   case. The `reps` argument must be a constant (not a tensor variable), and
+ * Renamed the FunctionGraph.nodes attribute to FunctionNodes.apply_nodes (Ian G.)
-   must have the same length as the number of dimensions in the `x` argument;
+ * Warn when we don't handle correctly the parameter in Theano flags `nvcc.flags`
-   this is now checked. (David)
+   (Frederic B.)
+ * Do not reorder the user flags passed to the compiler. They get set after other flags. (Frederic B.)
+ * Make setuptools optional (Ilan Schnell)
-Scan fixes:
+ * We warn when a user tries to use an old GPU with which TheNo is untested.
- * computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
+   This could cause crash and will also be very slow. (Frederic B.)
-   before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
+ * Make theano.grad able to differentiate between not implemented, undefined and disconnected grad.
-   now : do the right thing.
+   Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} or
- * gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
+   something of DisconectedType (Ian G.)
+ * Make theano.grad expect to always receive a float or undefined
-   * before : it used to return wrong values
+   gradient and enforce that op with integer output values always
-   * now : do the right thing.
+   return 0. (Ian G.)
-   * Note: The reported case of this bug was happening in conjunction with the
-     save optimization of scan that give run time errors. So if you didn't
-     manually disable the same memory optimization (number in the list4),
+New memory output contract (was mentioned in the release notes of Theano 0.5):
-     you are fine if you didn't manually request multiple taps.
+ * Now the output memory received can be preallocated by other stuff.
- * Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
+   In the past it was always the previous output an Apply node allocated.
-   before : compilation error when computing R-op
+   So this means that the shape and strides can be different from previous calls
-   now : do the right thing.
+   and there can be links to this memory at other places.
- * save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
+   This means it could receive preallocated output that is not c_contiguous.
-   before : for certain corner cases used to result in a runtime shape error
+   But we don't do that now. (Pascal L.)
-   now : do the right thing.
+ * New Theano flags to test this DebugMode.check_preallocated_output (Pascal L.)
- * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
+ * Updated a few ops to respect this contract (Pascal L.)
- * Scan.infer_shape now works correctly when working with a condition for the number of loops.
-   In the past, it returned n_steps as the length, which is not always true. (Razvan)
- * Scan.infer_shape crash fix. (Razvan)
+New Features:
+ * GPU scan now works (does not crash) when there is a mixture of float32 and other dtypes.
-New features:
+ * theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
- * AdvancedIncSubtensor grad defined and tested (Justin Bayer)
+ * debugprint new param ids=["CHAR", "id", "int", ""]
- * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
+   This makes the identifier printed to be a unique char, the Python id, a
- * tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
+   unique int, or not have it printed. We changed the default to be "CHAR"
- * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
+   as this is more readable. (Frederic B.)
- * theano-cache list: list the content of the theano cache (Frederic)
+ * debugprint new param stop_on_name=[False, True]. If True, we don't print
- * theano-cache unlock: remove the Theano lock (Olivier)
+   anything below an intermediate variable that has a name. Defaults to False.
- * tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
+   (Frederic B.)
- * MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
+ * debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
-     * used by tensor.{max,min,max_and_argmax}
+ * If you use Enthought Python Distribution (EPD) now we use its blas
- * tensor.{all,any} (Razvan)
+   implementation by default. (Frederic B., Graham Taylor, Simon McGregor)
- * tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
+ * MRG random now raises an error with a clear message when the passed shape
- * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+   contains dimensions with bad value like 0. (Frederic B. reported by Ian G.)
- * IfElse now allows to have a list/tuple as the result of the if/else branches.
+ * "CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
-     * They must have the same length and corresponding type (Razvan)
+ * "CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
- * Argmax output dtype is now int64 instead of int32. (Olivier)
+ * We add dimensions to CudaNdarray to automatically broadcast more frequently.
- * Added the element-wise operation arccos. (Ian)
+   (Frederic B.)
- * Added sparse dot with dense grad output. (Yann Dauphin)
+ * New theano flag cmodule.warn_no_version. Default False. If True,
-     * Optimized to Usmm and UsmmCscDense in some case (Yann)
+   will print a warning when compiling one or more Op with C code that
-     * Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
+   can't be cached because there is no c_code_cache_version() function
-       The new theano.sparse.dot() has a dense gradient for all inputs.
+   associated to at least one of those Ops.  (Frederic B.)
- * GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
+ * CPU alloc now always generate C code (Pascal L.)
- * TensorVariable.zeros_like() and SparseVariable.zeros_like()
+ * New Theano flag cmodule.warn_no_version=False. When True, warn when an op
- * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
+   with C code is not versioned (which forces to recompile it everytimes).
- * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
+   (Frederic B.)
- * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
+ * C code reuses preallocated outputs (only done by Scan) (Pascal L.)
-     * We also support the "theano_version" substitution.
+ * Garbage collection of intermediate results during Theano function calls
- * IntDiv c code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
+   for Ops with C code (Pascal L.)
- * Internal filter_variable mechanism in Type. (Pascal, Ian)
+ * Theano flag compiledir_format now supports the parameter "numpy_version" and "g++". (Frederic B.)
-    * Ifelse works on sparse.
+ * Theano GPU variables, shared variables and constants now support <, <=,
-    * It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
+   > and >= similar to those not on the GPU.
- * Added a_tensor.transpose(axes) axes is optional (James)
+ * AdvancedIncSubtensor now supports the set_instead_of_inc parameter. (Eric L.)
-    * theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
+ * Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.)
- * a_CudaNdarray_object[*] = int, now works (Frederic)
+ * theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman}
- * tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
+   have a new parameter keepdims (Eric L.)
- * sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
+   This allows to broadcast it correctly against the input data to normalize it.
- * sparse_variable[N, N] now works (Li Yao, Frederic)
+ * The Updates objects now check that the keys are SharedVariable when we pass them
- * sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal)
+   in the __init__ function. (Pascal L.)
-   M, N, O, and P can be Python int or scalar tensor variables, None, or
+ * Set a Theano Variable name on transposed op when the input has one (Frederic B).
-   omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
+ * The cvm linker now supports garbage collection (enabled by default). (James B. Arnaud B., Pascal L.)
- * tensor.tensordot can now be moved to GPU (Sander Dieleman,
+ * The cvm linker is now the default linker.
-   Pascal, based on code from Tijmen Tieleman's gnumpy,
+   This makes the "loop" around the execution of apply node in C. So this lowers the overhead.
-   http://www.cs.toronto.edu/~tijmen/gnumpy.html)
+ * theano_variable[numpy.newaxis] is now supported (James B.)
- * Many infer_shape implemented on sparse matrices op. (David W.F.)
+ * Enable ifelse on the GPU. (Frederic B.)
- * Added theano.sparse.verify_grad_sparse to easily allow testing grad of
+ * Correctly support numpy.memmap everywhere (Pascal L.)
-   sparse op. It supports testing the full and structured gradients.
+   We add partial support for them before. Just use the normal tensor operation
- * The keys in our cache now store the hash of constants and not the constant values
+   on them and it should work.
-   themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
+   But be careful not to exhaust your computer memory! (we always generate normal ndarray)
- * 'theano-cache list' lists key files bigger than 1M (Frederic B.)
+ * Add an optimization that stabilizes log(softmax(x)). (Ian G.)
- * 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
+ * Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.)
- * 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
+ * If `theano_fn.trust_input` is set to False, do not check if the inputs are good
- * The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
+   when calling the theano function. (Frederic B.)
- * Add the header_dirs to the hard part of the compilation key. This is
+ * Add theano.tensor.blas,gem{m,v} as shortcut.
-   currently used only by cuda, but if we use library that are only headers,
+ * theano.grad(..., add_names=True). False for the old
-   this can be useful. (Frederic B.)
+   behavior. Otherwise it tries to name the grad variables. (Ian G.)
- * The Theano flag "nvcc.flags" is now included in the hard part of the key.
+ * theano-nose (Pascal L.)
-   This mean that now we recompile all modules for each value of "nvcc.flags".
+   A wrapper around nosetests that adds needed extensions.
-   A change in "nvcc.flags" used to be ignored for module that were already
+   * --profile-time option, to print time spent in each test (Eric L.)
-   compiled. (Frederic B.)
+   * --batch option, to allow to run tests in batch to lower memory requirement.
- * Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
+ * m = mean(log(1 - sigm(x)))
-   at compile time if all their inputs are constant.
+   x - scalar * theano.grad(m, x)
-   (Frederic B., Pascal L., reported by Sander Dieleman)
+   There is a stabilization optimization for this.
- * New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)
+   Now it is applied more frequently. (Pascal L.)
-New optimizations:
+New Op/functions:
- * AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
+ * Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard)
- * dot22, dot22scalar work with complex. (Frederic)
+ * Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard)
- * Generate Gemv/Gemm more often. (James)
+ * Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard)
- * Remove scan when all computations can be moved outside the loop. (Razvan)
+ * Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
- * scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
+ * Added theano.tensor.diff that wraps numpy.diff (Nicolas B.)
- * exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
+ * Added theano.tensor.bincount that wraps numpy.bincount (Nicolas B., Pascal L, Frederic B.)
- * Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
+ * Added theano.tensor.squeeze (Nicolas B.)
- * Made the optimization process faster. (James)
+   This removes broadcasted dimensions from the variable.
- * Allow fusion of elemwise when the scalar op needs support code. (James)
+   Theano-esque version of numpy.squeeze.
- * Better opt that lifts transpose around dot. (James)
+ * Added theano.tensor.repeat that wraps numpy.repeat (Nicolas B. + PL)
+ * Added theano.tensor.bartlett that wraps  numpy.bartlett (Eric L.)
+ * Added theano.tensor.fill_diagonal that wraps numpy.fill_diagonal (Eric L., Frederic B.)
-Crashes fixed:
+ * Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.)
- * T.mean crash at graph building time. (Ian)
+ * Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op
- * "Interactive debugger" crash fix. (Ian, Frederic)
+   that allows to load a .npy file in a theano graph (Matthew Rocklin)
- * Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
+ * theano.sandbox.linalg.kron.py:Kron op. (Eric L.)
- * Optimization crash with gemm and complex. (Frederic)
+   Kronecker product
- * GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
- * Compilation crash with amdlibm and the GPU. (Frederic)
+Speed up:
- * IfElse crash. (Frederic)
+ * CPU convolutions are now parallelized (Frederic B.)
- * Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
+   By default use all cores/hyper-threads.
- * GPU compilation crash on MacOS X. (Olivier)
+   To control it, use the `OMP_NUM_THREADS=N` environment variable where N is the number of
- * Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
+   parallel threads to use. By default it is equal to the number of CPU cores/hyper
- * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
+   threads that you have.
- * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
+   There is a new Theano flag `openmp` to allow/disallow openmp op.
- * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
+   If your BLAS library is parallelized, this flag won't affect it, but the
- * Fix dot22scalar cast of integer scalars (Justin Bayer, Frederic, Olivier)
+   env variable will.
- * Fix runtime crash in gemm, dot22. FB
+ * Remove a corner case causing duplicated dot22/gemm in the graph. (Frederic B., Ian G.)
- * Fix on 32bits computer: make sure all shape are int64.(Olivier)
+ * Enable fusion of elemwise that have the same clients multiple times. (Frederic B.)
- * Fix to deque on python 2.4 (Olivier)
+ * New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.)
- * Fix crash when not using c code (or using DebugMode) (not used by
+ * Faster theano.function compilation. (Pascal L., Ian G.)
-   default) with numpy 1.6*. Numpy has a bug in the reduction code that
+ * Remove GPU transfer around specify_shape op. (Frederic B.)
-   made it crash. (Pascal)
+ * Implemented/tested MANY op.infer_shape method (Eric Larsen)
- * Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU)
+   This allows Theano to make better shape inferance.
-   when matrices had non-unit stride in both dimensions (CPU and GPU),
+ * Implement Solve.infer_shape (Matthew Rocklin)
-   or when matrices had negative strides (GPU only). In those cases,
+ * Scan memory optimizations now work more frequently. (Razvan P.)
-   we are now making copies. (Pascal)
+   There was a warning printed by the subtensor optimization in those cases.
- * More cases supported in AdvancedIncSubtensor1. (Olivier D.)
+ * Faster rng_mrg Python code. (mostly used for tests) (Frederic B.)
- * Fix crash when a broadcasted constant was used as input of an
-   elemwise Op and needed to be upcasted to match the op's output.
+Speed up GPU:
-   (Reported by John Salvatier, fixed by Pascal L.)
+ * Convolution on the GPU now checks the generation of the card to make
- * Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)
+   it faster in some cases (especially medium/big ouput image) (Frederic B.)
+   * We had hardcoded 512 as the maximum number of threads per block. Newer cards
+     support up to 1024 threads per block.
-Known bugs:
+ * Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
- * CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_).
+ * We now pass the GPU architecture to nvcc when compiling (Frederic B.)
-     * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
+ * Now we use the GPU function async feature by default. (Frederic B.)
+   Set the environment variable `CUDA_LAUNCH_BLOCKING` to `1` to disable this
+   for profiling or debugging.
-Sandbox:
+ * Faster creation of CudaNdarray objects (Frederic B.)
- * cvm interface more consistent with current linker. (James)
+ * Now some Max reductions are implemented on the GPU. (Ian G.)
-   * Now all tests pass with the linker=cvm flags.
- * vm linker has a callback parameter. (James)
+Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp):
- * review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
+ * sparse.remove0 (Frederic B., Nicolas B.)
- * review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
+ * sparse.sp_sum(a, axis=None) (Nicolas B.)
- * review/finish/doc: MatrixInverse, matrix_inverse. (Razvan)
+   * bugfix: the not structured grad was returning a structured grad.
- * review/finish/doc: matrix_dot. (Razvan)
+ * sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
- * review/finish/doc: det (determinent) op. (Philippe Hamel)
+ * sparse.{diag,square_diagonal} (Nicolas B.)
- * review/finish/doc: Cholesky determinent op. (David)
- * review/finish/doc: ensure_sorted_indices. (Li Yao)
+Sparse:
- * review/finish/doc: spectral_radius_boud. (Xavier Glorot)
+ * Support for uint* dtype.
- * review/finish/doc: sparse sum. (Valentin Bisson)
+ * Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't
- * review/finish/doc: Remove0 (Valentin)
+   have the same sparsity pattern. (Frederic B.)
- * review/finish/doc: SquareDiagonal (Eric)
+ * New Ops: sparse.{expm1,deg2rad,rad2deg,trunc} (Nicolas B.)
+ * New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
+ * New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
-Sandbox New features (not enabled by default):
+ * New functions: structured_{add,exp,log,pow,minimum,maximum,sigmoid} (Yann D., Nicolas B.)
- * CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
+     * Optimized op: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
- * New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)
+ * New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
+ * New Op: sparse.Cast() (Yann D., Nicolas B.)
+   * Add sparse_variable.astype() and theano.sparse.cast() and
+     theano.sparse.{b,w,i,l,f,d,c,z}cast() as their tensor equivalent (Nicolas B.)
+ * Op class: SamplingDot (Yann D., Nicolas B.)
+   * Optimized version: SamplingDotCsr, StructuredDotCSC
+   * Optimizations to insert the optimized version: local_sampling_dot_csr, local_structured_add_s_v
+ * New Ops: sparse.{Multinomial,Poisson,Binomial} (Yann D., NB)
+ * Implement the CSMProperties grad method (Yann Dauphin)
+ * Move optimizations to theano/sparse/opt.py (Nicolas B.)
+New flags:
+ * `profile=True` flag now prints the sum of all printed profiles. (Frederic B.)
+   * It works with the linkers vm/cvm (default).
+   * Also print compile time, optimizer time and linker time.
+   * Also print a summary by op class.
+ * new flag "profile_optimizer" (Frederic B.)
+   when profile=True, will also print the time spent in each optimizer.
+   Useful to find optimization bottleneck.
+ * new flag "cmodule.remove_gxx_opt" (Frederic B.)
+   If True, will remove -O* parameter passed to g++.
+   This is useful to debug in gdb module compiled by Theano.
+   The parameter -g is passed by default to g++.
+ * new flag cmodule.compilation_warning
+   if True, will print compilation warning.
+ * new flag `allow_gc` (Frederic B.)
+   When False, do not garbage collect intermediate results when they are not needed.
+   This uses more memory, but allocates memory less frequently so faster.
+ * new flag `vm.lazy` (Frederic B.)
+   Useful only for the vm linkers. When lazy is None,
+   auto detect if lazy evaluation is needed and use the apropriate
+   version. If lazy is True/False, force the version used between
+   Loop/LoopGC and Stack.
+ * new flag `cxx`. This is the C++ compiler to use. If empty do not compile C code. (Frederic B.)
+ * New flag `print_active_device` that defaults to True. (Matthew R.)
 Documentation:
- * Many updates. (Many people)
+ * Added in the tutorial documentation on how to extend Theano.
- * Updates to install doc on MacOS. (Olivier)
+   This explains how to make a Theano Op from a Python function.
- * Updates to install doc on Windows. (David, Olivier)
+   http://deeplearning.net/software/theano/tutorial/extending_theano.html
- * Doc on the Rop function (Ian)
+   (Frederic B.)
- * Added how to use scan to loop with a condition as the number of iteration. (Razvan)
+ * New installation instructions for Windows using EPD (Pascal L.)
- * Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
+ * New installation on Windows by using a Linux VM from ContinuumIO (Frederic B.)
- * Refactored GPU installation of Theano. (Olivier)
+ * Revisions of Theano tutorial and addition of exercices to it. (Eric L.)
+ * New tutorial on Sparse variable. (Nicolas B., Sebastien Lemieux, Frederic Bastien
+   http://www.deeplearning.net/software/theano/tutorial/sparse.html
+ * Installation documentation for CentOS6 (Frederic B.)
+ * Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer)
+ * Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G.
+ * Python Memory Management tutorial (Steven Pigeon, Olivier D.)
+Proposal:
+ * Math framework for complex gradients (Pascal L.)
+Internal changes:
+ * Define new exceptions MissingInputError and UnusedInputError, and use them
+   in theano.function, instead of TypeError and ValueError. (Pascal L.)
+ * Better handling of bitwidth and max values of integers and pointers
+   across platforms (Pascal L.)
+ * Made a few Ops with C code versioned to reduce compilation time.
+   (Frederic B, Pascal L.)
+ * Better deletion of files in the compiledir (Frederic B.)
+ * Safer import on sort op (Nicolas Pinto)
+ * hash_from_dict for elemwise op (Fredric B.)
+ * Renamed BadCLinkerOutput into BadThunkOutput. (PL)
+ * tensor.utils.shape_of_variables (Matthew R.)
+ * Add the numpy abi version and g++/nvcc version in the key of compiled code. (Frederic B.)
+ * env.replace_all_validate_remove (Frederic B.)
+   This allows global optimizer to ensure it removed some nodes from the graph.
+   This is a generic way to catch errors that would otherwise duplicate
+   computation.
+   * It was used for GEMM and Scan optimization (Frederic B., Razvan P.)
+ * Fix how exception are raised in GPU code (James B.)
+ * Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen and others.
+ * TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or
+   numpy.zeros with the right dtype. (Pascal L., Olivier D.)
+   This allows to have the same code work with both types.
+ * Renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.)
+ * New exception MissingGXX when we try to compile but there is no cxx compiler. (Frederic B.)
+ * New fct theano.gof.utils.give_variables_names(...) that gives unique names to variables. (Matthew R.)
+ * Use most of the time the new NumPy C-API for later NumPy release. (Frederic B.)
+ * New theano.gof.sched.sort_apply_nodes() that will allow other execution ordering. (Matthew R.)
+ * New attribute sort_schedule_fn, a way to specify a scheduler to use. (Matthew R.)
+Crash Fix:
+ * Fix import conflict name (usaar33, Frederic B.)
+    * This makes Theano work with PiCloud.
+ * Do not try to use the BLAS library when blas.ldflags is manually set to an
+   empty string (Frederic B., Pascal L.)
+ * When importing theano on a computer without GPU with the Theano
+   flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by  Luo Heng)
+ * Optimization printed a useless error when scipy was not available. (Frederic B.)
+ * GPU conv crash/slowdown on newer hardware (James B.)
+ * Better error handling in GPU conv (Frederic B.)
+ * GPU optimization that moves element-wise Ops to the GPU. Crash happened in
+   a particular execution order of this optimization and the
+   element-wise fusion optimization when upcasting some inputs to
+   float32 (to compute them on the GPU).
+   (Frederic B., reported by Sander Dieleman)
+ * GpuReshape in some particular case when the input is not contiguous
+   (Frederic B., reported by Sander Dieleman)
+ * GpuSoftmaxWithBias with shape (0, N) with N > 1.
+   (Frederic B., reported by Razvan P.)
+ * Fix crash under 64-bit Windows, when taking subtensors of the form a[n:]
+   (Pascal L., reported by Simon McGregor)
+ * Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable
+   dimensions, which could typically result in optimization crashes (Olivier D.)
+ * Fixed crash when concatenating some arrays with specific broadcasting
+   patterns (Olivier D.)
+ * Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
+ * In advanced indexing, if some inputs are constant, no need to call constant(...)
+   on their value any more. (Pascal L., reported by John Salvatier)
+ * Fix crash on GPU when the GpuSubtensor didn't put the right stride
+   when the result tensor had a dimension with size of 1. (Pascal L,
+   reported Graham T.)
+ * Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
+ * If you grad again a random state, don't crash (Razvan P.)
+ * GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535.
+   (Frederic B. reported by Gabe Schwartz)
+ * Potential crash due to parallel compilation when importing theano.sandbox.cuda
+   (Olivier D.)
+ * Crash fix on python 2.4 with slicing. (Pascal L.)
+ * grad of argmin and argmax (Razvan P.)
+ * Don't compute the Rop for shared variables with updates (mostly random).
+   We don't use them and they caused crash. (Razvan P.)
+ * MaxArgmax.grad() when one of the gradient it receives is None. (Razvan P, reported by Mark Fenner)
+ * Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)
+Tests:
+ * Use less memory (Olivier D.) (fix crash on 32-bit computers)
+ * Fix test with Theano flag "blas.ldflags=". (Frederic B., Pascal L.)
+ * Fix crash with advanced subtensor and numpy constant.
+ * Fix random tests crash due to random value. (Pascal L.)
+ * Always introduce Alloc node when calling alloc and let the optimizer remove them if needed.
+   This allows DebugMode to catch some shape error. (Pascal L.)
+ * DebugMode now checks the view_map for all types of Theano variables.
+   It was doing only variables of tensor type. (Frederic B.)
 Others:
- * Better error messages in many places. (Many people)
+ * Remove python warning for some python version. (Gabe Schwartz)
- * PEP8 fixes. (Many people)
+ * Remove useless fill op in fast_compile mode to make the graph more readable. (Fredric B.)
- * Add a warning about numpy bug when using advanced indexing on a
+ * Remove GpuOuter as it is a subset of the new GpuGer (Frederic B.)
-   tensor with more than 2**32 elements (the resulting array is not
+ * Now we use http://travis-ci.org/ to run all CPU tests (without SciPy)
-   correctly filled and ends with zeros). (Pascal, reported by David WF)
+   with the default mode on all Pull Requests.
- * Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan)
+   This should make the trunk more stable. (Fredric B.)
- * New min_informative_str() function to print graph. (Ian)
+ * Our nightly buildbot now checks on python 2.4 (Frederic B.)
- * Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier)
+   This should make the trunk work on it more frequently.
- * Better support for utf string. (David)
- * Fix pydotprint with a function compiled with a ProfileMode (Frederic)
+Other thanks:
-     * Was broken with change to the profiler.
+ * blaxill reported an error introduced into the trunk.
- * Warning when people have old cache entries. (Olivier)
- * More tests for join on the GPU and CPU. (Frederic)
+New stuff that will probably be reworked/removed before the release:
- * Do not request to load the GPU module by default in scan module. (Razvan)
+ * Better PyCUDA sharing of the GPU context.(fix crash at exit) (Frederic B.)
- * Fixed some import problems. (Frederic and others)
+   TODO: there is still a crash at exit!
- * Filtering update. (James)
- * On Windows, the default compiledir changed to be local to the
-   computer/user and not transferred with roaming profile. (Sebastian
-   Urban)
- * New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
-   it prints a warning when an error occurs when inferring the shape of some apply node.
-   The other accepted value is "raise" to raise an error when this happens. (Frederic)
- * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
- * better pycuda tests (Frederic)
- * check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
- * Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
- * More internal verification on what each op.infer_shape return. (Frederic, James)
- * Argmax dtype to int64 (Olivier)
- * Improved docstring and basic tests for the Tile Op (David).
-Reviewers (alphabetical order):
- * David, Frederic, Ian, James, Olivier, Razvan