update NEWS.txt and HISTORY.txt for the 0.6rc1 release.

538aedb2 · Frederic · 147a7155 · 538aedb2 · 538aedb2
--- a/HISTORY.txt
+++ b/HISTORY.txt
@@ -4,6 +4,297 @@
 =================
 Old Release Notes
 =================
+Theano 0.5 (23 February 2012)
+=============================
+
+Highlight:
+ * Moved to github: http://github.com/Theano/Theano/
+ * Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets
+ * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
+ * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+ * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
+   and dot(vector, vector). (James, Frédéric, Pascal)
+ * C implementation of Alloc. (James, Pascal)
+ * theano.grad() now also work with sparse variable. (Arnaud)
+ * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
+ * See the Interface changes.
+
+
+Interface Behavior Changes:
+ * The current default value of the parameter axis of
+   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
+   numpy: None. i.e. operate on all dimensions of the tensor.
+   (Frédéric Bastien, Olivier Delalleau) (was deprecated and generated
+   a warning since Theano 0.3 released Nov. 23rd, 2010)
+ * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
+   You can specify the output dtype with a new dtype parameter to sum.
+   The output dtype is the one using for the summation.
+   There is no warning in previous Theano version about this.
+   The consequence is that the sum is done in a dtype with more precision than before.
+   So the sum could be slower, but will be more resistent to overflow.
+   This new behavior is the same as numpy. (Olivier, Pascal)
+ * When using a GPU, detect faulty nvidia drivers. This was detected
+   when running Theano tests. Now this is always tested. Faulty
+   drivers results in wrong results for reduce operations. (Frederic B.)
+
+
+Interface Features Removed (most were deprecated):
+ * The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
+   were accepted only by theano.function().
+   Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
+ * tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt
+   (list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
+ * A few tag.shape and Join.vec_length left have been removed. (Frederic)
+ * The .value attribute of shared variables is removed, use shared.set_value()
+   or shared.get_value() instead. (Frederic)
+ * Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
+   If you use it, Theano will now raise an error. (Olivier D.)
+ * scan interface changes: (Razvan Pascanu)
+    * The use of `return_steps` for specifying how many entries of the output
+      to return has been removed. Instead, apply a subtensor to the output
+      returned by scan to select a certain slice.
+    * The inner function (that scan receives) should return its outputs and
+      updates following this order:
+        [outputs], [updates], [condition].
+      One can skip any of the three if not used, but the order has to stay unchanged.
+
+Interface bug fix:
+ * Rop in some case should have returned a list of one Theano variable,
+   but returned the variable itself. (Razvan)
+
+New deprecation (will be removed in Theano 0.6, warning generated if you use them):
+ * tensor.shared() renamed to tensor._shared(). You probably want to
+   call theano.shared() instead! (Olivier D.)
+
+
+Bug fixes (incorrect results):
+ * On CPU, if the convolution had received explicit shape information,
+   they where not checked at runtime.  This caused wrong result if the
+   input shape was not the one expected. (Frederic, reported by Sander
+   Dieleman)
+ * Theoretical bug: in some case we could have GPUSum return bad value.
+   We were not able to reproduce this problem
+     * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
+       01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
+ * div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
+ * theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
+   The grad is now disabled and returns an error. (Frederic)
+ * An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)"
+   and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your
+   code was affected by this bug. (Olivier, reported by Sander Dieleman)
+ * When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]),
+   an optimization replacing it with a direct indexing (x[d]) used an incorrect formula,
+   leading to incorrect results. (Pascal, reported by Razvan)
+ * The tile() function  is now stricter in what it accepts to allow for better
+   error-checking/avoiding nonsensical situations. The gradient has been
+   disabled for the time being as it only implemented (incorrectly) one special
+   case. The `reps` argument must be a constant (not a tensor variable), and
+   must have the same length as the number of dimensions in the `x` argument;
+   this is now checked. (David)
+
+
+Scan fixes:
+ * computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
+   before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
+   now : do the right thing.
+ * gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
+   before : it used to return wrong values
+   now : do the right thing.
+   Note: The reported case of this bug was happening in conjunction with the
+         save optimization of scan that give run time errors. So if you didn't
+         manually disable the same memory optimization (number in the list4),
+         you are fine if you didn't manually request multiple taps.
+ * Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
+   before : compilation error when computing R-op
+   now : do the right thing.
+ * save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
+   before : for certain corner cases used to result in a runtime shape error
+   now : do the right thing.
+ * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
+ * Scan.infer_shape now works correctly when working with a condition for the number of loops.
+   In the past, it returned n_steps as the length, which is not always true. (Razvan)
+ * Scan.infer_shape crash fix. (Razvan)
+
+New features:
+ * AdvancedIncSubtensor grad defined and tested (Justin Bayer)
+ * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
+ * tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
+ * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
+ * theano-cache list: list the content of the theano cache (Frederic)
+ * theano-cache unlock: remove the Theano lock (Olivier)
+ * tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
+ * MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
+     * used by tensor.{max,min,max_and_argmax}
+ * tensor.{all,any} (Razvan)
+ * tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
+ * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+ * IfElse now allows to have a list/tuple as the result of the if/else branches.
+     * They must have the same length and corresponding type (Razvan)
+ * Argmax output dtype is now int64 instead of int32. (Olivier)
+ * Added the element-wise operation arccos. (Ian)
+ * Added sparse dot with dense grad output. (Yann Dauphin)
+     * Optimized to Usmm and UsmmCscDense in some case (Yann)
+     * Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
+       The new theano.sparse.dot() has a dense gradient for all inputs.
+ * GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
+ * TensorVariable.zeros_like() and SparseVariable.zeros_like()
+ * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
+ * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
+ * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
+     * We also support the "theano_version" substitution.
+ * IntDiv C code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
+ * Internal filter_variable mechanism in Type. (Pascal, Ian)
+    * Ifelse works on sparse.
+    * It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
+ * Added a_tensor.transpose(axes) axes is optional (James)
+    * theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
+ * a_CudaNdarray_object[*] = int, now works (Frederic)
+ * tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
+ * sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
+ * sparse_variable[N, N] now works (Li Yao, Frederic)
+ * sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal)
+   M, N, O, and P can be Python int or scalar tensor variables, None, or
+   omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
+ * tensor.tensordot can now be moved to GPU (Sander Dieleman,
+   Pascal, based on code from Tijmen Tieleman's gnumpy,
+   http://www.cs.toronto.edu/~tijmen/gnumpy.html)
+ * Many infer_shape implemented on sparse matrices op. (David W.F.)
+ * Added theano.sparse.verify_grad_sparse to easily allow testing grad of
+   sparse op. It supports testing the full and structured gradients.
+ * The keys in our cache now store the hash of constants and not the constant values
+   themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
+ * 'theano-cache list' lists key files bigger than 1M (Frederic B.)
+ * 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
+ * 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
+ * The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
+ * Add the header_dirs to the hard part of the compilation key. This is
+   currently used only by cuda, but if we use library that are only headers,
+   this can be useful. (Frederic B.)
+ * The Theano flag "nvcc.flags" is now included in the hard part of the key.
+   This mean that now we recompile all modules for each value of "nvcc.flags".
+   A change in "nvcc.flags" used to be ignored for module that were already
+   compiled. (Frederic B.)
+ * Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
+   at compile time if all their inputs are constant.
+   (Frederic B., Pascal L., reported by Sander Dieleman)
+ * New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)
+
+
+New optimizations:
+ * AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
+ * dot22, dot22scalar work with complex. (Frederic)
+ * Generate Gemv/Gemm more often. (James)
+ * Remove scan when all computations can be moved outside the loop. (Razvan)
+ * scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
+ * exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
+ * Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
+ * Made the optimization process faster. (James)
+ * Allow fusion of elemwise when the scalar op needs support code. (James)
+ * Better opt that lifts transpose around dot. (James)
+
+
+Crashes fixed:
+ * T.mean crash at graph building time. (Ian)
+ * "Interactive debugger" crash fix. (Ian, Frederic)
+ * Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
+ * Optimization crash with gemm and complex. (Frederic)
+ * GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
+ * Compilation crash with amdlibm and the GPU. (Frederic)
+ * IfElse crash. (Frederic)
+ * Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
+ * GPU compilation crash on MacOS X. (Olivier)
+ * Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
+ * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
+ * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
+ * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
+ * Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
+ * Fix runtime crash in gemm, dot22. FB
+ * Fix on 32bits computer: make sure all shape are int64.(Olivier)
+ * Fix to deque on python 2.4 (Olivier)
+ * Fix crash when not using C code (or using DebugMode) (not used by
+   default) with numpy 1.6*. Numpy has a bug in the reduction code that
+   made it crash. (Pascal)
+ * Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU)
+   when matrices had non-unit stride in both dimensions (CPU and GPU),
+   or when matrices had negative strides (GPU only). In those cases,
+   we are now making copies. (Pascal)
+ * More cases supported in AdvancedIncSubtensor1. (Olivier D.)
+ * Fix crash when a broadcasted constant was used as input of an
+   elemwise Op and needed to be upcasted to match the op's output.
+   (Reported by John Salvatier, fixed by Pascal L.)
+ * Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)
+
+
+Known bugs:
+ * CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_).
+     * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
+
+
+Sandbox:
+ * cvm interface more consistent with current linker. (James)
+   * Now all tests pass with the linker=cvm flags.
+ * vm linker has a callback parameter. (James)
+ * review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
+ * review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
+ * review/finish/doc: MatrixInverse, matrix_inverse. (Razvan)
+ * review/finish/doc: matrix_dot. (Razvan)
+ * review/finish/doc: det (determinent) op. (Philippe Hamel)
+ * review/finish/doc: Cholesky determinent op. (David)
+ * review/finish/doc: ensure_sorted_indices. (Li Yao)
+ * review/finish/doc: spectral_radius_boud. (Xavier Glorot)
+ * review/finish/doc: sparse sum. (Valentin Bisson)
+ * review/finish/doc: Remove0 (Valentin)
+ * review/finish/doc: SquareDiagonal (Eric)
+
+
+Sandbox New features (not enabled by default):
+ * CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
+ * New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)
+
+
+Documentation:
+ * Many updates. (Many people)
+ * Updates to install doc on MacOS. (Olivier)
+ * Updates to install doc on Windows. (David, Olivier)
+ * Doc on the Rop function (Ian)
+ * Added how to use scan to loop with a condition as the number of iteration. (Razvan)
+ * Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
+ * Refactored GPU installation of Theano. (Olivier)
+
+
+Others:
+ * Better error messages in many places. (Many people)
+ * PEP8 fixes. (Many people)
+ * Add a warning about numpy bug when using advanced indexing on a
+   tensor with more than 2**32 elements (the resulting array is not
+   correctly filled and ends with zeros). (Pascal, reported by David WF)
+ * Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan)
+ * New min_informative_str() function to print graph. (Ian)
+ * Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier)
+ * Better support for utf string. (David)
+ * Fix pydotprint with a function compiled with a ProfileMode (Frederic)
+     * Was broken with change to the profiler.
+ * Warning when people have old cache entries. (Olivier)
+ * More tests for join on the GPU and CPU. (Frederic)
+ * Do not request to load the GPU module by default in scan module. (Razvan)
+ * Fixed some import problems. (Frederic and others)
+ * Filtering update. (James)
+ * On Windows, the default compiledir changed to be local to the
+   computer/user and not transferred with roaming profile. (Sebastian
+   Urban)
+ * New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
+   it prints a warning when an error occurs when inferring the shape of some apply node.
+   The other accepted value is "raise" to raise an error when this happens. (Frederic)
+ * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
+ * better pycuda tests (Frederic)
+ * check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
+ * Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
+ * More internal verification on what each op.infer_shape return. (Frederic, James)
+ * Argmax dtype to int64 (Olivier)
+ * Improved docstring and basic tests for the Tile Op (David).
+
+Reviewers (alphabetical order):
+ * David, Frederic, Ian, James, Olivier, Razvan


 Theano 0.4.1 (12 August 2011)

--- a/NEWS.txt
+++ b/NEWS.txt
@@ -8,294 +8,411 @@ https://github.com/Theano/Theano/wiki/Devnews
 Release Notes
 =============

-Theano 0.5 (23 February 2012)
-=============================
-
-Highlight:
- * Moved to github: http://github.com/Theano/Theano/
- * Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets
- * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
- * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
- * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
-   and dot(vector, vector). (James, Frédéric, Pascal)
- * C implementation of Alloc. (James, Pascal)
- * theano.grad() now also work with sparse variable. (Arnaud)
- * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
- * See the Interface changes.
+Theano 0.6rc1 (1 October 2012)
+==============================
+Updates in the Trunk since the last release up to
+git log -p rel-0.5... |grep Merge|less
+done up to and including merge of PR 976


-Interface Behavior Changes:
- * The current default value of the parameter axis of
-   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
-   numpy: None. i.e. operate on all dimensions of the tensor.
-   (Frédéric Bastien, Olivier Delalleau) (was deprecated and generated
-   a warning since Theano 0.3 released Nov. 23rd, 2010)
- * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
-   You can specify the output dtype with a new dtype parameter to sum.
-   The output dtype is the one using for the summation.
-   There is no warning in previous Theano version about this.
-   The consequence is that the sum is done in a dtype with more precision than before.
-   So the sum could be slower, but will be more resistent to overflow.
-   This new behavior is the same as numpy. (Olivier, Pascal)
- * When using a GPU, detect faulty nvidia drivers. This was detected
-   when running Theano tests. Now this is always tested. Faulty
-   drivers results in wrong results for reduce operations. (Frederic B.)
+Highlight:
+ * Bug fix, crash fix, CPU and GPU speed up.
+ * theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
+ * New default linker `cvm`. This is the execution engine that tell what op to run in witch order.
+   It is now implemented in C and enable lazy evaluation of ifelse op.
+ * Faster theano.function compilation. (Pascal L., Ian G.)
+ * Big sparse submodule update and documentation of it. (Nicolas Bouchard)
+ * Use GPU asynchronous functionality (Frederic B.)
+ * Better Windows support.

+Known bug
+ * A few crash case that will be fixed by the final release.

-Interface Features Removed (most were deprecated):
- * The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
-   were accepted only by theano.function().
-   Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
- * tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt
-   (list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
- * A few tag.shape and Join.vec_length left have been removed. (Frederic)
- * The .value attribute of shared variables is removed, use shared.set_value()
-   or shared.get_value() instead. (Frederic)
- * Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
-   If you use it, Theano will now raise an error. (Olivier D.)
- * scan interface changes: (Razvan Pascanu)
-    * The use of `return_steps` for specifying how many entries of the output
-      to return has been removed. Instead, apply a subtensor to the output
-      returned by scan to select a certain slice.
-    * The inner function (that scan receives) should return its outputs and
-      updates following this order:
-        [outputs], [updates], [condition].
-      One can skip any of the three if not used, but the order has to stay unchanged.
+Bug fixes
+ * Outputs of Scan nodes could contain corrupted values: some parts of the
+   output would be repeated a second time, instead of the correct values.
+   It happened randomly, and quite infrequently, but the bug has been present
+   (both in Python and Cython) since April 2011. (Pascal L.)
+ * In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale.
+   It did not return the right number of elements. (Frederic B.)
+ * set_subtensor(x[int vector], new_value) when moved to the GPU
+   was transformed into inc_subtensor on the GPU. Now we have a correct
+   (but slow) GPU implementation.
+   Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly
+   in all cases as well as inc_subtensor(*, *).
+   Note 2: If your code was affected by the incorrect behavior, we now print
+   a warning by default (Frederic B.)
+ * Fixed an issue whereby config values were used as default arguments,
+   with those defaults then stuck at old values if the config variables were
+   changed during program execution. (David W-F)
+ * Fixed many subtle bugs involving mutable default arguments which may have
+   led to unexpected behaviour, such as objects sharing instance variables
+   they were not supposed to share. (David W-F)
+ * Correctly record the GPU device number used when we let the driver select it.
+   (Frederic B.)
+ * CAReduce with NaN in inputs don't return the good output. (Pascal L.)
+     * This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
+ * The grad of TensorDot, was returning the wrong shape for some combination of axes.
+   We now raise NotImplementedError in those cases. (Frederic B.)
+ * conv2d with subsample >2 returned wrong values. (Pascal L.)
+     * Fixed when mode==valid, disabled when mode==full
+ * theano.sparse.CSMGrad op(generated by the grad of CSM) didn't
+   handle unsorted input correctly and grapdient that are more sparse
+   then the input. In that case, bad result was returned. But this can
+   happen only when a sparse input of a Theano function was not
+   sorted. This happen for example with sparse advanted indexing from
+   scipy. The conclusion is most of time Nan in the graph.
+   (Yann Dauphin)
+ * theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handled
+   correctly not contiguous inputs/outputs. (Pascal L.)
+ * Fix a corner case CVM updates case. (Pascal L.)
+   This happen is the update to a shared variable is itself after optimization.
+   The CVM was not used by default.
+ * Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.)
+   This probably didn't cause problem as there is only the UsmmCscDense op
+   (used call to Usmm wieh CSC matrix) that could interfere with them.

-Interface bug fix:
- * Rop in some case should have returned a list of one Theano variable,
-   but returned the variable itself. (Razvan)
+Deprecation
+ * Deprecated the Module class (Ian G.)
+   This was a predecessor of SharedVariable with a less pythonic phylosophy.

-New deprecation (will be removed in Theano 0.6, warning generated if you use them):
- * tensor.shared() renamed to tensor._shared(). You probably want to
-   call theano.shared() instead! (Olivier D.)
+Interface changes
+ * Now the base version requirement are numpy >= 1.5.0 and the optional scipy >= 0.8.
+ * In Theano 0.5, we removed the deprecated sharedvar.value property.
+   Now we raise an error if you access it. (Frederic B.)
+ * theano.function does not accept duplicate inputs, so function([x, x], ...)
+   does not work anymore. (Pascal L.)
+ * theano.function now raises an error if some of the provided inputs are
+   not part of the computational graph needed to compute the output, for
+   instance, function([x, y], [y]). You can use the kwarg
+   ``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
+   (Pascal L.)
+ * New Theano flag "on_unused_input" that define the default value of the
+   previous point. (Frederic B.)
+ * tensor.alloc() now raises an error during graph build time
+   when we try to create less dimensions than the number of dimensions
+   the provided value have. In the past, the error was at run time.
+   (Frederic B.)
+ * Remove theano.Value and related stuff (Ian G.)
+   This was a test of what ended up as SharedVariable.
+ * Renamed Env to FunctionGraph, and object attribute "env" to "fgraph" (Ian G.)
+   Deprecation warning printed when you try to access the "env" attribute.
+ * Renamed the FunctionGraph.nodes attribute to FunctionNodes.apply_nodes (Ian G.)
+ * Warn when we don't handle correctly the parameter in Theano flags `nvcc.flags`
+   (Frederic B.)
+ * Do not reorder the user flags passed to the compiler. They get set after other flags.(Frederic B.)
+ * Make setuptools optional (Ilan Schnell+)/Remove Dependency.
+ * We warn when a user try to use an old GPU what we don't test with.
+   This could cause crash and will also be very slow. (Frederic B.)
+ * Make theano.grad able to differentiate between not implemented, undefined and disconnected grad.
+   Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} or
+   something of DisconectedType (Ian G.)
+ * Make theano.grad expect to always receive a float or undefined
+   gradient and inforce that op with integers output values always
+   return 0. (Ian G.)


-Bug fixes (incorrect results):
- * On CPU, if the convolution had received explicit shape information,
-   they where not checked at runtime.  This caused wrong result if the
-   input shape was not the one expected. (Frederic, reported by Sander
-   Dieleman)
- * Theoretical bug: in some case we could have GPUSum return bad value.
-   We were not able to reproduce this problem
-     * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
-       01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
- * div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
- * theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
-   The grad is now disabled and returns an error. (Frederic)
- * An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)"
-   and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your
-   code was affected by this bug. (Olivier, reported by Sander Dieleman)
- * When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]),
-   an optimization replacing it with a direct indexing (x[d]) used an incorrect formula,
-   leading to incorrect results. (Pascal, reported by Razvan)
- * The tile() function  is now stricter in what it accepts to allow for better
-   error-checking/avoiding nonsensical situations. The gradient has been
-   disabled for the time being as it only implemented (incorrectly) one special
-   case. The `reps` argument must be a constant (not a tensor variable), and
-   must have the same length as the number of dimensions in the `x` argument;
-   this is now checked. (David)
+New memory output contract(was told about in the release note of Theano 0.5):
+ * Now the output memory received can be preallocated by other stuff.
+   In the past it was always the previous output an Apply node allowcated.
+   So this mean that the shape and strides can be different the from previous call
+   and there can be link to this memory at other place.
+   This mean it could receive preallocated output that is not c_contiguous.
+   But we don't do that now. (Pascal L.)
+ * New Theano flags to test this DebugMode.check_preallocated_output (Pascal L.)
+ * Updated the a few ops to respect this contract (Pascal L.)


-Scan fixes:
- * computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
-   before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
-   now : do the right thing.
- * gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
-   before : it used to return wrong values
-   now : do the right thing.
-   Note: The reported case of this bug was happening in conjunction with the
-         save optimization of scan that give run time errors. So if you didn't
-         manually disable the same memory optimization (number in the list4),
-         you are fine if you didn't manually request multiple taps.
- * Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
-   before : compilation error when computing R-op
-   now : do the right thing.
- * save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
-   before : for certain corner cases used to result in a runtime shape error
-   now : do the right thing.
- * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
- * Scan.infer_shape now works correctly when working with a condition for the number of loops.
-   In the past, it returned n_steps as the length, which is not always true. (Razvan)
- * Scan.infer_shape crash fix. (Razvan)
+New Features
+ * GPU scan now work (don't crash) when there is a mixture of float32 and other dtypes.
+ * theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
+ * debugprint new param ids=["CHAR", "id", "int", ""]
+   This makes the identifier printed to be the python id, a unique char, a
+   unique int, or not have it printed. We changed the default to be "CHAR"
+   as this is more readable. (Frederic B.)
+ * debugprint new param stop_on_name=[False, True]. If True, we don't print
+   anything below an intermediate variable that has a name. Defaults to False.
+   (Frederic B.)
+ * debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
+ * If you use Enthought Python Distribution (EPD) now we use its blas
+   implementation by default. (Frederic B., Graham Taylor, Simon McGregor)
+ * MRG random now raises an error with a clear message when the passed shape
+   contains dimensions with bad value like 0. (Frederic B. reported by Ian G.)
+ * "CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
+ * "CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
+ * We add dimensions to CudaNdarray to automatically broadcast more frequently.
+   (Frederic B.)
+ * New theano flag cmodule.warn_no_version. Default False. If True,
+   will print a warning when compiling one or more Op with C code that
+   can't be cached because there is no c_code_cache_version() function
+   associated to at least one of those Ops.  (Frederic B.)
+ * CPU alloc now always generate C code (Pascal L.)
+ * New Theano flag cmodule.warn_no_version=False. When True, warn when an op
+   with C code is not versioned (which forces to recompile it everytimes).
+   (Frederic B.)
+ * C code reuses preallocated outputs (only done by Scan) (Pascal L.)
+ * Garbage collection of intermediate results during Theano function calls
+   for Ops with C code (Pascal L.)
+ * Theano flags compiledir_format now support the parameter "numpy_version" and "g++". (Frederic B.)
+ * Theano GPU variables, shared variable and constant now support <, <=,
+   > and >= as as those not on the GPU.
+ * AdvancedIncSubtensor now support the set_instead_of_inc parameter. (Eric L.)
+ * Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.)
+ * theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman}
+   have a new parameter keepdims (Eric L.)
+   This allow to broadcast it correctly again the input data to normalize it.
+ * The Updates object now check that the key are SharedVariable when we pass them
+   in the __init__ function. (Pascal L.)
+ * Set a Theano Variable name on transposed op when the input have one (Frederic B).
+ * The cvm linker now support garbage collection (enabled by default). (James B. Arnaud B., Pascal L.)
+ * The cvm linker is now the default linker.
+   This make the "loop" around the execution of apply node in C. So this lower the overhead.
+ * theano_variable[numpy.newaxis] is now supported (James B.)
+ * Enable ifelse on the GPU. (Frederic B.)
+ * Correctly support numpy.memmap everywhere (Pascal L.)
+   We add partial support for them before. Just use the normal tensor operation
+   on them and it should work.
+   But take care to don't exhaust your computer memory! (we always generate normal ndarray)
+ * Add an optimization that stabilize log(softmax(x)). (Ian G.)
+ * Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.)
+ * If `theano_fn.trust_input` is set to False, do not check if the input are good
+   when calling the theano function. (Frederic B.)
+ * Add theano.tensor.blas,gem{m,v} as shortcut.
+ * theano.grad(..., add_names=True). False for the old
+   behavior. Otherwise it tries to name the grad variables. (Ian G.)
+ * theano-nose (Pascal L.)
+   A wrapper around nosetests that add needed extension.
+   * --profile-time option, print time spend in each tests (Eric L.)
+   * --batch option, allow to run tests in batch to lower memory requirement.
+ * m = mean(log(1 - sigm(x)))
+   x - scalar * theano.grad(m, x)
+   There is a stabilization optimization for this.
+   Now it is applied more frequently. (Pascal L.)

-New features:
- * AdvancedIncSubtensor grad defined and tested (Justin Bayer)
- * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
- * tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
- * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
- * theano-cache list: list the content of the theano cache (Frederic)
- * theano-cache unlock: remove the Theano lock (Olivier)
- * tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
- * MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
-     * used by tensor.{max,min,max_and_argmax}
- * tensor.{all,any} (Razvan)
- * tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
- * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
- * IfElse now allows to have a list/tuple as the result of the if/else branches.
-     * They must have the same length and corresponding type (Razvan)
- * Argmax output dtype is now int64 instead of int32. (Olivier)
- * Added the element-wise operation arccos. (Ian)
- * Added sparse dot with dense grad output. (Yann Dauphin)
-     * Optimized to Usmm and UsmmCscDense in some case (Yann)
-     * Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
-       The new theano.sparse.dot() has a dense gradient for all inputs.
- * GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
- * TensorVariable.zeros_like() and SparseVariable.zeros_like()
- * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
- * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
- * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
-     * We also support the "theano_version" substitution.
- * IntDiv C code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
- * Internal filter_variable mechanism in Type. (Pascal, Ian)
-    * Ifelse works on sparse.
-    * It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
- * Added a_tensor.transpose(axes) axes is optional (James)
-    * theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
- * a_CudaNdarray_object[*] = int, now works (Frederic)
- * tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
- * sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
- * sparse_variable[N, N] now works (Li Yao, Frederic)
- * sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal)
-   M, N, O, and P can be Python int or scalar tensor variables, None, or
-   omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
- * tensor.tensordot can now be moved to GPU (Sander Dieleman,
-   Pascal, based on code from Tijmen Tieleman's gnumpy,
-   http://www.cs.toronto.edu/~tijmen/gnumpy.html)
- * Many infer_shape implemented on sparse matrices op. (David W.F.)
- * Added theano.sparse.verify_grad_sparse to easily allow testing grad of
-   sparse op. It supports testing the full and structured gradients.
- * The keys in our cache now store the hash of constants and not the constant values
-   themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
- * 'theano-cache list' lists key files bigger than 1M (Frederic B.)
- * 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
- * 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
- * The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
- * Add the header_dirs to the hard part of the compilation key. This is
-   currently used only by cuda, but if we use library that are only headers,
-   this can be useful. (Frederic B.)
- * The Theano flag "nvcc.flags" is now included in the hard part of the key.
-   This mean that now we recompile all modules for each value of "nvcc.flags".
-   A change in "nvcc.flags" used to be ignored for module that were already
-   compiled. (Frederic B.)
- * Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
-   at compile time if all their inputs are constant.
-   (Frederic B., Pascal L., reported by Sander Dieleman)
- * New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)

+New Op/function
+ * Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard)
+ * Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard)
+ * Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard)
+ * Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
+ * Added theano.tensor.diff that wrap numpy.diff (Nicolas B.)
+ * Added theano.tensor.bincount that wrap numpy.bincount (Nicolas B., Pascal L, Frederic B.)
+ * Added theano.tensor.squeeze (Nicolas B.)
+   This remove broadcasted dimensions from the variable.
+   Theano-nesque version of numpy.squeeze.
+ * Added theano.tensor.repeat that wrap numpy.repeat (Nicolas B. + PL)
+ * Added theano.tensor.bartlett that wrap  numpy.bartlett (Eric L.)
+ * Added theano.tensor.fill_diagonal that wrap numpy.fill_diagonal (Eric L., Frederic B.)
+ * Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.)
+ * Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op
+   that allow to load a .npy file in a theano graph (Matthew Rocklin)
+ * theano.sandbox.linalg.kron.py:Kron op. (Eric L.)
+   Kronecker product

-New optimizations:
- * AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
- * dot22, dot22scalar work with complex. (Frederic)
- * Generate Gemv/Gemm more often. (James)
- * Remove scan when all computations can be moved outside the loop. (Razvan)
- * scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
- * exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
- * Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
- * Made the optimization process faster. (James)
- * Allow fusion of elemwise when the scalar op needs support code. (James)
- * Better opt that lifts transpose around dot. (James)
+Speed up
+ * CPU convolution are now parallelized (Frederic B.)
+   By default use all cores/hyper-threads
+   To control it, use the `OMP_NUM_THREADS=N` environment variable where N is the number of
+   parallel thread to use. By default it is equal to the number of CPU cores/hyper
+   threads that you have.
+   There is a new Theano flags `openmp` to allow/disallow openmp op.
+   If you BLAS library is parallelized, this flag won't affect it, but the
+   env variable will.
+ * Remove a corner case where du duplicated dot22/gemm in the graph. (Frederic B., Ian G.)
+ * Enable fusion of elemwise that have the same clients multiple time. (Frederic B.)
+ * New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.)
+ * Faster theano.function compilation. (Pascal L., Ian G.)
+ * Remove GPU transfer around specify_shape op. (Frederic B.)
+ * Implemented/tested MANY op.infer_shape method (Eric Larsen)
+   This allow Theano to make better shape inferance.
+ * Implement Solve.infer_shape (Matthew Rocklin)
+ * Scan memory optimization now work more frequently. (Razvan P.)
+   There was warning printed by the subtensor optimization in those cases.
+ * faster rng_mrg python code. (mostly used for tests) (Frederic B.)

+Speed up GPU
+ * Convolution on the GPU now check the generation of the card to make
+   it faster in some cases (especially medium/big ouput image) (Frederic B.)
+   * We had hardcoded 512 as the maximum number of thread per block. Newer card
+     support up to 1024 threads per block.
+ * Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
+ * We now pass the GPU architecture to nvcc when compiling (Frederic B.)
+ * Now we use the GPU function async feature by default. (Frederic B.)
+   Set the environment variable `CUDA_LAUNCH_BLOCKING` to `1` to disable this
+   for profiling or debugging.
+ * Faster creation of CudaNdarray objects (Frederic B.)
+ * Now some Max reduction are implemented on the GPU. (Ian G.)

-Crashes fixed:
- * T.mean crash at graph building time. (Ian)
- * "Interactive debugger" crash fix. (Ian, Frederic)
- * Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
- * Optimization crash with gemm and complex. (Frederic)
- * GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
- * Compilation crash with amdlibm and the GPU. (Frederic)
- * IfElse crash. (Frederic)
- * Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
- * GPU compilation crash on MacOS X. (Olivier)
- * Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
- * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
- * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
- * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
- * Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
- * Fix runtime crash in gemm, dot22. FB
- * Fix on 32bits computer: make sure all shape are int64.(Olivier)
- * Fix to deque on python 2.4 (Olivier)
- * Fix crash when not using C code (or using DebugMode) (not used by
-   default) with numpy 1.6*. Numpy has a bug in the reduction code that
-   made it crash. (Pascal)
- * Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU)
-   when matrices had non-unit stride in both dimensions (CPU and GPU),
-   or when matrices had negative strides (GPU only). In those cases,
-   we are now making copies. (Pascal)
- * More cases supported in AdvancedIncSubtensor1. (Olivier D.)
- * Fix crash when a broadcasted constant was used as input of an
-   elemwise Op and needed to be upcasted to match the op's output.
-   (Reported by John Salvatier, fixed by Pascal L.)
- * Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)
+Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp)
+ * sparse.remove0 (Frederic B., Nicolas B.)
+ * sparse.sp_sum(a, axis=None) (Nicolas B.)
+   * bugfix: the not structured grad was returning a structured grad.
+ * sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
+ * sparse.{diag,square_diagonal} (Nicolas B.)

+Sparse
+ * Support for uint* dtype.
+ * Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't
+   have the same sparsity pattern. (Frederic B.)
+ * New Ops: sparse.{expm1,deg2rad,rad2deg,trunc} (Nicolas B.)
+ * New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
+ * New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
+ * New functions: structured_{add,exp,log,pow,minimum,maximum,sigmoid} (Yann D., Nicolas B.)
+     * Op optimized op: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
+ * New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
+ * New Op: sparse.Cast() (Yann D., Nicolas B.)
+   * Add sparse_variable.astype() and theano.sparse.cast() and
+     theano.sparse.{b,w,i,l,f,d,c,z}cast() as there tensor equivalent (Nicolas B.)
+ * Op class: SamplingDot (Yann D., Nicolas B.)
+   * Optimized version: SamplingDotCsr, StructuredDotCSC
+   * Optimization to inster the optimizer version: local_sampling_dot_csr, local_structured_add_s_v
+ * New Ops: sparse.{Multinomial,Poisson,Binomial}(Yann D., NB)
+ * Implement the CSMProperties grad method (Yann Dauphin)
+ * Move optimizations to theano/sparse/opt.py (Nicolas B.)

-Known bugs:
- * CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_).
-     * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
+New flags
+ * `profile=True` flag now print a printing of the sum of all printed profile.(Frederic B.)
+   * It work with the linker vm/cvm(default).
+   * Also print compile time, optimizer time and linker time.
+   * Also print a summary by op class.
+ * new flag "profile_optimizer" (Frederic B.)
+   when profile=True, will also print the time spent in each optimizer.
+   Useful to find optimization bottleneck.
+ * new flag "cmodule.remove_gxx_opt" (Frederic B.)
+   If True, will remove -O* parameter passed to g++.
+   This is useful to debug in gdb module compiled by Theano.
+   The parameter -g is passed by default to g++.
+ * new flag cmodule.compilation_warning
+   if True, will print compilation warning.
+ * new flag `allow_gc` (Frederic B.)
+   When False, do not garbage collect intermediate results when they are not needed.
+   This use more memory, but allocate memory less frequently so faster.
+ * new flag `vm.lazy` (Frederic B.)
+   Useful only for the vm linkers. When lazy is None,
+   auto detect if lazy evaluation is needed and use the apropriate
+   version. If lazy it True/False, force the version used between
+   Loop/LoopGC and Stack.
+ * new flag `cxx`. This is the c++ compiler to use. If empty do not compile c code. (Frederic B.)
+ * New flag `print_active_device` flag that default to True. (Matthew R.)

+Documentation
+ * Added in the tutorial documentation on how to extend Theano.
+   This explains how to make a Theano Op from a Python function.
+   http://deeplearning.net/software/theano/tutorial/extending_theano.html
+   (Frederic B.)
+ * New installation instructions for Windows using EPD (Pascal L.)
+ * New installation on Windows by using a Linux VM from ContinuumIO (Frederic B.)
+ * Revisions of Theano tutorial and addition of exercices to it. (Eric L.)
+ * New tutorial on Sparse variable. (Nicolas B., Sébastien Lemieux, Frederic Bastien
+   http://www.deeplearning.net/software/theano/tutorial/sparse.html
+ * Installation documentation for CentOS6 (Frederic B.)
+ * Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer)
+ * Doc type fix, Doc update, Better error messag: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G.
+ * Python Memory Management (Steven Pigeon, Olivier D.)

-Sandbox:
- * cvm interface more consistent with current linker. (James)
-   * Now all tests pass with the linker=cvm flags.
- * vm linker has a callback parameter. (James)
- * review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
- * review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
- * review/finish/doc: MatrixInverse, matrix_inverse. (Razvan)
- * review/finish/doc: matrix_dot. (Razvan)
- * review/finish/doc: det (determinent) op. (Philippe Hamel)
- * review/finish/doc: Cholesky determinent op. (David)
- * review/finish/doc: ensure_sorted_indices. (Li Yao)
- * review/finish/doc: spectral_radius_boud. (Xavier Glorot)
- * review/finish/doc: sparse sum. (Valentin Bisson)
- * review/finish/doc: Remove0 (Valentin)
- * review/finish/doc: SquareDiagonal (Eric)
+Proposal
+ * Math framework for complex gradien (Pascal L.)


-Sandbox New features (not enabled by default):
- * CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
- * New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)
+Internal changes
+ * Define new exceptions MissingInputError and UnusedInputError, and use them
+   in theano.function, instead of TypeError and ValueError. (Pascal L.)
+ * Better handling of bitwidth and max values of integers and pointers
+   across platforms (Pascal L.)
+ * Made a few Ops with C code versioned to reduce compilation time.
+   (Frederic B, Pascal L.)
+ * Better deletion of files in the compiledir (Frederic B.)
+ * Safer import on sort op (Nicolas Pinto)
+ * hash_from_dict for elemwise op (Fredric B.)
+ * Renamed BadCLinkerOutput into BadThunkOutput. (PL)
+ * tensor.utils.shape_of_variables (Matthew R.)
+ * Add the numpy abi version and g++/nvcc version in the key of compiled code. (Frederic B.)
+ * env.replace_all_validate_remove (Frederic B.)
+   This allow global optimizer to ensure it removed some nodes from the graph.
+   This is a generic way to catch error that would otherwise duplicate
+   computation.
+   * It was used for GEMM and Scan optimization (Frederic B., Razvan P.)
+ * Fix how exception are raised in GPU code (James B.)
+ * Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen and others.
+ * TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or
+   numpy.zeros with the right dtype. (Pascal L., Olivier D.)
+   This allow to have the same code work with both type.
+ * Renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.)
+ * New exception MissingGXX when we try to compile but there is no cxx compiler. (Frederic B.)
+ * New fct theano.gof.utils.give_variables_names(...) that give unique name to variable. (Matthew R.)
+ * Use most of the time the new NumPy C-API for later NumPy release. (Frederic B.)
+ * New theano.gof.sched.sort_apply_nodes() that will allow other execution ordering. (Matthew R.)
+ * New attribute sort_schedule_fn, a way to specify a scheduler to use. (Matthew R.)

+Crash Fix
+ * Fix import conflict name (usaar33, Frederic B.)
+    * This make Theano work with PiCloud.
+ * Do not try to use the BLAS library when blas.ldflags is manually set to an
+   empty string (Frederic B., Pascal L.)
+ * When importing theano on a computer without GPU with the Theano
+   flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by  Luo Heng)
+ * Optimization printed a useless error when scipy was not available. (Frederic B.)
+ * GPU conv crash/slowdown on newer hardware (James B.)
+ * Better error handling in GPU conv (Frederic B.)
+ * GPU optimization that moves element-wise Ops to the GPU. Crash happened in
+   a particular execution order of this optimization and the
+   element-wise fusion optimization when upcasting some inputs to
+   float32 (to compute them on the GPU).
+   (Frederic B., reported by Sander Dieleman)
+ * GpuReshape in some particular case when the input is not contiguous
+   (Frederic B., reported by Sander Dieleman)
+ * GpuSoftmaxWithBias with shape (0, N) with N > 1.
+   (Frederic B., reported by Razvan P.)
+ * Fix crash under 64-bit Windows, when taking subtensors of the form a[n:]
+   (Pascal L., reported by Simon McGregor)
+ * Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable
+   dimensions, which could typically result in optimization crashes (Olivier D.)
+ * Fixed crash when concatenating some arrays with specific broadcasting
+   patterns (Olivier D.)
+ * Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
+ * In advanced indexing, if some inputs are constant, no need to call constant(...)
+   on their value any more. (Pascal L., reported by John Salvatier)
+ * Fix crash on GPU when the GpuSubtensor didn't put the right stride
+   when the results tensor had a dimensions with size of 1. (Pascal L,
+   reported Graham T.)
+ * Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
+ * If you grad again a random state, don't crash (Razvan P.)
+ * GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535.
+   (Frederic B. reported by Gabe Schwartz)
+ * Potential crash due to parallel compilation when importing theano.sandbox.cuda
+   (Olivier D.)
+ * Crash fix on python 2.4 with slicing. (Pascal L.)
+ * grad of argmin and argmax (Razvan P.)
+ * Don't compute the Rop for shared variable with updates(mostly random).
+   We don't use them and they caused crash. (Razvan P.)
+ * MaxArgmax.grad() when one of the gradient it receive is None. (Razvan P, reported by Mark Fenner)
+ * Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)

-Documentation:
- * Many updates. (Many people)
- * Updates to install doc on MacOS. (Olivier)
- * Updates to install doc on Windows. (David, Olivier)
- * Doc on the Rop function (Ian)
- * Added how to use scan to loop with a condition as the number of iteration. (Razvan)
- * Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
- * Refactored GPU installation of Theano. (Olivier)
+Tests
+ * Use less memory (Olivier D.)(frix crash on 32-bits computers)
+ * Fix test with Theano flag "blas.ldflags=". (Frederic B., Pascal L.)
+ * Fix crash with advanced subtensor and numpy constant.
+ * Fix random tests crash due to random value. (Pascal L.)
+ * Always introduce Alloc node when calling alloc and let the optimizer remove them if needed.
+   This allow DebugMode to catch some shape error. (Pascal L.)
+ * DebugMode now check the view_map for all type of Theano variable.
+   It was doing only variable of tensor type. (Frederic B.)

+Others
+ * Remove python warning for some python version. (Gabe Schwartz)
+ * Remove useless fill op in fast_compile mode to make the graph more readable. (Fredric B.)
+ * Remove GpuOuter as it is a subset of the new GpuGer (Frederic B.)
+ * Now we use http://travis-ci.org/ to run all CPU tests (without SciPy)
+   with the default mode on all Pull Requests.
+   This should make the trunk more stable. (Fredric B.)
+ * Our nightly buildbot now check on python 2.4(Frederic B.)
+   This should make the trunk work on it more frequently.

-Others:
- * Better error messages in many places. (Many people)
- * PEP8 fixes. (Many people)
- * Add a warning about numpy bug when using advanced indexing on a
-   tensor with more than 2**32 elements (the resulting array is not
-   correctly filled and ends with zeros). (Pascal, reported by David WF)
- * Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan)
- * New min_informative_str() function to print graph. (Ian)
- * Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier)
- * Better support for utf string. (David)
- * Fix pydotprint with a function compiled with a ProfileMode (Frederic)
-     * Was broken with change to the profiler.
- * Warning when people have old cache entries. (Olivier)
- * More tests for join on the GPU and CPU. (Frederic)
- * Do not request to load the GPU module by default in scan module. (Razvan)
- * Fixed some import problems. (Frederic and others)
- * Filtering update. (James)
- * On Windows, the default compiledir changed to be local to the
-   computer/user and not transferred with roaming profile. (Sebastian
-   Urban)
- * New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
-   it prints a warning when an error occurs when inferring the shape of some apply node.
-   The other accepted value is "raise" to raise an error when this happens. (Frederic)
- * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
- * better pycuda tests (Frederic)
- * check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
- * Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
- * More internal verification on what each op.infer_shape return. (Frederic, James)
- * Argmax dtype to int64 (Olivier)
- * Improved docstring and basic tests for the Tile Op (David).
+Other thanks:
+ * blaxill reported an error introduced into the trunk.

-Reviewers (alphabetical order):
- * David, Frederic, Ian, James, Olivier, Razvan
+New stuff that will probably be reworked/removed before the release
+ * new flag "time_seq_optimizer" (Frederic B.)
+ * new flag "time_eq_optimizer" (Frederic B.)
+ * Better PyCUDA sharing of the GPU context.(fix crash at exit) (Frederic B.)
+   TODO: there is still a crash at exit!