Merge pull request #979 from nouiz/0.6rc1

0.6rc1

Merge pull request #979 from nouiz/0.6rc1
547e54c8 · lamblin · b131e669 · 4f5f7d6f · 547e54c8 · 547e54c8
--- a/HISTORY.txt
+++ b/HISTORY.txt
@@ -4,6 +4,297 @@
 =================
 Old Release Notes
 =================
+Theano 0.5 (23 February 2012)
+=============================
+Highlight:
+ * Moved to github: http://github.com/Theano/Theano/
+ * Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets
+ * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
+ * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+ * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
+   and dot(vector, vector). (James, Frédéric, Pascal)
+ * C implementation of Alloc. (James, Pascal)
+ * theano.grad() now also work with sparse variable. (Arnaud)
+ * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
+ * See the Interface changes.
+Interface Behavior Changes:
+ * The current default value of the parameter axis of
+   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
+   numpy: None. i.e. operate on all dimensions of the tensor.
+   (Frédéric Bastien, Olivier Delalleau) (was deprecated and generated
+   a warning since Theano 0.3 released Nov. 23rd, 2010)
+ * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
+   You can specify the output dtype with a new dtype parameter to sum.
+   The output dtype is the one using for the summation.
+   There is no warning in previous Theano version about this.
+   The consequence is that the sum is done in a dtype with more precision than before.
+   So the sum could be slower, but will be more resistent to overflow.
+   This new behavior is the same as numpy. (Olivier, Pascal)
+ * When using a GPU, detect faulty nvidia drivers. This was detected
+   when running Theano tests. Now this is always tested. Faulty
+   drivers results in wrong results for reduce operations. (Frederic B.)
+Interface Features Removed (most were deprecated):
+ * The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
+   were accepted only by theano.function().
+   Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
+ * tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt
+   (list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
+ * A few tag.shape and Join.vec_length left have been removed. (Frederic)
+ * The .value attribute of shared variables is removed, use shared.set_value()
+   or shared.get_value() instead. (Frederic)
+ * Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
+   If you use it, Theano will now raise an error. (Olivier D.)
+ * scan interface changes: (Razvan Pascanu)
+    * The use of `return_steps` for specifying how many entries of the output
+      to return has been removed. Instead, apply a subtensor to the output
+      returned by scan to select a certain slice.
+    * The inner function (that scan receives) should return its outputs and
+      updates following this order:
+        [outputs], [updates], [condition].
+      One can skip any of the three if not used, but the order has to stay unchanged.
+Interface bug fix:
+ * Rop in some case should have returned a list of one Theano variable,
+   but returned the variable itself. (Razvan)
+New deprecation (will be removed in Theano 0.6, warning generated if you use them):
+ * tensor.shared() renamed to tensor._shared(). You probably want to
+   call theano.shared() instead! (Olivier D.)
+Bug fixes (incorrect results):
+ * On CPU, if the convolution had received explicit shape information,
+   they where not checked at runtime.  This caused wrong result if the
+   input shape was not the one expected. (Frederic, reported by Sander
+   Dieleman)
+ * Theoretical bug: in some case we could have GPUSum return bad value.
+   We were not able to reproduce this problem
+     * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
+       01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
+ * div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
+ * theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
+   The grad is now disabled and returns an error. (Frederic)
+ * An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)"
+   and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your
+   code was affected by this bug. (Olivier, reported by Sander Dieleman)
+ * When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]),
+   an optimization replacing it with a direct indexing (x[d]) used an incorrect formula,
+   leading to incorrect results. (Pascal, reported by Razvan)
+ * The tile() function  is now stricter in what it accepts to allow for better
+   error-checking/avoiding nonsensical situations. The gradient has been
+   disabled for the time being as it only implemented (incorrectly) one special
+   case. The `reps` argument must be a constant (not a tensor variable), and
+   must have the same length as the number of dimensions in the `x` argument;
+   this is now checked. (David)
+Scan fixes:
+ * computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
+   before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
+   now : do the right thing.
+ * gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
+   before : it used to return wrong values
+   now : do the right thing.
+   Note: The reported case of this bug was happening in conjunction with the
+         save optimization of scan that give run time errors. So if you didn't
+         manually disable the same memory optimization (number in the list4),
+         you are fine if you didn't manually request multiple taps.
+ * Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
+   before : compilation error when computing R-op
+   now : do the right thing.
+ * save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
+   before : for certain corner cases used to result in a runtime shape error
+   now : do the right thing.
+ * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
+ * Scan.infer_shape now works correctly when working with a condition for the number of loops.
+   In the past, it returned n_steps as the length, which is not always true. (Razvan)
+ * Scan.infer_shape crash fix. (Razvan)
+New features:
+ * AdvancedIncSubtensor grad defined and tested (Justin Bayer)
+ * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
+ * tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
+ * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
+ * theano-cache list: list the content of the theano cache (Frederic)
+ * theano-cache unlock: remove the Theano lock (Olivier)
+ * tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
+ * MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
+     * used by tensor.{max,min,max_and_argmax}
+ * tensor.{all,any} (Razvan)
+ * tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
+ * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+ * IfElse now allows to have a list/tuple as the result of the if/else branches.
+     * They must have the same length and corresponding type (Razvan)
+ * Argmax output dtype is now int64 instead of int32. (Olivier)
+ * Added the element-wise operation arccos. (Ian)
+ * Added sparse dot with dense grad output. (Yann Dauphin)
+     * Optimized to Usmm and UsmmCscDense in some case (Yann)
+     * Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
+       The new theano.sparse.dot() has a dense gradient for all inputs.
+ * GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
+ * TensorVariable.zeros_like() and SparseVariable.zeros_like()
+ * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
+ * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
+ * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
+     * We also support the "theano_version" substitution.
+ * IntDiv C code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
+ * Internal filter_variable mechanism in Type. (Pascal, Ian)
+    * Ifelse works on sparse.
+    * It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
+ * Added a_tensor.transpose(axes) axes is optional (James)
+    * theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
+ * a_CudaNdarray_object[*] = int, now works (Frederic)
+ * tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
+ * sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
+ * sparse_variable[N, N] now works (Li Yao, Frederic)
+ * sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal)
+   M, N, O, and P can be Python int or scalar tensor variables, None, or
+   omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
+ * tensor.tensordot can now be moved to GPU (Sander Dieleman,
+   Pascal, based on code from Tijmen Tieleman's gnumpy,
+   http://www.cs.toronto.edu/~tijmen/gnumpy.html)
+ * Many infer_shape implemented on sparse matrices op. (David W.F.)
+ * Added theano.sparse.verify_grad_sparse to easily allow testing grad of
+   sparse op. It supports testing the full and structured gradients.
+ * The keys in our cache now store the hash of constants and not the constant values
+   themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
+ * 'theano-cache list' lists key files bigger than 1M (Frederic B.)
+ * 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
+ * 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
+ * The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
+ * Add the header_dirs to the hard part of the compilation key. This is
+   currently used only by cuda, but if we use library that are only headers,
+   this can be useful. (Frederic B.)
+ * The Theano flag "nvcc.flags" is now included in the hard part of the key.
+   This mean that now we recompile all modules for each value of "nvcc.flags".
+   A change in "nvcc.flags" used to be ignored for module that were already
+   compiled. (Frederic B.)
+ * Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
+   at compile time if all their inputs are constant.
+   (Frederic B., Pascal L., reported by Sander Dieleman)
+ * New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)
+New optimizations:
+ * AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
+ * dot22, dot22scalar work with complex. (Frederic)
+ * Generate Gemv/Gemm more often. (James)
+ * Remove scan when all computations can be moved outside the loop. (Razvan)
+ * scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
+ * exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
+ * Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
+ * Made the optimization process faster. (James)
+ * Allow fusion of elemwise when the scalar op needs support code. (James)
+ * Better opt that lifts transpose around dot. (James)
+Crashes fixed:
+ * T.mean crash at graph building time. (Ian)
+ * "Interactive debugger" crash fix. (Ian, Frederic)
+ * Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
+ * Optimization crash with gemm and complex. (Frederic)
+ * GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
+ * Compilation crash with amdlibm and the GPU. (Frederic)
+ * IfElse crash. (Frederic)
+ * Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
+ * GPU compilation crash on MacOS X. (Olivier)
+ * Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
+ * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
+ * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
+ * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
+ * Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
+ * Fix runtime crash in gemm, dot22. FB
+ * Fix on 32bits computer: make sure all shape are int64.(Olivier)
+ * Fix to deque on python 2.4 (Olivier)
+ * Fix crash when not using C code (or using DebugMode) (not used by
+   default) with numpy 1.6*. Numpy has a bug in the reduction code that
+   made it crash. (Pascal)
+ * Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU)
+   when matrices had non-unit stride in both dimensions (CPU and GPU),
+   or when matrices had negative strides (GPU only). In those cases,
+   we are now making copies. (Pascal)
+ * More cases supported in AdvancedIncSubtensor1. (Olivier D.)
+ * Fix crash when a broadcasted constant was used as input of an
+   elemwise Op and needed to be upcasted to match the op's output.
+   (Reported by John Salvatier, fixed by Pascal L.)
+ * Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)
+Known bugs:
+ * CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_).
+     * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
+Sandbox:
+ * cvm interface more consistent with current linker. (James)
+   * Now all tests pass with the linker=cvm flags.
+ * vm linker has a callback parameter. (James)
+ * review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
+ * review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
+ * review/finish/doc: MatrixInverse, matrix_inverse. (Razvan)
+ * review/finish/doc: matrix_dot. (Razvan)
+ * review/finish/doc: det (determinent) op. (Philippe Hamel)
+ * review/finish/doc: Cholesky determinent op. (David)
+ * review/finish/doc: ensure_sorted_indices. (Li Yao)
+ * review/finish/doc: spectral_radius_boud. (Xavier Glorot)
+ * review/finish/doc: sparse sum. (Valentin Bisson)
+ * review/finish/doc: Remove0 (Valentin)
+ * review/finish/doc: SquareDiagonal (Eric)
+Sandbox New features (not enabled by default):
+ * CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
+ * New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)
+Documentation:
+ * Many updates. (Many people)
+ * Updates to install doc on MacOS. (Olivier)
+ * Updates to install doc on Windows. (David, Olivier)
+ * Doc on the Rop function (Ian)
+ * Added how to use scan to loop with a condition as the number of iteration. (Razvan)
+ * Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
+ * Refactored GPU installation of Theano. (Olivier)
+Others:
+ * Better error messages in many places. (Many people)
+ * PEP8 fixes. (Many people)
+ * Add a warning about numpy bug when using advanced indexing on a
+   tensor with more than 2**32 elements (the resulting array is not
+   correctly filled and ends with zeros). (Pascal, reported by David WF)
+ * Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan)
+ * New min_informative_str() function to print graph. (Ian)
+ * Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier)
+ * Better support for utf string. (David)
+ * Fix pydotprint with a function compiled with a ProfileMode (Frederic)
+     * Was broken with change to the profiler.
+ * Warning when people have old cache entries. (Olivier)
+ * More tests for join on the GPU and CPU. (Frederic)
+ * Do not request to load the GPU module by default in scan module. (Razvan)
+ * Fixed some import problems. (Frederic and others)
+ * Filtering update. (James)
+ * On Windows, the default compiledir changed to be local to the
+   computer/user and not transferred with roaming profile. (Sebastian
+   Urban)
+ * New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
+   it prints a warning when an error occurs when inferring the shape of some apply node.
+   The other accepted value is "raise" to raise an error when this happens. (Frederic)
+ * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
+ * better pycuda tests (Frederic)
+ * check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
+ * Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
+ * More internal verification on what each op.infer_shape return. (Frederic, James)
+ * Argmax dtype to int64 (Olivier)
+ * Improved docstring and basic tests for the Tile Op (David).
+Reviewers (alphabetical order):
+ * David, Frederic, Ian, James, Olivier, Razvan
 Theano 0.4.1 (12 August 2011)

--- a/NEWS.txt
+++ b/NEWS.txt
@@ -8,294 +8,405 @@ https://github.com/Theano/Theano/wiki/Devnews
 Release Notes
 =============
-Theano 0.5 (23 February 2012)
+Theano 0.6rc1 (1 October 2012)
-=============================
+==============================
 Highlight:
- * Moved to github: http://github.com/Theano/Theano/
+ * Bug fix, crash fix, CPU and GPU speed up.
- * Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets
+ * theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
- * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
+ * New default linker `cvm`. This is the execution engine that tell what op to run in witch order.
- * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+   It is now implemented in C and enable lazy evaluation of ifelse op.
- * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
+ * Faster theano.function compilation. (Pascal L., Ian G.)
-   and dot(vector, vector). (James, Frédéric, Pascal)
+ * Big sparse submodule update and documentation of it. (Nicolas Bouchard)
- * C implementation of Alloc. (James, Pascal)
+ * Use GPU asynchronous functionality (Frederic B.)
- * theano.grad() now also work with sparse variable. (Arnaud)
+ * Better Windows support.
- * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
- * See the Interface changes.
+Known bug:
+ * A few crash case that will be fixed by the final release.
-Interface Behavior Changes:
+Bug fixes:
- * The current default value of the parameter axis of
+ * Outputs of Scan nodes could contain corrupted values: some parts of the
-   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
+   output would be repeated a second time, instead of the correct values.
-   numpy: None. i.e. operate on all dimensions of the tensor.
+   It happened randomly, and quite infrequently, but the bug has been present
-   (Frédéric Bastien, Olivier Delalleau) (was deprecated and generated
+   (both in Python and Cython) since April 2011. (Pascal L.)
-   a warning since Theano 0.3 released Nov. 23rd, 2010)
+ * In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale.
- * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
+   It did not return the right number of elements. (Frederic B.)
-   You can specify the output dtype with a new dtype parameter to sum.
+ * set_subtensor(x[int vector], new_value) when moved to the GPU
-   The output dtype is the one using for the summation.
+   was transformed into inc_subtensor on the GPU. Now we have a correct
-   There is no warning in previous Theano version about this.
+   (but slow) GPU implementation.
-   The consequence is that the sum is done in a dtype with more precision than before.
+   Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly
-   So the sum could be slower, but will be more resistent to overflow.
+   in all cases as well as all inc_subtensor.
-   This new behavior is the same as numpy. (Olivier, Pascal)
+   Note 2: If your code was affected by the incorrect behavior, we now print
- * When using a GPU, detect faulty nvidia drivers. This was detected
+   a warning by default (Frederic B.)
-   when running Theano tests. Now this is always tested. Faulty
+ * Fixed an issue whereby config values were used as default arguments,
-   drivers results in wrong results for reduce operations. (Frederic B.)
+   with those defaults then stuck at old values if the config variables were
+   changed during program execution. (David W-F)
+ * Fixed many subtle bugs involving mutable default arguments which may have
+   led to unexpected behaviour, such as objects sharing instance variables
+   they were not supposed to share. (David W-F)
+ * Correctly record the GPU device number used when we let the driver select it.
+   (Frederic B.)
+ * CAReduce with NaN in inputs don't return the good output. (Pascal L.)
+     * This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
+ * The grad of TensorDot, was returning the wrong shape for some combination of axes.
+   We now raise NotImplementedError in those cases. (Frederic B.)
+ * conv2d with subsample >2 returned wrong values. (Pascal L.)
+     * Fixed when mode==valid, disabled when mode==full
+ * theano.sparse.CSMGrad op(generated by the grad of CSM) didn't
+   handle unsorted input correctly and grapdient that are more sparse
+   then the input. In that case, bad result was returned. But this can
+   happen only when a sparse input of a Theano function was not
+   sorted. This happen for example with sparse advanted indexing from
+   scipy. The conclusion is most of time Nan in the graph.
+   (Yann Dauphin)
+ * theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handled
+   correctly not contiguous inputs/outputs. (Pascal L.)
+ * Fix a corner case CVM updates case. (Pascal L.)
+   This happen is the update to a shared variable is itself after optimization.
+   The CVM was not used by default.
+ * Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.)
+   This probably didn't cause problem as there is only the UsmmCscDense op
+   (used call to Usmm wieh CSC matrix) that could interfere with them.
+Deprecation:
+ * Deprecated the Module class (Ian G.)
+   This was a predecessor of SharedVariable with a less pythonic phylosophy.
-Interface Features Removed (most were deprecated):
+Interface changes:
- * The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
+ * Now the base version requirement are numpy >= 1.5.0 and the optional scipy >= 0.8.
-   were accepted only by theano.function().
+ * In Theano 0.5, we removed the deprecated sharedvar.value property.
-   Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
+   Now we raise an error if you access it. (Frederic B.)
- * tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt
+ * theano.function does not accept duplicate inputs, so function([x, x], ...)
-   (list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
+   does not work anymore. (Pascal L.)
- * A few tag.shape and Join.vec_length left have been removed. (Frederic)
+ * theano.function now raises an error if some of the provided inputs are
- * The .value attribute of shared variables is removed, use shared.set_value()
+   not part of the computational graph needed to compute the output, for
-   or shared.get_value() instead. (Frederic)
+   instance, function([x, y], [y]). You can use the kwarg
- * Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
+   ``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
-   If you use it, Theano will now raise an error. (Olivier D.)
+   (Pascal L.)
- * scan interface changes: (Razvan Pascanu)
+ * New Theano flag "on_unused_input" that define the default value of the
-    * The use of `return_steps` for specifying how many entries of the output
+   previous point. (Frederic B.)
-      to return has been removed. Instead, apply a subtensor to the output
+ * tensor.alloc() now raises an error during graph build time
-      returned by scan to select a certain slice.
+   when we try to create less dimensions than the number of dimensions
-    * The inner function (that scan receives) should return its outputs and
+   the provided value have. In the past, the error was at run time.
-      updates following this order:
+   (Frederic B.)
-        [outputs], [updates], [condition].
+ * Remove theano.Value and related stuff (Ian G.)
-      One can skip any of the three if not used, but the order has to stay unchanged.
+   This was a test of what ended up as SharedVariable.
+ * Renamed Env to FunctionGraph, and object attribute "env" to "fgraph" (Ian G.)
+   Deprecation warning printed when you try to access the "env" attribute.
+ * Renamed the FunctionGraph.nodes attribute to FunctionNodes.apply_nodes (Ian G.)
+ * Warn when we don't handle correctly the parameter in Theano flags `nvcc.flags`
+   (Frederic B.)
+ * Do not reorder the user flags passed to the compiler. They get set after other flags.(Frederic B.)
+ * Make setuptools optional (Ilan Schnell+)/Remove Dependency.
+ * We warn when a user try to use an old GPU what we don't test with.
+   This could cause crash and will also be very slow. (Frederic B.)
+ * Make theano.grad able to differentiate between not implemented, undefined and disconnected grad.
+   Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} or
+   something of DisconectedType (Ian G.)
+ * Make theano.grad expect to always receive a float or undefined
+   gradient and inforce that op with integers output values always
+   return 0. (Ian G.)
-Interface bug fix:
- * Rop in some case should have returned a list of one Theano variable,
-   but returned the variable itself. (Razvan)
-New deprecation (will be removed in Theano 0.6, warning generated if you use them):
+New memory output contract(was told about in the release note of Theano 0.5):
- * tensor.shared() renamed to tensor._shared(). You probably want to
+ * Now the output memory received can be preallocated by other stuff.
-   call theano.shared() instead! (Olivier D.)
+   In the past it was always the previous output an Apply node allowcated.
+   So this mean that the shape and strides can be different the from previous call
+   and there can be link to this memory at other place.
+   This mean it could receive preallocated output that is not c_contiguous.
+   But we don't do that now. (Pascal L.)
+ * New Theano flags to test this DebugMode.check_preallocated_output (Pascal L.)
+ * Updated the a few ops to respect this contract (Pascal L.)
-Bug fixes (incorrect results):
+New Features:
- * On CPU, if the convolution had received explicit shape information,
+ * GPU scan now work (don't crash) when there is a mixture of float32 and other dtypes.
-   they where not checked at runtime.  This caused wrong result if the
+ * theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
-   input shape was not the one expected. (Frederic, reported by Sander
+ * debugprint new param ids=["CHAR", "id", "int", ""]
-   Dieleman)
+   This makes the identifier printed to be the python id, a unique char, a
- * Theoretical bug: in some case we could have GPUSum return bad value.
+   unique int, or not have it printed. We changed the default to be "CHAR"
-   We were not able to reproduce this problem
+   as this is more readable. (Frederic B.)
-     * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
+ * debugprint new param stop_on_name=[False, True]. If True, we don't print
-       01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
+   anything below an intermediate variable that has a name. Defaults to False.
- * div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
+   (Frederic B.)
- * theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
+ * debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
-   The grad is now disabled and returns an error. (Frederic)
+ * If you use Enthought Python Distribution (EPD) now we use its blas
- * An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)"
+   implementation by default. (Frederic B., Graham Taylor, Simon McGregor)
-   and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your
+ * MRG random now raises an error with a clear message when the passed shape
-   code was affected by this bug. (Olivier, reported by Sander Dieleman)
+   contains dimensions with bad value like 0. (Frederic B. reported by Ian G.)
- * When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]),
+ * "CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
-   an optimization replacing it with a direct indexing (x[d]) used an incorrect formula,
+ * "CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
-   leading to incorrect results. (Pascal, reported by Razvan)
+ * We add dimensions to CudaNdarray to automatically broadcast more frequently.
- * The tile() function  is now stricter in what it accepts to allow for better
+   (Frederic B.)
-   error-checking/avoiding nonsensical situations. The gradient has been
+ * New theano flag cmodule.warn_no_version. Default False. If True,
-   disabled for the time being as it only implemented (incorrectly) one special
+   will print a warning when compiling one or more Op with C code that
-   case. The `reps` argument must be a constant (not a tensor variable), and
+   can't be cached because there is no c_code_cache_version() function
-   must have the same length as the number of dimensions in the `x` argument;
+   associated to at least one of those Ops.  (Frederic B.)
-   this is now checked. (David)
+ * CPU alloc now always generate C code (Pascal L.)
+ * New Theano flag cmodule.warn_no_version=False. When True, warn when an op
+   with C code is not versioned (which forces to recompile it everytimes).
+   (Frederic B.)
+ * C code reuses preallocated outputs (only done by Scan) (Pascal L.)
+ * Garbage collection of intermediate results during Theano function calls
+   for Ops with C code (Pascal L.)
+ * Theano flags compiledir_format now support the parameter "numpy_version" and "g++". (Frederic B.)
+ * Theano GPU variables, shared variable and constant now support <, <=,
+   > and >= as as those not on the GPU.
+ * AdvancedIncSubtensor now support the set_instead_of_inc parameter. (Eric L.)
+ * Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.)
+ * theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman}
+   have a new parameter keepdims (Eric L.)
+   This allow to broadcast it correctly again the input data to normalize it.
+ * The Updates object now check that the key are SharedVariable when we pass them
+   in the __init__ function. (Pascal L.)
+ * Set a Theano Variable name on transposed op when the input have one (Frederic B).
+ * The cvm linker now support garbage collection (enabled by default). (James B. Arnaud B., Pascal L.)
+ * The cvm linker is now the default linker.
+   This make the "loop" around the execution of apply node in C. So this lower the overhead.
+ * theano_variable[numpy.newaxis] is now supported (James B.)
+ * Enable ifelse on the GPU. (Frederic B.)
+ * Correctly support numpy.memmap everywhere (Pascal L.)
+   We add partial support for them before. Just use the normal tensor operation
+   on them and it should work.
+   But take care to don't exhaust your computer memory! (we always generate normal ndarray)
+ * Add an optimization that stabilize log(softmax(x)). (Ian G.)
+ * Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.)
+ * If `theano_fn.trust_input` is set to False, do not check if the input are good
+   when calling the theano function. (Frederic B.)
+ * Add theano.tensor.blas,gem{m,v} as shortcut.
+ * theano.grad(..., add_names=True). False for the old
+   behavior. Otherwise it tries to name the grad variables. (Ian G.)
+ * theano-nose (Pascal L.)
+   A wrapper around nosetests that add needed extension.
+   * --profile-time option, print time spend in each tests (Eric L.)
+   * --batch option, allow to run tests in batch to lower memory requirement.
+ * m = mean(log(1 - sigm(x)))
+   x - scalar * theano.grad(m, x)
+   There is a stabilization optimization for this.
+   Now it is applied more frequently. (Pascal L.)
-Scan fixes:
+New Op/function:
- * computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
+ * Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard)
-   before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
+ * Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard)
-   now : do the right thing.
+ * Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard)
- * gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
+ * Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
-   before : it used to return wrong values
+ * Added theano.tensor.diff that wrap numpy.diff (Nicolas B.)
-   now : do the right thing.
+ * Added theano.tensor.bincount that wrap numpy.bincount (Nicolas B., Pascal L, Frederic B.)
-   Note: The reported case of this bug was happening in conjunction with the
+ * Added theano.tensor.squeeze (Nicolas B.)
-         save optimization of scan that give run time errors. So if you didn't
+   This remove broadcasted dimensions from the variable.
-         manually disable the same memory optimization (number in the list4),
+   Theano-nesque version of numpy.squeeze.
-         you are fine if you didn't manually request multiple taps.
+ * Added theano.tensor.repeat that wrap numpy.repeat (Nicolas B. + PL)
- * Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
+ * Added theano.tensor.bartlett that wrap  numpy.bartlett (Eric L.)
-   before : compilation error when computing R-op
+ * Added theano.tensor.fill_diagonal that wrap numpy.fill_diagonal (Eric L., Frederic B.)
-   now : do the right thing.
+ * Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.)
- * save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
+ * Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op
-   before : for certain corner cases used to result in a runtime shape error
+   that allow to load a .npy file in a theano graph (Matthew Rocklin)
-   now : do the right thing.
+ * theano.sandbox.linalg.kron.py:Kron op. (Eric L.)
- * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
+   Kronecker product
- * Scan.infer_shape now works correctly when working with a condition for the number of loops.
-   In the past, it returned n_steps as the length, which is not always true. (Razvan)
- * Scan.infer_shape crash fix. (Razvan)
-New features:
+Speed up:
- * AdvancedIncSubtensor grad defined and tested (Justin Bayer)
+ * CPU convolution are now parallelized (Frederic B.)
- * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
+   By default use all cores/hyper-threads
- * tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
+   To control it, use the `OMP_NUM_THREADS=N` environment variable where N is the number of
- * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
+   parallel thread to use. By default it is equal to the number of CPU cores/hyper
- * theano-cache list: list the content of the theano cache (Frederic)
+   threads that you have.
- * theano-cache unlock: remove the Theano lock (Olivier)
+   There is a new Theano flags `openmp` to allow/disallow openmp op.
- * tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
+   If you BLAS library is parallelized, this flag won't affect it, but the
- * MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
+   env variable will.
-     * used by tensor.{max,min,max_and_argmax}
+ * Remove a corner case where du duplicated dot22/gemm in the graph. (Frederic B., Ian G.)
- * tensor.{all,any} (Razvan)
+ * Enable fusion of elemwise that have the same clients multiple time. (Frederic B.)
- * tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
+ * New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.)
- * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
+ * Faster theano.function compilation. (Pascal L., Ian G.)
- * IfElse now allows to have a list/tuple as the result of the if/else branches.
+ * Remove GPU transfer around specify_shape op. (Frederic B.)
-     * They must have the same length and corresponding type (Razvan)
+ * Implemented/tested MANY op.infer_shape method (Eric Larsen)
- * Argmax output dtype is now int64 instead of int32. (Olivier)
+   This allow Theano to make better shape inferance.
- * Added the element-wise operation arccos. (Ian)
+ * Implement Solve.infer_shape (Matthew Rocklin)
- * Added sparse dot with dense grad output. (Yann Dauphin)
+ * Scan memory optimization now work more frequently. (Razvan P.)
-     * Optimized to Usmm and UsmmCscDense in some case (Yann)
+   There was warning printed by the subtensor optimization in those cases.
-     * Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
+ * faster rng_mrg python code. (mostly used for tests) (Frederic B.)
-       The new theano.sparse.dot() has a dense gradient for all inputs.
- * GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
- * TensorVariable.zeros_like() and SparseVariable.zeros_like()
- * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
- * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
- * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
-     * We also support the "theano_version" substitution.
- * IntDiv C code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
- * Internal filter_variable mechanism in Type. (Pascal, Ian)
-    * Ifelse works on sparse.
-    * It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
- * Added a_tensor.transpose(axes) axes is optional (James)
-    * theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
- * a_CudaNdarray_object[*] = int, now works (Frederic)
- * tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
- * sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
- * sparse_variable[N, N] now works (Li Yao, Frederic)
- * sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal)
-   M, N, O, and P can be Python int or scalar tensor variables, None, or
-   omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
- * tensor.tensordot can now be moved to GPU (Sander Dieleman,
-   Pascal, based on code from Tijmen Tieleman's gnumpy,
-   http://www.cs.toronto.edu/~tijmen/gnumpy.html)
- * Many infer_shape implemented on sparse matrices op. (David W.F.)
- * Added theano.sparse.verify_grad_sparse to easily allow testing grad of
-   sparse op. It supports testing the full and structured gradients.
- * The keys in our cache now store the hash of constants and not the constant values
-   themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
- * 'theano-cache list' lists key files bigger than 1M (Frederic B.)
- * 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
- * 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
- * The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
- * Add the header_dirs to the hard part of the compilation key. This is
-   currently used only by cuda, but if we use library that are only headers,
-   this can be useful. (Frederic B.)
- * The Theano flag "nvcc.flags" is now included in the hard part of the key.
-   This mean that now we recompile all modules for each value of "nvcc.flags".
-   A change in "nvcc.flags" used to be ignored for module that were already
-   compiled. (Frederic B.)
- * Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
-   at compile time if all their inputs are constant.
-   (Frederic B., Pascal L., reported by Sander Dieleman)
- * New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)
+Speed up GPU:
+ * Convolution on the GPU now check the generation of the card to make
+   it faster in some cases (especially medium/big ouput image) (Frederic B.)
+   * We had hardcoded 512 as the maximum number of thread per block. Newer card
+     support up to 1024 threads per block.
+ * Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
+ * We now pass the GPU architecture to nvcc when compiling (Frederic B.)
+ * Now we use the GPU function async feature by default. (Frederic B.)
+   Set the environment variable `CUDA_LAUNCH_BLOCKING` to `1` to disable this
+   for profiling or debugging.
+ * Faster creation of CudaNdarray objects (Frederic B.)
+ * Now some Max reduction are implemented on the GPU. (Ian G.)
-New optimizations:
+Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp):
- * AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
+ * sparse.remove0 (Frederic B., Nicolas B.)
- * dot22, dot22scalar work with complex. (Frederic)
+ * sparse.sp_sum(a, axis=None) (Nicolas B.)
- * Generate Gemv/Gemm more often. (James)
+   * bugfix: the not structured grad was returning a structured grad.
- * Remove scan when all computations can be moved outside the loop. (Razvan)
+ * sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
- * scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
+ * sparse.{diag,square_diagonal} (Nicolas B.)
- * exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
- * Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
- * Made the optimization process faster. (James)
- * Allow fusion of elemwise when the scalar op needs support code. (James)
- * Better opt that lifts transpose around dot. (James)
+Sparse:
+ * Support for uint* dtype.
+ * Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't
+   have the same sparsity pattern. (Frederic B.)
+ * New Ops: sparse.{expm1,deg2rad,rad2deg,trunc} (Nicolas B.)
+ * New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
+ * New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
+ * New functions: structured_{add,exp,log,pow,minimum,maximum,sigmoid} (Yann D., Nicolas B.)
+     * Op optimized op: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
+ * New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
+ * New Op: sparse.Cast() (Yann D., Nicolas B.)
+   * Add sparse_variable.astype() and theano.sparse.cast() and
+     theano.sparse.{b,w,i,l,f,d,c,z}cast() as there tensor equivalent (Nicolas B.)
+ * Op class: SamplingDot (Yann D., Nicolas B.)
+   * Optimized version: SamplingDotCsr, StructuredDotCSC
+   * Optimization to inster the optimizer version: local_sampling_dot_csr, local_structured_add_s_v
+ * New Ops: sparse.{Multinomial,Poisson,Binomial}(Yann D., NB)
+ * Implement the CSMProperties grad method (Yann Dauphin)
+ * Move optimizations to theano/sparse/opt.py (Nicolas B.)
-Crashes fixed:
+New flags:
- * T.mean crash at graph building time. (Ian)
+ * `profile=True` flag now print a printing of the sum of all printed profile.(Frederic B.)
- * "Interactive debugger" crash fix. (Ian, Frederic)
+   * It work with the linker vm/cvm(default).
- * Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
+   * Also print compile time, optimizer time and linker time.
- * Optimization crash with gemm and complex. (Frederic)
+   * Also print a summary by op class.
- * GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
+ * new flag "profile_optimizer" (Frederic B.)
- * Compilation crash with amdlibm and the GPU. (Frederic)
+   when profile=True, will also print the time spent in each optimizer.
- * IfElse crash. (Frederic)
+   Useful to find optimization bottleneck.
- * Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
+ * new flag "cmodule.remove_gxx_opt" (Frederic B.)
- * GPU compilation crash on MacOS X. (Olivier)
+   If True, will remove -O* parameter passed to g++.
- * Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
+   This is useful to debug in gdb module compiled by Theano.
- * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
+   The parameter -g is passed by default to g++.
- * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
+ * new flag cmodule.compilation_warning
- * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
+   if True, will print compilation warning.
- * Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
+ * new flag `allow_gc` (Frederic B.)
- * Fix runtime crash in gemm, dot22. FB
+   When False, do not garbage collect intermediate results when they are not needed.
- * Fix on 32bits computer: make sure all shape are int64.(Olivier)
+   This use more memory, but allocate memory less frequently so faster.
- * Fix to deque on python 2.4 (Olivier)
+ * new flag `vm.lazy` (Frederic B.)
- * Fix crash when not using C code (or using DebugMode) (not used by
+   Useful only for the vm linkers. When lazy is None,
-   default) with numpy 1.6*. Numpy has a bug in the reduction code that
+   auto detect if lazy evaluation is needed and use the apropriate
-   made it crash. (Pascal)
+   version. If lazy it True/False, force the version used between
- * Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU)
+   Loop/LoopGC and Stack.
-   when matrices had non-unit stride in both dimensions (CPU and GPU),
+ * new flag `cxx`. This is the c++ compiler to use. If empty do not compile c code. (Frederic B.)
-   or when matrices had negative strides (GPU only). In those cases,
+ * New flag `print_active_device` flag that default to True. (Matthew R.)
-   we are now making copies. (Pascal)
- * More cases supported in AdvancedIncSubtensor1. (Olivier D.)
- * Fix crash when a broadcasted constant was used as input of an
-   elemwise Op and needed to be upcasted to match the op's output.
-   (Reported by John Salvatier, fixed by Pascal L.)
- * Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)
+Documentation:
+ * Added in the tutorial documentation on how to extend Theano.
+   This explains how to make a Theano Op from a Python function.
+   http://deeplearning.net/software/theano/tutorial/extending_theano.html
+   (Frederic B.)
+ * New installation instructions for Windows using EPD (Pascal L.)
+ * New installation on Windows by using a Linux VM from ContinuumIO (Frederic B.)
+ * Revisions of Theano tutorial and addition of exercices to it. (Eric L.)
+ * New tutorial on Sparse variable. (Nicolas B., Sebastien Lemieux, Frederic Bastien
+   http://www.deeplearning.net/software/theano/tutorial/sparse.html
+ * Installation documentation for CentOS6 (Frederic B.)
+ * Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer)
+ * Doc type fix, Doc update, Better error messag: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G.
+ * Python Memory Management (Steven Pigeon, Olivier D.)
-Known bugs:
+Proposal:
- * CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_).
+ * Math framework for complex gradien (Pascal L.)
-     * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
-Sandbox:
- * cvm interface more consistent with current linker. (James)
-   * Now all tests pass with the linker=cvm flags.
- * vm linker has a callback parameter. (James)
- * review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
- * review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
- * review/finish/doc: MatrixInverse, matrix_inverse. (Razvan)
- * review/finish/doc: matrix_dot. (Razvan)
- * review/finish/doc: det (determinent) op. (Philippe Hamel)
- * review/finish/doc: Cholesky determinent op. (David)
- * review/finish/doc: ensure_sorted_indices. (Li Yao)
- * review/finish/doc: spectral_radius_boud. (Xavier Glorot)
- * review/finish/doc: sparse sum. (Valentin Bisson)
- * review/finish/doc: Remove0 (Valentin)
- * review/finish/doc: SquareDiagonal (Eric)
-Sandbox New features (not enabled by default):
- * CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
- * New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)
+Internal changes:
+ * Define new exceptions MissingInputError and UnusedInputError, and use them
+   in theano.function, instead of TypeError and ValueError. (Pascal L.)
+ * Better handling of bitwidth and max values of integers and pointers
+   across platforms (Pascal L.)
+ * Made a few Ops with C code versioned to reduce compilation time.
+   (Frederic B, Pascal L.)
+ * Better deletion of files in the compiledir (Frederic B.)
+ * Safer import on sort op (Nicolas Pinto)
+ * hash_from_dict for elemwise op (Fredric B.)
+ * Renamed BadCLinkerOutput into BadThunkOutput. (PL)
+ * tensor.utils.shape_of_variables (Matthew R.)
+ * Add the numpy abi version and g++/nvcc version in the key of compiled code. (Frederic B.)
+ * env.replace_all_validate_remove (Frederic B.)
+   This allow global optimizer to ensure it removed some nodes from the graph.
+   This is a generic way to catch error that would otherwise duplicate
+   computation.
+   * It was used for GEMM and Scan optimization (Frederic B., Razvan P.)
+ * Fix how exception are raised in GPU code (James B.)
+ * Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen and others.
+ * TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or
+   numpy.zeros with the right dtype. (Pascal L., Olivier D.)
+   This allow to have the same code work with both type.
+ * Renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.)
+ * New exception MissingGXX when we try to compile but there is no cxx compiler. (Frederic B.)
+ * New fct theano.gof.utils.give_variables_names(...) that give unique name to variable. (Matthew R.)
+ * Use most of the time the new NumPy C-API for later NumPy release. (Frederic B.)
+ * New theano.gof.sched.sort_apply_nodes() that will allow other execution ordering. (Matthew R.)
+ * New attribute sort_schedule_fn, a way to specify a scheduler to use. (Matthew R.)
-Documentation:
+Crash Fix:
- * Many updates. (Many people)
+ * Fix import conflict name (usaar33, Frederic B.)
- * Updates to install doc on MacOS. (Olivier)
+    * This make Theano work with PiCloud.
- * Updates to install doc on Windows. (David, Olivier)
+ * Do not try to use the BLAS library when blas.ldflags is manually set to an
- * Doc on the Rop function (Ian)
+   empty string (Frederic B., Pascal L.)
- * Added how to use scan to loop with a condition as the number of iteration. (Razvan)
+ * When importing theano on a computer without GPU with the Theano
- * Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
+   flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by  Luo Heng)
- * Refactored GPU installation of Theano. (Olivier)
+ * Optimization printed a useless error when scipy was not available. (Frederic B.)
+ * GPU conv crash/slowdown on newer hardware (James B.)
+ * Better error handling in GPU conv (Frederic B.)
+ * GPU optimization that moves element-wise Ops to the GPU. Crash happened in
+   a particular execution order of this optimization and the
+   element-wise fusion optimization when upcasting some inputs to
+   float32 (to compute them on the GPU).
+   (Frederic B., reported by Sander Dieleman)
+ * GpuReshape in some particular case when the input is not contiguous
+   (Frederic B., reported by Sander Dieleman)
+ * GpuSoftmaxWithBias with shape (0, N) with N > 1.
+   (Frederic B., reported by Razvan P.)
+ * Fix crash under 64-bit Windows, when taking subtensors of the form a[n:]
+   (Pascal L., reported by Simon McGregor)
+ * Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable
+   dimensions, which could typically result in optimization crashes (Olivier D.)
+ * Fixed crash when concatenating some arrays with specific broadcasting
+   patterns (Olivier D.)
+ * Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
+ * In advanced indexing, if some inputs are constant, no need to call constant(...)
+   on their value any more. (Pascal L., reported by John Salvatier)
+ * Fix crash on GPU when the GpuSubtensor didn't put the right stride
+   when the results tensor had a dimensions with size of 1. (Pascal L,
+   reported Graham T.)
+ * Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
+ * If you grad again a random state, don't crash (Razvan P.)
+ * GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535.
+   (Frederic B. reported by Gabe Schwartz)
+ * Potential crash due to parallel compilation when importing theano.sandbox.cuda
+   (Olivier D.)
+ * Crash fix on python 2.4 with slicing. (Pascal L.)
+ * grad of argmin and argmax (Razvan P.)
+ * Don't compute the Rop for shared variable with updates(mostly random).
+   We don't use them and they caused crash. (Razvan P.)
+ * MaxArgmax.grad() when one of the gradient it receive is None. (Razvan P, reported by Mark Fenner)
+ * Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)
+Tests:
+ * Use less memory (Olivier D.)(frix crash on 32-bits computers)
+ * Fix test with Theano flag "blas.ldflags=". (Frederic B., Pascal L.)
+ * Fix crash with advanced subtensor and numpy constant.
+ * Fix random tests crash due to random value. (Pascal L.)
+ * Always introduce Alloc node when calling alloc and let the optimizer remove them if needed.
+   This allow DebugMode to catch some shape error. (Pascal L.)
+ * DebugMode now check the view_map for all type of Theano variable.
+   It was doing only variable of tensor type. (Frederic B.)
 Others:
- * Better error messages in many places. (Many people)
+ * Remove python warning for some python version. (Gabe Schwartz)
- * PEP8 fixes. (Many people)
+ * Remove useless fill op in fast_compile mode to make the graph more readable. (Fredric B.)
- * Add a warning about numpy bug when using advanced indexing on a
+ * Remove GpuOuter as it is a subset of the new GpuGer (Frederic B.)
-   tensor with more than 2**32 elements (the resulting array is not
+ * Now we use http://travis-ci.org/ to run all CPU tests (without SciPy)
-   correctly filled and ends with zeros). (Pascal, reported by David WF)
+   with the default mode on all Pull Requests.
- * Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan)
+   This should make the trunk more stable. (Fredric B.)
- * New min_informative_str() function to print graph. (Ian)
+ * Our nightly buildbot now check on python 2.4(Frederic B.)
- * Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier)
+   This should make the trunk work on it more frequently.
- * Better support for utf string. (David)
- * Fix pydotprint with a function compiled with a ProfileMode (Frederic)
+Other thanks:
-     * Was broken with change to the profiler.
+ * blaxill reported an error introduced into the trunk.
- * Warning when people have old cache entries. (Olivier)
- * More tests for join on the GPU and CPU. (Frederic)
- * Do not request to load the GPU module by default in scan module. (Razvan)
- * Fixed some import problems. (Frederic and others)
- * Filtering update. (James)
- * On Windows, the default compiledir changed to be local to the
-   computer/user and not transferred with roaming profile. (Sebastian
-   Urban)
- * New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
-   it prints a warning when an error occurs when inferring the shape of some apply node.
-   The other accepted value is "raise" to raise an error when this happens. (Frederic)
- * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
- * better pycuda tests (Frederic)
- * check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
- * Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
- * More internal verification on what each op.infer_shape return. (Frederic, James)
- * Argmax dtype to int64 (Olivier)
- * Improved docstring and basic tests for the Tile Op (David).
-Reviewers (alphabetical order):
+New stuff that will probably be reworked/removed before the release:
- * David, Frederic, Ian, James, Olivier, Razvan
+ * Better PyCUDA sharing of the GPU context.(fix crash at exit) (Frederic B.)
+   TODO: there is still a crash at exit!
--- a/doc/NEWS.txt
+++ b/doc/NEWS.txt
@@ -13,7 +13,7 @@ Highlight:
 * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
 * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
 * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
-   and dot(vector, vector). (James, Frédéric, Pascal)
+   and dot(vector, vector). (James, Frederic, Pascal)
 * C implementation of Alloc. (James, Pascal)
 * theano.grad() now also work with sparse variable. (Arnaud)
 * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
@@ -24,7 +24,7 @@ Interface Behavior Changes:
 * The current default value of the parameter axis of
   theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
   numpy: None. i.e. operate on all dimensions of the tensor.
-   (Frédéric Bastien, Olivier Delalleau) (was deprecated and generated
+   (Frederic Bastien, Olivier Delalleau) (was deprecated and generated
   a warning since Theano 0.3 released Nov. 23rd, 2010)
 * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
   You can specify the output dtype with a new dtype parameter to sum.
@@ -209,7 +209,7 @@ Crashes fixed:
 * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
 * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
 * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
- * Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
+ * Fix dot22scalar cast of integer scalars (Justin Bayer, Frederic, Olivier)
 * Fix runtime crash in gemm, dot22. FB
 * Fix on 32bits computer: make sure all shape are int64.(Olivier)
 * Fix to deque on python 2.4 (Olivier)

--- a/doc/conf.py
+++ b/doc/conf.py
@@ -51,9 +51,9 @@ copyright = '2008--2012, LISA lab'
 # other places throughout the built documents.
 #
 # The short X.Y version.
-version = '0.5'
+version = '0.6'
 # The full version, including alpha/beta/rc tags.
-release = '0.5'
+release = '0.6rc1'
 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:

--- a/doc/introduction.txt
+++ b/doc/introduction.txt
@@ -165,11 +165,11 @@ Note: There is no short term plan to support multi-node computation.
 Theano Vision State
 ===================
-Here is the state of that vision as of 24 October 2011 (after Theano release
+Here is the state of that vision as of 1 October 2012 (after Theano release
-0.4.1):
+0.6rc1):
 * We support tensors using the `numpy.ndarray` object and we support many operations on them.
-* We support sparse types by using the `scipy.{csc,csr}_matrix` object and support some operations on them (more are coming).
+* We support sparse types by using the `scipy.{csc,csr}_matrix` object and support some operations on them.
 * We have started implementing/wrapping more advanced linear algebra operations.
 * We have many graph transformations that cover the 4 categories listed above.
 * We can improve the graph transformation with better storage optimization
@@ -196,16 +196,15 @@ Here is the state of that vision as of 24 October 2011 (after Theano release
  * The profiler used by cvm is less complete than `ProfileMode`.
 * SIMD parallelism on the CPU comes from the compiler.
-* Multi-core parallelism is only supported for gemv and gemm, and only
+* Multi-core parallelism is only supported Conv2d. If the external BLAS implementation supports it,
-  if the external BLAS implementation supports it.
+  there is also, gemm, gemv and ger that are parallelized.
 * No multi-node support.
 * Many, but not all NumPy functions/aliases are implemented.
  * http://www.assembla.com/spaces/theano/tickets/781
-* Wrapping an existing Python function in easy, but better documentation of
+* Wrapping an existing Python function in easy and documented.
-  it would make it even easier.
+* We know how to separate the shared variable memory
-* We need to find a way to separate the shared variable memory
  storage location from its object type (tensor, sparse, dtype, broadcast
-  flags).
+  flags), but we need to do it.
 Contact us

--- a/doc/library/config.txt
+++ b/doc/library/config.txt
@@ -310,7 +310,7 @@ import theano and print the config variable, as in:
 .. attribute:: config.warn.ignore_bug_before
-    String value: 'None', 'all', '0.3', '0.4', '0.4.1', '0.5'
+    String value: 'None', 'all', '0.3', '0.4', '0.4.1', '0.5', '0.6'
    Default: 'None'

--- a/setup.py
+++ b/setup.py
@@ -44,9 +44,9 @@ AUTHOR              = "LISA laboratory, University of Montreal"
 AUTHOR_EMAIL        = "theano-dev@googlegroups.com"
 PLATFORMS           = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
 MAJOR               = 0
-MINOR               = 5
+MINOR               = 6
 MICRO               = 0
-SUFFIX              = ""  # Should be blank except for rc's, betas, etc.
+SUFFIX              = "rc1"  # Should be blank except for rc's, betas, etc.
 ISRELEASED          = False
 VERSION             = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX)