Modif for release 0.4.1

b1635336 · Frederic Bastien · 8d0b2ecc · b1635336 · b1635336 · b1635336
--- a/NEWS.txt
+++ b/NEWS.txt
-Modifications in the 0.4.1 release candidate 2(9 August 2011)
+Modifications in the 0.4.1 (12 August 2011)
-Know bug:
- * CAReduce with nan in inputs don't return the good output (`Ticket <http://trac-hg.assembla.com/theano/ticket/763>`_).
-   * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
-   * This is not a new bug, just a bug discovered since the last release that we didn't had time to fix.
-Deprecation (will be removed in Theano 0.5):
- * The string mode (accepted only by theano.function()) FAST_RUN_NOGC. Use Mode(linker='c|py_nogc') instead.
- * The string mode (accepted only by theano.function()) STABILIZE. Use Mode(optimizer='stabilize') instead.
- * scan interface change:
-   * The use of `return_steps` for specifying how many entries of the output
-     scan has been depricated
-     * The same thing can be done by applying a subtensor on the output
-       return by scan to select a certain slice
-   * The inner function (that scan receives) should return its outputs and
-     updates following this order:
-        [outputs], [updates], [condition]. One can skip any of the three if not
-        used, but the order has to stay unchanged.
- * tensor.grad(cost, wrt) will return an object of the "same type" as wrt 
-   (list/tuple/TensorVariable).
-   * Currently tensor.grad return a type list when the wrt is a list/tuple of 
-     more then 1 element.
-Decrecated in 0.4.0:
- * Dividing integers with / is deprecated: use // for integer division, or
-   cast one of the integers to a float type if you want a float result (you may
-   also change this behavior with config.int_division).
- * tag.shape attribute deprecated (#633)
- * CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
 New features:
@@ -111,6 +74,43 @@ Crash fixed:
   * Compilation crash fixed under Ubuntu 11.04
   * Compilation crash fixed with CUDA 4.0
+Know bug:
+ * CAReduce with nan in inputs don't return the good output (`Ticket <http://trac-hg.assembla.com/theano/ticket/763>`_).
+   * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
+   * This is not a new bug, just a bug discovered since the last release that we didn't had time to fix.
+Deprecation (will be removed in Theano 0.5, warning generated if you use them):
+ * The string mode (accepted only by theano.function()) FAST_RUN_NOGC. Use Mode(linker='c|py_nogc') instead.
+ * The string mode (accepted only by theano.function()) STABILIZE. Use Mode(optimizer='stabilize') instead.
+ * scan interface change:
+   * The use of `return_steps` for specifying how many entries of the output
+     scan has been depricated
+     * The same thing can be done by applying a subtensor on the output
+       return by scan to select a certain slice
+   * The inner function (that scan receives) should return its outputs and
+     updates following this order:
+        [outputs], [updates], [condition]. One can skip any of the three if not
+        used, but the order has to stay unchanged.
+ * tensor.grad(cost, wrt) will return an object of the "same type" as wrt 
+   (list/tuple/TensorVariable).
+   * Currently tensor.grad return a type list when the wrt is a list/tuple of 
+     more then 1 element.
+Decrecated in 0.4.0(Reminder, warning generated if you use them):
+ * Dividing integers with / is deprecated: use // for integer division, or
+   cast one of the integers to a float type if you want a float result (you may
+   also change this behavior with config.int_division).
+ * tag.shape attribute deprecated (#633)
+ * CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
 Sandbox:
 * MRG random generator now implements the same casting behavior as the regular random generator.

--- a/doc/NEWS.txt
+++ b/doc/NEWS.txt
-Modifications in the trunk since the last release
+Modifications in the 0.4.1 (12 August 2011)
-Theano 0.4.0 (2011-06-27)
+New features:
--------------------------------------------------
-Change in output memory storage for Ops:
+ * `R_op <http://deeplearning.net/software/theano/tutorial/gradients.html>`_ macro like theano.tensor.grad
- If you implemented custom Ops, with either C or Python implementation,
- this will concern you.
+   * Not all tests are done yet (TODO)
+ * Added alias theano.tensor.bitwise_{and,or,xor,not}. They are the numpy names.
+ * Updates returned by Scan (you need to pass them to the theano.function) are now a new Updates class.
+   That allow more check and easier work with them. The Updates class is a subclass of dict
+ * Scan can now work in a "do while" loop style.
+   * We scan until a condition is met.
+   * There is a minimum of 1 iteration(can't do "while do" style loop)
+ * The "Interactive Debugger" (compute_test_value theano flags)
+   * Now should work with all ops (even the one with only C code)
+   * In the past some errors were caught and re-raised as unrelated errors (ShapeMismatch replaced with NotImplemented). We don't do that anymore.
+ * The new Op.make_thunk function(introduced in 0.4.0) is now used by constant_folding and DebugMode
+ * Added A_TENSOR_VARIABLE.astype() as a way to cast. NumPy allows this syntax.
+ * New BLAS GER implementation.
+ * Insert GEMV more frequently.
+ * Added new ifelse(scalar condition, rval_if_true, rval_if_false) Op.
+   * This is a subset of the elemwise switch (tensor condition, rval_if_true, rval_if_false).
+   * With the new feature in the sandbox, only one of rval_if_true or rval_if_false will be evaluated.
+Optimizations:
+ * Subtensor has C code
+ * {Inc,Set}Subtensor has C code
+ * ScalarFromTensor has C code
+ * dot(zeros,x) and dot(x,zeros)
+ * IncSubtensor(x, zeros, idx) -> x
+ * SetSubtensor(x, x[idx], idx) -> x (when x is a constant)
+ * subtensor(alloc,...) -> alloc
+ * Many new scan optimization 
+   * Lower scan execution overhead with a Cython implementation
+   * Removed scan double compilation (by using the new Op.make_thunk mechanism)
+   * Certain computations from the inner graph are now Pushed out into the outer
+     graph. This means they are not re-comptued at every step of scan.
+   * Different scan ops get merged now into a single op (if possible), reducing
+     the overhead and sharing computations between the two instances
- The contract for memory storage of Ops has been changed. In particular,
+GPU:
- it is no longer guaranteed that output memory buffers are either empty,
- or allocated by a previous execution of the same Op.
- Right now, here is the situation:
+ * PyCUDA/Theano bridge and `documentation <http://deeplearning.net/software/theano/tutorial/pycuda.html>`_.
-  * For Python implementation (perform), what is inside output_storage
-    may have been allocated from outside the perform() function, for
-    instance by another node (e.g., Scan) or the Mode. If that was the
-    case, the memory can be assumed to be C-contiguous (for the moment).
-  * For C implementations (c_code), nothing has changed yet.
- In a future version, the content of the output storage, both for Python and C
+   * New function to easily convert pycuda GPUArray object to and from CudaNdarray object
- versions, will either be NULL, or have the following guarantees:
+   * Fixed a bug if you crated a view of a manually created CudaNdarray that are view of GPUArray.
+ * Removed a warning when nvcc is not available and the user did not requested it.
+ * renamed config option cuda.nvccflags -> nvcc.flags
+ * Allow GpuSoftmax and GpuSoftmaxWithBias to work with bigger input.
-  * It will be a Python object of the appropriate Type (for a Tensor variable,
+Bugs fixed:
-    a numpy.ndarray, for a GPU variable, a CudaNdarray, for instance)
-  * It will have the correct number of dimensions, and correct dtype
- However, its shape and memory layout (strides) will not be guaranteed.
+ * In one case an AdvancedSubtensor1 could be converted to a GpuAdvancedIncSubtensor1 insted of GpuAdvancedSubtensor1.
+   It probably didn't happen due to the order of optimizations, but that order is not guaranteed to be the same on all computers.
+ * Derivative of set_subtensor was wrong.
+ * Derivative of Alloc was wrong.
- When that change is made, the config flag DebugMode.check_preallocated_output
+Crash fixed:
- will help you find implementations that are not up-to-date.
+ * On an unusual Python 2.4.4 on Windows
+ * When using a C cache copied from another location
+ * On Windows 32 bits when setting a complex64 to 0.
+ * Compilation crash with CUDA 4
+ * When wanting to copy the compilation cache from a computer to another
+   * This can be useful for using Theano on a computer without a compiler.
+ * GPU:
+   * Compilation crash fixed under Ubuntu 11.04
+   * Compilation crash fixed with CUDA 4.0
+Know bug:
+ * CAReduce with nan in inputs don't return the good output (`Ticket <http://trac-hg.assembla.com/theano/ticket/763>`_).
+   * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
+   * This is not a new bug, just a bug discovered since the last release that we didn't had time to fix.
+Deprecation (will be removed in Theano 0.5, warning generated if you use them):
+ * The string mode (accepted only by theano.function()) FAST_RUN_NOGC. Use Mode(linker='c|py_nogc') instead.
+ * The string mode (accepted only by theano.function()) STABILIZE. Use Mode(optimizer='stabilize') instead.
+ * scan interface change:
+   * The use of `return_steps` for specifying how many entries of the output
+     scan has been depricated
+     * The same thing can be done by applying a subtensor on the output
+       return by scan to select a certain slice
+   * The inner function (that scan receives) should return its outputs and
+     updates following this order:
+        [outputs], [updates], [condition]. One can skip any of the three if not
+        used, but the order has to stay unchanged.
+ * tensor.grad(cost, wrt) will return an object of the "same type" as wrt 
+   (list/tuple/TensorVariable).
+   * Currently tensor.grad return a type list when the wrt is a list/tuple of 
+     more then 1 element.
+Decrecated in 0.4.0(Reminder, warning generated if you use them):
-Deprecation:
- * tag.shape attribute deprecated (#633)
- * CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
 * Dividing integers with / is deprecated: use // for integer division, or
   cast one of the integers to a float type if you want a float result (you may
   also change this behavior with config.int_division).
- * Removed (already deprecated) sandbox/compile module
+ * tag.shape attribute deprecated (#633)
- * Removed (already deprecated) incsubtensor and setsubtensor functions,
+ * CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
-   inc_subtensor and set_subtensor are to be used instead.
-Bugs fixed:
+Sandbox:
- * In CudaNdarray.__{iadd,idiv}__, when it is not implemented, return the error.
- * THEANO_FLAGS='optimizer=None' now works as expected
- * Fixed memory leak in error handling on GPU-to-host copy
- * Fix relating specifically to Python 2.7 on Mac OS X
- * infer_shape can now handle Python longs
- * Trying to compute x % y with one or more arguments being complex now
-   raises an error.
- * The output of random samples computed with uniform(..., dtype=...) is
-   guaranteed to be of the specified dtype instead of potentially being of a
-   higher-precision dtype.
- * The perform() method of DownsampleFactorMax did not give the right result
-   when reusing output storage. This happen only if you use the Theano flags 
-   'linker=c|py_nogc' or manually specify the mode to be 'c|py_nogc'.
-Crash fixed:
+ * MRG random generator now implements the same casting behavior as the regular random generator.
- * Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
-   crash.
- * Some optimizations crashed when the "ShapeOpt" optimization was disabled.
-Optimization:
+Sandbox New features(not enabled by default):
- * Optimize all subtensor followed by subtensor.
-GPU:
+ * New Linkers (theano flags linker={vm,cvm})
- * Move to the gpu fused elemwise that have other dtype then float32 in them
-   (except float64) if the input and output are float32.
+   * The new linker allows lazy evaluation of the new ifelse op, meaning we compute only the true or false branch depending of the condition. This can speed up some types of computation.
+   * Uses a new profiling system (that currently tracks less stuff)
+   * The cvm is implemented in C, so it lowers Theano's overhead.
+   * The vm is implemented in python. So it can help debugging in some cases.
+   * In the future, the default will be the cvm.
+ * Some new not yet well tested sparse ops: theano.sparse.sandbox.{SpSum, Diag, SquareDiagonal, ColScaleCSC, RowScaleCSC, Remove0, EnsureSortedIndices, ConvolutionIndices}
-   * This allow to move elemwise comparisons to the GPU if we cast it to
+Documentation:
-     float32 after that.
- * Implemented CudaNdarray.ndim to have the same interface in ndarray.
+ * How to compute the `Jacobian, Hessian, Jacobian times a vector, Hessian times a vector <http://deeplearning.net/software/theano/tutorial/gradients.html>`_.
- * Fixed slowdown caused by multiple chained views on CudaNdarray objects
+ * Slide for a 3 hours class with exercises that was done at the HPCS2011 Conference in Montreal.
- * CudaNdarray_alloc_contiguous changed so as to never try to free
-   memory on a view: new "base" property
- * Safer decref behaviour in CudaNdarray in case of failed allocations
- * New GPU implementation of tensor.basic.outer
- * Multinomial random variates now available on GPU
-New features:
+Others:
- * ProfileMode
-    * profile the scan overhead
+ * Logger name renamed to be consistent.
-    * simple hook system to add profiler
+ * Logger function simplified and made more consistent.
-    * reordered the output to be in the order of more general to more specific
+ * Fixed transformation of error by other not related error with the compute_test_value Theano flag.
- * DebugMode now checks Ops with different patterns of preallocated memory,
+ * Compilation cache enhancements.
-   configured by config.DebugMode.check_preallocated_output.
+ * Made compatible with NumPy 1.6 and SciPy 0.9
- * var[vector of index] now work, (grad work recursively, the direct grad
+ * Fix tests when there was new dtype in NumPy that is not supported by Theano.
-   work inplace, gpu work)
+ * Fixed some tests when SciPy is not available.
+ * Don't compile anything when Theano is imported. Compile support code when we compile the first C code.
-    * limitation: work only of the outer most dimensions.
+ * Python 2.4 fix:
- * New way to test the graph as we build it. Allow to easily find the source
+   * Fix the file theano/misc/check_blas.py
-   of shape mismatch error:
+   * For python 2.4.4 on Windows, replaced float("inf") with numpy.inf.
-   `<http://deeplearning.net/software/theano/tutorial/debug_faq.html#interactive-debugger>`__
+ * Removes useless inputs to a scan node
- * cuda.root inferred if nvcc is on the path, otherwise defaults to
-   /usr/local/cuda
+   * Beautification mostly, making the graph more visible. Such inputs would appear as a consequence of other optimizations
- * Better graph printing for graphs involving a scan subgraph
- * Casting behavior can be controlled through config.cast_policy,
+Core:
-   new (experimental) mode.
- * Smarter C module cache, avoiding erroneous usage of the wrong C
+ * there is a new mechanism that lets an Op permit that one of its
-   implementation when some options change, and avoiding recompiling the
+   inputs to be aliased to another destroyed input.  This will generally
-   same module multiple times in some situations.
+   result in incorrect calculation, so it should be used with care!  The
- * The "theano-cache clear" command now clears the cache more thoroughly.
+   right way to use it is when the caller can guarantee that even if
- * More extensive linear algebra ops (CPU only) that wrap scipy.linalg
+   these two inputs look aliased, they actually will never overlap. This
-   now available in the sandbox.
+   mechanism can be used, for example, by a new alternative approach to
- * CUDA devices 4 - 16 should now be available if present.
+   implementing Scan.  If an op has an attribute called
- * infer_shape support for the View op, better infer_shape support in Scan
+   "destroyhandler_tolerate_aliased" then this is what's going on.
- * infer_shape supported in all case of subtensor
+   IncSubtensor is thus far the only Op to use this mechanism.Mechanism
- * tensor.grad now gives an error by default when computing the gradient
-   wrt a node that is disconnected from the cost (not in the graph, or
-   no continuous path from that op to the cost).
- * New tensor.isnan and isinf functions.
-Documentation:
- * Better commenting of cuda_ndarray.cu
- * Fixes in the scan documentation: add missing declarations/print statements
- * Better error message on failed __getitem__
- * Updated documentation on profile mode
- * Better documentation of testing on Windows
- * Better documentation of the 'run_individual_tests' script
-Unit tests:
- * More strict float comparaison by default
- * Reuse test for subtensor of tensor for gpu tensor(more gpu test)
- * Tests that check for aliased function inputs and assure appropriate copying
-   (#374)
- * Better test of copies in CudaNdarray
- * New tests relating to the new base pointer requirements
- * Better scripts to run tests individually or in batches
- * Some tests are now run whenever cuda is available and not just when it has
-   been enabled before
- * Tests display less pointless warnings.
-Other:
- * Correctly put the broadcast flag to True in the output var of
-   a Reshape op when we receive an int 1 in the new shape.
- * pydotprint: high contrast mode is now the default, option to print
-   more compact node names.
- * pydotprint: How trunk label that are too long.
- * More compact printing (ignore leading "Composite" in op names)
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -53,7 +53,7 @@ copyright = '2008--2011, LISA lab'
 # The short X.Y version.
 version = '0.4.1'
 # The full version, including alpha/beta/rc tags.
-release = '0.4.1rc2'
+release = '0.4.1'
 # There are two options for replacing |today|: either, you set today to some
 # non-false value, then it is used:

--- a/setup.py
+++ b/setup.py
@@ -48,7 +48,7 @@ PLATFORMS           = ["Windows", "Linux", "Solaris", "Mac OS-X", "Unix"]
 MAJOR               = 0
 MINOR               = 4
 MICRO               = 1
-SUFFIX              = "rc2"  # Should be blank except for rc's, betas, etc.
+SUFFIX              = ""  # Should be blank except for rc's, betas, etc.
 ISRELEASED          = False
 VERSION             = '%d.%d.%d%s' % (MAJOR, MINOR, MICRO, SUFFIX)