提交 8d2c43de authored 作者: lamblin's avatar lamblin

Merge pull request #1011 from delallea/minor

Minor fixes
...@@ -7,15 +7,15 @@ Old Release Notes ...@@ -7,15 +7,15 @@ Old Release Notes
Theano 0.5 (23 February 2012) Theano 0.5 (23 February 2012)
============================= =============================
Highlight: Highlights:
* Moved to github: http://github.com/Theano/Theano/ * Moved to github: http://github.com/Theano/Theano/
* Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets * Old trac tickets moved to assembla tickets: http://www.assembla.com/spaces/theano/tickets
* Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people) * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
* Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban) * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
* Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
and dot(vector, vector). (James, Frédéric, Pascal) and dot(vector, vector). (James, Frédéric, Pascal)
* C implementation of Alloc. (James, Pascal) * C implementation of Alloc. (James, Pascal)
* theano.grad() now also work with sparse variable. (Arnaud) * theano.grad() now also works with sparse variables. (Arnaud)
* Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan) * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
* See the Interface changes. * See the Interface changes.
...@@ -28,14 +28,14 @@ Interface Behavior Changes: ...@@ -28,14 +28,14 @@ Interface Behavior Changes:
a warning since Theano 0.3 released Nov. 23rd, 2010) a warning since Theano 0.3 released Nov. 23rd, 2010)
* The current output dtype of sum with input dtype [u]int* is now always [u]int64. * The current output dtype of sum with input dtype [u]int* is now always [u]int64.
You can specify the output dtype with a new dtype parameter to sum. You can specify the output dtype with a new dtype parameter to sum.
The output dtype is the one using for the summation. The output dtype is the one used for the summation.
There is no warning in previous Theano version about this. There is no warning in previous Theano versions about this.
The consequence is that the sum is done in a dtype with more precision than before. The consequence is that the sum is done in a dtype with more precision than before.
So the sum could be slower, but will be more resistent to overflow. So the sum could be slower, but will be more resistant to overflow.
This new behavior is the same as numpy. (Olivier, Pascal) This new behavior is the same as numpy. (Olivier, Pascal)
* When using a GPU, detect faulty nvidia drivers. This was detected * When using a GPU, detect faulty nvidia drivers. This was detected
when running Theano tests. Now this is always tested. Faulty when running Theano tests. Now this is always tested. Faulty
drivers results in wrong results for reduce operations. (Frederic B.) drivers result in wrong results for reduce operations. (Frederic B.)
Interface Features Removed (most were deprecated): Interface Features Removed (most were deprecated):
...@@ -69,7 +69,7 @@ New deprecation (will be removed in Theano 0.6, warning generated if you use the ...@@ -69,7 +69,7 @@ New deprecation (will be removed in Theano 0.6, warning generated if you use the
Bug fixes (incorrect results): Bug fixes (incorrect results):
* On CPU, if the convolution had received explicit shape information, * On CPU, if the convolution had received explicit shape information,
they where not checked at runtime. This caused wrong result if the they were not checked at runtime. This caused wrong result if the
input shape was not the one expected. (Frederic, reported by Sander input shape was not the one expected. (Frederic, reported by Sander
Dieleman) Dieleman)
* Theoretical bug: in some case we could have GPUSum return bad value. * Theoretical bug: in some case we could have GPUSum return bad value.
...@@ -95,21 +95,21 @@ Bug fixes (incorrect results): ...@@ -95,21 +95,21 @@ Bug fixes (incorrect results):
Scan fixes: Scan fixes:
* computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan) * computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug) before: most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
now : do the right thing. now: do the right thing.
* gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan) * gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
before : it used to return wrong values before: it used to return wrong values
now : do the right thing. now: do the right thing.
Note: The reported case of this bug was happening in conjunction with the Note: The reported case of this bug was happening in conjunction with the
save optimization of scan that give run time errors. So if you didn't save optimization of scan that give run time errors. So if you didn't
manually disable the same memory optimization (number in the list4), manually disable the same memory optimization (number in the list4),
you are fine if you didn't manually request multiple taps. you are fine if you didn't manually request multiple taps.
* Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan) * Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
before : compilation error when computing R-op before: compilation error when computing R-op
now : do the right thing. now: do the right thing.
* save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan) * save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
before : for certain corner cases used to result in a runtime shape error before: for certain corner cases used to result in a runtime shape error
now : do the right thing. now: do the right thing.
* Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes) * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
* Scan.infer_shape now works correctly when working with a condition for the number of loops. * Scan.infer_shape now works correctly when working with a condition for the number of loops.
In the past, it returned n_steps as the length, which is not always true. (Razvan) In the past, it returned n_steps as the length, which is not always true. (Razvan)
...@@ -118,10 +118,10 @@ Scan fixes: ...@@ -118,10 +118,10 @@ Scan fixes:
New features: New features:
* AdvancedIncSubtensor grad defined and tested (Justin Bayer) * AdvancedIncSubtensor grad defined and tested (Justin Bayer)
* Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra) * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
* tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic) * tensor.{zeros,ones}_like now supports the dtype param as numpy (Frederic)
* Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian) * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
* theano-cache list: list the content of the theano cache (Frederic) * theano-cache list: list the content of the theano cache (Frederic)
* theano-cache unlock: remove the Theano lock (Olivier) * theano-cache unlock: remove the Theano cache lock (Olivier)
* tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic) * tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
* MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic) * MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
* used by tensor.{max,min,max_and_argmax} * used by tensor.{max,min,max_and_argmax}
...@@ -142,12 +142,12 @@ New features: ...@@ -142,12 +142,12 @@ New features:
* theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic) * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
* Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder) * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
* We also support the "theano_version" substitution. * We also support the "theano_version" substitution.
* IntDiv C code (faster and allow this elemwise to be fused with other elemwise) (Pascal) * IntDiv C code (faster and allows this elemwise to be fused with other elemwise) (Pascal)
* Internal filter_variable mechanism in Type. (Pascal, Ian) * Internal filter_variable mechanism in Type. (Pascal, Ian)
* Ifelse works on sparse. * Ifelse works on sparse.
* It makes use of gpu shared variable more transparent with theano.function updates and givens parameter. * It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
* Added a_tensor.transpose(axes) axes is optional (James) * Added a_tensor.transpose(axes) axes is optional (James)
* theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes. * theano.tensor.transpose(a_tensor, kwargs) We were ignoring kwargs, now it is used as the axes.
* a_CudaNdarray_object[*] = int, now works (Frederic) * a_CudaNdarray_object[*] = int, now works (Frederic)
* tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier) * tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
* sparse_variable.size (as scipy) computes the number of stored values. (Olivier) * sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
...@@ -168,11 +168,11 @@ New features: ...@@ -168,11 +168,11 @@ New features:
* 'theano-cache list' prints the number of compiled modules per op class (Frederic B.) * 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
* The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file. * The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
* Add the header_dirs to the hard part of the compilation key. This is * Add the header_dirs to the hard part of the compilation key. This is
currently used only by cuda, but if we use library that are only headers, currently used only by cuda, but if we use libraries that are only headers,
this can be useful. (Frederic B.) this can be useful. (Frederic B.)
* The Theano flag "nvcc.flags" is now included in the hard part of the key. * The Theano flag "nvcc.flags" is now included in the hard part of the key.
This mean that now we recompile all modules for each value of "nvcc.flags". This means that now we recompile all modules for each value of "nvcc.flags".
A change in "nvcc.flags" used to be ignored for module that were already A change in "nvcc.flags" used to be ignored for modules that were already
compiled. (Frederic B.) compiled. (Frederic B.)
* Alloc, GpuAlloc are not always pre-computed (constant_folding optimization) * Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
at compile time if all their inputs are constant. at compile time if all their inputs are constant.
...@@ -209,7 +209,7 @@ Crashes fixed: ...@@ -209,7 +209,7 @@ Crashes fixed:
* Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic) * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
* Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier) * Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
* Fix runtime crash in gemm, dot22. FB * Fix runtime crash in gemm, dot22. FB
* Fix on 32bits computer: make sure all shape are int64.(Olivier) * Fix on 32 bit computer: make sure all shapes are int64. (Olivier)
* Fix to deque on python 2.4 (Olivier) * Fix to deque on python 2.4 (Olivier)
* Fix crash when not using C code (or using DebugMode) (not used by * Fix crash when not using C code (or using DebugMode) (not used by
default) with numpy 1.6*. Numpy has a bug in the reduction code that default) with numpy 1.6*. Numpy has a bug in the reduction code that
...@@ -287,10 +287,9 @@ Others: ...@@ -287,10 +287,9 @@ Others:
The other accepted value is "raise" to raise an error when this happens. (Frederic) The other accepted value is "raise" to raise an error when this happens. (Frederic)
* The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic) * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
* better pycuda tests (Frederic) * better pycuda tests (Frederic)
* check_blas.py now accept the shape and the number of iteration as parameter (Frederic) * check_blas.py now accepts the shape and the number of iterations as parameter (Frederic)
* Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic) * Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
* More internal verification on what each op.infer_shape return. (Frederic, James) * More internal verification on what each op.infer_shape return. (Frederic, James)
* Argmax dtype to int64 (Olivier)
* Improved docstring and basic tests for the Tile Op (David). * Improved docstring and basic tests for the Tile Op (David).
Reviewers (alphabetical order): Reviewers (alphabetical order):
......
...@@ -8,21 +8,23 @@ https://github.com/Theano/Theano/wiki/Devnews ...@@ -8,21 +8,23 @@ https://github.com/Theano/Theano/wiki/Devnews
Release Notes Release Notes
============= =============
Theano 0.6rc1 (1 October 2012) Theano 0.6rc1 (October 1st, 2012)
============================== =================================
Highlight: Highlights:
* Bug fix, crash fix, CPU and GPU speed up. * Bug fixes, crash fixes, CPU and GPU speed up.
* theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.) * theano_var.eval({other_var: val[,...]} to simplify the usage of Theano (Ian G.)
* New default linker `cvm`. This is the execution engine that tell what op to run in witch order. * New default linker `cvm`. This is the execution engine that tells what op to run in which order.
It is now implemented in C and enable lazy evaluation of ifelse op. It is now implemented in C and enables lazy evaluation of ifelse op.
* Faster theano.function compilation. (Pascal L., Ian G.) * Faster theano.function compilation. (Pascal L., Ian G.)
* Big sparse submodule update and documentation of it. (Nicolas Bouchard) * Big sparse submodule update and documentation of it. (Nicolas Bouchard)
* Use GPU asynchronous functionality (Frederic B.) * Use GPU asynchronous functionality (Frederic B.)
* Better Windows support. * Better Windows support.
Known bug: Known bugs:
* A few crash case that will be fixed by the final release. * A few crash cases that will be fixed by the final release.
* CAReduce with NaN in inputs do not return the correct output. (reported by Pascal L.)
* This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
Bug fixes: Bug fixes:
* Outputs of Scan nodes could contain corrupted values: some parts of the * Outputs of Scan nodes could contain corrupted values: some parts of the
...@@ -46,34 +48,32 @@ Bug fixes: ...@@ -46,34 +48,32 @@ Bug fixes:
they were not supposed to share. (David W-F) they were not supposed to share. (David W-F)
* Correctly record the GPU device number used when we let the driver select it. * Correctly record the GPU device number used when we let the driver select it.
(Frederic B.) (Frederic B.)
* CAReduce with NaN in inputs don't return the good output. (Pascal L.)
* This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
* The grad of TensorDot, was returning the wrong shape for some combination of axes. * The grad of TensorDot, was returning the wrong shape for some combination of axes.
We now raise NotImplementedError in those cases. (Frederic B.) We now raise NotImplementedError in those cases. (Frederic B.)
* conv2d with subsample >2 returned wrong values. (Pascal L.) * conv2d with subsample >2 returned wrong values. (Pascal L.)
* Fixed when mode==valid, disabled when mode==full * Fixed when mode==valid, disabled when mode==full
* theano.sparse.CSMGrad op(generated by the grad of CSM) didn't * theano.sparse.CSMGrad op (generated by the grad of CSM) didn't
handle unsorted input correctly and grapdient that are more sparse handle unsorted input correctly and gradient that is sparser
then the input. In that case, bad result was returned. But this can than the input. In that case, a bad result was returned. But this could
happen only when a sparse input of a Theano function was not happen only when a sparse input of a Theano function was not
sorted. This happen for example with sparse advanted indexing from sorted. This happens for example with sparse advanced indexing from
scipy. The conclusion is most of time Nan in the graph. scipy. The conclusion is most of time Nan in the graph.
(Yann Dauphin) (Yann Dauphin)
* theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handled * theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handle
correctly not contiguous inputs/outputs. (Pascal L.) correctly not contiguous inputs/outputs. (Pascal L.)
* Fix a corner case CVM updates case. (Pascal L.) * Fix a corner case CVM updates case. (Pascal L.)
This happen is the update to a shared variable is itself after optimization. This happened if the update to a shared variable is itself after optimization.
The CVM was not used by default. The CVM was not used by default.
* Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.) * Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.)
This probably didn't cause problem as there is only the UsmmCscDense op This probably didn't cause problem as there is only the UsmmCscDense op
(used call to Usmm wieh CSC matrix) that could interfere with them. (used call to Usmm with CSC matrix) that could interfere with them.
Deprecation: Deprecation:
* Deprecated the Module class (Ian G.) * Deprecated the Module class (Ian G.)
This was a predecessor of SharedVariable with a less pythonic phylosophy. This was a predecessor of SharedVariable with a less pythonic philosophy.
Interface changes: Interface changes:
* Now the base version requirement are numpy >= 1.5.0 and the optional scipy >= 0.8. * Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.8.
* In Theano 0.5, we removed the deprecated sharedvar.value property. * In Theano 0.5, we removed the deprecated sharedvar.value property.
Now we raise an error if you access it. (Frederic B.) Now we raise an error if you access it. (Frederic B.)
* theano.function does not accept duplicate inputs, so function([x, x], ...) * theano.function does not accept duplicate inputs, so function([x, x], ...)
...@@ -83,7 +83,7 @@ Interface changes: ...@@ -83,7 +83,7 @@ Interface changes:
instance, function([x, y], [y]). You can use the kwarg instance, function([x, y], [y]). You can use the kwarg
``on_unused_input={'raise', 'warn', 'ignore'}`` to control this. ``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
(Pascal L.) (Pascal L.)
* New Theano flag "on_unused_input" that define the default value of the * New Theano flag "on_unused_input" that defines the default value of the
previous point. (Frederic B.) previous point. (Frederic B.)
* tensor.alloc() now raises an error during graph build time * tensor.alloc() now raises an error during graph build time
when we try to create less dimensions than the number of dimensions when we try to create less dimensions than the number of dimensions
...@@ -96,34 +96,34 @@ Interface changes: ...@@ -96,34 +96,34 @@ Interface changes:
* Renamed the FunctionGraph.nodes attribute to FunctionNodes.apply_nodes (Ian G.) * Renamed the FunctionGraph.nodes attribute to FunctionNodes.apply_nodes (Ian G.)
* Warn when we don't handle correctly the parameter in Theano flags `nvcc.flags` * Warn when we don't handle correctly the parameter in Theano flags `nvcc.flags`
(Frederic B.) (Frederic B.)
* Do not reorder the user flags passed to the compiler. They get set after other flags.(Frederic B.) * Do not reorder the user flags passed to the compiler. They get set after other flags. (Frederic B.)
* Make setuptools optional (Ilan Schnell+)/Remove Dependency. * Make setuptools optional (Ilan Schnell)
* We warn when a user try to use an old GPU what we don't test with. * We warn when a user tries to use an old GPU with which Theano is untested.
This could cause crash and will also be very slow. (Frederic B.) This could cause crash and will also be very slow. (Frederic B.)
* Make theano.grad able to differentiate between not implemented, undefined and disconnected grad. * Make theano.grad able to differentiate between not implemented, undefined and disconnected grad.
Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} or Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} or
something of DisconectedType (Ian G.) something of DisconectedType (Ian G.)
* Make theano.grad expect to always receive a float or undefined * Make theano.grad expect to always receive a float or undefined
gradient and inforce that op with integers output values always gradient and enforce that op with integer output values always
return 0. (Ian G.) return 0. (Ian G.)
New memory output contract(was told about in the release note of Theano 0.5): New memory output contract (was mentioned in the release notes of Theano 0.5):
* Now the output memory received can be preallocated by other stuff. * Now the output memory received can be preallocated by other stuff.
In the past it was always the previous output an Apply node allowcated. In the past it was always the previous output an Apply node allocated.
So this mean that the shape and strides can be different the from previous call So this means that the shape and strides can be different from previous calls
and there can be link to this memory at other place. and there can be links to this memory at other places.
This mean it could receive preallocated output that is not c_contiguous. This means it could receive preallocated output that is not c_contiguous.
But we don't do that now. (Pascal L.) But we don't do that now. (Pascal L.)
* New Theano flags to test this DebugMode.check_preallocated_output (Pascal L.) * New Theano flags to test this DebugMode.check_preallocated_output (Pascal L.)
* Updated the a few ops to respect this contract (Pascal L.) * Updated a few ops to respect this contract (Pascal L.)
New Features: New Features:
* GPU scan now work (don't crash) when there is a mixture of float32 and other dtypes. * GPU scan now works (does not crash) when there is a mixture of float32 and other dtypes.
* theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.) * theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
* debugprint new param ids=["CHAR", "id", "int", ""] * debugprint new param ids=["CHAR", "id", "int", ""]
This makes the identifier printed to be the python id, a unique char, a This makes the identifier printed to be a unique char, the Python id, a
unique int, or not have it printed. We changed the default to be "CHAR" unique int, or not have it printed. We changed the default to be "CHAR"
as this is more readable. (Frederic B.) as this is more readable. (Frederic B.)
* debugprint new param stop_on_name=[False, True]. If True, we don't print * debugprint new param stop_on_name=[False, True]. If True, we don't print
...@@ -149,87 +149,87 @@ New Features: ...@@ -149,87 +149,87 @@ New Features:
* C code reuses preallocated outputs (only done by Scan) (Pascal L.) * C code reuses preallocated outputs (only done by Scan) (Pascal L.)
* Garbage collection of intermediate results during Theano function calls * Garbage collection of intermediate results during Theano function calls
for Ops with C code (Pascal L.) for Ops with C code (Pascal L.)
* Theano flags compiledir_format now support the parameter "numpy_version" and "g++". (Frederic B.) * Theano flag compiledir_format now supports the parameter "numpy_version" and "g++". (Frederic B.)
* Theano GPU variables, shared variable and constant now support <, <=, * Theano GPU variables, shared variables and constants now support <, <=,
> and >= as as those not on the GPU. > and >= similar to those not on the GPU.
* AdvancedIncSubtensor now support the set_instead_of_inc parameter. (Eric L.) * AdvancedIncSubtensor now supports the set_instead_of_inc parameter. (Eric L.)
* Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.) * Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.)
* theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman} * theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman}
have a new parameter keepdims (Eric L.) have a new parameter keepdims (Eric L.)
This allow to broadcast it correctly again the input data to normalize it. This allows to broadcast it correctly against the input data to normalize it.
* The Updates object now check that the key are SharedVariable when we pass them * The Updates objects now check that the keys are SharedVariable when we pass them
in the __init__ function. (Pascal L.) in the __init__ function. (Pascal L.)
* Set a Theano Variable name on transposed op when the input have one (Frederic B). * Set a Theano Variable name on transposed op when the input has one (Frederic B).
* The cvm linker now support garbage collection (enabled by default). (James B. Arnaud B., Pascal L.) * The cvm linker now supports garbage collection (enabled by default). (James B. Arnaud B., Pascal L.)
* The cvm linker is now the default linker. * The cvm linker is now the default linker.
This make the "loop" around the execution of apply node in C. So this lower the overhead. This makes the "loop" around the execution of apply node in C. So this lowers the overhead.
* theano_variable[numpy.newaxis] is now supported (James B.) * theano_variable[numpy.newaxis] is now supported (James B.)
* Enable ifelse on the GPU. (Frederic B.) * Enable ifelse on the GPU. (Frederic B.)
* Correctly support numpy.memmap everywhere (Pascal L.) * Correctly support numpy.memmap everywhere (Pascal L.)
We add partial support for them before. Just use the normal tensor operation We add partial support for them before. Just use the normal tensor operation
on them and it should work. on them and it should work.
But take care to don't exhaust your computer memory! (we always generate normal ndarray) But be careful not to exhaust your computer memory! (we always generate normal ndarray)
* Add an optimization that stabilize log(softmax(x)). (Ian G.) * Add an optimization that stabilizes log(softmax(x)). (Ian G.)
* Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.) * Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.)
* If `theano_fn.trust_input` is set to False, do not check if the input are good * If `theano_fn.trust_input` is set to False, do not check if the inputs are good
when calling the theano function. (Frederic B.) when calling the theano function. (Frederic B.)
* Add theano.tensor.blas,gem{m,v} as shortcut. * Add theano.tensor.blas,gem{m,v} as shortcut.
* theano.grad(..., add_names=True). False for the old * theano.grad(..., add_names=True). False for the old
behavior. Otherwise it tries to name the grad variables. (Ian G.) behavior. Otherwise it tries to name the grad variables. (Ian G.)
* theano-nose (Pascal L.) * theano-nose (Pascal L.)
A wrapper around nosetests that add needed extension. A wrapper around nosetests that adds needed extensions.
* --profile-time option, print time spend in each tests (Eric L.) * --profile-time option, to print time spent in each test (Eric L.)
* --batch option, allow to run tests in batch to lower memory requirement. * --batch option, to allow to run tests in batch to lower memory requirement.
* m = mean(log(1 - sigm(x))) * m = mean(log(1 - sigm(x)))
x - scalar * theano.grad(m, x) x - scalar * theano.grad(m, x)
There is a stabilization optimization for this. There is a stabilization optimization for this.
Now it is applied more frequently. (Pascal L.) Now it is applied more frequently. (Pascal L.)
New Op/function: New Op/functions:
* Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard) * Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard)
* Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard) * Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard)
* Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard) * Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard)
* Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli). * Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
* Added theano.tensor.diff that wrap numpy.diff (Nicolas B.) * Added theano.tensor.diff that wraps numpy.diff (Nicolas B.)
* Added theano.tensor.bincount that wrap numpy.bincount (Nicolas B., Pascal L, Frederic B.) * Added theano.tensor.bincount that wraps numpy.bincount (Nicolas B., Pascal L, Frederic B.)
* Added theano.tensor.squeeze (Nicolas B.) * Added theano.tensor.squeeze (Nicolas B.)
This remove broadcasted dimensions from the variable. This removes broadcasted dimensions from the variable.
Theano-nesque version of numpy.squeeze. Theano-esque version of numpy.squeeze.
* Added theano.tensor.repeat that wrap numpy.repeat (Nicolas B. + PL) * Added theano.tensor.repeat that wraps numpy.repeat (Nicolas B. + PL)
* Added theano.tensor.bartlett that wrap numpy.bartlett (Eric L.) * Added theano.tensor.bartlett that wraps numpy.bartlett (Eric L.)
* Added theano.tensor.fill_diagonal that wrap numpy.fill_diagonal (Eric L., Frederic B.) * Added theano.tensor.fill_diagonal that wraps numpy.fill_diagonal (Eric L., Frederic B.)
* Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.) * Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.)
* Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op * Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op
that allow to load a .npy file in a theano graph (Matthew Rocklin) that allows to load a .npy file in a theano graph (Matthew Rocklin)
* theano.sandbox.linalg.kron.py:Kron op. (Eric L.) * theano.sandbox.linalg.kron.py:Kron op. (Eric L.)
Kronecker product Kronecker product
Speed up: Speed up:
* CPU convolution are now parallelized (Frederic B.) * CPU convolutions are now parallelized (Frederic B.)
By default use all cores/hyper-threads By default use all cores/hyper-threads.
To control it, use the `OMP_NUM_THREADS=N` environment variable where N is the number of To control it, use the `OMP_NUM_THREADS=N` environment variable where N is the number of
parallel thread to use. By default it is equal to the number of CPU cores/hyper parallel threads to use. By default it is equal to the number of CPU cores/hyper
threads that you have. threads that you have.
There is a new Theano flags `openmp` to allow/disallow openmp op. There is a new Theano flag `openmp` to allow/disallow openmp op.
If you BLAS library is parallelized, this flag won't affect it, but the If your BLAS library is parallelized, this flag won't affect it, but the
env variable will. env variable will.
* Remove a corner case where du duplicated dot22/gemm in the graph. (Frederic B., Ian G.) * Remove a corner case causing duplicated dot22/gemm in the graph. (Frederic B., Ian G.)
* Enable fusion of elemwise that have the same clients multiple time. (Frederic B.) * Enable fusion of elemwise that have the same clients multiple times. (Frederic B.)
* New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.) * New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.)
* Faster theano.function compilation. (Pascal L., Ian G.) * Faster theano.function compilation. (Pascal L., Ian G.)
* Remove GPU transfer around specify_shape op. (Frederic B.) * Remove GPU transfer around specify_shape op. (Frederic B.)
* Implemented/tested MANY op.infer_shape method (Eric Larsen) * Implemented/tested MANY op.infer_shape method (Eric Larsen)
This allow Theano to make better shape inferance. This allows Theano to make better shape inferance.
* Implement Solve.infer_shape (Matthew Rocklin) * Implement Solve.infer_shape (Matthew Rocklin)
* Scan memory optimization now work more frequently. (Razvan P.) * Scan memory optimizations now work more frequently. (Razvan P.)
There was warning printed by the subtensor optimization in those cases. There was a warning printed by the subtensor optimization in those cases.
* faster rng_mrg python code. (mostly used for tests) (Frederic B.) * Faster rng_mrg Python code. (mostly used for tests) (Frederic B.)
Speed up GPU: Speed up GPU:
* Convolution on the GPU now check the generation of the card to make * Convolution on the GPU now checks the generation of the card to make
it faster in some cases (especially medium/big ouput image) (Frederic B.) it faster in some cases (especially medium/big ouput image) (Frederic B.)
* We had hardcoded 512 as the maximum number of thread per block. Newer card * We had hardcoded 512 as the maximum number of threads per block. Newer cards
support up to 1024 threads per block. support up to 1024 threads per block.
* Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.) * Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
* We now pass the GPU architecture to nvcc when compiling (Frederic B.) * We now pass the GPU architecture to nvcc when compiling (Frederic B.)
...@@ -237,7 +237,7 @@ Speed up GPU: ...@@ -237,7 +237,7 @@ Speed up GPU:
Set the environment variable `CUDA_LAUNCH_BLOCKING` to `1` to disable this Set the environment variable `CUDA_LAUNCH_BLOCKING` to `1` to disable this
for profiling or debugging. for profiling or debugging.
* Faster creation of CudaNdarray objects (Frederic B.) * Faster creation of CudaNdarray objects (Frederic B.)
* Now some Max reduction are implemented on the GPU. (Ian G.) * Now some Max reductions are implemented on the GPU. (Ian G.)
Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp): Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp):
* sparse.remove0 (Frederic B., Nicolas B.) * sparse.remove0 (Frederic B., Nicolas B.)
...@@ -254,21 +254,21 @@ Sparse: ...@@ -254,21 +254,21 @@ Sparse:
* New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.) * New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
* New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.) * New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
* New functions: structured_{add,exp,log,pow,minimum,maximum,sigmoid} (Yann D., Nicolas B.) * New functions: structured_{add,exp,log,pow,minimum,maximum,sigmoid} (Yann D., Nicolas B.)
* Op optimized op: StructuredAddSV, StrucutedAddSVCSR (inserted automatically) * Optimized op: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
* New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.) * New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
* New Op: sparse.Cast() (Yann D., Nicolas B.) * New Op: sparse.Cast() (Yann D., Nicolas B.)
* Add sparse_variable.astype() and theano.sparse.cast() and * Add sparse_variable.astype() and theano.sparse.cast() and
theano.sparse.{b,w,i,l,f,d,c,z}cast() as there tensor equivalent (Nicolas B.) theano.sparse.{b,w,i,l,f,d,c,z}cast() as their tensor equivalent (Nicolas B.)
* Op class: SamplingDot (Yann D., Nicolas B.) * Op class: SamplingDot (Yann D., Nicolas B.)
* Optimized version: SamplingDotCsr, StructuredDotCSC * Optimized version: SamplingDotCsr, StructuredDotCSC
* Optimization to inster the optimizer version: local_sampling_dot_csr, local_structured_add_s_v * Optimizations to insert the optimized version: local_sampling_dot_csr, local_structured_add_s_v
* New Ops: sparse.{Multinomial,Poisson,Binomial}(Yann D., NB) * New Ops: sparse.{Multinomial,Poisson,Binomial} (Yann D., NB)
* Implement the CSMProperties grad method (Yann Dauphin) * Implement the CSMProperties grad method (Yann Dauphin)
* Move optimizations to theano/sparse/opt.py (Nicolas B.) * Move optimizations to theano/sparse/opt.py (Nicolas B.)
New flags: New flags:
* `profile=True` flag now print a printing of the sum of all printed profile.(Frederic B.) * `profile=True` flag now prints the sum of all printed profiles. (Frederic B.)
* It work with the linker vm/cvm(default). * It works with the linkers vm/cvm (default).
* Also print compile time, optimizer time and linker time. * Also print compile time, optimizer time and linker time.
* Also print a summary by op class. * Also print a summary by op class.
* new flag "profile_optimizer" (Frederic B.) * new flag "profile_optimizer" (Frederic B.)
...@@ -282,14 +282,14 @@ New flags: ...@@ -282,14 +282,14 @@ New flags:
if True, will print compilation warning. if True, will print compilation warning.
* new flag `allow_gc` (Frederic B.) * new flag `allow_gc` (Frederic B.)
When False, do not garbage collect intermediate results when they are not needed. When False, do not garbage collect intermediate results when they are not needed.
This use more memory, but allocate memory less frequently so faster. This uses more memory, but allocates memory less frequently so faster.
* new flag `vm.lazy` (Frederic B.) * new flag `vm.lazy` (Frederic B.)
Useful only for the vm linkers. When lazy is None, Useful only for the vm linkers. When lazy is None,
auto detect if lazy evaluation is needed and use the apropriate auto detect if lazy evaluation is needed and use the apropriate
version. If lazy it True/False, force the version used between version. If lazy is True/False, force the version used between
Loop/LoopGC and Stack. Loop/LoopGC and Stack.
* new flag `cxx`. This is the c++ compiler to use. If empty do not compile c code. (Frederic B.) * new flag `cxx`. This is the C++ compiler to use. If empty do not compile C code. (Frederic B.)
* New flag `print_active_device` flag that default to True. (Matthew R.) * New flag `print_active_device` that defaults to True. (Matthew R.)
Documentation: Documentation:
* Added in the tutorial documentation on how to extend Theano. * Added in the tutorial documentation on how to extend Theano.
...@@ -303,11 +303,11 @@ Documentation: ...@@ -303,11 +303,11 @@ Documentation:
http://www.deeplearning.net/software/theano/tutorial/sparse.html http://www.deeplearning.net/software/theano/tutorial/sparse.html
* Installation documentation for CentOS6 (Frederic B.) * Installation documentation for CentOS6 (Frederic B.)
* Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer) * Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer)
* Doc type fix, Doc update, Better error messag: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G. * Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G.
* Python Memory Management (Steven Pigeon, Olivier D.) * Python Memory Management tutorial (Steven Pigeon, Olivier D.)
Proposal: Proposal:
* Math framework for complex gradien (Pascal L.) * Math framework for complex gradients (Pascal L.)
Internal changes: Internal changes:
...@@ -324,25 +324,25 @@ Internal changes: ...@@ -324,25 +324,25 @@ Internal changes:
* tensor.utils.shape_of_variables (Matthew R.) * tensor.utils.shape_of_variables (Matthew R.)
* Add the numpy abi version and g++/nvcc version in the key of compiled code. (Frederic B.) * Add the numpy abi version and g++/nvcc version in the key of compiled code. (Frederic B.)
* env.replace_all_validate_remove (Frederic B.) * env.replace_all_validate_remove (Frederic B.)
This allow global optimizer to ensure it removed some nodes from the graph. This allows global optimizer to ensure it removed some nodes from the graph.
This is a generic way to catch error that would otherwise duplicate This is a generic way to catch errors that would otherwise duplicate
computation. computation.
* It was used for GEMM and Scan optimization (Frederic B., Razvan P.) * It was used for GEMM and Scan optimization (Frederic B., Razvan P.)
* Fix how exception are raised in GPU code (James B.) * Fix how exception are raised in GPU code (James B.)
* Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen and others. * Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen and others.
* TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or * TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or
numpy.zeros with the right dtype. (Pascal L., Olivier D.) numpy.zeros with the right dtype. (Pascal L., Olivier D.)
This allow to have the same code work with both type. This allows to have the same code work with both types.
* Renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.) * Renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.)
* New exception MissingGXX when we try to compile but there is no cxx compiler. (Frederic B.) * New exception MissingGXX when we try to compile but there is no cxx compiler. (Frederic B.)
* New fct theano.gof.utils.give_variables_names(...) that give unique name to variable. (Matthew R.) * New fct theano.gof.utils.give_variables_names(...) that gives unique names to variables. (Matthew R.)
* Use most of the time the new NumPy C-API for later NumPy release. (Frederic B.) * Use most of the time the new NumPy C-API for later NumPy release. (Frederic B.)
* New theano.gof.sched.sort_apply_nodes() that will allow other execution ordering. (Matthew R.) * New theano.gof.sched.sort_apply_nodes() that will allow other execution ordering. (Matthew R.)
* New attribute sort_schedule_fn, a way to specify a scheduler to use. (Matthew R.) * New attribute sort_schedule_fn, a way to specify a scheduler to use. (Matthew R.)
Crash Fix: Crash Fix:
* Fix import conflict name (usaar33, Frederic B.) * Fix import conflict name (usaar33, Frederic B.)
* This make Theano work with PiCloud. * This makes Theano work with PiCloud.
* Do not try to use the BLAS library when blas.ldflags is manually set to an * Do not try to use the BLAS library when blas.ldflags is manually set to an
empty string (Frederic B., Pascal L.) empty string (Frederic B., Pascal L.)
* When importing theano on a computer without GPU with the Theano * When importing theano on a computer without GPU with the Theano
...@@ -369,7 +369,7 @@ Crash Fix: ...@@ -369,7 +369,7 @@ Crash Fix:
* In advanced indexing, if some inputs are constant, no need to call constant(...) * In advanced indexing, if some inputs are constant, no need to call constant(...)
on their value any more. (Pascal L., reported by John Salvatier) on their value any more. (Pascal L., reported by John Salvatier)
* Fix crash on GPU when the GpuSubtensor didn't put the right stride * Fix crash on GPU when the GpuSubtensor didn't put the right stride
when the results tensor had a dimensions with size of 1. (Pascal L, when the result tensor had a dimension with size of 1. (Pascal L,
reported Graham T.) reported Graham T.)
* Fix scan crash that made it not run on the GPU in one case. (Guillaume D.) * Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
* If you grad again a random state, don't crash (Razvan P.) * If you grad again a random state, don't crash (Razvan P.)
...@@ -379,20 +379,20 @@ Crash Fix: ...@@ -379,20 +379,20 @@ Crash Fix:
(Olivier D.) (Olivier D.)
* Crash fix on python 2.4 with slicing. (Pascal L.) * Crash fix on python 2.4 with slicing. (Pascal L.)
* grad of argmin and argmax (Razvan P.) * grad of argmin and argmax (Razvan P.)
* Don't compute the Rop for shared variable with updates(mostly random). * Don't compute the Rop for shared variables with updates (mostly random).
We don't use them and they caused crash. (Razvan P.) We don't use them and they caused crash. (Razvan P.)
* MaxArgmax.grad() when one of the gradient it receive is None. (Razvan P, reported by Mark Fenner) * MaxArgmax.grad() when one of the gradient it receives is None. (Razvan P, reported by Mark Fenner)
* Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.) * Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)
Tests: Tests:
* Use less memory (Olivier D.)(frix crash on 32-bits computers) * Use less memory (Olivier D.) (fix crash on 32-bit computers)
* Fix test with Theano flag "blas.ldflags=". (Frederic B., Pascal L.) * Fix test with Theano flag "blas.ldflags=". (Frederic B., Pascal L.)
* Fix crash with advanced subtensor and numpy constant. * Fix crash with advanced subtensor and numpy constant.
* Fix random tests crash due to random value. (Pascal L.) * Fix random tests crash due to random value. (Pascal L.)
* Always introduce Alloc node when calling alloc and let the optimizer remove them if needed. * Always introduce Alloc node when calling alloc and let the optimizer remove them if needed.
This allow DebugMode to catch some shape error. (Pascal L.) This allows DebugMode to catch some shape error. (Pascal L.)
* DebugMode now check the view_map for all type of Theano variable. * DebugMode now checks the view_map for all types of Theano variables.
It was doing only variable of tensor type. (Frederic B.) It was doing only variables of tensor type. (Frederic B.)
Others: Others:
* Remove python warning for some python version. (Gabe Schwartz) * Remove python warning for some python version. (Gabe Schwartz)
...@@ -401,7 +401,7 @@ Others: ...@@ -401,7 +401,7 @@ Others:
* Now we use http://travis-ci.org/ to run all CPU tests (without SciPy) * Now we use http://travis-ci.org/ to run all CPU tests (without SciPy)
with the default mode on all Pull Requests. with the default mode on all Pull Requests.
This should make the trunk more stable. (Fredric B.) This should make the trunk more stable. (Fredric B.)
* Our nightly buildbot now check on python 2.4(Frederic B.) * Our nightly buildbot now checks on python 2.4 (Frederic B.)
This should make the trunk work on it more frequently. This should make the trunk work on it more frequently.
Other thanks: Other thanks:
......
.. _NEWS: .. _NEWS:
Updates in the Trunk since the last release:
https://github.com/Theano/Theano/wiki/Devnews
============= =============
Release Notes Release Notes
============= =============
Theano 0.5 (23 February 2012) Theano 0.6rc1 (October 1st, 2012)
============================= =================================
Highlight:
* Moved to github: http://github.com/Theano/Theano/
* Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets
* Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
* Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
* Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
and dot(vector, vector). (James, Frederic, Pascal)
* C implementation of Alloc. (James, Pascal)
* theano.grad() now also work with sparse variable. (Arnaud)
* Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
* See the Interface changes.
Interface Behavior Changes:
* The current default value of the parameter axis of
theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
numpy: None. i.e. operate on all dimensions of the tensor.
(Frederic Bastien, Olivier Delalleau) (was deprecated and generated
a warning since Theano 0.3 released Nov. 23rd, 2010)
* The current output dtype of sum with input dtype [u]int* is now always [u]int64.
You can specify the output dtype with a new dtype parameter to sum.
The output dtype is the one using for the summation.
There is no warning in previous Theano version about this.
The consequence is that the sum is done in a dtype with more precision than before.
So the sum could be slower, but will be more resistent to overflow.
This new behavior is the same as numpy. (Olivier, Pascal)
* When using a GPU, detect faulty nvidia drivers. This was detected
when running Theano tests. Now this is always tested. Faulty
drivers results in wrong results for reduce operations. (Frederic B.)
Interface Features Removed (most were deprecated):
* The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
were accepted only by theano.function().
Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
* tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt
(list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
* A few tag.shape and Join.vec_length left have been removed. (Frederic)
* The .value attribute of shared variables is removed, use shared.set_value()
or shared.get_value() instead. (Frederic)
* Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
If you use it, Theano will now raise an error. (Olivier D.)
* scan interface changes: (Razvan Pascanu)
* The use of `return_steps` for specifying how many entries of the output
to return has been removed. Instead, apply a subtensor to the output
returned by scan to select a certain slice.
* The inner function (that scan receives) should return its outputs and
updates following this order: [outputs], [updates], [condition].
One can skip any of the three if not used, but the order has to stay unchanged.
Interface bug fix:
* Rop in some case should have returned a list of one Theano variable,
but returned the variable itself. (Razvan)
New deprecation (will be removed in Theano 0.6, warning generated if you use them):
* tensor.shared() renamed to tensor._shared(). You probably want to
call theano.shared() instead! (Olivier D.)
Bug fixes (incorrect results):
* On CPU, if the convolution had received explicit shape information,
they where not checked at runtime. This caused wrong result if the
input shape was not the one expected. (Frederic, reported by Sander
Dieleman)
* Theoretical bug: in some case we could have GPUSum return bad value.
We were not able to reproduce this problem
* patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
* div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
* theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
The grad is now disabled and returns an error. (Frederic)
* An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)"
and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your
code was affected by this bug. (Olivier, reported by Sander Dieleman)
* When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]),
an optimization replacing it with a direct indexing (x[d]) used an incorrect formula,
leading to incorrect results. (Pascal, reported by Razvan)
* The tile() function is now stricter in what it accepts to allow for better
error-checking/avoiding nonsensical situations. The gradient has been
disabled for the time being as it only implemented (incorrectly) one special
case. The `reps` argument must be a constant (not a tensor variable), and
must have the same length as the number of dimensions in the `x` argument;
this is now checked. (David)
Scan fixes:
* computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
now : do the right thing.
* gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
* before : it used to return wrong values
* now : do the right thing.
* Note: The reported case of this bug was happening in conjunction with the
save optimization of scan that give run time errors. So if you didn't
manually disable the same memory optimization (number in the list4),
you are fine if you didn't manually request multiple taps.
* Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
before : compilation error when computing R-op
now : do the right thing.
* save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
before : for certain corner cases used to result in a runtime shape error
now : do the right thing.
* Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
* Scan.infer_shape now works correctly when working with a condition for the number of loops.
In the past, it returned n_steps as the length, which is not always true. (Razvan)
* Scan.infer_shape crash fix. (Razvan)
New features:
* AdvancedIncSubtensor grad defined and tested (Justin Bayer)
* Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
* tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic)
* Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
* theano-cache list: list the content of the theano cache (Frederic)
* theano-cache unlock: remove the Theano lock (Olivier)
* tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
* MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
* used by tensor.{max,min,max_and_argmax}
* tensor.{all,any} (Razvan)
* tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
* Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
* IfElse now allows to have a list/tuple as the result of the if/else branches.
* They must have the same length and corresponding type (Razvan)
* Argmax output dtype is now int64 instead of int32. (Olivier)
* Added the element-wise operation arccos. (Ian)
* Added sparse dot with dense grad output. (Yann Dauphin)
* Optimized to Usmm and UsmmCscDense in some case (Yann)
* Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
The new theano.sparse.dot() has a dense gradient for all inputs.
* GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
* TensorVariable.zeros_like() and SparseVariable.zeros_like()
* theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
* theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
* Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
* We also support the "theano_version" substitution.
* IntDiv c code (faster and allow this elemwise to be fused with other elemwise) (Pascal)
* Internal filter_variable mechanism in Type. (Pascal, Ian)
* Ifelse works on sparse.
* It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
* Added a_tensor.transpose(axes) axes is optional (James)
* theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes.
* a_CudaNdarray_object[*] = int, now works (Frederic)
* tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
* sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
* sparse_variable[N, N] now works (Li Yao, Frederic)
* sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal)
M, N, O, and P can be Python int or scalar tensor variables, None, or
omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
* tensor.tensordot can now be moved to GPU (Sander Dieleman,
Pascal, based on code from Tijmen Tieleman's gnumpy,
http://www.cs.toronto.edu/~tijmen/gnumpy.html)
* Many infer_shape implemented on sparse matrices op. (David W.F.)
* Added theano.sparse.verify_grad_sparse to easily allow testing grad of
sparse op. It supports testing the full and structured gradients.
* The keys in our cache now store the hash of constants and not the constant values
themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
* 'theano-cache list' lists key files bigger than 1M (Frederic B.)
* 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
* 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
* The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
* Add the header_dirs to the hard part of the compilation key. This is
currently used only by cuda, but if we use library that are only headers,
this can be useful. (Frederic B.)
* The Theano flag "nvcc.flags" is now included in the hard part of the key.
This mean that now we recompile all modules for each value of "nvcc.flags".
A change in "nvcc.flags" used to be ignored for module that were already
compiled. (Frederic B.)
* Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
at compile time if all their inputs are constant.
(Frederic B., Pascal L., reported by Sander Dieleman)
* New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)
New optimizations:
* AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
* dot22, dot22scalar work with complex. (Frederic)
* Generate Gemv/Gemm more often. (James)
* Remove scan when all computations can be moved outside the loop. (Razvan)
* scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
* exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
* Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
* Made the optimization process faster. (James)
* Allow fusion of elemwise when the scalar op needs support code. (James)
* Better opt that lifts transpose around dot. (James)
Crashes fixed:
* T.mean crash at graph building time. (Ian)
* "Interactive debugger" crash fix. (Ian, Frederic)
* Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
* Optimization crash with gemm and complex. (Frederic)
* GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
* Compilation crash with amdlibm and the GPU. (Frederic)
* IfElse crash. (Frederic)
* Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
* GPU compilation crash on MacOS X. (Olivier)
* Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
* When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
* Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
* Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
* Fix dot22scalar cast of integer scalars (Justin Bayer, Frederic, Olivier)
* Fix runtime crash in gemm, dot22. FB
* Fix on 32bits computer: make sure all shape are int64.(Olivier)
* Fix to deque on python 2.4 (Olivier)
* Fix crash when not using c code (or using DebugMode) (not used by
default) with numpy 1.6*. Numpy has a bug in the reduction code that
made it crash. (Pascal)
* Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU)
when matrices had non-unit stride in both dimensions (CPU and GPU),
or when matrices had negative strides (GPU only). In those cases,
we are now making copies. (Pascal)
* More cases supported in AdvancedIncSubtensor1. (Olivier D.)
* Fix crash when a broadcasted constant was used as input of an
elemwise Op and needed to be upcasted to match the op's output.
(Reported by John Salvatier, fixed by Pascal L.)
* Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)
Highlights:
* Bug fixes, crash fixes, CPU and GPU speed up.
* theano_var.eval({other_var: val[,...]} to simplify the usage of Theano (Ian G.)
* New default linker `cvm`. This is the execution engine that tells what op to run in which order.
It is now implemented in C and enables lazy evaluation of ifelse op.
* Faster theano.function compilation. (Pascal L., Ian G.)
* Big sparse submodule update and documentation of it. (Nicolas Bouchard)
* Use GPU asynchronous functionality (Frederic B.)
* Better Windows support.
Known bugs: Known bugs:
* CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_). * A few crash cases that will be fixed by the final release.
* This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements. * CAReduce with NaN in inputs do not return the correct output. (reported by Pascal L.)
* This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
Sandbox: Bug fixes:
* cvm interface more consistent with current linker. (James) * Outputs of Scan nodes could contain corrupted values: some parts of the
* Now all tests pass with the linker=cvm flags. output would be repeated a second time, instead of the correct values.
* vm linker has a callback parameter. (James) It happened randomly, and quite infrequently, but the bug has been present
* review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier) (both in Python and Cython) since April 2011. (Pascal L.)
* review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume) * In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale.
* review/finish/doc: MatrixInverse, matrix_inverse. (Razvan) It did not return the right number of elements. (Frederic B.)
* review/finish/doc: matrix_dot. (Razvan) * set_subtensor(x[int vector], new_value) when moved to the GPU
* review/finish/doc: det (determinent) op. (Philippe Hamel) was transformed into inc_subtensor on the GPU. Now we have a correct
* review/finish/doc: Cholesky determinent op. (David) (but slow) GPU implementation.
* review/finish/doc: ensure_sorted_indices. (Li Yao) Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly
* review/finish/doc: spectral_radius_boud. (Xavier Glorot) in all cases as well as all inc_subtensor.
* review/finish/doc: sparse sum. (Valentin Bisson) Note 2: If your code was affected by the incorrect behavior, we now print
* review/finish/doc: Remove0 (Valentin) a warning by default (Frederic B.)
* review/finish/doc: SquareDiagonal (Eric) * Fixed an issue whereby config values were used as default arguments,
with those defaults then stuck at old values if the config variables were
changed during program execution. (David W-F)
Sandbox New features (not enabled by default): * Fixed many subtle bugs involving mutable default arguments which may have
* CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James) led to unexpected behaviour, such as objects sharing instance variables
* New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan) they were not supposed to share. (David W-F)
* Correctly record the GPU device number used when we let the driver select it.
(Frederic B.)
* The grad of TensorDot, was returning the wrong shape for some combination of axes.
We now raise NotImplementedError in those cases. (Frederic B.)
* conv2d with subsample >2 returned wrong values. (Pascal L.)
* Fixed when mode==valid, disabled when mode==full
* theano.sparse.CSMGrad op (generated by the grad of CSM) didn't
handle unsorted input correctly and gradient that is sparser
than the input. In that case, a bad result was returned. But this could
happen only when a sparse input of a Theano function was not
sorted. This happens for example with sparse advanced indexing from
scipy. The conclusion is most of time Nan in the graph.
(Yann Dauphin)
* theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handle
correctly not contiguous inputs/outputs. (Pascal L.)
* Fix a corner case CVM updates case. (Pascal L.)
This happened if the update to a shared variable is itself after optimization.
The CVM was not used by default.
* Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.)
This probably didn't cause problem as there is only the UsmmCscDense op
(used call to Usmm with CSC matrix) that could interfere with them.
Deprecation:
* Deprecated the Module class (Ian G.)
This was a predecessor of SharedVariable with a less pythonic philosophy.
Interface changes:
* Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.8.
* In Theano 0.5, we removed the deprecated sharedvar.value property.
Now we raise an error if you access it. (Frederic B.)
* theano.function does not accept duplicate inputs, so function([x, x], ...)
does not work anymore. (Pascal L.)
* theano.function now raises an error if some of the provided inputs are
not part of the computational graph needed to compute the output, for
instance, function([x, y], [y]). You can use the kwarg
``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
(Pascal L.)
* New Theano flag "on_unused_input" that defines the default value of the
previous point. (Frederic B.)
* tensor.alloc() now raises an error during graph build time
when we try to create less dimensions than the number of dimensions
the provided value have. In the past, the error was at run time.
(Frederic B.)
* Remove theano.Value and related stuff (Ian G.)
This was a test of what ended up as SharedVariable.
* Renamed Env to FunctionGraph, and object attribute "env" to "fgraph" (Ian G.)
Deprecation warning printed when you try to access the "env" attribute.
* Renamed the FunctionGraph.nodes attribute to FunctionNodes.apply_nodes (Ian G.)
* Warn when we don't handle correctly the parameter in Theano flags `nvcc.flags`
(Frederic B.)
* Do not reorder the user flags passed to the compiler. They get set after other flags. (Frederic B.)
* Make setuptools optional (Ilan Schnell)
* We warn when a user tries to use an old GPU with which Theano is untested.
This could cause crash and will also be very slow. (Frederic B.)
* Make theano.grad able to differentiate between not implemented, undefined and disconnected grad.
Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} or
something of DisconectedType (Ian G.)
* Make theano.grad expect to always receive a float or undefined
gradient and enforce that op with integer output values always
return 0. (Ian G.)
New memory output contract (was mentioned in the release notes of Theano 0.5):
* Now the output memory received can be preallocated by other stuff.
In the past it was always the previous output an Apply node allocated.
So this means that the shape and strides can be different from previous calls
and there can be links to this memory at other places.
This means it could receive preallocated output that is not c_contiguous.
But we don't do that now. (Pascal L.)
* New Theano flags to test this DebugMode.check_preallocated_output (Pascal L.)
* Updated a few ops to respect this contract (Pascal L.)
New Features:
* GPU scan now works (does not crash) when there is a mixture of float32 and other dtypes.
* theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
* debugprint new param ids=["CHAR", "id", "int", ""]
This makes the identifier printed to be a unique char, the Python id, a
unique int, or not have it printed. We changed the default to be "CHAR"
as this is more readable. (Frederic B.)
* debugprint new param stop_on_name=[False, True]. If True, we don't print
anything below an intermediate variable that has a name. Defaults to False.
(Frederic B.)
* debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
* If you use Enthought Python Distribution (EPD) now we use its blas
implementation by default. (Frederic B., Graham Taylor, Simon McGregor)
* MRG random now raises an error with a clear message when the passed shape
contains dimensions with bad value like 0. (Frederic B. reported by Ian G.)
* "CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
* "CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
* We add dimensions to CudaNdarray to automatically broadcast more frequently.
(Frederic B.)
* New theano flag cmodule.warn_no_version. Default False. If True,
will print a warning when compiling one or more Op with C code that
can't be cached because there is no c_code_cache_version() function
associated to at least one of those Ops. (Frederic B.)
* CPU alloc now always generate C code (Pascal L.)
* New Theano flag cmodule.warn_no_version=False. When True, warn when an op
with C code is not versioned (which forces to recompile it everytimes).
(Frederic B.)
* C code reuses preallocated outputs (only done by Scan) (Pascal L.)
* Garbage collection of intermediate results during Theano function calls
for Ops with C code (Pascal L.)
* Theano flag compiledir_format now supports the parameter "numpy_version" and "g++". (Frederic B.)
* Theano GPU variables, shared variables and constants now support <, <=,
> and >= similar to those not on the GPU.
* AdvancedIncSubtensor now supports the set_instead_of_inc parameter. (Eric L.)
* Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.)
* theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman}
have a new parameter keepdims (Eric L.)
This allows to broadcast it correctly against the input data to normalize it.
* The Updates objects now check that the keys are SharedVariable when we pass them
in the __init__ function. (Pascal L.)
* Set a Theano Variable name on transposed op when the input has one (Frederic B).
* The cvm linker now supports garbage collection (enabled by default). (James B. Arnaud B., Pascal L.)
* The cvm linker is now the default linker.
This makes the "loop" around the execution of apply node in C. So this lowers the overhead.
* theano_variable[numpy.newaxis] is now supported (James B.)
* Enable ifelse on the GPU. (Frederic B.)
* Correctly support numpy.memmap everywhere (Pascal L.)
We add partial support for them before. Just use the normal tensor operation
on them and it should work.
But be careful not to exhaust your computer memory! (we always generate normal ndarray)
* Add an optimization that stabilizes log(softmax(x)). (Ian G.)
* Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.)
* If `theano_fn.trust_input` is set to False, do not check if the inputs are good
when calling the theano function. (Frederic B.)
* Add theano.tensor.blas,gem{m,v} as shortcut.
* theano.grad(..., add_names=True). False for the old
behavior. Otherwise it tries to name the grad variables. (Ian G.)
* theano-nose (Pascal L.)
A wrapper around nosetests that adds needed extensions.
* --profile-time option, to print time spent in each test (Eric L.)
* --batch option, to allow to run tests in batch to lower memory requirement.
* m = mean(log(1 - sigm(x)))
x - scalar * theano.grad(m, x)
There is a stabilization optimization for this.
Now it is applied more frequently. (Pascal L.)
New Op/functions:
* Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard)
* Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard)
* Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard)
* Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
* Added theano.tensor.diff that wraps numpy.diff (Nicolas B.)
* Added theano.tensor.bincount that wraps numpy.bincount (Nicolas B., Pascal L, Frederic B.)
* Added theano.tensor.squeeze (Nicolas B.)
This removes broadcasted dimensions from the variable.
Theano-esque version of numpy.squeeze.
* Added theano.tensor.repeat that wraps numpy.repeat (Nicolas B. + PL)
* Added theano.tensor.bartlett that wraps numpy.bartlett (Eric L.)
* Added theano.tensor.fill_diagonal that wraps numpy.fill_diagonal (Eric L., Frederic B.)
* Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.)
* Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op
that allows to load a .npy file in a theano graph (Matthew Rocklin)
* theano.sandbox.linalg.kron.py:Kron op. (Eric L.)
Kronecker product
Speed up:
* CPU convolutions are now parallelized (Frederic B.)
By default use all cores/hyper-threads.
To control it, use the `OMP_NUM_THREADS=N` environment variable where N is the number of
parallel threads to use. By default it is equal to the number of CPU cores/hyper
threads that you have.
There is a new Theano flag `openmp` to allow/disallow openmp op.
If your BLAS library is parallelized, this flag won't affect it, but the
env variable will.
* Remove a corner case causing duplicated dot22/gemm in the graph. (Frederic B., Ian G.)
* Enable fusion of elemwise that have the same clients multiple times. (Frederic B.)
* New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.)
* Faster theano.function compilation. (Pascal L., Ian G.)
* Remove GPU transfer around specify_shape op. (Frederic B.)
* Implemented/tested MANY op.infer_shape method (Eric Larsen)
This allows Theano to make better shape inferance.
* Implement Solve.infer_shape (Matthew Rocklin)
* Scan memory optimizations now work more frequently. (Razvan P.)
There was a warning printed by the subtensor optimization in those cases.
* Faster rng_mrg Python code. (mostly used for tests) (Frederic B.)
Speed up GPU:
* Convolution on the GPU now checks the generation of the card to make
it faster in some cases (especially medium/big ouput image) (Frederic B.)
* We had hardcoded 512 as the maximum number of threads per block. Newer cards
support up to 1024 threads per block.
* Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
* We now pass the GPU architecture to nvcc when compiling (Frederic B.)
* Now we use the GPU function async feature by default. (Frederic B.)
Set the environment variable `CUDA_LAUNCH_BLOCKING` to `1` to disable this
for profiling or debugging.
* Faster creation of CudaNdarray objects (Frederic B.)
* Now some Max reductions are implemented on the GPU. (Ian G.)
Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp):
* sparse.remove0 (Frederic B., Nicolas B.)
* sparse.sp_sum(a, axis=None) (Nicolas B.)
* bugfix: the not structured grad was returning a structured grad.
* sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
* sparse.{diag,square_diagonal} (Nicolas B.)
Sparse:
* Support for uint* dtype.
* Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't
have the same sparsity pattern. (Frederic B.)
* New Ops: sparse.{expm1,deg2rad,rad2deg,trunc} (Nicolas B.)
* New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
* New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
* New functions: structured_{add,exp,log,pow,minimum,maximum,sigmoid} (Yann D., Nicolas B.)
* Optimized op: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
* New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
* New Op: sparse.Cast() (Yann D., Nicolas B.)
* Add sparse_variable.astype() and theano.sparse.cast() and
theano.sparse.{b,w,i,l,f,d,c,z}cast() as their tensor equivalent (Nicolas B.)
* Op class: SamplingDot (Yann D., Nicolas B.)
* Optimized version: SamplingDotCsr, StructuredDotCSC
* Optimizations to insert the optimized version: local_sampling_dot_csr, local_structured_add_s_v
* New Ops: sparse.{Multinomial,Poisson,Binomial} (Yann D., NB)
* Implement the CSMProperties grad method (Yann Dauphin)
* Move optimizations to theano/sparse/opt.py (Nicolas B.)
New flags:
* `profile=True` flag now prints the sum of all printed profiles. (Frederic B.)
* It works with the linkers vm/cvm (default).
* Also print compile time, optimizer time and linker time.
* Also print a summary by op class.
* new flag "profile_optimizer" (Frederic B.)
when profile=True, will also print the time spent in each optimizer.
Useful to find optimization bottleneck.
* new flag "cmodule.remove_gxx_opt" (Frederic B.)
If True, will remove -O* parameter passed to g++.
This is useful to debug in gdb module compiled by Theano.
The parameter -g is passed by default to g++.
* new flag cmodule.compilation_warning
if True, will print compilation warning.
* new flag `allow_gc` (Frederic B.)
When False, do not garbage collect intermediate results when they are not needed.
This uses more memory, but allocates memory less frequently so faster.
* new flag `vm.lazy` (Frederic B.)
Useful only for the vm linkers. When lazy is None,
auto detect if lazy evaluation is needed and use the apropriate
version. If lazy is True/False, force the version used between
Loop/LoopGC and Stack.
* new flag `cxx`. This is the C++ compiler to use. If empty do not compile C code. (Frederic B.)
* New flag `print_active_device` that defaults to True. (Matthew R.)
Documentation: Documentation:
* Many updates. (Many people) * Added in the tutorial documentation on how to extend Theano.
* Updates to install doc on MacOS. (Olivier) This explains how to make a Theano Op from a Python function.
* Updates to install doc on Windows. (David, Olivier) http://deeplearning.net/software/theano/tutorial/extending_theano.html
* Doc on the Rop function (Ian) (Frederic B.)
* Added how to use scan to loop with a condition as the number of iteration. (Razvan) * New installation instructions for Windows using EPD (Pascal L.)
* Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic) * New installation on Windows by using a Linux VM from ContinuumIO (Frederic B.)
* Refactored GPU installation of Theano. (Olivier) * Revisions of Theano tutorial and addition of exercices to it. (Eric L.)
* New tutorial on Sparse variable. (Nicolas B., Sebastien Lemieux, Frederic Bastien
http://www.deeplearning.net/software/theano/tutorial/sparse.html
* Installation documentation for CentOS6 (Frederic B.)
* Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer)
* Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G.
* Python Memory Management tutorial (Steven Pigeon, Olivier D.)
Proposal:
* Math framework for complex gradients (Pascal L.)
Internal changes:
* Define new exceptions MissingInputError and UnusedInputError, and use them
in theano.function, instead of TypeError and ValueError. (Pascal L.)
* Better handling of bitwidth and max values of integers and pointers
across platforms (Pascal L.)
* Made a few Ops with C code versioned to reduce compilation time.
(Frederic B, Pascal L.)
* Better deletion of files in the compiledir (Frederic B.)
* Safer import on sort op (Nicolas Pinto)
* hash_from_dict for elemwise op (Fredric B.)
* Renamed BadCLinkerOutput into BadThunkOutput. (PL)
* tensor.utils.shape_of_variables (Matthew R.)
* Add the numpy abi version and g++/nvcc version in the key of compiled code. (Frederic B.)
* env.replace_all_validate_remove (Frederic B.)
This allows global optimizer to ensure it removed some nodes from the graph.
This is a generic way to catch errors that would otherwise duplicate
computation.
* It was used for GEMM and Scan optimization (Frederic B., Razvan P.)
* Fix how exception are raised in GPU code (James B.)
* Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen and others.
* TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or
numpy.zeros with the right dtype. (Pascal L., Olivier D.)
This allows to have the same code work with both types.
* Renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.)
* New exception MissingGXX when we try to compile but there is no cxx compiler. (Frederic B.)
* New fct theano.gof.utils.give_variables_names(...) that gives unique names to variables. (Matthew R.)
* Use most of the time the new NumPy C-API for later NumPy release. (Frederic B.)
* New theano.gof.sched.sort_apply_nodes() that will allow other execution ordering. (Matthew R.)
* New attribute sort_schedule_fn, a way to specify a scheduler to use. (Matthew R.)
Crash Fix:
* Fix import conflict name (usaar33, Frederic B.)
* This makes Theano work with PiCloud.
* Do not try to use the BLAS library when blas.ldflags is manually set to an
empty string (Frederic B., Pascal L.)
* When importing theano on a computer without GPU with the Theano
flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by Luo Heng)
* Optimization printed a useless error when scipy was not available. (Frederic B.)
* GPU conv crash/slowdown on newer hardware (James B.)
* Better error handling in GPU conv (Frederic B.)
* GPU optimization that moves element-wise Ops to the GPU. Crash happened in
a particular execution order of this optimization and the
element-wise fusion optimization when upcasting some inputs to
float32 (to compute them on the GPU).
(Frederic B., reported by Sander Dieleman)
* GpuReshape in some particular case when the input is not contiguous
(Frederic B., reported by Sander Dieleman)
* GpuSoftmaxWithBias with shape (0, N) with N > 1.
(Frederic B., reported by Razvan P.)
* Fix crash under 64-bit Windows, when taking subtensors of the form a[n:]
(Pascal L., reported by Simon McGregor)
* Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable
dimensions, which could typically result in optimization crashes (Olivier D.)
* Fixed crash when concatenating some arrays with specific broadcasting
patterns (Olivier D.)
* Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
* In advanced indexing, if some inputs are constant, no need to call constant(...)
on their value any more. (Pascal L., reported by John Salvatier)
* Fix crash on GPU when the GpuSubtensor didn't put the right stride
when the result tensor had a dimension with size of 1. (Pascal L,
reported Graham T.)
* Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
* If you grad again a random state, don't crash (Razvan P.)
* GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535.
(Frederic B. reported by Gabe Schwartz)
* Potential crash due to parallel compilation when importing theano.sandbox.cuda
(Olivier D.)
* Crash fix on python 2.4 with slicing. (Pascal L.)
* grad of argmin and argmax (Razvan P.)
* Don't compute the Rop for shared variables with updates (mostly random).
We don't use them and they caused crash. (Razvan P.)
* MaxArgmax.grad() when one of the gradient it receives is None. (Razvan P, reported by Mark Fenner)
* Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)
Tests:
* Use less memory (Olivier D.) (fix crash on 32-bit computers)
* Fix test with Theano flag "blas.ldflags=". (Frederic B., Pascal L.)
* Fix crash with advanced subtensor and numpy constant.
* Fix random tests crash due to random value. (Pascal L.)
* Always introduce Alloc node when calling alloc and let the optimizer remove them if needed.
This allows DebugMode to catch some shape error. (Pascal L.)
* DebugMode now checks the view_map for all types of Theano variables.
It was doing only variables of tensor type. (Frederic B.)
Others: Others:
* Better error messages in many places. (Many people) * Remove python warning for some python version. (Gabe Schwartz)
* PEP8 fixes. (Many people) * Remove useless fill op in fast_compile mode to make the graph more readable. (Fredric B.)
* Add a warning about numpy bug when using advanced indexing on a * Remove GpuOuter as it is a subset of the new GpuGer (Frederic B.)
tensor with more than 2**32 elements (the resulting array is not * Now we use http://travis-ci.org/ to run all CPU tests (without SciPy)
correctly filled and ends with zeros). (Pascal, reported by David WF) with the default mode on all Pull Requests.
* Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan) This should make the trunk more stable. (Fredric B.)
* New min_informative_str() function to print graph. (Ian) * Our nightly buildbot now checks on python 2.4 (Frederic B.)
* Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier) This should make the trunk work on it more frequently.
* Better support for utf string. (David)
* Fix pydotprint with a function compiled with a ProfileMode (Frederic) Other thanks:
* Was broken with change to the profiler. * blaxill reported an error introduced into the trunk.
* Warning when people have old cache entries. (Olivier)
* More tests for join on the GPU and CPU. (Frederic) New stuff that will probably be reworked/removed before the release:
* Do not request to load the GPU module by default in scan module. (Razvan) * Better PyCUDA sharing of the GPU context.(fix crash at exit) (Frederic B.)
* Fixed some import problems. (Frederic and others) TODO: there is still a crash at exit!
* Filtering update. (James)
* On Windows, the default compiledir changed to be local to the
computer/user and not transferred with roaming profile. (Sebastian
Urban)
* New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
it prints a warning when an error occurs when inferring the shape of some apply node.
The other accepted value is "raise" to raise an error when this happens. (Frederic)
* The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
* better pycuda tests (Frederic)
* check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
* Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
* More internal verification on what each op.infer_shape return. (Frederic, James)
* Argmax dtype to int64 (Olivier)
* Improved docstring and basic tests for the Tile Op (David).
Reviewers (alphabetical order):
* David, Frederic, Ian, James, Olivier, Razvan
...@@ -165,7 +165,7 @@ Note: There is no short term plan to support multi-node computation. ...@@ -165,7 +165,7 @@ Note: There is no short term plan to support multi-node computation.
Theano Vision State Theano Vision State
=================== ===================
Here is the state of that vision as of 1 October 2012 (after Theano release Here is the state of that vision as of October 1st, 2012 (after Theano release
0.6rc1): 0.6rc1):
* We support tensors using the `numpy.ndarray` object and we support many operations on them. * We support tensors using the `numpy.ndarray` object and we support many operations on them.
...@@ -196,8 +196,8 @@ Here is the state of that vision as of 1 October 2012 (after Theano release ...@@ -196,8 +196,8 @@ Here is the state of that vision as of 1 October 2012 (after Theano release
* The profiler used by cvm is less complete than `ProfileMode`. * The profiler used by cvm is less complete than `ProfileMode`.
* SIMD parallelism on the CPU comes from the compiler. * SIMD parallelism on the CPU comes from the compiler.
* Multi-core parallelism is only supported Conv2d. If the external BLAS implementation supports it, * Multi-core parallelism is only supported by Conv2d. If the external BLAS implementation supports it,
there is also, gemm, gemv and ger that are parallelized. there are also, gemm, gemv and ger that are parallelized.
* No multi-node support. * No multi-node support.
* Many, but not all NumPy functions/aliases are implemented. * Many, but not all NumPy functions/aliases are implemented.
* http://www.assembla.com/spaces/theano/tickets/781 * http://www.assembla.com/spaces/theano/tickets/781
......
...@@ -251,7 +251,7 @@ import theano and print the config variable, as in: ...@@ -251,7 +251,7 @@ import theano and print the config variable, as in:
Default False Default False
Do the vm/cvm linker profile the execution of Theano function? Do the vm/cvm linkers profile the execution of Theano functions?
.. attribute:: profile_optimizer .. attribute:: profile_optimizer
...@@ -259,7 +259,7 @@ import theano and print the config variable, as in: ...@@ -259,7 +259,7 @@ import theano and print the config variable, as in:
Default False Default False
Do the vm/cvm linker profile the optimization phase when compiling a Theano function? Do the vm/cvm linkers profile the optimization phase when compiling a Theano function?
.. attribute:: config.lib.amdlibm .. attribute:: config.lib.amdlibm
......
...@@ -135,7 +135,7 @@ then be used like a normal Python function. ...@@ -135,7 +135,7 @@ then be used like a normal Python function.
variables to the values to substitute for them, and it returned variables to the values to substitute for them, and it returned
the numerical value of the expression. the numerical value of the expression.
:func:`eval` will be slow the first time you call it on a variable-- :func:`eval` will be slow the first time you call it on a variable --
it needs to call :func:`function` to compile the expression behind it needs to call :func:`function` to compile the expression behind
the scenes. Subsequent calls to :func:`eval` on that same variable the scenes. Subsequent calls to :func:`eval` on that same variable
will be fast, because the variable caches the compiled function. will be fast, because the variable caches the compiled function.
......
...@@ -34,13 +34,12 @@ Also the ``-march=native`` flag must be used with care if you have NFS. In that ...@@ -34,13 +34,12 @@ Also the ``-march=native`` flag must be used with care if you have NFS. In that
Faster Theano function Faster Theano function
---------------------- ----------------------
You can set the Theano `allow_gc` to `False` to get a speed up by You can set the Theano flag `allow_gc` to `False` to get a speed-up by using
using more memory. By default, Theano free intermediate results when more memory. By default, Theano frees intermediate results when we don't need
we don't need them anymore. Doing so prevent us from reusing this them anymore. Doing so prevents us from reusing this memory. So disabling the
memory. So disabling the gc will keep all intermediate results memory garbage collection will keep all intermediate results' memory space to allow to
space to allow to reuse them during the next call to the same Theano reuse them during the next call to the same Theano function, if they are of the
function if they are of the good shape. The shape could change if the correct shape. The shape could change if the shapes of the inputs change.
shape of the inputs change.
Faster Small Theano function Faster Small Theano function
---------------------------- ----------------------------
......
...@@ -221,7 +221,7 @@ class Apply(Node): ...@@ -221,7 +221,7 @@ class Apply(Node):
return new_node return new_node
def get_parents(self): def get_parents(self):
return list( self.inputs ) return list(self.inputs)
#convenience properties #convenience properties
nin = property(lambda self: len(self.inputs), doc='same as len(self.inputs)') nin = property(lambda self: len(self.inputs), doc='same as len(self.inputs)')
...@@ -387,8 +387,8 @@ class Variable(Node): ...@@ -387,8 +387,8 @@ class Variable(Node):
def get_parents(self): def get_parents(self):
if self.owner is not None: if self.owner is not None:
return [ self.owner ] return [self.owner]
return [ ] return []
def env_getter(self): def env_getter(self):
warnings.warn("Variable.env is deprecated, it has been renamed 'fgraph'", warnings.warn("Variable.env is deprecated, it has been renamed 'fgraph'",
...@@ -405,8 +405,7 @@ class Variable(Node): ...@@ -405,8 +405,7 @@ class Variable(Node):
stacklevel=2) stacklevel=2)
del self.fgraph del self.fgraph
def eval(self, inputs_to_values=None):
def eval(self, inputs_to_values = None):
""" Evaluates this variable. """ Evaluates this variable.
inputs_to_values: a dictionary mapping theano Variables to values. inputs_to_values: a dictionary mapping theano Variables to values.
...@@ -418,13 +417,12 @@ class Variable(Node): ...@@ -418,13 +417,12 @@ class Variable(Node):
if not hasattr(self, '_fn'): if not hasattr(self, '_fn'):
self._fn_inputs = inputs_to_values.keys() self._fn_inputs = inputs_to_values.keys()
self._fn = theano.function(self._fn_inputs, self) self._fn = theano.function(self._fn_inputs, self)
args = [ inputs_to_values[param] for param in self._fn_inputs ] args = [inputs_to_values[param] for param in self._fn_inputs]
rval = self._fn(*args) rval = self._fn(*args)
return rval return rval
env = property(env_getter, env_setter, env_deleter) env = property(env_getter, env_setter, env_deleter)
...@@ -1030,6 +1028,7 @@ def view_roots(r): ...@@ -1030,6 +1028,7 @@ def view_roots(r):
else: else:
return [r] return [r]
def list_of_nodes(inputs, outputs): def list_of_nodes(inputs, outputs):
""" Return the apply nodes of the graph between inputs and outputs """ """ Return the apply nodes of the graph between inputs and outputs """
return stack_search( return stack_search(
......
...@@ -1052,7 +1052,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){ ...@@ -1052,7 +1052,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
// We are not 100% sure that cudaMemcpy wait that the async gpu kernel are // We are not 100% sure that cudaMemcpy wait that the async gpu kernel are
// finished before doing the transfer. So we add this explicit sync as it // finished before doing the transfer. So we add this explicit sync as it
// is pretty fast. In a python loop, I ran 1 000 000 call in 1 second. // is pretty fast. In a python loop, I ran 1 000 000 call in 1 second.
// It is better to be save and not significatively slower then not safe. // It is better to be safe and not significatively slower than unsafe.
cudaThreadSynchronize(); cudaThreadSynchronize();
err = cudaMemcpy(&cpu_err_var, err_var, sizeof(int), err = cudaMemcpy(&cpu_err_var, err_var, sizeof(int),
......
...@@ -13,10 +13,10 @@ except ImportError: ...@@ -13,10 +13,10 @@ except ImportError:
sys.stderr.write("WARNING: scipy can't be imported." sys.stderr.write("WARNING: scipy can't be imported."
" We disable the sparse matrix code.") " We disable the sparse matrix code.")
from type import * from theano.sparse.type import *
if enable_sparse: if enable_sparse:
from basic import * from theano.sparse.basic import *
import opt from theano.sparse import opt
import sharedvar from theano.sparse import sharedvar
from sharedvar import sparse_constructor as shared from theano.sparse.sharedvar import sparse_constructor as shared
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论