提交 b6db35ce authored 作者: Olivier Delalleau's avatar Olivier Delalleau

Copied NEWS.txt to doc subfolder

上级 d7caa062
.. _NEWS: .. _NEWS:
Updates in the Trunk since the last release:
https://github.com/Theano/Theano/wiki/Devnews
============= =============
Release Notes Release Notes
============= =============
Theano 0.5 (23 February 2012) Theano 0.6rc1 (October 1st, 2012)
============================= =================================
Highlight: Highlights:
* Moved to github: http://github.com/Theano/Theano/ * Bug fixes, crash fixes, CPU and GPU speed up.
* Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets * theano_var.eval({other_var: val[,...]} to simplify the usage of Theano (Ian G.)
* Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people) * New default linker `cvm`. This is the execution engine that tells what op to run in which order.
* Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban) It is now implemented in C and enables lazy evaluation of ifelse op.
* Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm * Faster theano.function compilation. (Pascal L., Ian G.)
and dot(vector, vector). (James, Frederic, Pascal) * Big sparse submodule update and documentation of it. (Nicolas Bouchard)
* C implementation of Alloc. (James, Pascal) * Use GPU asynchronous functionality (Frederic B.)
* theano.grad() now also work with sparse variable. (Arnaud) * Better Windows support.
* Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
* See the Interface changes. Known bug:
* A few crash cases that will be fixed by the final release.
Interface Behavior Changes: Bug fixes:
* The current default value of the parameter axis of * Outputs of Scan nodes could contain corrupted values: some parts of the
theano.{max,min,argmax,argmin,max_and_argmax} is now the same as output would be repeated a second time, instead of the correct values.
numpy: None. i.e. operate on all dimensions of the tensor. It happened randomly, and quite infrequently, but the bug has been present
(Frederic Bastien, Olivier Delalleau) (was deprecated and generated (both in Python and Cython) since April 2011. (Pascal L.)
a warning since Theano 0.3 released Nov. 23rd, 2010) * In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale.
* The current output dtype of sum with input dtype [u]int* is now always [u]int64. It did not return the right number of elements. (Frederic B.)
You can specify the output dtype with a new dtype parameter to sum. * set_subtensor(x[int vector], new_value) when moved to the GPU
The output dtype is the one using for the summation. was transformed into inc_subtensor on the GPU. Now we have a correct
There is no warning in previous Theano version about this. (but slow) GPU implementation.
The consequence is that the sum is done in a dtype with more precision than before. Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly
So the sum could be slower, but will be more resistent to overflow. in all cases as well as all inc_subtensor.
This new behavior is the same as numpy. (Olivier, Pascal) Note 2: If your code was affected by the incorrect behavior, we now print
* When using a GPU, detect faulty nvidia drivers. This was detected a warning by default (Frederic B.)
when running Theano tests. Now this is always tested. Faulty * Fixed an issue whereby config values were used as default arguments,
drivers results in wrong results for reduce operations. (Frederic B.) with those defaults then stuck at old values if the config variables were
changed during program execution. (David W-F)
* Fixed many subtle bugs involving mutable default arguments which may have
Interface Features Removed (most were deprecated): led to unexpected behaviour, such as objects sharing instance variables
* The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They they were not supposed to share. (David W-F)
were accepted only by theano.function(). * Correctly record the GPU device number used when we let the driver select it.
Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead. (Frederic B.)
* tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt * CAReduce with NaN in inputs did not return the good output. (Pascal L.)
(list/tuple/TensorVariable). (Ian Goodfellow, Olivier) * This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
* A few tag.shape and Join.vec_length left have been removed. (Frederic) * The grad of TensorDot, was returning the wrong shape for some combination of axes.
* The .value attribute of shared variables is removed, use shared.set_value() We now raise NotImplementedError in those cases. (Frederic B.)
or shared.get_value() instead. (Frederic) * conv2d with subsample >2 returned wrong values. (Pascal L.)
* Theano config option "home" is not used anymore as it was redundant with "base_compiledir". * Fixed when mode==valid, disabled when mode==full
If you use it, Theano will now raise an error. (Olivier D.) * theano.sparse.CSMGrad op (generated by the grad of CSM) didn't
* scan interface changes: (Razvan Pascanu) handle unsorted input correctly and gradient that is sparser
* The use of `return_steps` for specifying how many entries of the output than the input. In that case, a bad result was returned. But this could
to return has been removed. Instead, apply a subtensor to the output happen only when a sparse input of a Theano function was not
returned by scan to select a certain slice. sorted. This happens for example with sparse advanced indexing from
* The inner function (that scan receives) should return its outputs and scipy. The conclusion is most of time Nan in the graph.
updates following this order: [outputs], [updates], [condition]. (Yann Dauphin)
One can skip any of the three if not used, but the order has to stay unchanged. * theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handle
correctly not contiguous inputs/outputs. (Pascal L.)
Interface bug fix: * Fix a corner case CVM updates case. (Pascal L.)
* Rop in some case should have returned a list of one Theano variable, This happened if the update to a shared variable is itself after optimization.
but returned the variable itself. (Razvan) The CVM was not used by default.
* Fix the view_map of sparse.Transpose and sparse.sandbow.sp.RowScale. (Frederic B.)
New deprecation (will be removed in Theano 0.6, warning generated if you use them): This probably didn't cause problem as there is only the UsmmCscDense op
* tensor.shared() renamed to tensor._shared(). You probably want to (used call to Usmm with CSC matrix) that could interfere with them.
call theano.shared() instead! (Olivier D.)
Deprecation:
* Deprecated the Module class (Ian G.)
Bug fixes (incorrect results): This was a predecessor of SharedVariable with a less pythonic philosophy.
* On CPU, if the convolution had received explicit shape information,
they where not checked at runtime. This caused wrong result if the Interface changes:
input shape was not the one expected. (Frederic, reported by Sander * Now the base version requirements are numpy >= 1.5.0 and the optional scipy >= 0.8.
Dieleman) * In Theano 0.5, we removed the deprecated sharedvar.value property.
* Theoretical bug: in some case we could have GPUSum return bad value. Now we raise an error if you access it. (Frederic B.)
We were not able to reproduce this problem * theano.function does not accept duplicate inputs, so function([x, x], ...)
does not work anymore. (Pascal L.)
* patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim): * theano.function now raises an error if some of the provided inputs are
01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic) not part of the computational graph needed to compute the output, for
instance, function([x, y], [y]). You can use the kwarg
* div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James) ``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
* theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value. (Pascal L.)
The grad is now disabled and returns an error. (Frederic) * New Theano flag "on_unused_input" that defines the default value of the
* An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)" previous point. (Frederic B.)
and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your * tensor.alloc() now raises an error during graph build time
code was affected by this bug. (Olivier, reported by Sander Dieleman) when we try to create less dimensions than the number of dimensions
* When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]), the provided value have. In the past, the error was at run time.
an optimization replacing it with a direct indexing (x[d]) used an incorrect formula, (Frederic B.)
leading to incorrect results. (Pascal, reported by Razvan) * Remove theano.Value and related stuff (Ian G.)
* The tile() function is now stricter in what it accepts to allow for better This was a test of what ended up as SharedVariable.
error-checking/avoiding nonsensical situations. The gradient has been * Renamed Env to FunctionGraph, and object attribute "env" to "fgraph" (Ian G.)
disabled for the time being as it only implemented (incorrectly) one special Deprecation warning printed when you try to access the "env" attribute.
case. The `reps` argument must be a constant (not a tensor variable), and * Renamed the FunctionGraph.nodes attribute to FunctionNodes.apply_nodes (Ian G.)
must have the same length as the number of dimensions in the `x` argument; * Warn when we don't handle correctly the parameter in Theano flags `nvcc.flags`
this is now checked. (David) (Frederic B.)
* Do not reorder the user flags passed to the compiler. They get set after other flags. (Frederic B.)
* Make setuptools optional (Ilan Schnell)
Scan fixes: * We warn when a user tries to use an old GPU with which TheNo is untested.
* computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan) This could cause crash and will also be very slow. (Frederic B.)
before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug) * Make theano.grad able to differentiate between not implemented, undefined and disconnected grad.
now : do the right thing. Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} or
* gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan) something of DisconectedType (Ian G.)
* Make theano.grad expect to always receive a float or undefined
* before : it used to return wrong values gradient and enforce that op with integer output values always
* now : do the right thing. return 0. (Ian G.)
* Note: The reported case of this bug was happening in conjunction with the
save optimization of scan that give run time errors. So if you didn't
manually disable the same memory optimization (number in the list4), New memory output contract (was mentioned in the release notes of Theano 0.5):
you are fine if you didn't manually request multiple taps. * Now the output memory received can be preallocated by other stuff.
* Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan) In the past it was always the previous output an Apply node allocated.
before : compilation error when computing R-op So this means that the shape and strides can be different from previous calls
now : do the right thing. and there can be links to this memory at other places.
* save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan) This means it could receive preallocated output that is not c_contiguous.
before : for certain corner cases used to result in a runtime shape error But we don't do that now. (Pascal L.)
now : do the right thing. * New Theano flags to test this DebugMode.check_preallocated_output (Pascal L.)
* Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes) * Updated a few ops to respect this contract (Pascal L.)
* Scan.infer_shape now works correctly when working with a condition for the number of loops.
In the past, it returned n_steps as the length, which is not always true. (Razvan)
* Scan.infer_shape crash fix. (Razvan) New Features:
* GPU scan now works (does not crash) when there is a mixture of float32 and other dtypes.
New features: * theano_var.eval({other_var:val[,...]} to simplify the usage of Theano (Ian G.)
* AdvancedIncSubtensor grad defined and tested (Justin Bayer) * debugprint new param ids=["CHAR", "id", "int", ""]
* Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra) This makes the identifier printed to be a unique char, the Python id, a
* tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic) unique int, or not have it printed. We changed the default to be "CHAR"
* Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian) as this is more readable. (Frederic B.)
* theano-cache list: list the content of the theano cache (Frederic) * debugprint new param stop_on_name=[False, True]. If True, we don't print
* theano-cache unlock: remove the Theano lock (Olivier) anything below an intermediate variable that has a name. Defaults to False.
* tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic) (Frederic B.)
* MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic) * debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
* used by tensor.{max,min,max_and_argmax} * If you use Enthought Python Distribution (EPD) now we use its blas
* tensor.{all,any} (Razvan) implementation by default. (Frederic B., Graham Taylor, Simon McGregor)
* tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley) * MRG random now raises an error with a clear message when the passed shape
* Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban) contains dimensions with bad value like 0. (Frederic B. reported by Ian G.)
* IfElse now allows to have a list/tuple as the result of the if/else branches. * "CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
* They must have the same length and corresponding type (Razvan) * "CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
* Argmax output dtype is now int64 instead of int32. (Olivier) * We add dimensions to CudaNdarray to automatically broadcast more frequently.
* Added the element-wise operation arccos. (Ian) (Frederic B.)
* Added sparse dot with dense grad output. (Yann Dauphin) * New theano flag cmodule.warn_no_version. Default False. If True,
* Optimized to Usmm and UsmmCscDense in some case (Yann) will print a warning when compiling one or more Op with C code that
* Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs. can't be cached because there is no c_code_cache_version() function
The new theano.sparse.dot() has a dense gradient for all inputs. associated to at least one of those Ops. (Frederic B.)
* GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic) * CPU alloc now always generate C code (Pascal L.)
* TensorVariable.zeros_like() and SparseVariable.zeros_like() * New Theano flag cmodule.warn_no_version=False. When True, warn when an op
* theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic) with C code is not versioned (which forces to recompile it everytimes).
* theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic) (Frederic B.)
* Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder) * C code reuses preallocated outputs (only done by Scan) (Pascal L.)
* We also support the "theano_version" substitution. * Garbage collection of intermediate results during Theano function calls
* IntDiv c code (faster and allow this elemwise to be fused with other elemwise) (Pascal) for Ops with C code (Pascal L.)
* Internal filter_variable mechanism in Type. (Pascal, Ian) * Theano flag compiledir_format now supports the parameter "numpy_version" and "g++". (Frederic B.)
* Ifelse works on sparse. * Theano GPU variables, shared variables and constants now support <, <=,
* It makes use of gpu shared variable more transparent with theano.function updates and givens parameter. > and >= similar to those not on the GPU.
* Added a_tensor.transpose(axes) axes is optional (James) * AdvancedIncSubtensor now supports the set_instead_of_inc parameter. (Eric L.)
* theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes. * Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.)
* a_CudaNdarray_object[*] = int, now works (Frederic) * theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman}
* tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier) have a new parameter keepdims (Eric L.)
* sparse_variable.size (as scipy) computes the number of stored values. (Olivier) This allows to broadcast it correctly against the input data to normalize it.
* sparse_variable[N, N] now works (Li Yao, Frederic) * The Updates objects now check that the keys are SharedVariable when we pass them
* sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal) in the __init__ function. (Pascal L.)
M, N, O, and P can be Python int or scalar tensor variables, None, or * Set a Theano Variable name on transposed op when the input has one (Frederic B).
omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work). * The cvm linker now supports garbage collection (enabled by default). (James B. Arnaud B., Pascal L.)
* tensor.tensordot can now be moved to GPU (Sander Dieleman, * The cvm linker is now the default linker.
Pascal, based on code from Tijmen Tieleman's gnumpy, This makes the "loop" around the execution of apply node in C. So this lowers the overhead.
http://www.cs.toronto.edu/~tijmen/gnumpy.html) * theano_variable[numpy.newaxis] is now supported (James B.)
* Many infer_shape implemented on sparse matrices op. (David W.F.) * Enable ifelse on the GPU. (Frederic B.)
* Added theano.sparse.verify_grad_sparse to easily allow testing grad of * Correctly support numpy.memmap everywhere (Pascal L.)
sparse op. It supports testing the full and structured gradients. We add partial support for them before. Just use the normal tensor operation
* The keys in our cache now store the hash of constants and not the constant values on them and it should work.
themselves. This is significantly more efficient for big constant arrays. (Frederic B.) But be careful not to exhaust your computer memory! (we always generate normal ndarray)
* 'theano-cache list' lists key files bigger than 1M (Frederic B.) * Add an optimization that stabilizes log(softmax(x)). (Ian G.)
* 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.) * Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.)
* 'theano-cache list' prints the number of compiled modules per op class (Frederic B.) * If `theano_fn.trust_input` is set to False, do not check if the inputs are good
* The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file. when calling the theano function. (Frederic B.)
* Add the header_dirs to the hard part of the compilation key. This is * Add theano.tensor.blas,gem{m,v} as shortcut.
currently used only by cuda, but if we use library that are only headers, * theano.grad(..., add_names=True). False for the old
this can be useful. (Frederic B.) behavior. Otherwise it tries to name the grad variables. (Ian G.)
* The Theano flag "nvcc.flags" is now included in the hard part of the key. * theano-nose (Pascal L.)
This mean that now we recompile all modules for each value of "nvcc.flags". A wrapper around nosetests that adds needed extensions.
A change in "nvcc.flags" used to be ignored for module that were already * --profile-time option, to print time spent in each test (Eric L.)
compiled. (Frederic B.) * --batch option, to allow to run tests in batch to lower memory requirement.
* Alloc, GpuAlloc are not always pre-computed (constant_folding optimization) * m = mean(log(1 - sigm(x)))
at compile time if all their inputs are constant. x - scalar * theano.grad(m, x)
(Frederic B., Pascal L., reported by Sander Dieleman) There is a stabilization optimization for this.
* New Op tensor.sort(), wrapping numpy.sort (Hani Almousli) Now it is applied more frequently. (Pascal L.)
New optimizations: New Op/functions:
* AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic) * Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard)
* dot22, dot22scalar work with complex. (Frederic) * Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard)
* Generate Gemv/Gemm more often. (James) * Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard)
* Remove scan when all computations can be moved outside the loop. (Razvan) * Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
* scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan) * Added theano.tensor.diff that wraps numpy.diff (Nicolas B.)
* exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier) * Added theano.tensor.bincount that wraps numpy.bincount (Nicolas B., Pascal L, Frederic B.)
* Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume) * Added theano.tensor.squeeze (Nicolas B.)
* Made the optimization process faster. (James) This removes broadcasted dimensions from the variable.
* Allow fusion of elemwise when the scalar op needs support code. (James) Theano-esque version of numpy.squeeze.
* Better opt that lifts transpose around dot. (James) * Added theano.tensor.repeat that wraps numpy.repeat (Nicolas B. + PL)
* Added theano.tensor.bartlett that wraps numpy.bartlett (Eric L.)
* Added theano.tensor.fill_diagonal that wraps numpy.fill_diagonal (Eric L., Frederic B.)
Crashes fixed: * Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.)
* T.mean crash at graph building time. (Ian) * Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op
* "Interactive debugger" crash fix. (Ian, Frederic) that allows to load a .npy file in a theano graph (Matthew Rocklin)
* Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin) * theano.sandbox.linalg.kron.py:Kron op. (Eric L.)
* Optimization crash with gemm and complex. (Frederic) Kronecker product
* GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
* Compilation crash with amdlibm and the GPU. (Frederic) Speed up:
* IfElse crash. (Frederic) * CPU convolutions are now parallelized (Frederic B.)
* Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal) By default use all cores/hyper-threads.
* GPU compilation crash on MacOS X. (Olivier) To control it, use the `OMP_NUM_THREADS=N` environment variable where N is the number of
* Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier) parallel threads to use. By default it is equal to the number of CPU cores/hyper
* When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic) threads that you have.
* Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle) There is a new Theano flag `openmp` to allow/disallow openmp op.
* Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic) If your BLAS library is parallelized, this flag won't affect it, but the
* Fix dot22scalar cast of integer scalars (Justin Bayer, Frederic, Olivier) env variable will.
* Fix runtime crash in gemm, dot22. FB * Remove a corner case causing duplicated dot22/gemm in the graph. (Frederic B., Ian G.)
* Fix on 32bits computer: make sure all shape are int64.(Olivier) * Enable fusion of elemwise that have the same clients multiple times. (Frederic B.)
* Fix to deque on python 2.4 (Olivier) * New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.)
* Fix crash when not using c code (or using DebugMode) (not used by * Faster theano.function compilation. (Pascal L., Ian G.)
default) with numpy 1.6*. Numpy has a bug in the reduction code that * Remove GPU transfer around specify_shape op. (Frederic B.)
made it crash. (Pascal) * Implemented/tested MANY op.infer_shape method (Eric Larsen)
* Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU) This allows Theano to make better shape inferance.
when matrices had non-unit stride in both dimensions (CPU and GPU), * Implement Solve.infer_shape (Matthew Rocklin)
or when matrices had negative strides (GPU only). In those cases, * Scan memory optimizations now work more frequently. (Razvan P.)
we are now making copies. (Pascal) There was a warning printed by the subtensor optimization in those cases.
* More cases supported in AdvancedIncSubtensor1. (Olivier D.) * Faster rng_mrg Python code. (mostly used for tests) (Frederic B.)
* Fix crash when a broadcasted constant was used as input of an
elemwise Op and needed to be upcasted to match the op's output. Speed up GPU:
(Reported by John Salvatier, fixed by Pascal L.) * Convolution on the GPU now checks the generation of the card to make
* Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.) it faster in some cases (especially medium/big ouput image) (Frederic B.)
* We had hardcoded 512 as the maximum number of threads per block. Newer cards
support up to 1024 threads per block.
Known bugs: * Faster GpuAdvancedSubtensor1, GpuSubtensor, GpuAlloc (Frederic B.)
* CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_). * We now pass the GPU architecture to nvcc when compiling (Frederic B.)
* This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements. * Now we use the GPU function async feature by default. (Frederic B.)
Set the environment variable `CUDA_LAUNCH_BLOCKING` to `1` to disable this
for profiling or debugging.
Sandbox: * Faster creation of CudaNdarray objects (Frederic B.)
* cvm interface more consistent with current linker. (James) * Now some Max reductions are implemented on the GPU. (Ian G.)
* Now all tests pass with the linker=cvm flags.
* vm linker has a callback parameter. (James) Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp):
* review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier) * sparse.remove0 (Frederic B., Nicolas B.)
* review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume) * sparse.sp_sum(a, axis=None) (Nicolas B.)
* review/finish/doc: MatrixInverse, matrix_inverse. (Razvan) * bugfix: the not structured grad was returning a structured grad.
* review/finish/doc: matrix_dot. (Razvan) * sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
* review/finish/doc: det (determinent) op. (Philippe Hamel) * sparse.{diag,square_diagonal} (Nicolas B.)
* review/finish/doc: Cholesky determinent op. (David)
* review/finish/doc: ensure_sorted_indices. (Li Yao) Sparse:
* review/finish/doc: spectral_radius_boud. (Xavier Glorot) * Support for uint* dtype.
* review/finish/doc: sparse sum. (Valentin Bisson) * Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't
* review/finish/doc: Remove0 (Valentin) have the same sparsity pattern. (Frederic B.)
* review/finish/doc: SquareDiagonal (Eric) * New Ops: sparse.{expm1,deg2rad,rad2deg,trunc} (Nicolas B.)
* New Ops: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even} (Nicolas B.)
* New Ops: sparse.{arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
Sandbox New features (not enabled by default): * New functions: structured_{add,exp,log,pow,minimum,maximum,sigmoid} (Yann D., Nicolas B.)
* CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James) * Optimized op: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
* New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan) * New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
* New Op: sparse.Cast() (Yann D., Nicolas B.)
* Add sparse_variable.astype() and theano.sparse.cast() and
theano.sparse.{b,w,i,l,f,d,c,z}cast() as their tensor equivalent (Nicolas B.)
* Op class: SamplingDot (Yann D., Nicolas B.)
* Optimized version: SamplingDotCsr, StructuredDotCSC
* Optimizations to insert the optimized version: local_sampling_dot_csr, local_structured_add_s_v
* New Ops: sparse.{Multinomial,Poisson,Binomial} (Yann D., NB)
* Implement the CSMProperties grad method (Yann Dauphin)
* Move optimizations to theano/sparse/opt.py (Nicolas B.)
New flags:
* `profile=True` flag now prints the sum of all printed profiles. (Frederic B.)
* It works with the linkers vm/cvm (default).
* Also print compile time, optimizer time and linker time.
* Also print a summary by op class.
* new flag "profile_optimizer" (Frederic B.)
when profile=True, will also print the time spent in each optimizer.
Useful to find optimization bottleneck.
* new flag "cmodule.remove_gxx_opt" (Frederic B.)
If True, will remove -O* parameter passed to g++.
This is useful to debug in gdb module compiled by Theano.
The parameter -g is passed by default to g++.
* new flag cmodule.compilation_warning
if True, will print compilation warning.
* new flag `allow_gc` (Frederic B.)
When False, do not garbage collect intermediate results when they are not needed.
This uses more memory, but allocates memory less frequently so faster.
* new flag `vm.lazy` (Frederic B.)
Useful only for the vm linkers. When lazy is None,
auto detect if lazy evaluation is needed and use the apropriate
version. If lazy is True/False, force the version used between
Loop/LoopGC and Stack.
* new flag `cxx`. This is the C++ compiler to use. If empty do not compile C code. (Frederic B.)
* New flag `print_active_device` that defaults to True. (Matthew R.)
Documentation: Documentation:
* Many updates. (Many people) * Added in the tutorial documentation on how to extend Theano.
* Updates to install doc on MacOS. (Olivier) This explains how to make a Theano Op from a Python function.
* Updates to install doc on Windows. (David, Olivier) http://deeplearning.net/software/theano/tutorial/extending_theano.html
* Doc on the Rop function (Ian) (Frederic B.)
* Added how to use scan to loop with a condition as the number of iteration. (Razvan) * New installation instructions for Windows using EPD (Pascal L.)
* Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic) * New installation on Windows by using a Linux VM from ContinuumIO (Frederic B.)
* Refactored GPU installation of Theano. (Olivier) * Revisions of Theano tutorial and addition of exercices to it. (Eric L.)
* New tutorial on Sparse variable. (Nicolas B., Sebastien Lemieux, Frederic Bastien
http://www.deeplearning.net/software/theano/tutorial/sparse.html
* Installation documentation for CentOS6 (Frederic B.)
* Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer)
* Doc typo fixes, Doc updates, Better error messages: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G.
* Python Memory Management tutorial (Steven Pigeon, Olivier D.)
Proposal:
* Math framework for complex gradients (Pascal L.)
Internal changes:
* Define new exceptions MissingInputError and UnusedInputError, and use them
in theano.function, instead of TypeError and ValueError. (Pascal L.)
* Better handling of bitwidth and max values of integers and pointers
across platforms (Pascal L.)
* Made a few Ops with C code versioned to reduce compilation time.
(Frederic B, Pascal L.)
* Better deletion of files in the compiledir (Frederic B.)
* Safer import on sort op (Nicolas Pinto)
* hash_from_dict for elemwise op (Fredric B.)
* Renamed BadCLinkerOutput into BadThunkOutput. (PL)
* tensor.utils.shape_of_variables (Matthew R.)
* Add the numpy abi version and g++/nvcc version in the key of compiled code. (Frederic B.)
* env.replace_all_validate_remove (Frederic B.)
This allows global optimizer to ensure it removed some nodes from the graph.
This is a generic way to catch errors that would otherwise duplicate
computation.
* It was used for GEMM and Scan optimization (Frederic B., Razvan P.)
* Fix how exception are raised in GPU code (James B.)
* Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen and others.
* TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or
numpy.zeros with the right dtype. (Pascal L., Olivier D.)
This allows to have the same code work with both types.
* Renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.)
* New exception MissingGXX when we try to compile but there is no cxx compiler. (Frederic B.)
* New fct theano.gof.utils.give_variables_names(...) that gives unique names to variables. (Matthew R.)
* Use most of the time the new NumPy C-API for later NumPy release. (Frederic B.)
* New theano.gof.sched.sort_apply_nodes() that will allow other execution ordering. (Matthew R.)
* New attribute sort_schedule_fn, a way to specify a scheduler to use. (Matthew R.)
Crash Fix:
* Fix import conflict name (usaar33, Frederic B.)
* This makes Theano work with PiCloud.
* Do not try to use the BLAS library when blas.ldflags is manually set to an
empty string (Frederic B., Pascal L.)
* When importing theano on a computer without GPU with the Theano
flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by Luo Heng)
* Optimization printed a useless error when scipy was not available. (Frederic B.)
* GPU conv crash/slowdown on newer hardware (James B.)
* Better error handling in GPU conv (Frederic B.)
* GPU optimization that moves element-wise Ops to the GPU. Crash happened in
a particular execution order of this optimization and the
element-wise fusion optimization when upcasting some inputs to
float32 (to compute them on the GPU).
(Frederic B., reported by Sander Dieleman)
* GpuReshape in some particular case when the input is not contiguous
(Frederic B., reported by Sander Dieleman)
* GpuSoftmaxWithBias with shape (0, N) with N > 1.
(Frederic B., reported by Razvan P.)
* Fix crash under 64-bit Windows, when taking subtensors of the form a[n:]
(Pascal L., reported by Simon McGregor)
* Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable
dimensions, which could typically result in optimization crashes (Olivier D.)
* Fixed crash when concatenating some arrays with specific broadcasting
patterns (Olivier D.)
* Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
* In advanced indexing, if some inputs are constant, no need to call constant(...)
on their value any more. (Pascal L., reported by John Salvatier)
* Fix crash on GPU when the GpuSubtensor didn't put the right stride
when the result tensor had a dimension with size of 1. (Pascal L,
reported Graham T.)
* Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
* If you grad again a random state, don't crash (Razvan P.)
* GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535.
(Frederic B. reported by Gabe Schwartz)
* Potential crash due to parallel compilation when importing theano.sandbox.cuda
(Olivier D.)
* Crash fix on python 2.4 with slicing. (Pascal L.)
* grad of argmin and argmax (Razvan P.)
* Don't compute the Rop for shared variables with updates (mostly random).
We don't use them and they caused crash. (Razvan P.)
* MaxArgmax.grad() when one of the gradient it receives is None. (Razvan P, reported by Mark Fenner)
* Fix crash of GpuSum when some dimensions shape was 0. (Frederic B.)
Tests:
* Use less memory (Olivier D.) (fix crash on 32-bit computers)
* Fix test with Theano flag "blas.ldflags=". (Frederic B., Pascal L.)
* Fix crash with advanced subtensor and numpy constant.
* Fix random tests crash due to random value. (Pascal L.)
* Always introduce Alloc node when calling alloc and let the optimizer remove them if needed.
This allows DebugMode to catch some shape error. (Pascal L.)
* DebugMode now checks the view_map for all types of Theano variables.
It was doing only variables of tensor type. (Frederic B.)
Others: Others:
* Better error messages in many places. (Many people) * Remove python warning for some python version. (Gabe Schwartz)
* PEP8 fixes. (Many people) * Remove useless fill op in fast_compile mode to make the graph more readable. (Fredric B.)
* Add a warning about numpy bug when using advanced indexing on a * Remove GpuOuter as it is a subset of the new GpuGer (Frederic B.)
tensor with more than 2**32 elements (the resulting array is not * Now we use http://travis-ci.org/ to run all CPU tests (without SciPy)
correctly filled and ends with zeros). (Pascal, reported by David WF) with the default mode on all Pull Requests.
* Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan) This should make the trunk more stable. (Fredric B.)
* New min_informative_str() function to print graph. (Ian) * Our nightly buildbot now checks on python 2.4 (Frederic B.)
* Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier) This should make the trunk work on it more frequently.
* Better support for utf string. (David)
* Fix pydotprint with a function compiled with a ProfileMode (Frederic) Other thanks:
* Was broken with change to the profiler. * blaxill reported an error introduced into the trunk.
* Warning when people have old cache entries. (Olivier)
* More tests for join on the GPU and CPU. (Frederic) New stuff that will probably be reworked/removed before the release:
* Do not request to load the GPU module by default in scan module. (Razvan) * Better PyCUDA sharing of the GPU context.(fix crash at exit) (Frederic B.)
* Fixed some import problems. (Frederic and others) TODO: there is still a crash at exit!
* Filtering update. (James)
* On Windows, the default compiledir changed to be local to the
computer/user and not transferred with roaming profile. (Sebastian
Urban)
* New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
it prints a warning when an error occurs when inferring the shape of some apply node.
The other accepted value is "raise" to raise an error when this happens. (Frederic)
* The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
* better pycuda tests (Frederic)
* check_blas.py now accept the shape and the number of iteration as parameter (Frederic)
* Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
* More internal verification on what each op.infer_shape return. (Frederic, James)
* Argmax dtype to int64 (Olivier)
* Improved docstring and basic tests for the Tile Op (David).
Reviewers (alphabetical order):
* David, Frederic, Ian, James, Olivier, Razvan
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论