提交 72a7214a authored 作者: lamblin's avatar lamblin

Merge pull request #863 from nouiz/mixed2

Mixed2
......@@ -2,148 +2,7 @@
Updates in the Trunk since the last release:
Bug fixes
* Outputs of Scan nodes could contain corrupted values: some parts of the
output would be repeated a second time, instead of the correct values.
It happened randomly, and quite infrequently, but the bug has been present
(both in Python and Cython) since April 2011. (Pascal L.)
* In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale.
It did not return the right number of elements. (Frederic B.)
* set_subtensor(x[int vector], new_value) when moved to the GPU
was transformed into inc_subtensor on the GPU. Now we have a correct
(but slow) GPU implementation.
Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly
in all cases as well as inc_subtensor(*, *).
Note 2: If your code was affected by the incorrect behavior, we now print
a warning by default (Frederic B.)
* Fixed an issue whereby config values were used as default arguments,
with those defaults then stuck at old values if the config variables were
changed during program execution. (David W-F)
* Fixed many subtle bugs involving mutable default arguments which may have
led to unexpected behaviour, such as objects sharing instance variables
they were not supposed to share. (David W-F)
* Correctly record the GPU device number used when we let the driver select it.
(Frederic B.)
Documentation
* Added in the tutorial documentation on how to extend Theano.
This explains how to make a Theano Op from a Python function.
http://deeplearning.net/software/theano/tutorial/extending_theano.html
(Frédéric B.)
* New installation instructions for Windows using EPD (Pascal L.)
Interface changes
* In 0.5, we removed the deprecated sharedvar.value property.
Now we raise an error if you access it. (Frederic B.)
* theano.function does not accept duplicate inputs, so function([x, x], ...)
does not work anymore. (Pascal L.)
* theano.function now raises an error if some of the provided inputs are
not part of the computational graph needed to compute the output, for
instance, function([x, y], [y]). You can use the kwarg
``on_unused_input={'raise', 'warn', 'ignore'}`` to control this.
(Pascal L.)
* New Theano flag "on_unused_input" that define the default value of the
previous point. (Frederic B.)
* tensor.alloc() now raises an error during graph build time
when we try to create less dimensions than the number of dimensions
the provided value have. In the past, the error was at run time.
(Frederic B.)
Speed up
* Convolution on the GPU now check the generation of the card to make
it faster in some cases (especially medium/big ouput image) (Frédéric B.)
(We hardcoded 512 as the maximum number of thread per block. Newer card
support up to 1024 threads per block.
* CPU convolution are now parallelized (Frédric B.)
By default use all cores/hyper-threads
To control it, use the OMP_NUM_THREADS=N environment variable.
New Features
* debugprint new param ids=["CHAR", "id", "int", ""]
This makes the identifier printed to be the python id, a unique char, a
unique int, or not have it printed. We changed the default to be "CHAR"
as this is more readable. (Frederic B.)
* debugprint new param stop_on_name=[False, True]. If True, we don't print
anything below an intermediate variable that has a name. Defaults to False.
(Frederic B.)
* debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
* If you use Enthought Python Distribution (EPD) now we use its blas
implementation by default (tested on Linux and Windows)
(Frederic B., Simon McGregor)
* MRG random now raises an error with a clear message when the passed shape
contains dimensions with bad value like 0. (Frédéric B. reported by Ian G.)
* "CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
* "CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
* We add dimensions to CudaNdarray to automatically broadcast more frequently.
(Frederic B.)
* theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
* New theano flag cmodule.warn_no_version. Default False. If True,
will print a warning when compiling one or more Op with C code that
can't be cached because there is no c_code_cache_version() function
associated to at least one of those Ops. (Frederic B.)
* CPU alloc now always generate C code (Pascal L.)
* New Theano flag cmodule.warn_no_version=False. When True, warn when an op
with C code is not versioned (which forces to recompile it everytimes).
(Frédéric B.)
* Made a few Ops with C code versioned to reduce compilation time.
(Frédéric B, Pascal L.)
* C code reuses preallocated outputs (only done by Scan) (Pascal L.)
* Garbage collection of intermediate results during Theano function calls
for Ops with C code (Pascal L.)
* Theano flags compiledir_format now support the parameter numpy_version.
* Theano GPU variables, shared variable and constant now support <, <=,
> and >= as as those not on the GPU.
Sparse
* Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't
have the same sparsity pattern. (Frederic B.)
Sparse Sandbox graduate
* Remove0 op: it removes stored elements with value 0. (Frederic B.)
Sparse Sandbox Additions (not reviewed/documented/tested, but used by some people)
* They are all in the theano.sparse.sandbox.sp2 module
* Op class: Cast, Poisson, Multinomial, EliminateZeros, Sum, Binomial
* Op class: SamplingDot, SamplingDotCsr (inserted automatically)
* Op function: structured_sigmoid, structured_exp, structured_pow, structured_minimum
* Op class: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
* opt: local_sampling_dot_csr, local_structured_add_s_v
Internal changes
* Define new exceptions MissingInputError and UnusedInputError, and use them
in theano.function, instead of TypeError and ValueError. (Pascal L.)
* Better handling of bitwidth and max values of integers and pointers
across platforms (Pascal L.)
Crash Fix
* Do not try to use the BLAS library when blas.ldflags is manually set to an
empty string (Frederic B.)
* When importing theano on a computer without GPU with the Theano
flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by Luo Heng)
* Optimization printed a useless error when scipy was not available. (Frederic B.)
* GPU conv crash/slowdown on newer hardware (James B.)
* Better error handling in GPU conv (Frederic B.)
* GPU optimization that moves element-wise Ops to the GPU. Crash happened in
a particular execution order of this optimization and the
element-wise fusion optimization when upcasting some inputs to
float32 (to compute them on the GPU).
(Frederic B., reported by Sander Dieleman)
* GpuReshape in some particular case when the input is not contiguous
(Frederic B., reported by Sander Dieleman)
* GpuSoftmaxWithBias with shape (0, N) with N > 1.
(Frédéric B., reported by Razvan P.)
* Fix crash under 64-bit Windows, when taking subtensors of the form a[n:]
(Pascal L., reported by Simon McGregor)
* Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable
dimensions, which could typically result in optimization crashes (Olivier D.)
* Fixed crash when concatenating some arrays with specific broadcasting
patterns (Olivier D.)
* Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylon)
* In advanced indexing, if some inputs are constant, no need to call constant(...)
on their value any more. (Pascal L., reported by John Salvatier)
* Fix crash on GPU when the GpuSubtensor didn't put the right stride
when the results tensor had a dimensions with size of 1. (Pascal L,
reported Graham T.)
https://github.com/Theano/Theano/wiki/Devnews
=============
Release Notes
......
......@@ -26,6 +26,9 @@ with the option time_profile=True to conduct time-profiling of the tests.
option will be interpreted as an indication of the number of tests to be run
between notifications of progress to standard output.
If the '--theano' option is used, it is replaced with the path to theano.
Useful if you don't know where it was installed.
`run_tests_in_batch.py` will in turn call back this script in another process.
"""
......@@ -39,6 +42,12 @@ import sys
from nose.plugins import Plugin
def main():
# Handle the --theano arguments
if "--theano" in sys.argv:
i = sys.argv.index("--theano")
import theano
sys.argv[i] = theano.__path__[0]
# Handle --batch[=n] arguments
batch_args = [arg for arg in sys.argv if arg.startswith('--batch')]
for arg in batch_args:
......@@ -137,6 +146,11 @@ def help():
--without-knownfailure: Do not load the KnownFailure plugin.
--theano: This parameter is replaced with the path to the theano library.
As theano-nose is a wrapper to nosetests, it expect a path to the tests to run.
If you don't know where theano is installed, use this option
to have it inserted automatically.
The other options will be passed to nosetests, see ``nosetests -h``.
"""
......
......@@ -37,7 +37,7 @@ compiledir_format_dict = {"platform": platform.platform(),
"python_version": platform.python_version(),
"theano_version": theano.__version__,
"numpy_version": numpy.__version__,
"g++": gcc_version_str.replace(" ", "_"),
"gxx_version": gcc_version_str.replace(" ", "_"),
}
compiledir_format_keys = ", ".join(compiledir_format_dict.keys())
default_compiledir_format =\
......
......@@ -758,8 +758,10 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
PyObject * axis_obj = Py_None;
PyObject * out_obj = Py_None;
PyObject * clipmode_obj = NULL;
if (! PyArg_ParseTuple(args, "O|OOO", &indices_obj, &axis_obj,
&out_obj, &clipmode_obj))
int max_threads = 1; // max threads per blocks
if (! PyArg_ParseTuple(args, "O|OOOi", &indices_obj, &axis_obj,
&out_obj, &clipmode_obj, &max_threads))
return NULL;
//Check argument indices
......@@ -839,14 +841,14 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
PyObject * axis_iobj = PyNumber_Long(axis_obj);
if (!axis_iobj) {
PyErr_SetString(PyExc_NotImplementedError,"CudaNdarray_TakeFrom: axis must be convertable to a long");
Py_DECREF(indices_obj);
Py_DECREF(indices);
return NULL;
}
long axis = PyInt_AsLong(axis_iobj);
Py_DECREF(axis_iobj); axis_iobj=NULL;
if (axis != 0) {
PyErr_SetString(PyExc_NotImplementedError,"CudaNdarray_TakeFrom: only axis=0 is currently supported");
Py_DECREF(indices_obj);
Py_DECREF(indices);
return NULL;
}
......@@ -869,13 +871,13 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
if (!out) {
out = (CudaNdarray*)CudaNdarray_New();
if (!out){
Py_DECREF(indices_obj);
Py_DECREF(indices);
free(dims);
return NULL;
}
if (CudaNdarray_alloc_contiguous(out, self->nd, dims)) {
Py_DECREF(out);
Py_DECREF(indices_obj);
Py_DECREF(indices);
free(dims);
return NULL;
}
......@@ -887,19 +889,20 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
if (clipmode_obj) {
char * clipmode = PyString_AsString(clipmode_obj);
if (! clipmode){
Py_DECREF(indices_obj);
Py_DECREF(indices);
Py_DECREF(out);
free(dims);
return NULL;
}
if (strcmp(clipmode, "raise") != 0) {
PyErr_SetString(PyExc_NotImplementedError,"CudaNdarray_TakeFrom: only the raise mode is currently supported");
Py_DECREF(indices_obj);
PyErr_Format(PyExc_NotImplementedError,
"CudaNdarray_TakeFrom: only the raise mode is currently supported. Got '%s'",
clipmode);
Py_DECREF(indices);
Py_DECREF(out);
free(dims);
return NULL;
}
Py_DECREF(clipmode_obj);
}
void (*k3)(const int, const int, const int,
const npy_int64*,
......@@ -913,7 +916,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
if (err_var == NULL) {
err_var = (int*)device_malloc(sizeof(int));
if (!err_var) { // PyErr set by device_malloc
Py_DECREF(indices_obj);
Py_DECREF(indices);
Py_DECREF(out);
free(dims);
return NULL;
......@@ -928,7 +931,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
PyErr_Format(PyExc_RuntimeError,
"Error setting device error code to 0. %s",
cudaGetErrorString(err));
Py_DECREF(indices_obj);
Py_DECREF(indices);
Py_DECREF(out);
free(dims);
return NULL;
......@@ -936,13 +939,16 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
}
dim3 n_blocks(std::min(CudaNdarray_HOST_DIMS(out)[0],65535),1,1);
switch (self->nd) {
case 1:
{
dim3 n_threads(1, 1, 1);
if (verbose)
printf("kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
printf("cudaGetLastError=%d, nd=%d"
" kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
" n_threads.x=%i, n_threads.y=%i)\n",
self->nd, cudaGetLastError(),
n_blocks.x, n_blocks.y, n_threads.x, n_threads.y);
k3<<<n_blocks, n_threads>>>(
dims[0],
......@@ -963,11 +969,15 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
break;
case 2:
{
dim3 n_threads(std::min(CudaNdarray_HOST_DIMS(out)[1], 512), 1, 1);
dim3 n_threads(std::min(CudaNdarray_HOST_DIMS(out)[1], max_threads), 1, 1);
if (verbose)
printf("kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
printf("cudaGetLastError=%d, nd=%d"
" kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
" n_threads.x=%i, n_threads.y=%i)\n",
cudaGetLastError(), self->nd,
n_blocks.x, n_blocks.y, n_threads.x, n_threads.y);
k3<<<n_blocks, n_threads>>>(
dims[0], //dimensions
dims[1],
......@@ -987,12 +997,14 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
break;
case 3:
{
int ty = std::min(CudaNdarray_HOST_DIMS(out)[2], 512);
int tx = std::min(CudaNdarray_HOST_DIMS(out)[1], 512 / ty);
int ty = std::min(CudaNdarray_HOST_DIMS(out)[2], max_threads);
int tx = std::min(CudaNdarray_HOST_DIMS(out)[1], max_threads / ty);
dim3 n_threads(tx, ty, 1);
if (verbose)
printf("kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
printf("cudaGetLastError=%d, nd=%d"
" kernel config: (n_blocks.x=%d, n_blocks.y=%d,"
" n_threads.x=%i, n_threads.y=%i)\n",
self->nd, cudaGetLastError(),
n_blocks.x, n_blocks.y, n_threads.x, n_threads.y);
k3<<<n_blocks, n_threads>>>(
dims[0], //dimensions
......@@ -1025,7 +1037,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
"Cuda error: %s: %s.\n",
"CudaNdarray_TakeFrom",
cudaGetErrorString(err));
Py_DECREF(indices_obj);
Py_DECREF(indices);
Py_DECREF(out);
return NULL;
}
......@@ -1040,7 +1052,7 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
"Cuda error: %s: %s when trying to get the error value.\n",
"CudaNdarray_TakeFrom",
cudaGetErrorString(err));
Py_DECREF(indices_obj);
Py_DECREF(indices);
Py_DECREF(out);
return NULL;
}
......@@ -1055,17 +1067,17 @@ CudaNdarray_TakeFrom(CudaNdarray * self, PyObject *args){
err = cudaMemset((void*)err_var, 0, sizeof(int));
if (cudaSuccess != err) {
PyErr_Format(PyExc_MemoryError, "Error setting device error code to 0 after having an index error. %s", cudaGetErrorString(err));
Py_DECREF(indices_obj);
Py_DECREF(indices);
Py_DECREF(out);
return NULL;
}
Py_DECREF(indices_obj);
Py_DECREF(indices);
Py_DECREF(out);
return NULL;
}
Py_DECREF(indices_obj);
Py_DECREF(indices);
if (verbose) printf("TAKE SUCCEDED\n");
return (PyObject *)out;
......
......@@ -7,6 +7,7 @@ import subprocess
import sys
import warnings
import theano
from theano.gof.cc import hash_from_file
from theano.gof.cmodule import (std_libs, std_lib_dirs,
std_include_dirs, dlimport,
......@@ -119,6 +120,16 @@ class NVCC_compiler(object):
cuda_ndarray_cuh_hash = hash_from_file(
os.path.join(os.path.split(__file__)[0], 'cuda_ndarray.cuh'))
flags.append('-DCUDA_NDARRAY_CUH=' + cuda_ndarray_cuh_hash)
# We compile cuda_ndarray.cu during import.
# We should not add device properties at that time.
# As the device is not selected yet!
# TODO: compile cuda_ndarray when we bind to a GPU?
import theano.sandbox.cuda
if hasattr(theano.sandbox, 'cuda'):
n = theano.sandbox.cuda.use.device_number
p = theano.sandbox.cuda.device_properties(n)
flags.append('-arch=sm_' + str(p['major']) + str(p['minor']))
return flags
@staticmethod
......@@ -217,7 +228,9 @@ class NVCC_compiler(object):
# '--gpu-code=compute_13',
#nvcc argument
preargs1 = [pa for pa in preargs
if pa.startswith('-O') or pa.startswith('--maxrregcount=')]
if pa.startswith('-O') or
pa.startswith('--maxrregcount=') or
pa.startswith('-arch=')]
preargs2 = [pa for pa in preargs
if pa not in preargs1] # other arguments
......@@ -337,6 +350,7 @@ class NVCC_compiler(object):
pass
print >> sys.stderr, l
print nvcc_stdout
print cmd
raise Exception('nvcc return status', p.returncode,
'for cmd', ' '.join(cmd))
elif config.cmodule.compilation_warning and nvcc_stdout:
......
......@@ -410,7 +410,8 @@ class T_Scan(unittest.TestCase):
for step in xrange(1, 4):
v_out[step] = v_u[step] * W_in + v_out[step - 1] * W
theano_values = f2(v_u, v_x0, W_in, W)
assert numpy.allclose(theano_values, v_out)
assert numpy.allclose(theano_values, v_out), (theano_values, v_out,
theano_values - v_out)
# TO DEL
topo = f2.maker.fgraph.toposort()
......@@ -591,8 +592,8 @@ class T_Scan(unittest.TestCase):
v_y[i] = numpy.dot(v_x[i - 1], vWout)
(theano_x, theano_y) = f4(v_u1, v_u2, v_x0, v_y0, vW_in1)
assert numpy.allclose(theano_x, v_x)
assert numpy.allclose(theano_y, v_y)
assert numpy.allclose(theano_x, v_x), (theano_x, v_x, theano_x - v_x)
assert numpy.allclose(theano_y, v_y), (theano_y, v_y, theano_y - v_y)
def test_multiple_outs_taps(self):
l = 5
......@@ -683,14 +684,13 @@ class T_Scan(unittest.TestCase):
ny1[4] = (ny1[3] + ny1[1]) * numpy.dot(ny0[3], vWout)
ny2[4] = numpy.dot(v_u1[4], vW_in1)
def test_using_taps_sequence(self):
# this test refers to a bug reported by Nicolas
# Boulanger-Lewandowski June 6th
x = theano.tensor.dvector()
y, updates = theano.scan(lambda x: [x],
sequences=dict(input=x, taps=[-1]),
outputs_info = [None])
outputs_info=[None])
inp = numpy.arange(5).astype('float64')
rval = theano.function([x], y, updates=updates)(inp)
assert numpy.all(rval == inp[:-1])
......@@ -840,8 +840,10 @@ class T_Scan(unittest.TestCase):
# equivalent is done
(theano_x0, theano_x1) = f9(vu0, vu1, vu2, vx0, vx1)
# assert that theano does what it should
assert numpy.allclose(theano_x0, numpy_x0)
assert numpy.allclose(theano_x1, numpy_x1), (theano_x1, numpy_x1, theano_x1 - numpy_x1)
assert numpy.allclose(theano_x0, numpy_x0), (theano_x0, numpy_x0,
theano_x0 - numpy_x0)
assert numpy.allclose(theano_x1, numpy_x1), (theano_x1, numpy_x1,
theano_x1 - numpy_x1)
# assert that it was done in place
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
......@@ -940,11 +942,11 @@ class T_Scan(unittest.TestCase):
vx1 = asarrayX(rng.uniform())
x0 = theano.shared(vx0)
x1 = theano.shared(vx1)
outputs, updates = theano.scan(lambda x,y: (x + asarrayX(1),
y + asarrayX(1)),
outputs, updates = theano.scan(lambda x, y: (x + asarrayX(1),
y + asarrayX(1)),
[],
[x0,x1],
n_steps = 3)
[x0, x1],
n_steps=3)
x0 = asarrayX(numpy.zeros((3,)))
x0[0] = vx0
x0 = theano.tensor.constant(x0)
......@@ -2447,7 +2449,6 @@ class T_Scan(unittest.TestCase):
v_eW = numpy.array(rng.uniform(size=(5, 5)) - .5, dtype=floatX)
v_eh0 = numpy.array(rng.uniform(size=(5,)) - .5, dtype=floatX)
def rnn_fn(_u, _y, _W):
srng = theano.tensor.shared_randomstreams.RandomStreams(seed)
......
......@@ -55,3 +55,5 @@ from theano.gradient import Rop, Lop, grad, numeric_grad, verify_grad, \
jacobian, hessian
from theano.tensor.sort import sort
from extra_ops import (DiffOp, bincount, squeeze,
repeat, bartlett, fill_diagonal)
......@@ -3,8 +3,8 @@ import numpy
import theano
import basic
from theano import gof, tensor, scalar
from theano.sandbox.linalg.ops import diag
from theano import gof, scalar
import basic as tensor
class DiffOp(theano.Op):
......@@ -446,7 +446,9 @@ class FillDiagonal(gof.Op):
raise NotImplementedError('%s: gradient is currently implemented'
' for matrices only' % self.__class__.__name__)
wr_a = fill_diagonal(grad, 0) # valid for any number of dimensions
wr_val = diag(grad).sum() # diag is only valid for matrices
# diag is only valid for matrices
import theano.sandbox.linalg
wr_val = theano.sandbox.linalg.ops.diag(grad).sum()
return [wr_a, wr_val]
fill_diagonal_ = FillDiagonal()
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论