提交 44a3e92c authored 作者: abalkin's avatar abalkin

Merge remote-tracking branch 'upstream/master' into no-relative-imports

Conflicts: theano/compile/function.py theano/compile/pfunc.py theano/gof/fg.py theano/gof/type.py
# Prevent git from showing duplicate names with commands like "git shortlog"
# # See the manpage of git-shortlog for details.
# # The syntax is:
# # Name that should be used <email that should be used> Bad name <bad email>
# #
# # You can skip Bad name if it is the same as the one that should be used, and is unique.
# #
# # This file is up-to-date if the command git log --format="%aN <%aE>" | sort -u
# # gives no duplicates.
<abergeron@gmail.com> <anakha@kami.(none)>
David Warde-Farley <wardefar@iro.umontreal.ca> David Warde-Farley <dwf@cs.toronto.edu>
David Warde-Farley <wardefar@iro.umontreal.ca> David Warde Farley <dwf@cs.toronto.edu>
......
......@@ -4,6 +4,131 @@
Release Notes
=============
Theano in the development version since 0.6rc2
==============================================
up to merged PR gh-1220
Highlights:
* Speed-ups.
* Crash fixes.
* A few small interface changes.
* GPU memory leak fix.
* A few corner cases fix without incidence.
* More Theano determinism
* tensor.{dot,tensordot} more complete/faster/more GPU friendly.
* tensor.tensordot now support Rop/Lop
* tensor.dot support n-dimensional inputs as NumPy
* To support more NumPy syntax:
* Add theano.tensor.take()
* Add a_tensor_variable.{sort,dot,std,argmin,argmax,argsort,clip,conj,conjugate,repeat,round,trace,real,imag,take}
Commiters for this rc2 only:
Bug fix:
* Fix memory leak on the GPU in some corner with the Theano flags `allow_gc=False`. (Frederic B., reported by Jonas Gehring)
* Fix copy of random state between graph. (Guillaume D.)
http://deeplearning.net/software/theano/tutorial/examples.html#copying-random-state-between-theano-graphs
* Fix wrong dtype in sandbox.linalg.ExtractDiag with shape of 0. (Frederic B., reported by abalkin)
* Correctly support array with more then 2*10e32 element in AdvancedSubtensor1. (Abalkin)
* Fix wrong broadcast dimensions of output of Repeat op. (Abalkin)
We where using the inputs broadcasting pattern in some cases when we shouldn't.
* Fix theano.sandbox.linalg.eigh grad that didn't always returned the right dtype. (Frederic B., Olivier D.)
New Features:
* More Theano determinism (Ian G., Olivier D., Pascal L.)
* Add and use a new class OrderedSet.
* Modify theano.grad to be determinist.
* Warn when using a dict as the updates argument to theano.compile.function, since this makes the returned function non-deterministic.
* The Updates class was not appropriate for representing updates because it is non-deterministic; replaced it with the OrderedUpdates class.
* Implemented GpuContiguous.grad. (Ian G.)
* tensor.tensordot now support Rop/Lop (Jeremiah Lowin)
This remove the class TensorDot and TensorDotGrad. It is the Dot/Elemwise ops that are used.
* tensor.dot support n-dimensional inputs as NumPy (Jeremiah Lowin)
Work on the GPU too.
* The Theano flag `nvcc.flags` now accept `-ftz=true`, `--prec-div=false` and `--prec=sqrt=false` as value. (Frederic B.)
To enable all of them, use the Theano flag `nvcc.flags=--use_fast_math`.
* New op theano.sparse.ConstructSparseFromList (Rami Al-Rfou' Vivek Kulkarni)
* Make Theano work with Anaconda on Windows. (Pascal L.)
* Add tensor_var.diagonal and theano.tensor.{diag,diagonal}. (abalkin)
* AdvencedSubtensor1 can now have a sparse gradient. (Rami Al-Rfou', Vivek Kulkarni)
Interface Deprecation (a warning is printed):
* theano.misc.strutil.renderString -> render_string (Ian G.)
* Will get warning when using dictionary at some place as this make Theano non-deterministic.
Interface Change:
* Raise an error when theano.shared called with a theano variable. (Frederic B.)
* Don't print warning for bug before Theano 0.5 by default. (Frederic B.)
* Theano functions now always have a field name, default to None. (Frederic B.)
* Theano function fct.fgraph have a copy of the Theano function name field. (Ian G.)
This is needed to all the fgraph to know it.
* In the grad method, if it were asked to raise an error if there is no path between the variables, we didn't always returned an error. (Ian G.)
We returned the mathematical right answer 0.
* get_constant_value() renamed get_scalar_constant_value() and raise a new exception tensor.basic.NotScalarConstantError. (Ian G.)
* theano.function raise an error when triing to replace inputs with the given paramter. (Olivier D.)
This was doing nothing, the error message tell what the user probably want to do.
New Interface (reuse existing functionality):
* tensor_var.sort() as a shortcut for theano.tensor.sort. (Jeremiah Lowin)
We where already doing this for argsort.
* Add theano.tensor.take() and a_tensor_var.take() to support NumPy syntax. (abalkin)
* Add a_tensor_variable.{dot,std,argmin,argmax,argsort,clip,conj,conjugate,repeat,round,trace,real,imag}. (abalkin)
New debug feature:
* DebugMode print more info when there is an error. (Frederic B.)
* Better profiling of test time with `theano-nose --time-profile`. (Frederic B.)
* Detection of infinite loop with global optimizer. (Pascal L.)
* DebugMode.check_preallocated_output now also work on Theano function output. (Pascal L.)
Speed-ups:
* c_code for SpecifyShape op. (Frederic B.)
* cross-entropy optimization now work when specify_shape is used. (Pascal L.)
* The Scan optimization ScanSaveMem and PushOutDot1 applied more frequently. (Razvan P, reported Abalkin)
A skipped optimization warning was printed.
* dot(vector, vector) now faster with some BLAS implementation. (Eric Hunsberger)
OpenBLAS and other didn't called {s,d}dot internally when we called {s,g}gemv.
MKL was doing this.
* Compilation speed up: Take the compiledir lock only for op that generate c_code. (Frederic B)
* More scan optimization (Razvan P.)
* Opt to make RNN fast in Theano.
* Optimize some case of dot, by moving them outside of Scan.
* Move some sequences outside of scan too.
* Merge more scan inputs, mostly byproduct of other Scan optimizations.
* c_code for theano.sparse.AddSD. (Rami Al-Rfou', Vivek Kulkarni)
Crash Fixes:
* Fix crash about dimshuffle. (abalkin)
* Fix crash at compilation. (Olivier D.)
* Fix openmp detection. (Pascal L.)
Resulted in a crash with EPD on Windows.
* Fix for new BLAS interface in SciPy. (Olivier D.)
Fix crash with some development version of SciPy.
* GpuSum work with bigger shape when summing on the first dim on 3d tensor. (Frederic B., reported Chris Currivan)
* Windows compilation crash fix. (Frederic B.)
* Make CrossentropySoftmax1HotWithBiasDx and CrossentropySoftmaxArgmax1HotWithBias support uint* dtype. (Frederic B., reported by Mark Fenner)
* Fix GpuSoftmax and GpuSoftmaxWithBias crash on GTX285. (Frederic B.)
* Fix crash due to a race condition when importing theano. (Ian G.)
* Fix crash from path problem with `theano-nose --batch`. (Abalkin)
* Fix crash with tensor.roll(Var, iscalar). (Frederic B., reported by Jeremiah Lowin)
* Fix compilation crash with llvm on Mac. (Abalkin)
* Fix the grad of Scan that told wrongly that there is no connection between cost and parameters. (Razvan P.)
* The infer shape mechanism now force that broadcasted dimensions have a shape know to be equivalent to one during compilation.
Sometimes, we where not able knowing this before run time and resulted in crash. (Frederic B.)
* Fix compilation problems on GPU on Windows. (Frederic B.)
Theoretical bugfix (bug that won't happen with current Theano code, but if you messed with the internal, could have affected you):
* GpuContiguous now check the preallocated outputs strides before using it. (Pascal L.)
Others:
* Fix race condition when determining if g++ is available. (Abalkin)
* Documentation improvements. (Many people including David W-F, abalkin, Amir Elaguizy, Olivier D., Frederic B.)
* The current GPU back-end have a new function CudaNdarray_prep_output(CudaNdarray ** arr, int nd, const int * dims) (Ian G)
=============
Release Notes
=============
Theano 0.6rc2 (November 21th, 2012)
===================================
......
......@@ -657,8 +657,8 @@ Theano dependencies is easy, but be aware that it will take a long time
Homebrew
~~~~~~~~
There are some :ref:`instructions
<https://github.com/samueljohn/homebrew-python>` by Samuel John on how to install
There are some `instructions
<https://github.com/samueljohn/homebrew-python>`__ by Samuel John on how to install
Theano dependencies with Homebrew instead of MacPort.
......
......@@ -39,7 +39,7 @@ probably do something similar on older computer.
Installation steps
~~~~~~~~~~~~~~~~~~
Ubuntu 11.10/12.04:
Ubuntu 11.10/12.04/12.10:
1) ``sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libopenblas-dev git``
2) ``sudo pip install Theano``
......@@ -70,7 +70,7 @@ Theano/BLAS speed test:
.. code-block:: bash
python /usr/lib/python2.*/site-packages/theano/misc/check_blas.py
python `python -c "import os, theano; print os.path.dirname(theano.__file__)"`/misc/check_blas.py
This will print a table with different versions of BLAS/numbers of
threads on multiple CPUs and GPUs. It will also print some Theano/NumPy
......@@ -163,6 +163,8 @@ Test GPU configuration
Ubuntu 12.04 LTS: default gcc version 4.6.3. gcc 4.4.7 and 4.5.3 availables.
Ubuntu 12.10: default gcc version 4.7.2. gcc 4.4.7, 4.5.4 and 4.6.3 availables.
......
......@@ -1229,6 +1229,7 @@ Linear Algebra
If an integer i, it is converted to an array containing
the last i dimensions of the first tensor and the first
i dimensions of the second tensor:
axes = [range(a.ndim - i, b.ndim), range(i)]
If an array, its two elements must contain compatible axes
......@@ -1251,6 +1252,8 @@ Linear Algebra
are compatible. The resulting tensor will have shape (2, 5, 6) -- the
dimensions that are not being summed:
.. code-block:: python
a = np.random.random((2,3,4))
b = np.random.random((5,6,4,3))
......@@ -1284,6 +1287,8 @@ Linear Algebra
In an extreme case, no axes may be specified. The resulting tensor
will have shape equal to the concatenation of the shapes of a and b:
.. code-block:: python
c = np.tensordot(a, b, 0)
print(a.shape) #(2,3,4)
print(b.shape) #(5,6,4,3)
......
......@@ -7,8 +7,11 @@
.. note::
Two similar implementation exists for conv2d:
:func:`signal.conv2d <theano.tensor.signal.conv.conv2d>` and
:func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`. The former implements a traditional
:func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
The former implements a traditional
2D convolution, while the latter implements the convolutional layers
present in convolutional neural networks (where filters are 3D and pool
over several input channels).
......
......@@ -74,11 +74,11 @@ cross-entropy (note that this assumes that x will contain values between 0 and
.. code-block:: python
x,y,b = T.dvectors('x','y','b')
x, y, b = T.dvectors('x', 'y', 'b')
W = T.dmatrix('W')
h = T.nnet.sigmoid(T.dot(W,x) + b)
x_recons = T.nnet.sigmoid(T.dot(V,h) + c)
recon_cost = T.nnet.binary_crossentropy(x_recons,x).mean()
h = T.nnet.sigmoid(T.dot(W, x) + b)
x_recons = T.nnet.sigmoid(T.dot(V, h) + c)
recon_cost = T.nnet.binary_crossentropy(x_recons, x).mean()
.. function:: categorical_crossentropy(coding_dist,true_dist)
......@@ -87,7 +87,7 @@ cross-entropy (note that this assumes that x will contain values between 0 and
needed to identify an event from a set of possibilities, if a coding scheme is used based
on a given probability distribution q, rather than the "true" distribution p. Mathematically, this
function computes :math:`H(p,q) = - \sum_x p(x) \log(q(x))`, where
p=coding_dist and q=true_dist
p=true_dist and q=coding_dist.
:Parameters:
......@@ -108,6 +108,6 @@ cross-entropy (note that this assumes that x will contain values between 0 and
.. code-block:: python
y = T.nnet.softmax(T.dot(W,x) + b)
cost = T.nnet.categorical_crossentropy(y,o)
y = T.nnet.softmax(T.dot(W, x) + b)
cost = T.nnet.categorical_crossentropy(y, o)
# o is either the above-mentioned 1-of-N vector or 2D tensor
......@@ -7,8 +7,11 @@
.. note::
Two similar implementation exists for conv2d:
:func:`signal.conv2d <theano.tensor.signal.conv.conv2d>` and
:func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>. The former implements a traditional
:func:`nnet.conv2d <theano.tensor.nnet.conv.conv2d>`.
The former implements a traditional
2D convolution, while the latter implements the convolutional layers
present in convolutional neural networks (where filters are 3D and pool
over several input channels).
......
......@@ -161,8 +161,9 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None,
if updates is None:
updates = []
if isinstance(updates, dict) and \
not isinstance(updates, gof.python25.OrderedDict):
if (isinstance(updates, dict) and
not isinstance(updates, gof.python25.OrderedDict) and
len(updates) > 1):
warnings.warn(
"The parameter 'updates' of theano.function()"
" expects an OrderedDict,"
......@@ -183,8 +184,8 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None,
# compute some features of the arguments:
uses_In = any([isinstance(i, In) for i in inputs]) # N.B. the square brackets are ncessary
uses_tuple = any([isinstance(i, (list, tuple)) for i in inputs]) # N.B. the square brackets are ncessary
uses_updates = (updates != [])
uses_givens = (givens != [])
uses_updates = bool(updates)
uses_givens = bool(givens)
# See if we have any mutable / borrow inputs
check_for_aliased_inputs = False
......@@ -198,7 +199,9 @@ def function(inputs, outputs=None, mode=None, updates=None, givens=None,
if profile:
raise NotImplementedError('profiling not supported in old-style function')
if uses_updates or uses_givens:
raise NotImplementedError("In() instances and tuple inputs triggers the old semantics, which disallow using updates and givens")
raise NotImplementedError(
"In() instances and tuple inputs trigger the old "
"semantics, which disallow using updates and givens")
fn = orig_function(inputs, outputs,
mode=mode,
accept_inplace=accept_inplace, name=name)
......
......@@ -232,8 +232,8 @@ def rebuild_collect_shared(outputs,
cloned_outputs.append(Out(cloned_v, borrow=v.borrow))
else:
raise TypeError('Outputs must be theano Variable or '
'Out instances. Received ' + str(v)\
+ ' of type '+str(type(v)))
'Out instances. Received ' + str(v)
+ ' of type ' + str(type(v)))
#computed_list.append(cloned_v)
else:
if isinstance(outputs, Variable):
......@@ -277,7 +277,8 @@ class Param(object):
def __init__(self, variable, default=None, name=None, mutable=False,
strict=False, allow_downcast=None, implicit=None, borrow=None):
"""
:param variable: A variable in an expression graph to use as a compiled-function parameter
:param variable: A variable in an expression graph to use as a
compiled-function parameter
:param default: The default value to use at call-time (can also be a Container where
the function will find a value at call-time.)
......@@ -289,10 +290,12 @@ class Param(object):
:param borrow: Whether the function is allowed to alias some output to
this input. Using None (default) means we re-use the same value as the
`mutable` flag.
False: do not permit any output to be aliased to the input
False: do not permit any output to be aliased to the input
:param strict: False -> function arguments may be copied or cast to match the
type required by the parameter `variable`. True -> function arguments must exactly match the type
type required by the parameter `variable`.
True -> function arguments must exactly match the type
required by `variable`.
:param allow_downcast: Only applies if `strict` is False.
......@@ -451,6 +454,27 @@ def pfunc(params, outputs=None, mode=None, updates=None, givens=None,
"provided for it being ignored. Please do not duplicate "
"variables in the inputs list." % (v, i, dup_v_i)))
# Check that we are not using `givens` to replace input variables, because
# this typically does nothing, contrary to what one may expect.
in_var_set = set(in_variables)
try:
givens_pairs = givens.items()
except AttributeError:
givens_pairs = givens
for x, y in givens_pairs:
if x in in_var_set:
raise RuntimeError(
'You are trying to replace variable \'%s\' through the '
'`givens` parameter, but this variable is an input to your '
'function. Replacing inputs is currently forbidden because it '
'has no effect. One way to modify an input `x` to a function '
'evaluating f(x) is to define a new input `y` and use '
'`theano.function([y], f(x), givens={x: g(y)})`. Another '
'solution consists in using `theano.clone`, e.g. like this: '
'`theano.function([x], '
'theano.clone(f(x), replace={x: g(x)}))`.'
% x)
output_vars = rebuild_collect_shared(outputs,
in_variables,
replace=givens,
......
......@@ -386,6 +386,14 @@ class T_function(unittest.TestCase):
self.assertRaises(UnusedInputError, function, [m, mt], mt*2)
f = function([m, mt], mt*2, on_unused_input='ignore')
def test_givens_input_var(self):
"""
Ensure error is raised when trying to replace an input variable.
"""
x = T.scalar('x')
y = x * 2
self.assertRaises(RuntimeError, function, [x], y, givens={x: x + 1})
class T_picklefunction(unittest.TestCase):
......@@ -680,6 +688,18 @@ class SomethingToPickle(object):
self.f2 = function([x, In(a, value=1.0,name='a'), In(s, value=self.f1.container[s], update=s+a*x, mutable=True)], s+a*x)
def test_empty_givens_updates():
"""
Regression test for bug fixed in 8625e03.
"""
# Empty givens / updates dictionaries were not properly detected before,
# triggering useless crashes at compile time.
x = T.scalar()
y = x * 2
function([theano.In(x)], y, givens={})
function([theano.In(x)], y, updates={})
if __name__ == '__main__':
if 1:
......
......@@ -420,6 +420,11 @@ else:
" want theano to use.")
default_openmp = count > 1
# Disable it by default for now as currently only the ConvOp support
# it And this cause slow down by default as we do not disable it for
# too small convolution.
default_openmp = False
AddConfigVar('openmp',
"Allow (or not) parallel computation on the CPU with OpenMP. "
"This is the default value used when creating an Op that "
......
......@@ -1472,7 +1472,7 @@ class GCC_compiler(object):
#cxxflags.append("-D NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION")
numpy_ver = [int(n) for n in numpy.__version__.split('.')[:2]]
# numpy 1.7 deprecated the following macro but the didn't
# numpy 1.7 deprecated the following macro but the new one didn't
# existed in the past
if bool(numpy_ver < [1, 7]):
cxxflags.append("-D NPY_ARRAY_ENSURECOPY=NPY_ENSURECOPY")
......
......@@ -2,12 +2,6 @@
Classes and functions for validating graphs that contain view
and inplace operations.
"""
import sys
if sys.version_info[:2] >= (2,5):
from collections import defaultdict
# otherwise it's implemented in python25.py
import theano
import toolbox
import graph
......
......@@ -763,6 +763,7 @@ class OpenMPOp(Op):
self.openmp = openmp
def c_compile_args(self):
self.update_self_openmp()
if self.openmp:
return ['-fopenmp']
return []
......@@ -807,7 +808,10 @@ class OpenMPOp(Op):
return False
return default_openmp
def make_thunk(self, node, storage_map, compute_map, no_recycling):
def update_self_openmp(self):
"""
Make sure self.openmp is not True if there is no support in gxx
"""
if self.openmp:
if OpenMPOp.gxx_support_openmp is None:
OpenMPOp.gxx_support_openmp = OpenMPOp.test_gxx_support()
......@@ -818,9 +822,13 @@ class OpenMPOp(Op):
" know this happen with some version of the EPD mingw"
" compiler. We disable openmp everywhere in Theano."
" To remove this warning set the theano flags `openmp`"
" to False.")
" to False.",
stacklevel=3)
if OpenMPOp.gxx_support_openmp is False:
self.openmp = False
theano.config.openmp = False
def make_thunk(self, node, storage_map, compute_map, no_recycling):
self.update_self_openmp()
return super(OpenMPOp, self).make_thunk(node, storage_map,
compute_map, no_recycling)
......@@ -22,7 +22,7 @@ from theano import gof
from theano.gof import Variable
from theano.gof.python25 import OrderedDict
from theano.gof.null_type import NullType
from theano.printing import min_informative_str
# we can't do "import theano.tensor"
# tensor depends on theano.compile
# theano.compile depends on theano.gradient (this file)
......
......@@ -194,41 +194,28 @@ if __name__ == "__main__":
goto2 1.13/16 3.16s
Test time in float32
(cuda version 3.2RC and up have a faster gemm on the Fermi/GTX[45]??)
gpu/cuda version
M2050(Amazon)/5.0 0.25s
GTX680/4.2 0.154s
GTX580/4.2 0.164s
GTX480/4.2 0.192s
GTX470/4.2 0.238s
C2075/4.2 0.25s
GTX285/4.2 0.452s #cuda 3.0 seam faster? driver version?
GT520/4.2 2.68s
GTX560/4.2 0.30s
GTX460/4.0 0.45s
GTX580/3.2 0.203s
GTX680/3.2 0.218s
GTX480/3.2 0.237s
GTX470/3.2 0.297s
GTX285/3.2 0.452s #cuda 3.0 seam faster? driver version?
GTX480/3.0 0.27s
M2070/4.1 0.27s
GTX470/3.2 0.29s
M2070/3.2 0.32s
GTX470/3.0 0.34s
GTX285/3.0 0.40s
C1060/3.2 0.46s
GTX550Ti/4.0 0.57s
520/3.2 3.06s
520M/3.2 3.19s with bumblebee on Ubuntu 12.04
GT220/3.2RC 3.80s
GT210/4.0 6.35s
8500GT/3.0 10.68s
cuda version 5.0 4.2 4.1 4.0 3.2 3.0 # note
gpu
M2070 0.25s 0.27s 0.32s
M2050(Amazon) 0.25s
C2075 0.25s
C1060 0.46s
GTX680 0.154s 0.218s
GTX580 0.164s 0.203s
GTX480 0.192s 0.237s 0.27s
GTX470 0.238s 0.297s 0.34s
GTX660 0.24s
GTX560 0.30s
GTX460 0.37s 0.45s
GTX285 0.452s 0.452s 0.40s # cuda 3.0 seam faster? driver version?
GTX550Ti 0.57s
GT520 2.68s 3.06s
520M 3.19s # with bumblebee on Ubuntu 12.04
GT220 3.80s
GT210 6.35s
8500GT 10.68s
"""
t, impl = execute(not options.print_only, not options.quiet,
......
def renderString(string, dict):
import warnings
def render_string(string, sub):
"""
string: a string, containing formatting instructions
sub: a dictionary containing keys and values to substitute for
them.
returns: string % sub
The only difference between this function and the % operator
is that it raises an exception with a more informative error
message than the % operator does.
"""
try:
finalCode = string % dict
finalCode = string % sub
except Exception , E:
#print 'could not render C code due to exception with message "'+str(E)+'", trying to find out why...'
# If unable to render the string, render longer and longer
# initial substrings until we find the minimal initial substring
# that causes an error
i = 0
while i <= len(string):
try:
finalCode = string[0:i] % dict
finalCode = string[0:i] % sub
except Exception, F:
if str(F) == str(E):
raise Exception(string[0:i]+"<<<< caused exception "+str(F))
i+=1
assert False
return finalCode
#
def renderString(string, dict):
warnings.warn("renderString is deprecated. It is now called render_string",
stacklevel = 2)
return render_string(string, dict)
def pretty_format(string):
lines = string.split('\n')
......@@ -34,11 +53,8 @@ def pretty_format(string):
rval = '\n'.join(lines)
return rval
#
def strip_leading_white_space(line):
while len(line) >0 and (line[0]==' ' or line[0]=='\t'):
line = line[1:]
#
return line
#
......@@ -13,5 +13,9 @@ def call_subprocess_Popen(command, **params):
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
except AttributeError:
startupinfo.dwFlags |= subprocess._subprocess.STARTF_USESHOWWINDOW
# Under Windows 7 64-bits, Anaconda's g++ is not found unless
# specifying "shell=True".
params['shell'] = True
proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
return proc
......@@ -220,7 +220,7 @@ if(!work_complete){
}}}}}}} //extra scope so error handler jumps don't cross declarations
///////////// < /code generated by GpuConv3D >
"""
return strutil.renderString(codeSource,locals())
return strutil.render_string(codeSource,locals())
def c_support_code_apply(self, node, nodename):
# This code is not sensitive to the ignore_border flag.
......@@ -279,7 +279,7 @@ conv_rows_stack( float* img, float* kern, float* bias, float* out,
"""
return codeSource#renderString(codeSource,locals())
return codeSource
gpu_convd = GpuConv3D()
......
......@@ -336,7 +336,7 @@ convgrad_rows_stack( float* img, float* dCdH, float* dCdW,
dCdW[j,z,k,l,m] += dCdH[i,j,p,q,r] * V[i,z,dr*p+k,dc*q+l,dt*r+m]
*/
"""
return codeSource#renderString(codeSource,locals())
return codeSource
gpu_conv_grad3d = GpuConvGrad3D()
......
......@@ -263,7 +263,7 @@ if(!work_complete){
}}}}}} // for fail
///////////// < /code generated by GpuConvTransp3D >
"""
return strutil.renderString(codeSource,locals())
return strutil.render_string(codeSource,locals())
def c_support_code_apply(self, node, nodename):
# This code is not sensitive to the ignore_border flag.
......
......@@ -218,7 +218,7 @@ if cuda_available:
atexit.register(gpu_shutdown)
except EnvironmentError, e:
cuda_available = False
cuda_initialization_error_message = e.message
cuda_initialization_error_message = " ".join(e.args)
class GpuOp(theano.gof.Op):
......
......@@ -13,15 +13,20 @@ scal = scalar # somewhere scalar gets reassigned to be a function
from theano.gof.python25 import all, any
from theano.sandbox.cuda import GpuOp, device_properties
try:
# We must be able to import this file to create the full doc when nvcc
# is not available
from theano.sandbox.cuda import filter as type_support_filter
from theano.sandbox.cuda import device_properties
import cuda_ndarray
except ImportError:
pass
from theano.sandbox.cuda import GpuOp
from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda import filter as type_support_filter
from theano.sandbox.cuda.elemwise import NaiveAlgo
import cuda_ndarray
_logger_name = 'theano.sandbox.cuda.basic_ops'
_logger = logging.getLogger(_logger_name)
_logger.setLevel(logging.INFO)
......@@ -2267,9 +2272,17 @@ class GpuSubtensor(GpuOp, tensor.Subtensor):
set_dim='CudaNdarray_set_dim',
set_stride='CudaNdarray_set_stride',
update_flags="", strides_mul=4)
finish_view = ""
#For broadcasted dimensions, set the strides to 0
#We can't do that only for broadcasted dimensions as this can happen for dimensions of size 0,
#That are rebroadcated later.
for idx in range(node.outputs[0].ndim):
finish_view += """
if(CudaNdarray_HOST_DIMS(xview)[%(idx)s]==1)
CudaNdarray_set_stride(xview, %(idx)s, 0);
""" % locals()
finish_view = """
finish_view += """
//Set the base only now
if(CudaNdarray_set_device_data(xview, CudaNdarray_DEV_DATA(xview),
......@@ -2287,6 +2300,13 @@ class GpuSubtensor(GpuOp, tensor.Subtensor):
return build_view + "{" + get_xview + "}" + finish_view
def c_code_cache_version(self):
hv = self.helper_c_code_cache_version()
# If `helper_c_code_cache_version` is not versioned we do not want to
# have a versioned version of this op's C code.
if len(hv) == 0:
return ()
return (3, hv)
class GpuAdvancedSubtensor1(tensor.AdvancedSubtensor1, GpuOp):
"""
......@@ -2455,7 +2475,7 @@ class GpuIncSubtensor(tensor.IncSubtensor, GpuOp):
:return: C code expression to make a copy of x
Base class uses PyArrayObject *, subclasses may override for
Base class uses `PyArrayObject *`, subclasses may override for
different types of arrays.
"""
return """(CudaNdarray*) CudaNdarray_Copy(%(x)s)""" % locals()
......
......@@ -166,7 +166,8 @@ CudaNdarray_set_dim(CudaNdarray * self, int idx, int d)
{
if ((idx >= self->nd) || (idx < 0) || (d < 0))
{
fprintf(stderr, "WARNING: probably bad CudaNdarray_set_dim arguments: %i %i\n", idx, d);
fprintf(stderr, "WARNING: probably bad CudaNdarray_set_dim arguments: self->ndim=%i, idx=%i stride=%i\n",
self->nd, idx, d);
}
if (d != self->host_structure[idx])
......
# This is work in progress
import theano
from theano import Op, Apply
import theano.tensor as T
from theano.gof import local_optimizer
from theano.sandbox.cuda import cuda_available, GpuOp
......
......@@ -288,7 +288,9 @@ class CudaNdarrayType(Type):
//std::cerr << "c_extract " << %(name)s << '\\n';
if (%(name)s->nd != %(nd)s)
{
PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has rank %%i, it was supposed to have rank %(nd)s", %(name)s->nd);
PyErr_Format(PyExc_RuntimeError,
"c_extract: Some CudaNdarray has rank %%i, it was supposed to have rank %(nd)s",
%(name)s->nd);
%(name)s = NULL;
%(fail)s;
}
......@@ -299,7 +301,9 @@ class CudaNdarrayType(Type):
print >> sio, """
if (CudaNdarray_HOST_DIMS(%(name)s)[%(i)s] != 1)
{
PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has dim %%i on broadcastable dimension %%i", CudaNdarray_HOST_DIMS(%(name)s)[%(i)s], %(i)s);
PyErr_Format(PyExc_RuntimeError,
"c_extract: Some CudaNdarray has dim %%i on broadcastable dimension %%i",
CudaNdarray_HOST_DIMS(%(name)s)[%(i)s], %(i)s);
%(name)s = NULL;
%(fail)s;
}
......@@ -309,7 +313,9 @@ class CudaNdarrayType(Type):
if (CudaNdarray_HOST_STRIDES(%(name)s)[%(i)s])
{
//std::cerr << "c_extract bad stride detected...\\n";
PyErr_Format(PyExc_RuntimeError, "Some CudaNdarray has a nonzero stride %%i on a broadcastable dimension %%i", CudaNdarray_HOST_STRIDES(%(name)s)[%(i)s], %(i)s);
PyErr_Format(PyExc_RuntimeError,
"c_extract: Some CudaNdarray has a nonzero stride %%i on a broadcastable dimension %%i",
CudaNdarray_HOST_STRIDES(%(name)s)[%(i)s], %(i)s);
%(name)s = NULL;
%(fail)s;
}
......
import numpy
import theano
from theano.gof import Op, Apply
from theano import tensor
......
......@@ -12,7 +12,7 @@ from theano.tensor.opt import (register_stabilize,
register_specialize, register_canonicalize)
from theano.gof import local_optimizer
from theano.gof.opt import Optimizer
from theano.gradient import grad_not_implemented, DisconnectedType
from theano.gradient import DisconnectedType
try:
import scipy.linalg
......@@ -433,16 +433,14 @@ class CholeskyGrad(Op):
return Apply(self, [x, l, dz], [x.type()])
def perform(self, node, inputs, outputs):
"""
Implements the "reverse-mode" gradient for the Cholesky factorization
of a positive-definite matrix.
"""Implements the "reverse-mode" gradient [1]_ for the
Cholesky factorization of a positive-definite matrix.
References
----------
.. [1] S. P. Smith. "Differentiation of the Cholesky Algorithm".
Journal of Computational and Graphical Statistics,
Vol. 4, No. 2 (Jun.,1995), pp. 134-147
http://www.jstor.org/stable/1390762
"""
x = inputs[0]
L = inputs[1]
......
......@@ -12,27 +12,18 @@ __authors__ = ("Razvan Pascanu "
__copyright__ = "(c) 2010, Universite de Montreal"
__contact__ = "Razvan Pascanu <r.pascanu@gmail>"
import itertools
import logging
import time
from itertools import izip
import numpy
import theano
from theano.compile import function, Param, Out
from theano import compile
from theano import gradient
from theano.gof.python25 import any
from theano.gof import PureOp, Apply
from theano import gof
from theano.tensor import TensorType
from theano import tensor
from theano.tensor.opt import Shape_i
#from theano.sandbox import cuda
from theano.compile.profiling import ScanProfileStats
import scan_utils
# Logging function for sending warning or info
_logger = logging.getLogger('theano.scan_module.scan_op')
......
......@@ -561,6 +561,9 @@ class ScalarVariable(_scalar_py_operators, Variable):
class ScalarConstant(_scalar_py_operators, Constant):
pass
# Register ScalarConstant as the type of Constant corresponding to Scalar
Scalar.Constant = ScalarConstant
# Easy constructors
......
......@@ -22,7 +22,7 @@ __contact__ = "theano-dev <theano-dev@googlegroups.com>"
__docformat__ = "restructuredtext en"
import numpy
from theano.compile import shared_constructor, SharedVariable
from theano.compile import SharedVariable
from basic import Scalar, _scalar_py_operators
class ScalarSharedVariable(_scalar_py_operators, SharedVariable):
......
......@@ -519,7 +519,6 @@ def get_scalar_constant_value(v):
if isinstance(v, numpy.ndarray):
return numpy_scalar(v)
if isinstance(v, Constant):
if getattr(v.tag, 'unique_value', None) is not None:
data = v.tag.unique_value
......@@ -528,11 +527,9 @@ def get_scalar_constant_value(v):
return numpy_scalar(data)
if v.owner:
if isinstance(v.owner.op, Alloc):
return get_scalar_constant_value(v.owner.inputs[0])
if isinstance(v.owner.op, DimShuffle):
return get_scalar_constant_value(v.owner.inputs[0])
if isinstance(v.owner.op, Rebroadcast):
if isinstance(v.owner.op, (Alloc, DimShuffle, Rebroadcast,
compile.ops.OutputGuard,
compile.DeepCopyOp)):
return get_scalar_constant_value(v.owner.inputs[0])
if isinstance(v.owner.op, Elemwise) and \
isinstance(v.owner.op.scalar_op, scal.Second):
......@@ -2007,6 +2004,13 @@ class TensorConstant(_tensor_py_operators, Constant):
def signature(self):
return TensorConstantSignature((self.type, self.data))
def equals(self, other):
# Override Contant.equals to allow to compare with numpy.ndarray
if isinstance(other, numpy.ndarray):
# Make a TensorConstant to be able to compare
other = constant(other)
return (isinstance(other, TensorConstant) and
self.signature() == other.signature())
TensorType.Constant = TensorConstant
......@@ -3641,6 +3645,10 @@ def var(input, axis=None, keepdims=False):
:param keepdims: If this is set to True, the axes which are reduced are
left in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original tensor.
:note: It use the two-pass algorithm for more stable results.
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Two-pass_algorithm
It exist other implementation that are even more stable, but probably slower.
"""
input_ndim = input.type.ndim
......@@ -3676,6 +3684,10 @@ def std(input, axis=None, keepdims=False):
With this option,
the result will broadcast correctly against the
original tensor.
:note: It call var and var use the two-pass algorithm for more stable results.
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Two-pass_algorithm
It exist other implementation that are even more stable, but probably slower.
"""
return sqrt(var(input=input, axis=axis, keepdims=keepdims))
......
......@@ -492,11 +492,19 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail):
{
if (PyArray_DESCR(%(xx)s)->type_num == NPY_FLOAT)
{
//fprintf(stderr, "B %%i %%i %%i %%i\\n",
// Nz0, Nz1, Sz0, Sz1);
float alpha = ((dtype_%(alpha)s*)PyArray_DATA(%(alpha)s))[0];
//fprintf(stderr, "alpha=%%f\\n", alpha);
//fprintf(stderr, "sx sy %%i %%i\\n", Sx, Sy);
// Check for vector-vector dot (Nx0 == 1). The code may work
// for Sx1 != 1 as well, but has not been tested for this case,
// so Sx1 == 1 is required for safety.
if (Nx0 == 1 && Sx1 == 1)
{
zz_data[0] = fbeta*zz_data[0] + alpha*sdot_(&Nx1,
(float*)(PyArray_DATA(%(xx)s)), &Sx1,
(float*)yy_data, &Sy);
}
else
{
sgemv_(&TRANS, &Nx1, &Nx0,
&alpha,
(float*)(PyArray_DATA(%(xx)s)), &Sx0,
......@@ -504,9 +512,22 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail):
&fbeta,
(float*)zz_data, &Sz);
}
}
else if (PyArray_DESCR(%(xx)s)->type_num == NPY_DOUBLE)
{
double alpha = ((dtype_%(alpha)s*)PyArray_DATA(%(alpha)s))[0];
// Check for vector-vector dot (Nx0 == 1). The code may work
// for Sx1 != 1 as well, but has not been tested for this case,
// so Sx1 == 1 is required for safety.
if (Nx0 == 1 && Sx1 == 1)
{
zz_data[0] = dbeta*zz_data[0] + alpha*ddot_(&Nx1,
(double*)(PyArray_DATA(%(xx)s)), &Sx1,
(double*)yy_data, &Sy);
}
else
{
dgemv_(&TRANS, &Nx1, &Nx0,
&alpha,
(double*)(PyArray_DATA(%(xx)s)), &Sx0,
......@@ -514,6 +535,7 @@ def gemv_c_code(aa, xx, yy, zz, alpha, beta, destructive, fail):
&dbeta,
(double*)zz_data, &Sz);
}
}
else
{
PyErr_SetString(PyExc_AssertionError,
......@@ -556,7 +578,7 @@ class CGemv(BaseBLAS, Gemv):
return code
def c_code_cache_version(self):
return (9,)
return (10,)
@local_optimizer([gemv_inplace, gemv_no_inplace])
......
import theano
import numpy
import math
from theano import gof, tensor, function, scalar
from theano.sandbox.linalg.ops import diag
from theano import gof, tensor
class Fourier(gof.Op):
......
from basic import _scal_elemwise #, _transpose_inplace
from theano import scalar as scal
import elemwise
from theano import printing
from theano.printing import pprint
from theano.gof.python25 import any
def _scal_inplace(symbol):
"""Replace a symbol definition with an elementwise version of the corresponding scalar Op"""
......
......@@ -545,7 +545,7 @@ class Conv3D(theano.Op):
///////////// < /code generated by Conv3D >
"""
return strutil.renderString(codeSource,locals())
return strutil.render_string(codeSource,locals())
global conv3D
conv3D = Conv3D()
......
......@@ -271,7 +271,7 @@ class ConvGrad3D(theano.Op):
///////////// < /code generated by ConvGradW3D >
"""
return strutil.renderString(codeSource, locals())
return strutil.render_string(codeSource, locals())
convGrad3D = ConvGrad3D()
......
......@@ -324,7 +324,7 @@ class ConvTransp3D(theano.Op):
///////////// < /code generated by ConvTransp3D >
"""
return strutil.renderString(codeSource, locals())
return strutil.render_string(codeSource, locals())
convTransp3D = ConvTransp3D()
......
......@@ -813,7 +813,21 @@ class ShapeFeature(object):
"for a variable with %d dimensions." % (
len(s), r.ndim))
shape_vars = [self.unpack(s_i) for s_i in s]
shape_vars = []
for i in range(r.ndim):
if (hasattr(r.type, 'broadcastable') and
r.type.broadcastable[i]):
shape_vars.append(self.lscalar_one)
else:
shape_vars.append(self.unpack(s[i]))
assert all([not hasattr(r.type, "broadcastable") or
not r.type.broadcastable[i] or
# The two following comparison are a speed optimization
# But we never timed this speed optimization!
self.lscalar_one.equals(shape_vars[i]) or
self.lscalar_one.equals(
T.extract_constant(shape_vars[i]))
for i in range(r.ndim)])
self.shape_of[r] = tuple(shape_vars)
for sv in shape_vars:
self.shape_of_reverse_index.setdefault(sv, set()).add(r)
......@@ -855,6 +869,15 @@ class ShapeFeature(object):
merged_shape.append(r_shape[i])
else:
merged_shape.append(other_shape[i])
assert all([(not hasattr(r.type, "broadcastable") or
not r.type.broadcastable[i] and
not other_r.type.broadcastable[i]) or
# The two following comparison are a speed optimization
# But we never timed this speed optimization!
self.lscalar_one.equals(merged_shape[i]) or
self.lscalar_one.equals(
T.extract_constant(merged_shape[i]))
for i in range(r.ndim)])
self.shape_of[r] = tuple(merged_shape)
for sv in self.shape_of[r]:
self.shape_of_reverse_index.setdefault(sv, set()).add(r)
......@@ -871,6 +894,13 @@ class ShapeFeature(object):
new_shape.append(self.unpack(s_i))
else:
new_shape.append(s_j)
assert all([not hasattr(r.type, "broadcastable") or
not r.type.broadcastable[i] or
# The two following comparison are a speed optimization
# But we never timed this speed optimization!
self.lscalar_one.equals(new_shape[i]) or
self.lscalar_one.equals(T.extract_constant(new_shape[i]))
for i in range(r.ndim)])
self.shape_of[r] = tuple(new_shape)
for sv in self.shape_of[r]:
self.shape_of_reverse_index.setdefault(sv, set()).add(r)
......
......@@ -28,16 +28,10 @@ Also, we should make the fgraph refuse optimization that break the canonization
import logging
_logger = logging.getLogger('theano.tensor.opt')
import operator
import itertools
import sys
import theano
from theano import gof
from elemwise import CAReduce
import basic as T
from theano.gof.python25 import any, all
from theano.gof.opt import Optimizer
from theano.gof import InconsistencyError, toolbox
......
......@@ -4,11 +4,8 @@ graphs.
__docformat__ = "restructuredtext en"
import copy
import sys
import numpy
from theano.gof import Container
from theano.compile.sharedvalue import (SharedVariable, shared_constructor,
shared)
import raw_random
......
......@@ -5,11 +5,7 @@ generic 2D convolution.
__docformat__ = "restructuredtext en"
import numpy
import theano
import theano.tensor as tensor
import theano.tensor.nnet as nnet
from theano import gof, Op, tensor, config
from theano.tensor.nnet import conv
import logging
......
......@@ -5456,8 +5456,9 @@ class test_tensordot(unittest.TestCase):
f1 = inplace_func([avec, bvec], c)
aval = rand(5)
bval = rand(5)
self.assertTrue(numpy.tensordot(aval, bval, axes) == \
f1(aval, bval))
out0 = numpy.tensordot(aval, bval, axes)
out1 = f1(aval, bval)
self.assertTrue(numpy.allclose(out0, out1), (out0, out1))
utt.verify_grad(self.TensorDot(axes), [aval, bval])
# Test matrix-vector
......
......@@ -2475,6 +2475,57 @@ class test_shapeoptimizer(unittest.TestCase):
assert len(topo) == 1
assert topo[0].op == deep_copy_op
@staticmethod
def max_pool_c01b(c01b, pool_shp, pool_stride, img_shp):
"""Like max_pool but with input using axes ('c', 0, 1, 'b')
(Alex Krizhevsky format)
pool_shp, pool_stride and img_shp are int that represent
the same shp in x and y.
"""
mx = None
# Compute index in pooled space of last needed pool
# (needed = each input pixel must appear in at least one pool)
def last_pool(im_shp, p_shp, p_strd):
rval = int(numpy.ceil(float(im_shp - p_shp) / p_strd))
assert p_strd * rval + p_shp >= im_shp
assert p_strd * (rval - 1) + p_shp < im_shp
return rval
# Compute starting row of the last pool
last_pool_r = last_pool(img_shp, pool_shp, pool_stride) * pool_stride
# Compute number of rows needed in img for all indexes to work out
required_r = last_pool_r + pool_shp
last_pool_c = last_pool(img_shp, pool_shp, pool_stride) * pool_stride
required_c = last_pool_c + pool_shp
wide_infinity = T.alloc(-numpy.inf, c01b.shape[0],
required_r, required_c, c01b.shape[3])
c01b = T.set_subtensor(wide_infinity[:, 0:img_shp, 0:img_shp, :], c01b)
for row_within_pool in xrange(pool_shp):
row_stop = last_pool_r + row_within_pool + 1
for col_within_pool in xrange(pool_shp):
col_stop = last_pool_c + col_within_pool + 1
cur = c01b[:, row_within_pool:row_stop:pool_stride,
col_within_pool:col_stop:pool_stride, :]
if mx is None:
mx = cur
else:
mx = T.maximum(mx, cur)
return mx
def test_broadcasted_dims(self):
#This test a case that caused a crash during optimization
shp = (1, 1, 1, 1)
rng = numpy.random.RandomState(utt.fetch_seed())
a = shared(rng.rand(*shp).astype(config.floatX))
out = self.max_pool_c01b(a, 1, 1, 1)
f = theano.function([], out)
f()
def test_local_track_shape_i(self):
class IdentityNoShape(gof.Op):
'''Op that does not infer the output shape from the input one'''
......
import theano
import numpy
from elemwise import Elemwise
......
......@@ -55,10 +55,12 @@ nosetests.
import cPickle
import datetime
import os
import subprocess
import sys
import datetime
import time
import theano
from theano.misc.windows import call_subprocess_Popen
......@@ -261,8 +263,8 @@ def run(stdout, stderr, argv, theano_nose, batch_size, time_profile,
n_tests + 1)):
# Print the test we will start in the raw log to help
# debug tests that are too long.
f_rawlog.write("\nWill run test #%d %s\n" % (test_id,
data["ids"][test_id]))
f_rawlog.write("\n%s Will run test #%d %s\n" % (
time.ctime(), test_id, data["ids"][test_id]))
f_rawlog.flush()
proc = call_subprocess_Popen(
......
......@@ -64,7 +64,8 @@ class OrderedUpdates(OrderedDict):
# Warn about non-determinism.
warnings.warn('Updating an `OrderedUpdates` with a '
'non-ordered dictionary with 2+ elements could '
'make your code non-deterministic')
'make your code non-deterministic',
stacklevel=2)
for key, val in OrderedDict(other).iteritems():
if key in self:
if self[key] == val:
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论