提交 b4c881d1 authored 作者: Dumitru Erhan's avatar Dumitru Erhan

merge

......@@ -43,8 +43,10 @@ Environment Variables
.. envvar:: THEANO_FLAGS
This is a list of comma-delimited key[=value] pairs that control Theano's behavior. A key that appears without an '=value' must be for a boolean value, and it acts as setting it to True.
This is a list of comma-delimited key[=value] pairs that control
Theano's behavior. A key that appears without an '=value' must be
for a boolean value, and it acts as setting it to True.
For example, in bash, you can override your :envvar:`THEANORC` defaults
for <myscript>.py by typing this:
......@@ -52,11 +54,15 @@ Environment Variables
THEANO_FLAGS='floatX=float32,device=gpu0,nvcc.fastmath' python <myscript>.py
If a value is defined several times in ``THEANO_FLAGS``,
the right-most definition is used. So, for instance, if
``THEANO_FLAGS='device=cpu,device=gpu0'``, then gpu0 will be used.
.. envvar:: THEANORC
The location[s] of the .theanorc file[s] in ConfigParser format.
It defaults to ``$HOME/.theanorc``.
It defaults to ``$HOME/.theanorc``.
Here is the .theanorc equivalent to the THEANO_FLAGS in the example above:
.. code-block:: text
......@@ -70,10 +76,10 @@ Environment Variables
Multiple configuration files can be specified by separating them with ':'
characters (as in $PATH). Multiple configuration files will be merged,
with earlier (left-most) files taking priority over later files in the
with later (right-most) files taking priority over earlier files in the
case that multiple files specify values for a common configuration option.
For example, to override system-wide settings with personal ones,
set ``THEANORC=~/.theanorc:/etc/theanorc``
For example, to override system-wide settings with personal ones,
set ``THEANORC=/etc/theanorc:~/.theanorc``.
The rest of this page describes some of the more common and important flags
that you might want to use. For the complete list (including documentation),
......
......@@ -58,7 +58,7 @@ file and run it.
import numpy
import time
vlen = 100000
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
......@@ -74,28 +74,31 @@ The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to
make sure that the input `x` are stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 3 seconds, whereas on the GPU it takes just over 0.2 seconds. Note that the results are close but not identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. Note that the results are close but not
identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
As a point of reference, a loop that calls ``numpy.exp(x.value)`` also takes about 7 seconds.
.. code-block:: text
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu python thing.py
Looping 100 times took 3.12647008896 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 1.74085572 2.55530456 1.88906098]
Looping 100 times took 7.17374897003 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753 1.62323285]
bergstra@tikuanyin:~/tmp$ THEANO_FLAGS=mode=FAST_RUN,device=gpu0 python thing.py
Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.217401981354 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 1.74085569 2.55530477 1.88906097]
Looping 100 times took 0.418929815292 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Returning a handle to device-allocated data
-------------------------------------------
The speedup is not greater in the example above because the function is
returning its result as a numpy ndarray (which has already copied from the
device to the host). This is what makes it so easy to swap in device=gpu0, but
if you want to be less portable, you can see a bigger speedup by changing
returning its result as a numpy ndarray which has already been copied from the
device to the host for your convenience. This is what makes it so easy to swap in device=gpu0, but
if you don't mind being less portable, you might prefer to see a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The gpu_from_host
op means "copy the input from the host to the gpu" and it is optimized away
Op means "copy the input from the host to the gpu" and it is optimized away
after the T.exp(x) is replaced by a GPU version of exp().
.. code-block:: python
......@@ -105,7 +108,7 @@ after the T.exp(x) is replaced by a GPU version of exp().
import numpy
import time
vlen = 100000
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
......@@ -123,17 +126,71 @@ The output from this program is
.. code-block:: text
Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.173671007156 seconds
Looping 100 times took 0.185714006424 seconds
Result is <CudaNdarray object at 0x3e9e970>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 1.74085569 2.55530477 1.88906097]
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Here we've shaved off about 20% of the run-time by simply not copying the
Here we've shaved off about 50% of the run-time by simply not copying the
resulting array back to the host.
The object returned by each function call is now not a numpy array but a
"CudaNdarray" which can be converted to a numpy ndarray by the normal
numpy casting mechanism.
Running the GPU at Full Speed
------------------------------
To really get maximum performance in this simple example, we need to use an :class:`Out`
instance to tell Theano not to copy the output it returns to us. Theano allocates memory for
internal use like a working buffer, but by default it will never return a result that is
allocated in the working buffer. This is normally what you want, but our example is so simple
that it has the un-wanted side-effect of really slowing things down.
..
TODO:
The story here about copying and working buffers is misleading and potentially not correct
... why exactly does borrow=True cut 75% of the runtime ???
.. code-block:: python
from theano import function, config, shared, sandbox, Out
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)),
borrow=True))
t0 = time.time()
for i in xrange(iters):
r = f()
print 'Looping 100 times took', time.time() - t0, 'seconds'
print 'Result is', r
print 'Numpy result is', numpy.asarray(r)
Running this version of the code takes just under 0.05 seconds, over 140x faster than
the CPU implementation!
.. code-block:: text
Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.0497219562531 seconds
Result is <CudaNdarray object at 0x31eeaf0>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
This version of the code ``using borrow=True`` is slightly less safe because if we had saved
the `r` returned from one function call, we would have to take care and remember that its value might
be over-written by a subsequent function call. Although borrow=True makes a dramatic difference in this example,
be careful! The advantage of
borrow=True is much weaker in larger graphs, and there is a lot of potential for making a
mistake by failing to account for the resulting memory aliasing.
What can be accelerated on the GPU?
------------------------------------
......
......@@ -428,9 +428,20 @@ class Function(object):
# Reinitialize each container's 'provided' counter
for c in self.input_storage:
c.provided = 0
# Set positional arguments
for i, arg in enumerate(args):
self[i] = arg
i = 0
for arg in args:
#TODO: provide a Param option for skipping the filter if we
# really want speed.
s = self.input_storage[i]
if arg is None:
s.storage[0] = arg
else:
s.storage[0] = s.type.filter(arg, strict=s.strict)
s.provided += 1
i+=1
# Set keyword arguments
for k, arg in kwargs.iteritems():
self[k] = arg
......@@ -448,7 +459,9 @@ class Function(object):
self.inv_finder[c]))
# Do the actual work
t0_fn = time.time()
self.fn()
dt_fn = time.time() - t0_fn
# Retrieve the values that were computed
outputs = [x.data for x in self.output_storage]
......@@ -486,6 +499,9 @@ class Function(object):
self.maker.mode.fct_call_time[self.name] += dt_call
self.maker.mode.fct_call[self.name] += 1
self.maker.mode.call_time += dt_call
self.maker.mode.fn_time += dt_fn
if self.return_none:
return None
elif self.unpack_single and len(outputs) == 1:
......
......@@ -172,6 +172,8 @@ class Mode(object):
if isinstance(optimizer, gof.Query):
self.provided_optimizer = optimizer
self._optimizer = optimizer
self.call_time = 0
self.fn_time = 0
def __str__(self):
return "Mode(linker = %s, optimizer = %s)" % (self.provided_linker, self.provided_optimizer)
......
import time, atexit, copy
from theano.gof.link import WrapLinkerMany
from theano.gof.link import WrapLinker
from theano.gof.cutils import run_cthunk
from theano.compile.mode import Mode, register_mode, predefined_modes, predefined_linkers, predefined_optimizers, default_linker, default_optimizer
from theano.gof.cc import OpWiseCLinker
from theano.gof.python25 import any
from theano import gof
from theano.configparser import config, AddConfigVar, IntParam
from theano.compile.function_module import FunctionMaker
import_time = time.time()
......@@ -18,44 +19,57 @@ AddConfigVar('ProfileMode.n_ops_to_print',
"Number of ops to print by default",
IntParam(20, lambda i: i > 0))
class Profile_Maker(FunctionMaker):
def create(self, input_storage=None, trustme=False):
ret = super(Profile_Maker,self).create(input_storage, trustme)
for i, node in enumerate(ret.maker.env.toposort()):
self.mode.apply_time[(i,node.op)]=0.0
self.mode.apply_call[(i,node.op)]=0
# self.mode.op_cimpl[node.op] =
return ret
class ProfileMode(Mode):
def __init__(self, linker=default_linker, optimizer=default_optimizer):
local_time = [0.0]
apply_time = {}
apply_call = {}
op_time = {}
op_cimpl = {}
op_call = {}
compile_time = 0 #time passed in theano.function()
fct_call_time = {}#time passed inside theano fct call including op time.
fct_call = {}
self.__setstate__((linker, optimizer, local_time,
apply_time, apply_call,
op_time, op_cimpl, op_call,
op_cimpl,
compile_time, fct_call_time, fct_call))
def function_maker(self, i,o,m, *args, **kwargs):
"""Return an instance of `Profiler_Maker` which init the count"""
assert m is self
return Profile_Maker(i, o, self, *args, **kwargs)
def __getstate__(self):
#print "__getstate__",self.provided_linker,self.provided_optimizer
return (self.provided_linker, self.provided_optimizer, self.local_time,
self.apply_time, self.apply_call,
self.op_time, self.op_cimpl, self.op_call, self.compile_time, self.fct_call_time, self.fct_call)
self.op_cimpl, self.compile_time, self.fct_call_time, self.fct_call)
def __setstate__(self, (linker, optimizer, local_time,
apply_time, apply_call,
op_time, op_cimpl, op_call,
op_cimpl,
compile_time, fct_call_time, fct_call)):
self.local_time = local_time
self.apply_time = apply_time
self.apply_call = apply_call
self.op_time = op_time
self.op_cimpl = op_cimpl
self.op_call = op_call
self.compile_time = compile_time
self.fct_call_time = fct_call_time
self.fct_call = fct_call
self.call_time = 0
self.fn_time = 0
def blah(i, node, th):
if hasattr(th, 'cthunk'):
......@@ -63,7 +77,7 @@ class ProfileMode(Mode):
failure = run_cthunk(th.cthunk)
dt = time.time() - t0
if failure:
raise RuntimeError(('A C Op raised an exception. PerformLinker cannot'
raise RuntimeError(('A C Op raised an exception. PROFILE_MODE cannot'
' tell you what it was though. Use a standard mode such as'
' FAST_RUN_NOGC to correct the problem.'))
else:
......@@ -72,11 +86,9 @@ class ProfileMode(Mode):
dt = time.time() - t0
local_time[0] += dt
apply_time[(i,node.op)] = apply_time.get((i,node.op), 0.0) + dt
apply_call[(i,node.op)] = apply_call.get((i,node.op), 0) + 1
op_time[node.op] = op_time.get(node.op, 0.0) + dt
apply_time[(i,node.op)] += dt
apply_call[(i,node.op)] += 1
op_cimpl[node.op] = hasattr(th, 'cthunk')
op_call[node.op] = op_call.get(node.op,0) + 1
self.provided_linker = linker
......@@ -84,7 +96,7 @@ class ProfileMode(Mode):
if isinstance(linker, str) or linker is None:
linker = predefined_linkers[linker]
linker = WrapLinkerMany([linker], [blah])
linker = WrapLinker([linker], blah)
self.linker = linker
if isinstance(optimizer, str) or optimizer is None:
......@@ -113,18 +125,11 @@ class ProfileMode(Mode):
fct_call = self.fct_call
apply_time = self.apply_time
apply_call = self.apply_call
op_time = self.op_time
op_call = self.op_call
op_cimpl = self.op_cimpl
op_flops = {}
for a,t in op_time.items():
if hasattr(a,'flops'):
op_flops[a]=a.flops*op_call[a]/t/1e6
self.print_summary_("print_summary",local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl,
op_flops, n_apply_to_print, n_ops_to_print)
apply_time, apply_call, op_cimpl,
n_apply_to_print, n_ops_to_print)
def print_diff_summary(self, other, n_apply_to_print=15, n_ops_to_print=20):
......@@ -153,42 +158,23 @@ class ProfileMode(Mode):
r[a]+=t
return r
def diff_dict_flops(a_time,b_time_,a_call,b_call):
flops = {}
b_time = copy.copy(b_time_)
for a,ta in a_time.items():
tb = b_time.pop(a,0)
if hasattr(a,'flops'):
flops[a]=a.flops*a_call[a]/ta - a.flops*b_call[a]/tb/1e6
#they are missing in a
for b,tb in b_time.items():
if hasattr(b,'flops'):
flops[b]=b.flops*b_call[b]/tb/1e6
return flops
local_time = self.local_time[0]-other.local_time[0]
compile_time = self.compile_time-other.compile_time
fct_call_time = diff_dict(self.fct_call_time,other.fct_call_time)
fct_call = diff_dict(self.fct_call,other.fct_call)
apply_time = diff_dict(self.apply_time, other.apply_time)
apply_call = diff_dict(self.apply_call, other.apply_call)
op_time = diff_dict(self.op_time, other.op_time)
op_call = diff_dict(self.op_call, other.op_call)
op_cimpl = self.op_cimpl and other.op_cimpl
op_flops = diff_dict_flops(self.op_time, other.op_time, self.op_call, other.op_call)
self.print_summary_("print_diff_summary",local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl,
op_flops, n_apply_to_print=n_apply_to_print,
apply_time, apply_call, op_cimpl,
n_apply_to_print=n_apply_to_print,
n_ops_to_print=n_ops_to_print, print_apply=False)
@staticmethod
def print_summary_(fct_name, local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl,
op_flops=None, n_apply_to_print=15, n_ops_to_print=20, print_apply=True):
apply_time, apply_call, op_cimpl,
n_apply_to_print=15, n_ops_to_print=20, print_apply=True):
"""
do the actual printing of print_summary and print_diff_summary.
......@@ -218,6 +204,19 @@ class ProfileMode(Mode):
sum(f for f, t, a, nb_call in atimes[n_apply_to_print:])*100,
sum(t for f, t, a, nb_call in atimes[n_apply_to_print:]))
op_time = {}
op_call = {}
for (i,a),t in apply_time.items():
op_time.setdefault(a,0)
op_call.setdefault(a,0)
op_time[a]+=t
op_call[a]+=apply_call[(i,a)]
op_flops = {}
for a,t in op_time.items():
if hasattr(a,'flops'):
op_flops[a]=a.flops*op_call[a]/t/1e6
flops_msg=''
if op_flops:
flops_msg=' <MFlops/s>'
......
......@@ -544,35 +544,20 @@ class Test_check_isfinite(unittest.TestCase):
theano.tensor.TensorType.filter_checks_isfinite = self.old_val
def test_check_isfinite(self):
x = theano.tensor.dvector()
x = theano.tensor.vector()
f = theano.function([x], (x+2) * 5, mode='DEBUG_MODE')
g = theano.function([x], theano.tensor.log(x), mode='DEBUG_MODE')
# this should work
f(numpy.log([3, 4, 5]))
# this should raise InvalidValueError
try:
# insert a NaN
f(numpy.log([3, -4, 5]))
assert False
except debugmode.InvalidValueError:
pass
# this should raise InvalidValueError
try:
# insert an Nan and Inf
f(numpy.asarray([0, 1.0, 0])/0)
assert False
except debugmode.InvalidValueError:
pass
# passing an invalid value as an input should trigger ValueError
self.failUnlessRaises(ValueError, f, numpy.log([3, -4, 5]))
self.failUnlessRaises(ValueError, f, numpy.asarray([0, 1.0, 0])/0)
self.failUnlessRaises(ValueError, f, numpy.asarray([1.0, 1.0, 1.0])/0)
# this should raise InvalidValueError
try:
# insert several Inf
f(numpy.asarray([1.0, 1.0, 1.0])/0)
assert False
except debugmode.InvalidValueError:
pass
# generating an invalid value internally should trigger InvalidValueError
self.failUnlessRaises(debugmode.InvalidValueError, g, [3,-4,5])
# this should disable the exception
theano.tensor.TensorType.filter_checks_isfinite = False
......
......@@ -14,11 +14,12 @@ THEANO_FLAGS=os.getenv("THEANO_FLAGS","")
# [section.]option[=value] entries. If the section part is omited, their should be only one
# section with that contain the gived option.
# THEANORC=~/.theanorc:~lisa/.theanorc
# THEANORC can contain a colon-delimited list of config files, like
# THEANORC=~lisa/.theanorc:~/.theanorc
# In that case, definitions in files on the right (here, ~/.theanorc) have
# precedence over those in files on the left.
def config_files_from_theanorc():
rval = [os.path.expanduser(s) for s in os.getenv('THEANORC', '~/.theanorc').split(':')]
rval.reverse()
print "THEANORC", rval
return rval
theano_cfg = ConfigParser.SafeConfigParser()
theano_cfg.read(config_files_from_theanorc())
......@@ -42,14 +43,15 @@ def fetch_val_for_key(key):
"""Return the overriding config value for a key.
A successful search returs a string value.
An unsuccessful search raises a KeyError
The priority order is:
The (decreasing) priority order is:
- THEANO_FLAGS
- ~./theanorc
"""
# first try to find it in the FLAGS
rval = None
for name_val in THEANO_FLAGS.split(','):
if not name_val:
continue
......@@ -60,7 +62,12 @@ def fetch_val_for_key(key):
name, val = name_val_tuple
if name == key:
return val
# rval might be overriden by a later definition in THEANO_FLAGS
rval = val
# If an rval is found, it should be a string
if rval is not None:
return rval
# next try to find it in the config file
......@@ -77,7 +84,7 @@ def fetch_val_for_key(key):
return theano_cfg.get(section, option)
except (ConfigParser.NoOptionError, ConfigParser.NoSectionError):
raise KeyError(key)
class TheanoConfigParser(object):
#properties are installed by AddConfigVar
......@@ -143,7 +150,7 @@ class ConfigParam(object):
self.val = val
deleter=None
class EnumStr(ConfigParam):
def __init__(self, default, *options):
self.default = default
......
......@@ -222,7 +222,7 @@ class PureType(object):
try:
self.filter(a, True)
return True
except TypeError:
except (TypeError, ValueError):
return False
def make_variable(self, name = None):
......
......@@ -18,18 +18,22 @@ def _asarray(a, dtype=None, order=None):
Currently, this issue has only been causing trouble when the target
data type is 'int32', on some computers. As a result, this is the only
situation where we do more than a simple call to ``numpy.asarray``. If it
turns out that a similar problem can occur for more data type, this
situation where we may do more than a simple call to ``numpy.asarray``. If
it turns out that a similar problem can occur for more data type, this
function should be updated accordingly.
This function's name starts with a '_' to indicate that it is meant to be
used internally. It is imported so as to be available directly through
theano._asarray
"""
dtype = numpy.dtype(dtype) # Convert into dtype object.
rval = numpy.asarray(a, dtype=dtype, order=order)
if dtype is numpy.int32 or dtype == 'int32':
# Make sure the type is properly set to the correct type.
return rval.view(dtype=numpy.int32)
numpy_int32 = numpy.dtype(numpy.int32)
if (dtype is numpy_int32 and rval.dtype is not numpy_int32):
# Enfore the numpy.int32 dtype.
return rval.view(dtype=numpy_int32)
else:
# Using ``numpy.asarray`` should work just fine.
# Debug assert if we want to detect other failure cases (untested):
# assert rval.dtype is dtype
return rval
......@@ -5,7 +5,17 @@ from theano import gof, Op, tensor, config
from theano.printing import Print
def getFilterOutShp(inshp, kshp, (dx,dy)=(1,1), mode='valid'):
"""Returns numpy ndarray of len 2
"""
Computes the shape (nb_rows, nb_col) of each output image.
:type inshp: tuple, list or 1D ndarray of length 2
:param inshp: shape of each (2D) input image
:type kshp: tuple, list or 1D ndarray of length 2
:param kshp: shape of each (2D) kernel filter
:type mode: string
:param mode: 'valid' or 'full' (see 'border_mode' in conv2d's doc)
:rtype: numpy 1D ndarray of len 2
:return: shape of each output "image" (or feature map)
"""
if mode=='valid': s = -1
else: s = 1
......@@ -28,10 +38,12 @@ def conv2d(input, filters, border_mode='valid', subsample=(1,1),
:param filters: tensor containing filters for convolutional neural net.
Indexing is: (filter, filter input feature map, filter row, filter col).
:type border_mode: string
:param border_mode:'valid'(only apply kernel over complete patch of the image)
or 'full'(padd the image with 0 and apply the kernel over all full patch and partial patch of the image
:param border_mode:'valid'(only apply kernel over complete patch of the image) or
'full'(padd the image with 0 and apply the kernel over all full patch and partial patch of
the image
:type subsample: tuple of len 2
:param subsample: how many pixel we move in the (row,col) direction of the image when we change of patch
:param subsample: how many pixel we move in the (row,col) direction of the image when we
change of patch
:type image_shape: tuple of len 4
:param image_shape: (batch size, stack size, nb row, nb col)
:type filter_shape: tuple of len 4
......@@ -60,18 +72,18 @@ def conv2d(input, filters, border_mode='valid', subsample=(1,1),
class ConvOp(Op):
"""
A convolution op that should extend scipy.signal.convolve2d, but much faster!
A convolution op that should behave like scipy.signal.convolve2d,
but much faster!
"""
__attrnames = ['imshp', 'kshp', 'nkern', 'bsize', 'dx', 'dy', 'out_mode',
'unroll_batch', 'unroll_kern', 'unroll_patch',
'imshp_logical', 'kshp_logical', 'kshp_logical_top_aligned']
"""These attributes uniquely identify the behaviour of this op for given inputs"""
def __init__(self, imshp=None, kshp=None, nkern=None, bsize=None, dx=None, dy=None, output_mode='valid',
unroll_batch=0,
def __init__(self, imshp=None, kshp=None, nkern=None, bsize=None,
dx=None, dy=None,
output_mode='valid', unroll_batch=0,
unroll_kern=0,
unroll_patch=True,
imshp_logical=None,
......@@ -80,7 +92,12 @@ class ConvOp(Op):
verbose=0,
version=-1):
"""
This Op implement the convolution of a kernel(tensor 4d,(nkern, stacksize, nb row, nb col)) on an image(tensor 4d, (batchsize, stacksize, nb row, nb col). The batch size is multiple image that we want to apply the same kernel over. The nkern is numtiple kernel that we want to apply to each image. The stack size is mostly used when their is multiple layer in the network. It is the sum of the convolution of multiple 2d image and kernel.
This Op implement the convolution of a kernel(tensor 4d,(nkern, stacksize, nb row, nb
col)) on an image(tensor 4d, (batchsize, stacksize, nb row, nb col). The batch size is
multiple image that we want to apply the same kernel over. The nkern is numtiple kernel
that we want to apply to each image. The stack size is mostly used when their is
multiple layer in the network. It is the sum of the convolution of multiple 2d image
and kernel.
The reason that this op does the summation over convolutions within the 'stack' is that
it allows us to be memory-efficient about how gradients are calculated. If, for
......@@ -89,14 +106,22 @@ class ConvOp(Op):
point) then we would have to sum over a potentially very large tensor to get the
gradient on the filters.
If the imshp, kshp, nkern and bsize are provided, we can generate more optimal code. This make a significant difference for the full mode with unroll_patch version.
The most frequent faster code currently available on 64_x86 computer is unroll_batch=4, unroll_kern=4, unroll_patch=False and this request that all the optional shape information are gived. Those number are empirically tested and backed up by the article: Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12, May 2008. It is in figure 12, it give the value mr x nr, those value are the optimum to use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
If the imshp, kshp, nkern and bsize are provided, we can generate more optimal code.
This make a significant difference for the full mode with unroll_patch version. The
most frequent faster code currently available on 64_x86 computer is unroll_batch=4,
unroll_kern=4, unroll_patch=False and this request that all the optional shape
information are gived. Those number are empirically tested and backed up by the
article: Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert
A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12,
May 2008. It is in figure 12, it give the value mr x nr, those value are the optimum to
use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other
architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
:type out_mode: string
:param out_mode: 'valid'(give an output smaller then the image, 'full'(give an output bigger then the image)
:param out_mode: 'valid'(give an output smaller then the image, 'full'(give an output
bigger then the image)
optional parameter(if provided will be used to generate more optinal c code):
optional parameters: (will generate more optimal c code)
:type imshp: tuple of len 2 or 3: 2 for 2d image, 3 for a stack of 2d images.
:param imshp: (stacksize, nb image row, nb image col)
......@@ -113,13 +138,17 @@ class ConvOp(Op):
param to select the version of code used:
:type unroll_patch: bool
:param unroll_patch: use a version of c_code that unroll the patch loop that don't request all shape information to work, but if all shape information are present, will use it to hardcode the value in the code for faster code.
:param unroll_patch: use a version of c_code that unroll the patch loop that don't
request all shape information to work, but if all shape information are present, will
use it to hardcode the value in the code for faster code.
:type unroll_batch:int
:param unroll_batch: use a version of c_code that unroll the batch(by unroll_batch) and the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern respectively.
:param unroll_batch: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
respectively.
:type unroll_kern:int
:param unroll_kern: use a version of c_code that unroll the batch(by unroll_batch) and the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern respectively.
:param unroll_kern: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
respectively.
:type verbose: int
:param verbose: passed to GpuConv
:type version: int
......@@ -130,26 +159,34 @@ class ConvOp(Op):
:param kshp_logical_top_aligned: idem
"""
all_shape = imshp is not None and kshp is not None and nkern is not None and bsize is not None
all_shape = imshp is not None and kshp is not None and \
nkern is not None and bsize is not None
if (unroll_batch>0 or unroll_kern>0) and not all_shape:
raise Exception("In ConvOp, when using unroll_batch and unroll_nkern, all shape are needed")
if not all_shape and (imshp is not None or kshp is not None or nkern is not None or bsize is not None):
print "OPTIMISATION WARNING: passing only a few shape to ConvOp for faster code is useless. We use all of them or none."
if not all_shape and (imshp is not None or kshp is not None \
or nkern is not None or bsize is not None):
print "OPTIMISATION WARNING: passing only a few shape to ConvOp "\
"for faster code is useless. We use all of them or none."
if not all_shape:
unroll_patch = True
if imshp is not None:
imshp = tuple(imshp)
if len(imshp)==2:
imshp = (1,)+imshp
elif len(imshp)==3:
imshp = imshp
else:
raise Exception("bad len for imshp")
self.imshp = imshp
if kshp is not None:
kshp = tuple(kshp)
self.kshp = kshp
self.nkern = nkern
self.bsize=bsize
......@@ -157,10 +194,12 @@ class ConvOp(Op):
self.dy=dy
self.verbose=verbose
self.version=version
# a triple
self.imshp_logical = self.imshp
if imshp_logical is not None: self.imshp_logical = tuple(imshp_logical)
assert (self.imshp is None and self.imshp_logical is None) or (len(self.imshp) == len(self.imshp_logical))
assert (self.imshp is None and self.imshp_logical is None) or \
(len(self.imshp) == len(self.imshp_logical))
# a pair
self.kshp_logical = self.kshp
......@@ -172,6 +211,7 @@ class ConvOp(Op):
self.unroll_patch=unroll_patch
if self.unroll_batch>0 and self.bsize % self.unroll_batch!=0:
if self.bsize<=self.unroll_batch:
self.unroll_batch = self.bsize
else:
......@@ -181,9 +221,15 @@ class ConvOp(Op):
while self.bsize % new!=0:
new-=1
print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_batch(%s) must be 0 or a divisor of bsize(%s). We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_batch),str(self.bsize),new)
print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_batch(%s)"\
"must be 0 or a divisor of bsize(%s). We revert it to %d. This"\
"won't change the result, but may make it slower."%\
(str(self.unroll_batch),str(self.bsize),new)
self.unroll_batch=new
if self.unroll_kern>0 and self.nkern % unroll_kern!=0:
if self.nkern<=self.unroll_kern:
self.unroll_kern = self.nkern
else:
......@@ -192,22 +238,29 @@ class ConvOp(Op):
assert(new>=1)
while self.nkern % new!=0:
new-=1
print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s) should be 0 or a divisor of nkern(%s)We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_kern),str(self.nkern),new)
print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s)"\
"should be 0 or a divisor of nkern(%s)We revert it to %d."\
"This won't change the result, but may make it slower."\
%(str(self.unroll_kern),str(self.nkern),new)
self.unroll_kern=new
if all_shape:
self.outshp = getFilterOutShp(self.imshp_logical, self.kshp_logical, (dx,dy), output_mode)
self.fulloutshp = getFilterOutShp(self.imshp_logical, self.kshp_logical, (1,1), output_mode)
else:
self.outshp = None
self.fulloutshp = None
self.out_mode = output_mode
if not self.out_mode in ["valid", "full"]:
raise Exception("Mode %s not implemented"%self.out_mode)
if all_shape and not (self.outshp > 0).all():
raise Exception(("Bad size for the output shape. Verify that [post-supersampling] input shape (%s)"
"and kern shape(%s) are ok. (hint: kerns must fit inside image in"
"'valid' mode)")%(self.imshp_logical,self.kshp_logical))
raise Exception(("Bad size for the output shape. Verify that [post-"\
"supersampling] input shape (%s) and kern shape(%s) are ok. "\
"(Hint: kerns must fit inside image in valid mode)")%
(self.imshp_logical,self.kshp_logical))
self._rehash()
if config.op.set_flops:
......@@ -244,11 +297,16 @@ class ConvOp(Op):
self.flops*=self.outshp[0]*self.outshp[1]#nb flops by output image
self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0]
else: #full mode not implemented
self.flops=0
for out_row in range(self.outshp[0]):#loop over output row
for out_col in range(self.outshp[0]):#loop over output col
for row in range(self.kshp[0]):#loop over kern row
if row+out_row-self.kshp[0]+1<0 or row+out_row-self.kshp[0]+1>=self.imshp[1]: continue
if (row+out_row-self.kshp[0]+1<0 or
row+out_row-self.kshp[0]+1>=self.imshp[1]):
continue
col=0
max_col=self.kshp[1]
img_col=out_col-self.kshp[1]+1
......@@ -263,7 +321,8 @@ class ConvOp(Op):
self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0]
assert self.flops==self.bsize * self.nkern * self.imshp[0] * self.kshp[0] * self.kshp[1] * self.imshp[1] * self.imshp[2] * 2
assert self.flops == self.bsize * self.nkern * self.imshp[0] * \
self.kshp[0] * self.kshp[1] * self.imshp[1] * self.imshp[2] * 2
def make_node(self, inputs, kerns):
# TODO: find a way to make ConvOp work for N-D (after NIPS09)
......@@ -375,21 +434,25 @@ class ConvOp(Op):
def grad(self, (inputs, kerns), (gz,)):
"""
In development. Works for test cases in test_sp.py
A few known issues:
* doesn't work for rectangular images or filters
* inputs needs to be a 4D tensor. Couldn't get 3D to work
* will crash if filter the same size as input image
WARNING: a few known issues:
* doesn't work for rectangular images or filters
* inputs needs to be a 4D tensor. Couldn't get 3D to work
* will crash if filter the same size as input image
"""
if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical:
raise NotImplementedError('todo')
if self.dx!=1 or self.dy!=1:
raise Exception("ERROR: We disable ConvOp.grad now when dx!=1 or dy!=1 as we think their is a high probability of bug in it. We need to raise the error on the gradient to .1!")
raise Exception("ERROR: We disable ConvOp.grad now when dx!=1 or "\
"dy!=1 as we think their is a high probability of bug in it."\
"We need to raise the error on the gradient to .1!")
all_shape = self.imshp is not None and self.kshp is not None and self.nkern is not None and self.bsize is not None
all_shape = self.imshp is not None and self.kshp is not None and \
self.nkern is not None and self.bsize is not None
if not all_shape and (self.dx!=1 or self.dy!=1):
raise Exception("ConvOp.grad when dx!=1 or dy!=1 we must have all the optional shape information")
raise Exception("ConvOp.grad when dx!=1 or dy!=1 we must have all "\
"the optional shape information")
grad_hack_necessary = False
if grad_hack_necessary:
......@@ -411,6 +474,7 @@ class ConvOp(Op):
kshp = None
un_p = self.unroll_patch
imshp_logical = None
if self.out_mode == 'valid':
(img, filters) = (newin, newgz)
kshp_logical = self.fulloutshp
......@@ -445,13 +509,17 @@ class ConvOp(Op):
un_b = bsize
else:
un_b = 1
print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine a good unroll value for the batch. Maybe you can optimize this!", bsize, un_b, self.unroll_batch, self.unroll_kern
print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the batch. Maybe you can optimize this!",\
bsize, un_b, self.unroll_batch, self.unroll_kern
if un_k!=0 and nkern%un_k!=0:
if nkern<un_k:
un_k = nkern
else:
un_k = 1
print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine a good unroll value for the kernel. Maybe you can optimize this!"
print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the kernel. Maybe you can optimize this!"
dw = ConvOp(imshp, kshp, nkern, bsize, 1,1, output_mode='valid',
unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p,
......@@ -460,9 +528,12 @@ class ConvOp(Op):
kshp_logical_top_aligned=kshp_logical_top_aligned,
version=self.version,
verbose=self.verbose)
if hasattr(self,'flops'):
dw.set_flops()
dw = dw(img,filters)
if all_shape:
assert (dw.owner.op.outshp==self.kshp).all()
if self.out_mode == 'valid':
......@@ -472,18 +543,21 @@ class ConvOp(Op):
####### Determine gradient on inputs ########
mode = 'valid'
if not self.out_mode == 'full': mode = 'full'
if not self.out_mode == 'full':
mode = 'full'
filters = kerns.dimshuffle((1,0,2,3))
filters = filters[:,:,::-1,::-1]
nkern = None
imshp = None
imshp_logical = None
kshp = None
if all_shape:
nkern = self.imshp[0]
imshp = (self.nkern, self.outshp[0], self.outshp[1])
imshp_logical=(self.nkern, self.fulloutshp[0], self.fulloutshp[1])
#print 'din', imshp, self.kshp, nkern
din = ConvOp(imshp, self.kshp, nkern, self.bsize,
1,1, output_mode=mode,
unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p,
......@@ -491,10 +565,14 @@ class ConvOp(Op):
kshp_logical=None,
version=-1,#we we change the mode, we don't forward the version.
verbose=self.verbose)
if hasattr(self,'flops'):
din.set_flops()
din = din(gz,filters)
assert (din.owner.op.outshp is None and self.imshp is None) or (din.owner.op.outshp==self.imshp[1:]).all()
assert (din.owner.op.outshp is None and self.imshp is None) or \
(din.owner.op.outshp==self.imshp[1:]).all()
return [din, dw]
def c_headers(self):
......@@ -512,8 +590,10 @@ class ConvOp(Op):
#define MOD %
using namespace std;
""" + tensor.blas.blas_header_text()
def c_libraries(self):
return tensor.blas.ldflags()
def c_code(self, node, name, (img2d, filtersflipped), (z, ), sub):
if node.inputs[0].type.dtype != node.inputs[1].type.dtype:
raise NotImplementedError()
......@@ -521,7 +601,8 @@ using namespace std;
d=locals()
d.update(sub)
all_shape = self.imshp is not None and self.kshp is not None and self.nkern is not None and self.bsize is not None
all_shape = self.imshp is not None and self.kshp is not None and \
self.nkern is not None and self.bsize is not None
d["self_out_mode"]=self.out_mode
d["self_dx"]=self.dx
......@@ -587,7 +668,7 @@ using namespace std;
if self.unroll_patch:
if self.verbose:
print "return unroll patch version",self.dx,self.dy
print "return unroll patch version. all_shape=", all_shape
return _conv_op_code_unroll_patch%d
if self.unroll_batch>0 or self.unroll_kern>0:
if self.unroll_batch<=0: self.unroll_batch=1
......@@ -607,44 +688,6 @@ using namespace std;
print "return no gemm version"
return _conv_op_code_a % d
def convolve2(kerns, kshp, nkern, images, imshp, bsize, step=(1,1),
bias=None, mode='valid', **d):
"""
param kerns: kernel tensor
param kshp: tuple(kern row, kern wid)
param nkern: int the number of kernel
param images:image tensor
param imshp: tuple([stack size,] image row, image wid)
param bsize: batch size
param step: subsampling to apply to the output tuple(row, wid)
param bias: if True, will add a bias
param mode: 'valid' or 'full'
return: tuple(theano graph with the output of ConvOp flattened to 2 dimensions, ?)
"""
#TODO: remove the bias argument from this function because convolution has nothing to do with a bias
# if imshp, is a tuple, images contains one input dimension
if len(imshp)!=3:
nvis_dim = 1
else: nvis_dim = imshp[0]
# all these reshapes should happen in place
imrshp = tensor.as_tensor([bsize] + list(imshp))
imtensor = tensor.reshape(images, imrshp)
kernrshp = tensor.as_tensor([nkern, nvis_dim] + list(kshp))
kerntensor = tensor.reshape(kerns, kernrshp)
convop = ConvOp(imshp, kshp, nkern, bsize, step[0], step[1],
output_mode=mode, **d)
convout = convop(imtensor, kerntensor)
if bias:
biastensor = tensor.DimShuffle((False,), ('x',0,'x','x'), inplace=True)(bias)
convout = convout + biastensor
rval = tensor.flatten(convout, 2)
return rval, N.hstack((nkern, convop.outshp))
_conv_op_code_a = """
const int mode=%(mode)s;
......
......@@ -5,14 +5,15 @@ from theano import config
import logging, copy
_logger_name = 'theano.sandbox.cuda'
_logger = logging.getLogger(_logger_name)
_logger.setLevel(logging.INFO)
_logger.addHandler(logging.StreamHandler())
_logger.setLevel(logging.WARNING)
def error(*msg):
_logger.warning('ERROR (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def warning(*msg):
_logger.warning(_logger_name+'WARNING: '+' '.join(str(m) for m in msg))
_logger.warning('WARNING (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def info(*msg):
_logger.info(_logger_name+'INFO: '+' '.join(str(m) for m in msg))
_logger.warning('INFO (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def debug(*msg):
_logger.debug(_logger_name+'DEBUG: '+' '.join(str(m) for m in msg))
_logger.warning('DEBUG (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
# Compile cuda_ndarray.cu
......@@ -63,23 +64,32 @@ if not compile_cuda_ndarray:
except ImportError:
compile_cuda_ndarray = True
if compile_cuda_ndarray:
import nvcc_compiler
if not nvcc_compiler.is_nvcc_available():
set_cuda_disabled()
try:
if compile_cuda_ndarray:
import nvcc_compiler
if not nvcc_compiler.is_nvcc_available():
set_cuda_disabled()
if enable_cuda:
code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read()
if enable_cuda:
code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read()
if not os.path.exists(cuda_ndarray_loc):
os.makedirs(cuda_ndarray_loc)
if not os.path.exists(cuda_ndarray_loc):
os.makedirs(cuda_ndarray_loc)
nvcc_compiler.nvcc_module_compile_str('cuda_ndarray', code, location = cuda_ndarray_loc,
include_dirs=[cuda_path], libs=['cublas'])
nvcc_compiler.nvcc_module_compile_str('cuda_ndarray', code, location = cuda_ndarray_loc,
include_dirs=[cuda_path], libs=['cublas'])
from cuda_ndarray.cuda_ndarray import *
from cuda_ndarray.cuda_ndarray import *
except Exception, e:
error( "Failed to compile cuda_ndarray.cu: %s" % str(e))
set_cuda_disabled()
if enable_cuda:
#check if their is an old cuda_ndarray that was loading instead of the one we compiled!
import cuda_ndarray.cuda_ndarray
if os.path.join(config.compiledir,'cuda_ndarray','cuda_ndarray.so')!=cuda_ndarray.cuda_ndarray.__file__:
_logger.warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!")
from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda.var import (CudaNdarrayVariable,
CudaNdarrayConstant,
......@@ -103,7 +113,7 @@ def use(device=config.device):
raise ValueError("Invalid device identifier", device)
if use.device_number is None:
# No successful call to use() has been made yet
if device=="-1" or device=="CPU":
if device<0:
return
if device in [None,""]:
device=0
......@@ -134,6 +144,5 @@ def handle_shared_float32(tf):
else:
raise NotImplementedError('removing our handler')
if enable_cuda and config.device.startswith('gpu'):
use()
......@@ -6,6 +6,13 @@ from theano import config
_logger=logging.getLogger("theano.sandbox.cuda.nvcc_compiler")
_logger.setLevel(logging.WARN)
from theano.configparser import config, AddConfigVar, StrParam
AddConfigVar('nvcc.compiler_bindir',
"if defined, nvcc compiler driver will seek g++ and gcc in this directory",
StrParam(""))
def error(*args):
#sys.stderr.write('ERROR:'+ ' '.join(str(a) for a in args)+'\n')
_logger.error("ERROR: "+' '.join(str(a) for a in args))
......@@ -68,6 +75,8 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[
debug('Generating shared lib', lib_filename)
# TODO: Why do these args cause failure on gtx285 that has 1.3 compute capability? '--gpu-architecture=compute_13', '--gpu-code=compute_13',
cmd = ['nvcc', '-shared', '-g'] + [pa for pa in preargs if pa.startswith('-O')]
if config.nvcc.compiler_bindir:
cmd.extend(['--compiler-bindir', config.nvcc.compiler_bindir])
cmd.extend(['-Xcompiler', ','.join(pa for pa in preargs if not pa.startswith('-O'))])
cmd.extend('-I%s'%idir for idir in include_dirs)
cmd.extend(['-o',lib_filename])
......
......@@ -140,20 +140,20 @@ def test_elemwise1():
b = tensor.fmatrix()
#let debugmode catch any mistakes
print >> sys.stderr, "STARTING FUNCTION 1"
print >> sys.stdout, "STARTING FUNCTION 1"
f = pfunc([b], [], updates=[(a, b**a)], mode=mode_with_gpu)
for i, node in enumerate(f.maker.env.toposort()):
print i, node
f(numpy.random.rand(*shape)+0.3)
print >> sys.stderr, "STARTING FUNCTION 2"
print >> sys.stdout, "STARTING FUNCTION 2"
#let debugmode catch any mistakes
f = pfunc([b], [], updates=[(a, tensor.exp(b**a))], mode=mode_with_gpu)
for i, node in enumerate(f.maker.env.toposort()):
print i, node
f(numpy.random.rand(*shape)+0.3)
print >> sys.stderr, "STARTING FUNCTION 3"
print >> sys.stdout, "STARTING FUNCTION 3"
#let debugmode catch any mistakes
f = pfunc([b], [], updates=[(a, a+b * tensor.exp(b**a))], mode=mode_with_gpu)
f(numpy.random.rand(*shape)+0.3)
......@@ -169,11 +169,11 @@ def test_elemwise2():
f = pfunc([b], [], updates=[(a, (a+b).dimshuffle(pattern))], mode=mode_with_gpu)
has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node
print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise
#let debugmode catch errors
print >> sys.stderr, 'pattern', pattern
print >> sys.stdout, 'pattern', pattern
f(rng.rand(*shape)*.3)
shape = (3,4,5,6)
......@@ -204,7 +204,7 @@ def test_elemwise3():
b**a).dimshuffle([2,0,3,1]))], mode=mode_with_gpu)
has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node
print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise
#let debugmode catch errors
......@@ -220,7 +220,7 @@ def test_elemwise4():
f = pfunc([b,c], [], updates=[(a, (a+b.dimshuffle('x', 0)*c.dimshuffle(0, 'x')))], mode=mode_with_gpu)
has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node
print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise
#let debugmode catch errors
......
......@@ -360,7 +360,7 @@ def test_subsample():
def test_logical_shapes():
# implement when
print >> sys.stderr, "INFO: test_logical_shapes not implemented (i.e. imshp_logical, kshp_logical, kshp_logical_top_aligned)"
print >> sys.stderr, "WARNING TODO: test_logical_shapes not implemented (i.e. imshp_logical, kshp_logical, kshp_logical_top_aligned)"
def _test_dummy():
......
......@@ -8,7 +8,7 @@ if cuda_ndarray.enable_cuda == False:
import numpy
def test_host_to_device():
print >>sys.stderr, 'starting test_host_to_dev'
print >>sys.stdout, 'starting test_host_to_dev'
for shape in ((), (3,), (2,3), (3,4,5,6)):
a = theano._asarray(numpy.random.rand(*shape), dtype='float32')
b = cuda_ndarray.CudaNdarray(a)
......@@ -53,7 +53,7 @@ def test_add():
def test_exp():
print >>sys.stderr, 'starting test_exp'
print >>sys.stdout, 'starting test_exp'
for shape in ((), (3,), (2,3), (1,10000000),(10,1000000), (100,100000),(1000,10000),(10000,1000)):
a0 = theano._asarray(numpy.random.rand(*shape), dtype='float32')
a1 = a0.copy()
......@@ -74,25 +74,25 @@ def test_exp():
def test_copy():
print >>sys.stderr, 'starting test_copy'
print >>sys.stdout, 'starting test_copy'
shape = (5,)
a = theano._asarray(numpy.random.rand(*shape), dtype='float32')
print >>sys.stderr, '.. creating device object'
print >>sys.stdout, '.. creating device object'
b = cuda_ndarray.CudaNdarray(a)
print >>sys.stderr, '.. copy'
print >>sys.stdout, '.. copy'
c = copy.copy(b)
print >>sys.stderr, '.. deepcopy'
print >>sys.stdout, '.. deepcopy'
d = copy.deepcopy(b)
print >>sys.stderr, '.. comparisons'
print >>sys.stdout, '.. comparisons'
assert numpy.allclose(a, numpy.asarray(b))
assert numpy.allclose(a, numpy.asarray(c))
assert numpy.allclose(a, numpy.asarray(d))
def test_dot():
print >>sys.stderr, 'starting test_dot'
print >>sys.stdout, 'starting test_dot'
a0 = theano._asarray(numpy.random.rand(4, 7), dtype='float32')
a1 = theano._asarray(numpy.random.rand(7, 6), dtype='float32')
......@@ -101,7 +101,7 @@ def test_dot():
assert numpy.allclose(numpy.dot(a0, a1), cuda_ndarray.dot(b0, b1))
print >> sys.stderr, 'WARNING test_dot: not testing all 8 transpose cases of dot'
print >> sys.stderr, 'WARNING TODO test_dot: not testing all 8 transpose cases of dot'
def test_sum():
shape = (2,3)
......@@ -147,7 +147,7 @@ def test_reshape():
]
def subtest(shape_1, shape_2):
#print >> sys.stderr, "INFO: shapes", shape_1, shape_2
#print >> sys.stdout, "INFO: shapes", shape_1, shape_2
a = theano._asarray(numpy.random.rand(*shape_1), dtype='float32')
b = cuda_ndarray.CudaNdarray(a)
......
......@@ -147,7 +147,7 @@ class DownsampleFactorMaxGrad(Op):
def c_code_cache_version(self):
return ()
def max_pool2D(input, ds, ignore_border=False):
"""
Takes as input a N-D tensor, where N >= 2. It downscales the input image by
......@@ -166,7 +166,7 @@ def max_pool2D(input, ds, ignore_border=False):
# extract image dimensions
img_shape = input.shape[-2:]
# count the number of "leading" dimensions, store as dmatrix
batch_size = tensor.prod(input.shape[:-2])
batch_size = tensor.shape_padright(batch_size,1)
......
......@@ -7,7 +7,7 @@ from theano.tests import unittest_tools as utt
from theano import function, Mode
import theano.tensor as T
from conv import ConvOp, convolve2, getFilterOutShp
from conv import ConvOp, getFilterOutShp
def flip(kern, kshp):
"flip the kernel as scipy.convolv2d do it flipped."
......@@ -41,7 +41,7 @@ def flip(kern, kshp):
global_rng = N.random.RandomState(3423489)
dmatrix4=T.TensorType('float64', (False, False, False, False))
def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll_batch=0, unroll_kern=0, img=T.dmatrix(), validate=True, conv_op_py=False, do_convolve2=False, do_print=True, repeat=1, unroll_patch=0):
def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll_batch=0, unroll_kern=0, img=T.dmatrix(), validate=True, conv_op_py=False, do_print=True, repeat=1, unroll_patch=False, unroll_patch_size=False, verbose=0):
# build actual input images
imgval = global_rng.rand(bsize, imshp[0], imshp[1], imshp[2])
......@@ -92,41 +92,13 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll
imgval[b,i,...], w_flip[n,i,...],1,val, bval, 0)[0::ss[0],0::ss[1]]
ntot += time.time() - time1
if do_convolve2:
####### test with new sp.convolve2 function ######
time1 = time.time()
hid, outshp2 = convolve2(kern, kshp, nkern, img, imshp,
bsize, (ss[0],ss[1]), mode=conv_mode)
propup = function([kern, img], hid)
propup1 = function([kern, img], hid,mode=Mode(linker="py"))
hidval = propup(w_flip.reshape(nkern,-1), imgval.reshape(bsize,-1))
hidval = hidval.reshape(bsize,nkern,outshp2[-2],outshp2[-1])
# hidval = hidval[:,:,::ss[0],::ss[1]]
hidval = hidval.reshape(bsize, -1)
for i in range(repeat):
hidval1 = propup1(w_flip.reshape(nkern,-1), imgval.reshape(bsize,-1))
hidval1 = hidval1.reshape(bsize,nkern,outshp2[-2],outshp2[-1])
# hidval1 = hidval1[:,:,::ss[0],::ss[1]]
hidval1 = hidval1.reshape(bsize, -1)
assert (N.abs(hidval-hidval1)<1e-5).all()
temp = N.abs(outval.reshape(bsize,-1) - hidval)
if validate:
assert (temp < 1e-5).all()
else:
hid = img #we don't need it, but it make the flow easier flow
hidval=outval.copy()#to keep the same memory
hidval1=outval.copy()
# ConvOp
if unroll_patch:
if unroll_patch and not unroll_patch_size:
conv_op = ConvOp(dx=ss[0],dy=ss[1], output_mode=conv_mode,
unroll_patch=unroll_patch)(inputs4, kerns4)
unroll_patch=unroll_patch, verbose=verbose)(inputs4, kerns4)
else:
conv_op = ConvOp(imshp, kshp, nkern, bsize, ss[0],ss[1], conv_mode,
unroll_batch=unroll_batch, unroll_kern=unroll_kern, unroll_patch=unroll_patch)(inputs4, kerns4)
unroll_batch=unroll_batch, unroll_kern=unroll_kern, unroll_patch=unroll_patch, verbose=verbose)(inputs4, kerns4)
l1shp=N.hstack((nkern,
getFilterOutShp(imshp, kshp, ss, conv_mode)))
propup2 = function([inputs4, kerns4], conv_op)
......@@ -155,7 +127,7 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll
temp = N.abs(outval - hidval3)
assert (temp < 1e-5).all()
img, imshp = hid, tuple(outshp)
imshp = tuple(outshp)
imgval = outval.reshape(bsize,outshp[0],outshp[1],outshp[2])
return tctot, tpytot, ntot
......@@ -246,23 +218,9 @@ class TestConvOp(unittest.TestCase):
# print 'img2d', img2d
img1d = img2d.reshape(bsize,-1)
# create filters (need to be flipped to use convolve2d)
# create filters
filtersflipped = flip(filters.reshape((nkern,)+kshp), kshp)
# compute with new convolve2 (no timing info)
output4, outshp4 = convolve2(kerns, kshp, nkern, input,\
imshp, bsize, (ss[0],ss[1]), bias=bias, mode=conv_mode)
# print 'output4', output4
ttime1 = time.time()
f = function([kerns, bias, input], output4)
out4 = f(filtersflipped.reshape(nkern,-1), biasvals, img1d)
# print 'out4', out4, img1d, filtersflipped
tconv2 += [time.time() - ttime1]
out4 = out4.reshape(bsize, nkern, outshp4[1], outshp4[2])
out4 = out4#[:,:,0::ss[0],0::ss[1]]
out4 = out4.reshape(bsize, -1)
# compute with ConvOp
dmatrix3=T.TensorType('float64', (False, False, False))
inputs4=dmatrix4()
......@@ -307,9 +265,6 @@ class TestConvOp(unittest.TestCase):
# compare benchmark with ConvOp
temp = bench1.flatten() - out2.flatten()
assert (temp < 1e-5).all()
# compare benchmark with convolve2
temp = bench1.flatten() - out4.flatten()
assert (temp < 1e-5).all()
print '**** Convolution Profiling Results ****'
print 'Scipy convolve2d processing time: %.3fs'%sum(tscipy),tscipy
......@@ -319,55 +274,17 @@ class TestConvOp(unittest.TestCase):
d=N.asarray(tscipy)/tconvop
print 'speed up ConvOp vs convolve2d: %.3f'%d.mean(),d
def test_multilayer_conv(self):
print '\n\n*************************************************'
print ' TEST MULTILAYER CONVOLUTION'
print '*************************************************'
# fixed parameters
# test multiple configuration at the same time
bsizes = [6,6] # batch size
imshp_starts = [(1,13,14),(1,4,5)]
kshpss = ([[5,6],[7,4]],[[2,2],[2,2]])
nkernss = [[20,40],[2,2]] # per output pixel
ssizess = [[(1,1),(1,2)],[(1,1),(2,2)]]
convmodes = ['valid','full']
do_convolve2=True
unroll = [(0,0,True),(0,0,False),(1,1,False),(2,2,False),(3,2,False)]#(batch,kern,patch)
do_speed_test = False
# TODO: this version show a bug that was fixed
# the test is included in the upper test.
# imshp_start = (1,4,4)
# kshps = ([2,2],[2,2])#,[7,4])
# nkerns = [2,2] # per output pixel
# ssizes = [(1,1),(2,2)]#2,2)]
# bsizes = [1,1] # batch size
# imshp_starts = [(1,10,10),(1,5,6)]
# kshpss = ([[2,3],[3,2]],[[2,2],[2,2]])
# nkernss = [[1,1],[1,1]] # per output pixel
N.set_printoptions(threshold=N.nan)
# symbolic stuff
kerns = [T.matrix(),T.dmatrix()]
img = T.dmatrix()
rng = N.random.RandomState(3423489)
tctot, tpytot, ntot = [], [], []
for i in range(len(kshpss)):
assert len(kshpss[i])==len(nkernss[i])==len(kerns)
if do_speed_test:
def speed_multilayer_conv(self):
# calculate the speed up of different combination of unroll
# put the paramter to the same you will try.
validate=False# we don't validate the result to have it much faster!
verbose=1
unroll_batch = [1,2,4,5,10,20]
unroll_kern = [1,2,4,5,10,20]
unroll_batch = [1,4,5]
unroll_kern = [1,4,5]
unroll_patch = [True, False]
bsize = 20 # batch size
imshp_start = (1,48,48)#un square shape to test more corner case.
......@@ -381,15 +298,16 @@ class TestConvOp(unittest.TestCase):
assert len(kshps)==len(nkerns)==len(kerns)
timing = N.zeros((len(unroll_batch),len(unroll_kern),3))
timing = N.zeros((len(unroll_batch),len(unroll_kern),3,len(convmodes)*len(ssizes)))
t_b_k=[]
#calculate the timing with unrolling
print 'time unroll batch kern'
t_=[[ 7.60572791, 3.95069814, 3.74271464], [ 4.05631089, 2.90384555, 2.93613672], [ 3.90551591, 2.92595196, 3.00102282]]
best=[]
worst=[]
best=[0.52690219879150391, 2.4266397953033447]
worst=[0.92042708396911621, 6.8822150230407715]
best=[]
worst=[]
t_=[]
for unroll_b, n_b in zip(unroll_batch,range(len(unroll_batch))):
for unroll_k, n_k in zip(unroll_kern,range(len(unroll_kern))):
......@@ -398,30 +316,31 @@ class TestConvOp(unittest.TestCase):
tctot, tpytot, ntot=[],[],[]
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=unroll_b, unroll_kern=unroll_k, validate=validate)
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=unroll_b, unroll_kern=unroll_k, validate=validate, verbose=verbose,do_print=False)
tctot+=[tctot_]
tpytot+=[tpytot_]
ntot+=[ntot_]
if unroll_b==4 and unroll_k==4:
print "unroll 4/4",tctot
#print "unroll 4/4",tctot
best=tctot
if unroll_b==1 and unroll_k==1:
print "unroll 1/1",tctot
#print "unroll 1/1",tctot
worst=tctot
timing[n_b,n_k]=[sum(tctot), sum(tpytot), sum(ntot)]
timing[n_b,n_k]=[tctot, tpytot, ntot]#[sum(tctot), sum(tpytot), sum(ntot)]
if not t_:
t=timing[:,:,0]#We select only the c timing.
t=timing[:,:,0,:]#We select only the c timing.
else:
t=t_
t=N.asarray(t)
#calculate the old timing
print 'time old version'
tctot_=[0.52555489540100098, 6.6634182929992676]
# tctot_=[]
tctot,tpytot,ntot=[],[],[]
tctot_=[]
if not tctot_:
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate)
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate, verbose=verbose,do_print=False)
tctot+=[tctot_]
tpytot+=[tpytot_]
ntot+=[ntot_]
......@@ -432,29 +351,73 @@ class TestConvOp(unittest.TestCase):
print "timing for unrolled version"
print t_b_k
print t
t_detail=t
t = t.sum(axis=2)
print "max %.3fs"%t.max(), "max param(batch unloop size/kernel unloop size)", t_b_k[t.argmax()]
print "min %.3fs"%t.min(), "min param(batch unloop size/kernel unloop size)", t_b_k[t.argmin()]
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t.min(),sum(tctot)/t.min())
print worst/best,tctot/best
#calculate the timing of unroll_patch
print 'time unroll_patch'
tctot_patch = []
tctot_patch_size = []
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=2)
tctot_patch += [tctot_]
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=True,verbose=verbose,do_print=False)
tctot_patch += [tctot_]
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=True,verbose=verbose,do_print=False,unroll_patch_size=True)
tctot_patch_size += [tctot_]
t_patch=sum(tctot_patch)
print "unroll_patch time", tctot_patch
print "unroll_patch without shape time", tctot_patch
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch,sum(tctot)/t_patch)
print best/tctot_patch, worst/tctot_patch
t_patch_size=sum(tctot_patch_size)
print "unroll_patch with shape time", tctot_patch_size
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch_size,sum(tctot)/t_patch_size)
print best/tctot_patch_size, worst/tctot_patch_size
print best
print worst
print tctot
print tctot_patch
return
def test_multilayer_conv(self):
print '\n\n*************************************************'
print ' TEST MULTILAYER CONVOLUTION'
print '*************************************************'
# fixed parameters
# test multiple configuration at the same time
bsizes = [6,6] # batch size
imshp_starts = [(1,13,14),(1,4,5)]
kshpss = ([[5,6],[7,4]],[[2,2],[2,2]])
nkernss = [[20,40],[2,2]] # per output pixel
ssizess = [[(1,1),(1,2)],[(1,1),(2,2)]]
convmodes = ['valid','full']
do_convolve2=True
unroll = [(0,0,True),(0,0,False),(1,1,False),(2,2,False),(3,2,False)]#(batch,kern,patch)
# TODO: this version show a bug that was fixed
# the test is included in the upper test.
# imshp_start = (1,4,4)
# kshps = ([2,2],[2,2])#,[7,4])
# nkerns = [2,2] # per output pixel
# ssizes = [(1,1),(2,2)]#2,2)]
# bsizes = [1,1] # batch size
# imshp_starts = [(1,10,10),(1,5,6)]
# kshpss = ([[2,3],[3,2]],[[2,2],[2,2]])
# nkernss = [[1,1],[1,1]] # per output pixel
N.set_printoptions(threshold=N.nan)
# symbolic stuff
kerns = [T.matrix(),T.dmatrix()]
img = T.dmatrix()
rng = N.random.RandomState(3423489)
tctot, tpytot, ntot = [], [], []
for i in range(len(kshpss)):
assert len(kshpss[i])==len(nkernss[i])==len(kerns)
for i in range(len(kshpss)):
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizess[i],range(len(ssizess[i]))):
......
import unittest, sys, time
import numpy as N
import theano.tensor as T
import numpy
import theano.tensor as tensor
from theano.tests import unittest_tools as utt
from theano.sandbox.downsample import DownsampleFactorMax
from theano.sandbox.downsample import DownsampleFactorMax, max_pool2D
from theano import function, Mode
def max_pool(images=None, imshp=None, maxpoolshp=None, ignore_border=True):
"""Implements a max pooling layer
Uses the same API as sp.max_pool but uses the Downsample op instead.
class TestDownsampleFactorMax(unittest.TestCase):
def setUp(self):
utt.seed_rng()
Takes as input a 2D tensor of shape batch_size x img_size and performs max pooling.
Max pooling downsamples by taking the max value in a given area, here defined by
maxpoolshp. Outputs a 2D tensor of shape batch_size x output_size.
@staticmethod
def numpy_max_pool2D(input, ds, ignore_border=False):
'''Helper function, implementing max_pool2D in pure numpy'''
if len(input.shape) < 2:
raise NotImplementedError('input should have at least 2 dim, shape is %s'\
% str(input.shape))
Parameters are keyword arguments in order to use func_to_mod.
xi=0
yi=0
if not ignore_border:
if input.shape[-2] % ds[0]:
xi += 1
if input.shape[-1] % ds[1]:
yi += 1
@param images: 2D tensor containing images on which to apply convolution.
Assumed to be of shape batch_size x img_size
@param imgshp: tuple containing image dimensions
@param maxpoolshp: tuple containing shape of area to max pool over
@output out1: symbolic result (2D tensor)
@output out2: logical shape of the output
out_shp = list(input.shape[:-2])
out_shp.append(input.shape[-2]/ds[0]+xi)
out_shp.append(input.shape[-1]/ds[1]+yi)
"""
if len(imshp) == 2:
imshp = (1,) + imshp
elif len(imshp)!=3:
raise NotImplementedError("!")
# all these reshapes should happen in place
imrshp = T.stack(images.shape[0],
*[T.as_tensor(x) for x in imshp])
imtensor = T.reshape(images, imrshp)
output_val = numpy.zeros(out_shp)
maxpop = DownsampleFactorMax(maxpoolshp, ignore_border)
rval = maxpop(imtensor)
for k in numpy.ndindex(input.shape[:-2]):
for i in range(output_val.shape[-2]):
ii = i*ds[0]
for j in range(output_val.shape[-1]):
jj = j*ds[1]
patch = input[k][ii:ii+ds[0],jj:jj+ds[1]]
output_val[k][i,j] = numpy.max(patch)
return output_val
return T.flatten(rval,2), maxpop.out_shape(imshp, maxpoolshp, ignore_border)
def test_DownsampleFactorMax(self):
rng = numpy.random.RandomState(utt.fetch_seed())
class TestDownsampleFactorMax(unittest.TestCase):
def test_maxpool(self):
# generate flatted images
# generate random images
maxpoolshps = ((1,1),(2,2),(3,3),(2,3))
imval = N.random.rand(4,10,64,64)
images = T.dmatrix()
dmatrix4=T.TensorType('float64', (False, False, False, False))
images4=dmatrix4()
tctot, tpytot, ntot = [],[],[]
imval = rng.rand(4,10,64,64)
images = tensor.dtensor4()
for maxpoolshp in maxpoolshps:
for border in [True,False]:
print 'maxpoolshp', maxpoolshp,'border', border
# numeric verification
xi=0
yi=0
if not border:
if imval.shape[-2] % maxpoolshp[0]:
xi += 1
if imval.shape[-1] % maxpoolshp[1]:
yi += 1
my_output_val = N.zeros((imval.shape[0], imval.shape[1],
imval.shape[2]/maxpoolshp[0]+xi,
imval.shape[3]/maxpoolshp[1]+yi))
time1=time.time()
for n in range(imval.shape[0]):
for k in range(imval.shape[1]):
for i in range(my_output_val.shape[2]):
ii = i*maxpoolshp[0]
for j in range(my_output_val.shape[3]):
jj = j*maxpoolshp[1]
patch = imval[n,k,ii:ii+maxpoolshp[0],jj:jj+maxpoolshp[1]]
my_output_val[n,k,i,j] = N.max(patch)
my_output_val = my_output_val.reshape(imval.shape[0],-1)
ntot+=[time.time()-time1]
# symbolic stuff
#### wrapper to DownsampleFactorMax op ####
output, outshp = max_pool(images, imval.shape[1:], maxpoolshp, border)
assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
## Pure Numpy computation
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
f = function([images,],[output,])
imval2=imval.reshape(imval.shape[0],-1)
output_val = f(imval2)
assert N.all(output_val == my_output_val)
output_val = f(imval)
assert numpy.all(output_val == numpy_output_val)
#DownsampleFactorMax op
maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=border)(images4)
f = function([images4],maxpool_op,mode=Mode(linker="py"))
f2 = function([images4],maxpool_op,mode=Mode(linker="c"))
f3 = function([images4],maxpool_op)#for when we want to use the debug mode
time1=time.time()
maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(images)
f = function([images], maxpool_op)
output_val = f(imval)
tctot+=[time.time()-time1]
assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
time1=time.time()
output_val = f2(imval)
tpytot+=[time.time()-time1]
assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
output_val = f3(imval)
print 'Numpy processing time: %.3fs'%sum(ntot),ntot
print 'c Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tctot),tctot
print 'py Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tpytot),tpytot
d=N.asarray(ntot)/tctot
print 'speed up c theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
d=N.asarray(ntot)/tpytot
print 'speed up py theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
assert (numpy.abs(output_val - numpy_output_val) < 1e-5).all()
def test_DownsampleFactorMax_grad(self):
# generate flatted images
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = ((1,1),(3,2),(2,3))
imval = N.random.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate
do_theano=True
imval = rng.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
def mp(input):
return DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(input)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_2D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = ((1,1),(3,2))
imval = rng.rand(4,7)
images = tensor.dmatrix()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
def mp(input):
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_3D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = [(1,2)]
imval = rng.rand(2,3,4)
images = tensor.dtensor3()
for maxpoolshp in maxpoolshps:
for border in [True,False]:
print 'maxpoolshp', maxpoolshp, 'border', border
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
c = tensor.sum(output)
c_val = function([images], c)(imval)
g = tensor.grad(c, images)
g_val = function([images], [g.shape, tensor.min(tensor.min(tensor.min(g))), tensor.max(tensor.max(tensor.max(g)))])(imval)
def mp(input):
return DownsampleFactorMax(maxpoolshp, ignore_border=border)(input)
utt.verify_grad(mp, [imval])
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_6D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = [(3,2)]
imval = rng.rand(2,1,1,1,3,4)
images = tensor.TensorType('float64', [False]*6)()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
def mp(input):
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
if __name__ == '__main__':
t = TestDownsampleFactorMax("test_maxpool").run()
#t.test_maxpool()
from theano.tests import main
# main("test_sp")
unittest.main()
......@@ -1125,7 +1125,7 @@ inv = Inv(upgrade_to_float, name = 'inv')
class Log(UnaryScalarOp):
""" log base e """
def impl(self, x):
return math.log(x)
return numpy.log(x)
def grad(self, (x, ), (gz, )):
if x.type in grad_types:
return gz / x,
......
......@@ -330,6 +330,7 @@ class TensorType(Type):
self.broadcastable = tuple(broadcastable)
self.dtype_specs() # error checking is done there
self.name = name
self.numpy_dtype = numpy.dtype(self.dtype)
if shape is None:
#backport self.shape = tuple((1 if b else None) for b in self.broadcastable)
l=[]
......@@ -360,16 +361,16 @@ class TensorType(Type):
This function is not meant to be called in user code. It is for
`Linker` instances to use when running a compiled graph.
"""
_data = data
if strict:
if (type(data) is numpy.ndarray) and (data.dtype is self.numpy_dtype):
pass # fall through to ndim check
elif strict:
# this is its own subcase that doesn't fall through to anything
if not isinstance(data, numpy.ndarray):
raise TypeError("%s expected a ndarray object.", data, type(data))
if not str(data.dtype) == self.dtype:
raise TypeError("%s expected a ndarray object with dtype = %s (got %s)." % (self, self.dtype, data.dtype))
if not data.ndim == self.ndim:
raise TypeError("%s expected a ndarray object with %s dimensions (got %s)." % (self, self.ndim, data.ndim))
if self.filter_checks_isfinite and (not numpy.all(numpy.isfinite(data))):
raise TypeError("non-finite elements not allowed")
if TensorType.use_shape:
for si, di in zip(self.shape, data.shape):
......@@ -378,11 +379,17 @@ class TensorType(Type):
self, self.shape, data.shape))
return data
else:
data = theano._asarray(data, dtype = self.dtype)
if not self.ndim == data.ndim:
data = theano._asarray(data, dtype = self.dtype) #TODO - consider to pad shape with ones
# to make it consistent with self.broadcastable... like vector->row type thing
if self.ndim != data.ndim:
raise TypeError("Wrong number of dimensions: expected %s, got %s with shape %s." % (self.ndim, data.ndim, data.shape), data)
if any(b and d != 1 for d, b in zip(data.shape, self.broadcastable)):
raise TypeError("Non-unit value on shape on a broadcastable dimension.", data.shape, self.broadcastable)
i = 0
for b in self.broadcastable:
if b and data.shape[i] != 1:
raise TypeError("Non-unit value on shape on a broadcastable dimension.", data.shape, self.broadcastable)
i+=1
if self.filter_checks_isfinite and (not numpy.all(numpy.isfinite(data))):
raise ValueError("non-finite elements not allowed")
return data
def dtype_specs(self):
......@@ -1826,14 +1833,16 @@ class Default(gof.Op):
view_map = {0: [0]}
def make_node(self, x, default):
x, default = as_tensor_variable(x), as_tensor_variable(default)
assert x.type == default.type
if x.type != default.type:
raise TypeError('Both default() arguments must have same type', x, default)
return gof.Apply(self, [x, default], [default.type()])
def perform(self, node, (x, default), (out, )):
if x is None:
out[0] = default.copy()
else:
out[0] = x
#backport out[0] = default.copy() if x is None else x
if x is None:
# why copy? Theano can't yet understand out[0] being a view of either x or y,
# so we can be a view of x, but only a copy of y.
out[0] = default.copy()
else:
out[0] = x
default = Default()
setdefault = default # legacy
......@@ -3588,8 +3597,10 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps=None, tol=None, mode=None, cast
o_fn = function(tensor_pt, o_output)
o_fn_out = o_fn(*[p.copy() for p in pt])
random_projection = rng.rand(*o_fn_out.shape)
# random_projection should not have elements too small,
# otherwise too much precision is lost in numerical gradient
random_projection = rng.rand(*o_fn_out.shape) + 0.5
if cast_to_output_type:
random_projection = numpy.array(random_projection,
dtype=o_output.dtype)
......
......@@ -822,7 +822,14 @@ class CAReduce(Op):
to_reduce = reversed(sorted(axis))
if to_reduce:
for dimension in to_reduce:
variable = self.ufunc.reduce(variable, dimension)
# If it's a zero-size array, use scalar_op.identity if available
if variable.shape[dimension] == 0:
if hasattr(self.scalar_op, 'identity'):
variable = self.scalar_op.identity
else:
raise ValueError("Input (%s) has zero-size on axis %s, but self.scalar_op (%s) has no attribute 'identity'" % (variable, dimension, self.scalar_op))
else:
variable = self.ufunc.reduce(variable, dimension)
output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype)
else:
output[0] = numpy.copy(variable)
......
......@@ -133,6 +133,8 @@ class test_CAReduce(unittest.TestCase):
((5, 6), (1, )),
((5, 6), ()),
((2, 3, 4, 5), (0, 1, 3)),
((5, 0), (0, )),
((5, 0), (1, )),
((), ())]:
x = TensorType('float64', [(entry == 1) for entry in xsh])('x')
e = CAReduce(add, axis = tosum)(x)
......@@ -149,7 +151,7 @@ class test_CAReduce(unittest.TestCase):
def test_c(self):
self.with_linker(gof.CLinker())
if __name__ == '__main__':
unittest.main()
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论