提交 b4c881d1 authored 作者: Dumitru Erhan's avatar Dumitru Erhan

merge

...@@ -43,8 +43,10 @@ Environment Variables ...@@ -43,8 +43,10 @@ Environment Variables
.. envvar:: THEANO_FLAGS .. envvar:: THEANO_FLAGS
This is a list of comma-delimited key[=value] pairs that control Theano's behavior. A key that appears without an '=value' must be for a boolean value, and it acts as setting it to True. This is a list of comma-delimited key[=value] pairs that control
Theano's behavior. A key that appears without an '=value' must be
for a boolean value, and it acts as setting it to True.
For example, in bash, you can override your :envvar:`THEANORC` defaults For example, in bash, you can override your :envvar:`THEANORC` defaults
for <myscript>.py by typing this: for <myscript>.py by typing this:
...@@ -52,11 +54,15 @@ Environment Variables ...@@ -52,11 +54,15 @@ Environment Variables
THEANO_FLAGS='floatX=float32,device=gpu0,nvcc.fastmath' python <myscript>.py THEANO_FLAGS='floatX=float32,device=gpu0,nvcc.fastmath' python <myscript>.py
If a value is defined several times in ``THEANO_FLAGS``,
the right-most definition is used. So, for instance, if
``THEANO_FLAGS='device=cpu,device=gpu0'``, then gpu0 will be used.
.. envvar:: THEANORC .. envvar:: THEANORC
The location[s] of the .theanorc file[s] in ConfigParser format. The location[s] of the .theanorc file[s] in ConfigParser format.
It defaults to ``$HOME/.theanorc``. It defaults to ``$HOME/.theanorc``.
Here is the .theanorc equivalent to the THEANO_FLAGS in the example above: Here is the .theanorc equivalent to the THEANO_FLAGS in the example above:
.. code-block:: text .. code-block:: text
...@@ -70,10 +76,10 @@ Environment Variables ...@@ -70,10 +76,10 @@ Environment Variables
Multiple configuration files can be specified by separating them with ':' Multiple configuration files can be specified by separating them with ':'
characters (as in $PATH). Multiple configuration files will be merged, characters (as in $PATH). Multiple configuration files will be merged,
with earlier (left-most) files taking priority over later files in the with later (right-most) files taking priority over earlier files in the
case that multiple files specify values for a common configuration option. case that multiple files specify values for a common configuration option.
For example, to override system-wide settings with personal ones, For example, to override system-wide settings with personal ones,
set ``THEANORC=~/.theanorc:/etc/theanorc`` set ``THEANORC=/etc/theanorc:~/.theanorc``.
The rest of this page describes some of the more common and important flags The rest of this page describes some of the more common and important flags
that you might want to use. For the complete list (including documentation), that you might want to use. For the complete list (including documentation),
......
...@@ -58,7 +58,7 @@ file and run it. ...@@ -58,7 +58,7 @@ file and run it.
import numpy import numpy
import time import time
vlen = 100000 vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000 iters = 1000
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
...@@ -74,28 +74,31 @@ The program just computes the exp() of a bunch of random numbers. ...@@ -74,28 +74,31 @@ The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to Note that we use the `shared` function to
make sure that the input `x` are stored on the graphics device. make sure that the input `x` are stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 3 seconds, whereas on the GPU it takes just over 0.2 seconds. Note that the results are close but not identical! The GPU will not always produce the exact same floating-point numbers as the CPU. If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. Note that the results are close but not
identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
As a point of reference, a loop that calls ``numpy.exp(x.value)`` also takes about 7 seconds.
.. code-block:: text .. code-block:: text
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu python thing.py $ THEANO_FLAGS=mode=FAST_RUN,device=cpu python thing.py
Looping 100 times took 3.12647008896 seconds Looping 100 times took 7.17374897003 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 1.74085572 2.55530456 1.88906098] Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753 1.62323285]
bergstra@tikuanyin:~/tmp$ THEANO_FLAGS=mode=FAST_RUN,device=gpu0 python thing.py bergstra@tikuanyin:~/tmp$ THEANO_FLAGS=mode=FAST_RUN,device=gpu0 python thing.py
Using gpu device 0: GeForce GTX 285 Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.217401981354 seconds Looping 100 times took 0.418929815292 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 1.74085569 2.55530477 1.88906097] Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Returning a handle to device-allocated data Returning a handle to device-allocated data
------------------------------------------- -------------------------------------------
The speedup is not greater in the example above because the function is The speedup is not greater in the example above because the function is
returning its result as a numpy ndarray (which has already copied from the returning its result as a numpy ndarray which has already been copied from the
device to the host). This is what makes it so easy to swap in device=gpu0, but device to the host for your convenience. This is what makes it so easy to swap in device=gpu0, but
if you want to be less portable, you can see a bigger speedup by changing if you don't mind being less portable, you might prefer to see a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The gpu_from_host the graph to express a computation with a GPU-stored result. The gpu_from_host
op means "copy the input from the host to the gpu" and it is optimized away Op means "copy the input from the host to the gpu" and it is optimized away
after the T.exp(x) is replaced by a GPU version of exp(). after the T.exp(x) is replaced by a GPU version of exp().
.. code-block:: python .. code-block:: python
...@@ -105,7 +108,7 @@ after the T.exp(x) is replaced by a GPU version of exp(). ...@@ -105,7 +108,7 @@ after the T.exp(x) is replaced by a GPU version of exp().
import numpy import numpy
import time import time
vlen = 100000 vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000 iters = 1000
rng = numpy.random.RandomState(22) rng = numpy.random.RandomState(22)
...@@ -123,17 +126,71 @@ The output from this program is ...@@ -123,17 +126,71 @@ The output from this program is
.. code-block:: text .. code-block:: text
Using gpu device 0: GeForce GTX 285 Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.173671007156 seconds Looping 100 times took 0.185714006424 seconds
Result is <CudaNdarray object at 0x3e9e970> Result is <CudaNdarray object at 0x3e9e970>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 1.74085569 2.55530477 1.88906097] Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Here we've shaved off about 20% of the run-time by simply not copying the Here we've shaved off about 50% of the run-time by simply not copying the
resulting array back to the host. resulting array back to the host.
The object returned by each function call is now not a numpy array but a The object returned by each function call is now not a numpy array but a
"CudaNdarray" which can be converted to a numpy ndarray by the normal "CudaNdarray" which can be converted to a numpy ndarray by the normal
numpy casting mechanism. numpy casting mechanism.
Running the GPU at Full Speed
------------------------------
To really get maximum performance in this simple example, we need to use an :class:`Out`
instance to tell Theano not to copy the output it returns to us. Theano allocates memory for
internal use like a working buffer, but by default it will never return a result that is
allocated in the working buffer. This is normally what you want, but our example is so simple
that it has the un-wanted side-effect of really slowing things down.
..
TODO:
The story here about copying and working buffers is misleading and potentially not correct
... why exactly does borrow=True cut 75% of the runtime ???
.. code-block:: python
from theano import function, config, shared, sandbox, Out
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)),
borrow=True))
t0 = time.time()
for i in xrange(iters):
r = f()
print 'Looping 100 times took', time.time() - t0, 'seconds'
print 'Result is', r
print 'Numpy result is', numpy.asarray(r)
Running this version of the code takes just under 0.05 seconds, over 140x faster than
the CPU implementation!
.. code-block:: text
Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.0497219562531 seconds
Result is <CudaNdarray object at 0x31eeaf0>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
This version of the code ``using borrow=True`` is slightly less safe because if we had saved
the `r` returned from one function call, we would have to take care and remember that its value might
be over-written by a subsequent function call. Although borrow=True makes a dramatic difference in this example,
be careful! The advantage of
borrow=True is much weaker in larger graphs, and there is a lot of potential for making a
mistake by failing to account for the resulting memory aliasing.
What can be accelerated on the GPU? What can be accelerated on the GPU?
------------------------------------ ------------------------------------
......
...@@ -428,9 +428,20 @@ class Function(object): ...@@ -428,9 +428,20 @@ class Function(object):
# Reinitialize each container's 'provided' counter # Reinitialize each container's 'provided' counter
for c in self.input_storage: for c in self.input_storage:
c.provided = 0 c.provided = 0
# Set positional arguments # Set positional arguments
for i, arg in enumerate(args): i = 0
self[i] = arg for arg in args:
#TODO: provide a Param option for skipping the filter if we
# really want speed.
s = self.input_storage[i]
if arg is None:
s.storage[0] = arg
else:
s.storage[0] = s.type.filter(arg, strict=s.strict)
s.provided += 1
i+=1
# Set keyword arguments # Set keyword arguments
for k, arg in kwargs.iteritems(): for k, arg in kwargs.iteritems():
self[k] = arg self[k] = arg
...@@ -448,7 +459,9 @@ class Function(object): ...@@ -448,7 +459,9 @@ class Function(object):
self.inv_finder[c])) self.inv_finder[c]))
# Do the actual work # Do the actual work
t0_fn = time.time()
self.fn() self.fn()
dt_fn = time.time() - t0_fn
# Retrieve the values that were computed # Retrieve the values that were computed
outputs = [x.data for x in self.output_storage] outputs = [x.data for x in self.output_storage]
...@@ -486,6 +499,9 @@ class Function(object): ...@@ -486,6 +499,9 @@ class Function(object):
self.maker.mode.fct_call_time[self.name] += dt_call self.maker.mode.fct_call_time[self.name] += dt_call
self.maker.mode.fct_call[self.name] += 1 self.maker.mode.fct_call[self.name] += 1
self.maker.mode.call_time += dt_call
self.maker.mode.fn_time += dt_fn
if self.return_none: if self.return_none:
return None return None
elif self.unpack_single and len(outputs) == 1: elif self.unpack_single and len(outputs) == 1:
......
...@@ -172,6 +172,8 @@ class Mode(object): ...@@ -172,6 +172,8 @@ class Mode(object):
if isinstance(optimizer, gof.Query): if isinstance(optimizer, gof.Query):
self.provided_optimizer = optimizer self.provided_optimizer = optimizer
self._optimizer = optimizer self._optimizer = optimizer
self.call_time = 0
self.fn_time = 0
def __str__(self): def __str__(self):
return "Mode(linker = %s, optimizer = %s)" % (self.provided_linker, self.provided_optimizer) return "Mode(linker = %s, optimizer = %s)" % (self.provided_linker, self.provided_optimizer)
......
import time, atexit, copy import time, atexit, copy
from theano.gof.link import WrapLinkerMany from theano.gof.link import WrapLinker
from theano.gof.cutils import run_cthunk from theano.gof.cutils import run_cthunk
from theano.compile.mode import Mode, register_mode, predefined_modes, predefined_linkers, predefined_optimizers, default_linker, default_optimizer from theano.compile.mode import Mode, register_mode, predefined_modes, predefined_linkers, predefined_optimizers, default_linker, default_optimizer
from theano.gof.cc import OpWiseCLinker from theano.gof.cc import OpWiseCLinker
from theano.gof.python25 import any from theano.gof.python25 import any
from theano import gof from theano import gof
from theano.configparser import config, AddConfigVar, IntParam from theano.configparser import config, AddConfigVar, IntParam
from theano.compile.function_module import FunctionMaker
import_time = time.time() import_time = time.time()
...@@ -18,44 +19,57 @@ AddConfigVar('ProfileMode.n_ops_to_print', ...@@ -18,44 +19,57 @@ AddConfigVar('ProfileMode.n_ops_to_print',
"Number of ops to print by default", "Number of ops to print by default",
IntParam(20, lambda i: i > 0)) IntParam(20, lambda i: i > 0))
class Profile_Maker(FunctionMaker):
def create(self, input_storage=None, trustme=False):
ret = super(Profile_Maker,self).create(input_storage, trustme)
for i, node in enumerate(ret.maker.env.toposort()):
self.mode.apply_time[(i,node.op)]=0.0
self.mode.apply_call[(i,node.op)]=0
# self.mode.op_cimpl[node.op] =
return ret
class ProfileMode(Mode): class ProfileMode(Mode):
def __init__(self, linker=default_linker, optimizer=default_optimizer): def __init__(self, linker=default_linker, optimizer=default_optimizer):
local_time = [0.0] local_time = [0.0]
apply_time = {} apply_time = {}
apply_call = {} apply_call = {}
op_time = {}
op_cimpl = {} op_cimpl = {}
op_call = {}
compile_time = 0 #time passed in theano.function() compile_time = 0 #time passed in theano.function()
fct_call_time = {}#time passed inside theano fct call including op time. fct_call_time = {}#time passed inside theano fct call including op time.
fct_call = {} fct_call = {}
self.__setstate__((linker, optimizer, local_time, self.__setstate__((linker, optimizer, local_time,
apply_time, apply_call, apply_time, apply_call,
op_time, op_cimpl, op_call, op_cimpl,
compile_time, fct_call_time, fct_call)) compile_time, fct_call_time, fct_call))
def function_maker(self, i,o,m, *args, **kwargs):
"""Return an instance of `Profiler_Maker` which init the count"""
assert m is self
return Profile_Maker(i, o, self, *args, **kwargs)
def __getstate__(self): def __getstate__(self):
#print "__getstate__",self.provided_linker,self.provided_optimizer #print "__getstate__",self.provided_linker,self.provided_optimizer
return (self.provided_linker, self.provided_optimizer, self.local_time, return (self.provided_linker, self.provided_optimizer, self.local_time,
self.apply_time, self.apply_call, self.apply_time, self.apply_call,
self.op_time, self.op_cimpl, self.op_call, self.compile_time, self.fct_call_time, self.fct_call) self.op_cimpl, self.compile_time, self.fct_call_time, self.fct_call)
def __setstate__(self, (linker, optimizer, local_time, def __setstate__(self, (linker, optimizer, local_time,
apply_time, apply_call, apply_time, apply_call,
op_time, op_cimpl, op_call, op_cimpl,
compile_time, fct_call_time, fct_call)): compile_time, fct_call_time, fct_call)):
self.local_time = local_time self.local_time = local_time
self.apply_time = apply_time self.apply_time = apply_time
self.apply_call = apply_call self.apply_call = apply_call
self.op_time = op_time
self.op_cimpl = op_cimpl self.op_cimpl = op_cimpl
self.op_call = op_call
self.compile_time = compile_time self.compile_time = compile_time
self.fct_call_time = fct_call_time self.fct_call_time = fct_call_time
self.fct_call = fct_call self.fct_call = fct_call
self.call_time = 0
self.fn_time = 0
def blah(i, node, th): def blah(i, node, th):
if hasattr(th, 'cthunk'): if hasattr(th, 'cthunk'):
...@@ -63,7 +77,7 @@ class ProfileMode(Mode): ...@@ -63,7 +77,7 @@ class ProfileMode(Mode):
failure = run_cthunk(th.cthunk) failure = run_cthunk(th.cthunk)
dt = time.time() - t0 dt = time.time() - t0
if failure: if failure:
raise RuntimeError(('A C Op raised an exception. PerformLinker cannot' raise RuntimeError(('A C Op raised an exception. PROFILE_MODE cannot'
' tell you what it was though. Use a standard mode such as' ' tell you what it was though. Use a standard mode such as'
' FAST_RUN_NOGC to correct the problem.')) ' FAST_RUN_NOGC to correct the problem.'))
else: else:
...@@ -72,11 +86,9 @@ class ProfileMode(Mode): ...@@ -72,11 +86,9 @@ class ProfileMode(Mode):
dt = time.time() - t0 dt = time.time() - t0
local_time[0] += dt local_time[0] += dt
apply_time[(i,node.op)] = apply_time.get((i,node.op), 0.0) + dt apply_time[(i,node.op)] += dt
apply_call[(i,node.op)] = apply_call.get((i,node.op), 0) + 1 apply_call[(i,node.op)] += 1
op_time[node.op] = op_time.get(node.op, 0.0) + dt
op_cimpl[node.op] = hasattr(th, 'cthunk') op_cimpl[node.op] = hasattr(th, 'cthunk')
op_call[node.op] = op_call.get(node.op,0) + 1
self.provided_linker = linker self.provided_linker = linker
...@@ -84,7 +96,7 @@ class ProfileMode(Mode): ...@@ -84,7 +96,7 @@ class ProfileMode(Mode):
if isinstance(linker, str) or linker is None: if isinstance(linker, str) or linker is None:
linker = predefined_linkers[linker] linker = predefined_linkers[linker]
linker = WrapLinkerMany([linker], [blah]) linker = WrapLinker([linker], blah)
self.linker = linker self.linker = linker
if isinstance(optimizer, str) or optimizer is None: if isinstance(optimizer, str) or optimizer is None:
...@@ -113,18 +125,11 @@ class ProfileMode(Mode): ...@@ -113,18 +125,11 @@ class ProfileMode(Mode):
fct_call = self.fct_call fct_call = self.fct_call
apply_time = self.apply_time apply_time = self.apply_time
apply_call = self.apply_call apply_call = self.apply_call
op_time = self.op_time
op_call = self.op_call
op_cimpl = self.op_cimpl op_cimpl = self.op_cimpl
op_flops = {}
for a,t in op_time.items():
if hasattr(a,'flops'):
op_flops[a]=a.flops*op_call[a]/t/1e6
self.print_summary_("print_summary",local_time, compile_time, fct_call_time, fct_call, self.print_summary_("print_summary",local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl, apply_time, apply_call, op_cimpl,
op_flops, n_apply_to_print, n_ops_to_print) n_apply_to_print, n_ops_to_print)
def print_diff_summary(self, other, n_apply_to_print=15, n_ops_to_print=20): def print_diff_summary(self, other, n_apply_to_print=15, n_ops_to_print=20):
...@@ -153,42 +158,23 @@ class ProfileMode(Mode): ...@@ -153,42 +158,23 @@ class ProfileMode(Mode):
r[a]+=t r[a]+=t
return r return r
def diff_dict_flops(a_time,b_time_,a_call,b_call):
flops = {}
b_time = copy.copy(b_time_)
for a,ta in a_time.items():
tb = b_time.pop(a,0)
if hasattr(a,'flops'):
flops[a]=a.flops*a_call[a]/ta - a.flops*b_call[a]/tb/1e6
#they are missing in a
for b,tb in b_time.items():
if hasattr(b,'flops'):
flops[b]=b.flops*b_call[b]/tb/1e6
return flops
local_time = self.local_time[0]-other.local_time[0] local_time = self.local_time[0]-other.local_time[0]
compile_time = self.compile_time-other.compile_time compile_time = self.compile_time-other.compile_time
fct_call_time = diff_dict(self.fct_call_time,other.fct_call_time) fct_call_time = diff_dict(self.fct_call_time,other.fct_call_time)
fct_call = diff_dict(self.fct_call,other.fct_call) fct_call = diff_dict(self.fct_call,other.fct_call)
apply_time = diff_dict(self.apply_time, other.apply_time) apply_time = diff_dict(self.apply_time, other.apply_time)
apply_call = diff_dict(self.apply_call, other.apply_call) apply_call = diff_dict(self.apply_call, other.apply_call)
op_time = diff_dict(self.op_time, other.op_time)
op_call = diff_dict(self.op_call, other.op_call)
op_cimpl = self.op_cimpl and other.op_cimpl op_cimpl = self.op_cimpl and other.op_cimpl
op_flops = diff_dict_flops(self.op_time, other.op_time, self.op_call, other.op_call)
self.print_summary_("print_diff_summary",local_time, compile_time, fct_call_time, fct_call, self.print_summary_("print_diff_summary",local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl, apply_time, apply_call, op_cimpl,
op_flops, n_apply_to_print=n_apply_to_print, n_apply_to_print=n_apply_to_print,
n_ops_to_print=n_ops_to_print, print_apply=False) n_ops_to_print=n_ops_to_print, print_apply=False)
@staticmethod @staticmethod
def print_summary_(fct_name, local_time, compile_time, fct_call_time, fct_call, def print_summary_(fct_name, local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl, apply_time, apply_call, op_cimpl,
op_flops=None, n_apply_to_print=15, n_ops_to_print=20, print_apply=True): n_apply_to_print=15, n_ops_to_print=20, print_apply=True):
""" """
do the actual printing of print_summary and print_diff_summary. do the actual printing of print_summary and print_diff_summary.
...@@ -218,6 +204,19 @@ class ProfileMode(Mode): ...@@ -218,6 +204,19 @@ class ProfileMode(Mode):
sum(f for f, t, a, nb_call in atimes[n_apply_to_print:])*100, sum(f for f, t, a, nb_call in atimes[n_apply_to_print:])*100,
sum(t for f, t, a, nb_call in atimes[n_apply_to_print:])) sum(t for f, t, a, nb_call in atimes[n_apply_to_print:]))
op_time = {}
op_call = {}
for (i,a),t in apply_time.items():
op_time.setdefault(a,0)
op_call.setdefault(a,0)
op_time[a]+=t
op_call[a]+=apply_call[(i,a)]
op_flops = {}
for a,t in op_time.items():
if hasattr(a,'flops'):
op_flops[a]=a.flops*op_call[a]/t/1e6
flops_msg='' flops_msg=''
if op_flops: if op_flops:
flops_msg=' <MFlops/s>' flops_msg=' <MFlops/s>'
......
...@@ -544,35 +544,20 @@ class Test_check_isfinite(unittest.TestCase): ...@@ -544,35 +544,20 @@ class Test_check_isfinite(unittest.TestCase):
theano.tensor.TensorType.filter_checks_isfinite = self.old_val theano.tensor.TensorType.filter_checks_isfinite = self.old_val
def test_check_isfinite(self): def test_check_isfinite(self):
x = theano.tensor.dvector() x = theano.tensor.vector()
f = theano.function([x], (x+2) * 5, mode='DEBUG_MODE') f = theano.function([x], (x+2) * 5, mode='DEBUG_MODE')
g = theano.function([x], theano.tensor.log(x), mode='DEBUG_MODE')
# this should work # this should work
f(numpy.log([3, 4, 5])) f(numpy.log([3, 4, 5]))
# this should raise InvalidValueError # passing an invalid value as an input should trigger ValueError
try: self.failUnlessRaises(ValueError, f, numpy.log([3, -4, 5]))
# insert a NaN self.failUnlessRaises(ValueError, f, numpy.asarray([0, 1.0, 0])/0)
f(numpy.log([3, -4, 5])) self.failUnlessRaises(ValueError, f, numpy.asarray([1.0, 1.0, 1.0])/0)
assert False
except debugmode.InvalidValueError:
pass
# this should raise InvalidValueError
try:
# insert an Nan and Inf
f(numpy.asarray([0, 1.0, 0])/0)
assert False
except debugmode.InvalidValueError:
pass
# this should raise InvalidValueError # generating an invalid value internally should trigger InvalidValueError
try: self.failUnlessRaises(debugmode.InvalidValueError, g, [3,-4,5])
# insert several Inf
f(numpy.asarray([1.0, 1.0, 1.0])/0)
assert False
except debugmode.InvalidValueError:
pass
# this should disable the exception # this should disable the exception
theano.tensor.TensorType.filter_checks_isfinite = False theano.tensor.TensorType.filter_checks_isfinite = False
......
...@@ -14,11 +14,12 @@ THEANO_FLAGS=os.getenv("THEANO_FLAGS","") ...@@ -14,11 +14,12 @@ THEANO_FLAGS=os.getenv("THEANO_FLAGS","")
# [section.]option[=value] entries. If the section part is omited, their should be only one # [section.]option[=value] entries. If the section part is omited, their should be only one
# section with that contain the gived option. # section with that contain the gived option.
# THEANORC=~/.theanorc:~lisa/.theanorc # THEANORC can contain a colon-delimited list of config files, like
# THEANORC=~lisa/.theanorc:~/.theanorc
# In that case, definitions in files on the right (here, ~/.theanorc) have
# precedence over those in files on the left.
def config_files_from_theanorc(): def config_files_from_theanorc():
rval = [os.path.expanduser(s) for s in os.getenv('THEANORC', '~/.theanorc').split(':')] rval = [os.path.expanduser(s) for s in os.getenv('THEANORC', '~/.theanorc').split(':')]
rval.reverse()
print "THEANORC", rval
return rval return rval
theano_cfg = ConfigParser.SafeConfigParser() theano_cfg = ConfigParser.SafeConfigParser()
theano_cfg.read(config_files_from_theanorc()) theano_cfg.read(config_files_from_theanorc())
...@@ -42,14 +43,15 @@ def fetch_val_for_key(key): ...@@ -42,14 +43,15 @@ def fetch_val_for_key(key):
"""Return the overriding config value for a key. """Return the overriding config value for a key.
A successful search returs a string value. A successful search returs a string value.
An unsuccessful search raises a KeyError An unsuccessful search raises a KeyError
The priority order is: The (decreasing) priority order is:
- THEANO_FLAGS - THEANO_FLAGS
- ~./theanorc - ~./theanorc
""" """
# first try to find it in the FLAGS # first try to find it in the FLAGS
rval = None
for name_val in THEANO_FLAGS.split(','): for name_val in THEANO_FLAGS.split(','):
if not name_val: if not name_val:
continue continue
...@@ -60,7 +62,12 @@ def fetch_val_for_key(key): ...@@ -60,7 +62,12 @@ def fetch_val_for_key(key):
name, val = name_val_tuple name, val = name_val_tuple
if name == key: if name == key:
return val # rval might be overriden by a later definition in THEANO_FLAGS
rval = val
# If an rval is found, it should be a string
if rval is not None:
return rval
# next try to find it in the config file # next try to find it in the config file
...@@ -77,7 +84,7 @@ def fetch_val_for_key(key): ...@@ -77,7 +84,7 @@ def fetch_val_for_key(key):
return theano_cfg.get(section, option) return theano_cfg.get(section, option)
except (ConfigParser.NoOptionError, ConfigParser.NoSectionError): except (ConfigParser.NoOptionError, ConfigParser.NoSectionError):
raise KeyError(key) raise KeyError(key)
class TheanoConfigParser(object): class TheanoConfigParser(object):
#properties are installed by AddConfigVar #properties are installed by AddConfigVar
...@@ -143,7 +150,7 @@ class ConfigParam(object): ...@@ -143,7 +150,7 @@ class ConfigParam(object):
self.val = val self.val = val
deleter=None deleter=None
class EnumStr(ConfigParam): class EnumStr(ConfigParam):
def __init__(self, default, *options): def __init__(self, default, *options):
self.default = default self.default = default
......
...@@ -222,7 +222,7 @@ class PureType(object): ...@@ -222,7 +222,7 @@ class PureType(object):
try: try:
self.filter(a, True) self.filter(a, True)
return True return True
except TypeError: except (TypeError, ValueError):
return False return False
def make_variable(self, name = None): def make_variable(self, name = None):
......
...@@ -18,18 +18,22 @@ def _asarray(a, dtype=None, order=None): ...@@ -18,18 +18,22 @@ def _asarray(a, dtype=None, order=None):
Currently, this issue has only been causing trouble when the target Currently, this issue has only been causing trouble when the target
data type is 'int32', on some computers. As a result, this is the only data type is 'int32', on some computers. As a result, this is the only
situation where we do more than a simple call to ``numpy.asarray``. If it situation where we may do more than a simple call to ``numpy.asarray``. If
turns out that a similar problem can occur for more data type, this it turns out that a similar problem can occur for more data type, this
function should be updated accordingly. function should be updated accordingly.
This function's name starts with a '_' to indicate that it is meant to be This function's name starts with a '_' to indicate that it is meant to be
used internally. It is imported so as to be available directly through used internally. It is imported so as to be available directly through
theano._asarray theano._asarray
""" """
dtype = numpy.dtype(dtype) # Convert into dtype object.
rval = numpy.asarray(a, dtype=dtype, order=order) rval = numpy.asarray(a, dtype=dtype, order=order)
if dtype is numpy.int32 or dtype == 'int32': numpy_int32 = numpy.dtype(numpy.int32)
# Make sure the type is properly set to the correct type. if (dtype is numpy_int32 and rval.dtype is not numpy_int32):
return rval.view(dtype=numpy.int32) # Enfore the numpy.int32 dtype.
return rval.view(dtype=numpy_int32)
else: else:
# Using ``numpy.asarray`` should work just fine. # Using ``numpy.asarray`` should work just fine.
# Debug assert if we want to detect other failure cases (untested):
# assert rval.dtype is dtype
return rval return rval
...@@ -5,7 +5,17 @@ from theano import gof, Op, tensor, config ...@@ -5,7 +5,17 @@ from theano import gof, Op, tensor, config
from theano.printing import Print from theano.printing import Print
def getFilterOutShp(inshp, kshp, (dx,dy)=(1,1), mode='valid'): def getFilterOutShp(inshp, kshp, (dx,dy)=(1,1), mode='valid'):
"""Returns numpy ndarray of len 2 """
Computes the shape (nb_rows, nb_col) of each output image.
:type inshp: tuple, list or 1D ndarray of length 2
:param inshp: shape of each (2D) input image
:type kshp: tuple, list or 1D ndarray of length 2
:param kshp: shape of each (2D) kernel filter
:type mode: string
:param mode: 'valid' or 'full' (see 'border_mode' in conv2d's doc)
:rtype: numpy 1D ndarray of len 2
:return: shape of each output "image" (or feature map)
""" """
if mode=='valid': s = -1 if mode=='valid': s = -1
else: s = 1 else: s = 1
...@@ -28,10 +38,12 @@ def conv2d(input, filters, border_mode='valid', subsample=(1,1), ...@@ -28,10 +38,12 @@ def conv2d(input, filters, border_mode='valid', subsample=(1,1),
:param filters: tensor containing filters for convolutional neural net. :param filters: tensor containing filters for convolutional neural net.
Indexing is: (filter, filter input feature map, filter row, filter col). Indexing is: (filter, filter input feature map, filter row, filter col).
:type border_mode: string :type border_mode: string
:param border_mode:'valid'(only apply kernel over complete patch of the image) :param border_mode:'valid'(only apply kernel over complete patch of the image) or
or 'full'(padd the image with 0 and apply the kernel over all full patch and partial patch of the image 'full'(padd the image with 0 and apply the kernel over all full patch and partial patch of
the image
:type subsample: tuple of len 2 :type subsample: tuple of len 2
:param subsample: how many pixel we move in the (row,col) direction of the image when we change of patch :param subsample: how many pixel we move in the (row,col) direction of the image when we
change of patch
:type image_shape: tuple of len 4 :type image_shape: tuple of len 4
:param image_shape: (batch size, stack size, nb row, nb col) :param image_shape: (batch size, stack size, nb row, nb col)
:type filter_shape: tuple of len 4 :type filter_shape: tuple of len 4
...@@ -60,18 +72,18 @@ def conv2d(input, filters, border_mode='valid', subsample=(1,1), ...@@ -60,18 +72,18 @@ def conv2d(input, filters, border_mode='valid', subsample=(1,1),
class ConvOp(Op): class ConvOp(Op):
""" """
A convolution op that should extend scipy.signal.convolve2d, but much faster! A convolution op that should behave like scipy.signal.convolve2d,
but much faster!
""" """
__attrnames = ['imshp', 'kshp', 'nkern', 'bsize', 'dx', 'dy', 'out_mode', __attrnames = ['imshp', 'kshp', 'nkern', 'bsize', 'dx', 'dy', 'out_mode',
'unroll_batch', 'unroll_kern', 'unroll_patch', 'unroll_batch', 'unroll_kern', 'unroll_patch',
'imshp_logical', 'kshp_logical', 'kshp_logical_top_aligned'] 'imshp_logical', 'kshp_logical', 'kshp_logical_top_aligned']
"""These attributes uniquely identify the behaviour of this op for given inputs""" """These attributes uniquely identify the behaviour of this op for given inputs"""
def __init__(self, imshp=None, kshp=None, nkern=None, bsize=None, dx=None, dy=None, output_mode='valid', def __init__(self, imshp=None, kshp=None, nkern=None, bsize=None,
unroll_batch=0, dx=None, dy=None,
output_mode='valid', unroll_batch=0,
unroll_kern=0, unroll_kern=0,
unroll_patch=True, unroll_patch=True,
imshp_logical=None, imshp_logical=None,
...@@ -80,7 +92,12 @@ class ConvOp(Op): ...@@ -80,7 +92,12 @@ class ConvOp(Op):
verbose=0, verbose=0,
version=-1): version=-1):
""" """
This Op implement the convolution of a kernel(tensor 4d,(nkern, stacksize, nb row, nb col)) on an image(tensor 4d, (batchsize, stacksize, nb row, nb col). The batch size is multiple image that we want to apply the same kernel over. The nkern is numtiple kernel that we want to apply to each image. The stack size is mostly used when their is multiple layer in the network. It is the sum of the convolution of multiple 2d image and kernel. This Op implement the convolution of a kernel(tensor 4d,(nkern, stacksize, nb row, nb
col)) on an image(tensor 4d, (batchsize, stacksize, nb row, nb col). The batch size is
multiple image that we want to apply the same kernel over. The nkern is numtiple kernel
that we want to apply to each image. The stack size is mostly used when their is
multiple layer in the network. It is the sum of the convolution of multiple 2d image
and kernel.
The reason that this op does the summation over convolutions within the 'stack' is that The reason that this op does the summation over convolutions within the 'stack' is that
it allows us to be memory-efficient about how gradients are calculated. If, for it allows us to be memory-efficient about how gradients are calculated. If, for
...@@ -89,14 +106,22 @@ class ConvOp(Op): ...@@ -89,14 +106,22 @@ class ConvOp(Op):
point) then we would have to sum over a potentially very large tensor to get the point) then we would have to sum over a potentially very large tensor to get the
gradient on the filters. gradient on the filters.
If the imshp, kshp, nkern and bsize are provided, we can generate more optimal code.
If the imshp, kshp, nkern and bsize are provided, we can generate more optimal code. This make a significant difference for the full mode with unroll_patch version. This make a significant difference for the full mode with unroll_patch version. The
The most frequent faster code currently available on 64_x86 computer is unroll_batch=4, unroll_kern=4, unroll_patch=False and this request that all the optional shape information are gived. Those number are empirically tested and backed up by the article: Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12, May 2008. It is in figure 12, it give the value mr x nr, those value are the optimum to use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other architecture can have different value.(2x4 for x86, 8x8 for itanium,...) most frequent faster code currently available on 64_x86 computer is unroll_batch=4,
unroll_kern=4, unroll_patch=False and this request that all the optional shape
information are gived. Those number are empirically tested and backed up by the
article: Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert
A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12,
May 2008. It is in figure 12, it give the value mr x nr, those value are the optimum to
use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other
architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
:type out_mode: string :type out_mode: string
:param out_mode: 'valid'(give an output smaller then the image, 'full'(give an output bigger then the image) :param out_mode: 'valid'(give an output smaller then the image, 'full'(give an output
bigger then the image)
optional parameter(if provided will be used to generate more optinal c code): optional parameters: (will generate more optimal c code)
:type imshp: tuple of len 2 or 3: 2 for 2d image, 3 for a stack of 2d images. :type imshp: tuple of len 2 or 3: 2 for 2d image, 3 for a stack of 2d images.
:param imshp: (stacksize, nb image row, nb image col) :param imshp: (stacksize, nb image row, nb image col)
...@@ -113,13 +138,17 @@ class ConvOp(Op): ...@@ -113,13 +138,17 @@ class ConvOp(Op):
param to select the version of code used: param to select the version of code used:
:type unroll_patch: bool :type unroll_patch: bool
:param unroll_patch: use a version of c_code that unroll the patch loop that don't request all shape information to work, but if all shape information are present, will use it to hardcode the value in the code for faster code. :param unroll_patch: use a version of c_code that unroll the patch loop that don't
request all shape information to work, but if all shape information are present, will
use it to hardcode the value in the code for faster code.
:type unroll_batch:int :type unroll_batch:int
:param unroll_batch: use a version of c_code that unroll the batch(by unroll_batch) and the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern respectively. :param unroll_batch: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
respectively.
:type unroll_kern:int :type unroll_kern:int
:param unroll_kern: use a version of c_code that unroll the batch(by unroll_batch) and the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern respectively. :param unroll_kern: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
respectively.
:type verbose: int :type verbose: int
:param verbose: passed to GpuConv :param verbose: passed to GpuConv
:type version: int :type version: int
...@@ -130,26 +159,34 @@ class ConvOp(Op): ...@@ -130,26 +159,34 @@ class ConvOp(Op):
:param kshp_logical_top_aligned: idem :param kshp_logical_top_aligned: idem
""" """
all_shape = imshp is not None and kshp is not None and nkern is not None and bsize is not None all_shape = imshp is not None and kshp is not None and \
nkern is not None and bsize is not None
if (unroll_batch>0 or unroll_kern>0) and not all_shape: if (unroll_batch>0 or unroll_kern>0) and not all_shape:
raise Exception("In ConvOp, when using unroll_batch and unroll_nkern, all shape are needed") raise Exception("In ConvOp, when using unroll_batch and unroll_nkern, all shape are needed")
if not all_shape and (imshp is not None or kshp is not None or nkern is not None or bsize is not None):
print "OPTIMISATION WARNING: passing only a few shape to ConvOp for faster code is useless. We use all of them or none." if not all_shape and (imshp is not None or kshp is not None \
or nkern is not None or bsize is not None):
print "OPTIMISATION WARNING: passing only a few shape to ConvOp "\
"for faster code is useless. We use all of them or none."
if not all_shape: if not all_shape:
unroll_patch = True unroll_patch = True
if imshp is not None: if imshp is not None:
imshp = tuple(imshp) imshp = tuple(imshp)
if len(imshp)==2: if len(imshp)==2:
imshp = (1,)+imshp imshp = (1,)+imshp
elif len(imshp)==3: elif len(imshp)==3:
imshp = imshp imshp = imshp
else: else:
raise Exception("bad len for imshp") raise Exception("bad len for imshp")
self.imshp = imshp self.imshp = imshp
if kshp is not None: if kshp is not None:
kshp = tuple(kshp) kshp = tuple(kshp)
self.kshp = kshp self.kshp = kshp
self.nkern = nkern self.nkern = nkern
self.bsize=bsize self.bsize=bsize
...@@ -157,10 +194,12 @@ class ConvOp(Op): ...@@ -157,10 +194,12 @@ class ConvOp(Op):
self.dy=dy self.dy=dy
self.verbose=verbose self.verbose=verbose
self.version=version self.version=version
# a triple # a triple
self.imshp_logical = self.imshp self.imshp_logical = self.imshp
if imshp_logical is not None: self.imshp_logical = tuple(imshp_logical) if imshp_logical is not None: self.imshp_logical = tuple(imshp_logical)
assert (self.imshp is None and self.imshp_logical is None) or (len(self.imshp) == len(self.imshp_logical)) assert (self.imshp is None and self.imshp_logical is None) or \
(len(self.imshp) == len(self.imshp_logical))
# a pair # a pair
self.kshp_logical = self.kshp self.kshp_logical = self.kshp
...@@ -172,6 +211,7 @@ class ConvOp(Op): ...@@ -172,6 +211,7 @@ class ConvOp(Op):
self.unroll_patch=unroll_patch self.unroll_patch=unroll_patch
if self.unroll_batch>0 and self.bsize % self.unroll_batch!=0: if self.unroll_batch>0 and self.bsize % self.unroll_batch!=0:
if self.bsize<=self.unroll_batch: if self.bsize<=self.unroll_batch:
self.unroll_batch = self.bsize self.unroll_batch = self.bsize
else: else:
...@@ -181,9 +221,15 @@ class ConvOp(Op): ...@@ -181,9 +221,15 @@ class ConvOp(Op):
while self.bsize % new!=0: while self.bsize % new!=0:
new-=1 new-=1
print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_batch(%s) must be 0 or a divisor of bsize(%s). We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_batch),str(self.bsize),new) print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_batch(%s)"\
"must be 0 or a divisor of bsize(%s). We revert it to %d. This"\
"won't change the result, but may make it slower."%\
(str(self.unroll_batch),str(self.bsize),new)
self.unroll_batch=new self.unroll_batch=new
if self.unroll_kern>0 and self.nkern % unroll_kern!=0: if self.unroll_kern>0 and self.nkern % unroll_kern!=0:
if self.nkern<=self.unroll_kern: if self.nkern<=self.unroll_kern:
self.unroll_kern = self.nkern self.unroll_kern = self.nkern
else: else:
...@@ -192,22 +238,29 @@ class ConvOp(Op): ...@@ -192,22 +238,29 @@ class ConvOp(Op):
assert(new>=1) assert(new>=1)
while self.nkern % new!=0: while self.nkern % new!=0:
new-=1 new-=1
print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s) should be 0 or a divisor of nkern(%s)We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_kern),str(self.nkern),new) print "OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s)"\
"should be 0 or a divisor of nkern(%s)We revert it to %d."\
"This won't change the result, but may make it slower."\
%(str(self.unroll_kern),str(self.nkern),new)
self.unroll_kern=new self.unroll_kern=new
if all_shape: if all_shape:
self.outshp = getFilterOutShp(self.imshp_logical, self.kshp_logical, (dx,dy), output_mode) self.outshp = getFilterOutShp(self.imshp_logical, self.kshp_logical, (dx,dy), output_mode)
self.fulloutshp = getFilterOutShp(self.imshp_logical, self.kshp_logical, (1,1), output_mode) self.fulloutshp = getFilterOutShp(self.imshp_logical, self.kshp_logical, (1,1), output_mode)
else: else:
self.outshp = None self.outshp = None
self.fulloutshp = None self.fulloutshp = None
self.out_mode = output_mode self.out_mode = output_mode
if not self.out_mode in ["valid", "full"]: if not self.out_mode in ["valid", "full"]:
raise Exception("Mode %s not implemented"%self.out_mode) raise Exception("Mode %s not implemented"%self.out_mode)
if all_shape and not (self.outshp > 0).all(): if all_shape and not (self.outshp > 0).all():
raise Exception(("Bad size for the output shape. Verify that [post-supersampling] input shape (%s)" raise Exception(("Bad size for the output shape. Verify that [post-"\
"and kern shape(%s) are ok. (hint: kerns must fit inside image in" "supersampling] input shape (%s) and kern shape(%s) are ok. "\
"'valid' mode)")%(self.imshp_logical,self.kshp_logical)) "(Hint: kerns must fit inside image in valid mode)")%
(self.imshp_logical,self.kshp_logical))
self._rehash() self._rehash()
if config.op.set_flops: if config.op.set_flops:
...@@ -244,11 +297,16 @@ class ConvOp(Op): ...@@ -244,11 +297,16 @@ class ConvOp(Op):
self.flops*=self.outshp[0]*self.outshp[1]#nb flops by output image self.flops*=self.outshp[0]*self.outshp[1]#nb flops by output image
self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0] self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0]
else: #full mode not implemented else: #full mode not implemented
self.flops=0 self.flops=0
for out_row in range(self.outshp[0]):#loop over output row for out_row in range(self.outshp[0]):#loop over output row
for out_col in range(self.outshp[0]):#loop over output col for out_col in range(self.outshp[0]):#loop over output col
for row in range(self.kshp[0]):#loop over kern row for row in range(self.kshp[0]):#loop over kern row
if row+out_row-self.kshp[0]+1<0 or row+out_row-self.kshp[0]+1>=self.imshp[1]: continue
if (row+out_row-self.kshp[0]+1<0 or
row+out_row-self.kshp[0]+1>=self.imshp[1]):
continue
col=0 col=0
max_col=self.kshp[1] max_col=self.kshp[1]
img_col=out_col-self.kshp[1]+1 img_col=out_col-self.kshp[1]+1
...@@ -263,7 +321,8 @@ class ConvOp(Op): ...@@ -263,7 +321,8 @@ class ConvOp(Op):
self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0] self.flops*=self.imshp[0]*self.nkern*self.bsize#for all outputs images#n_stack==self.imshp[0]
assert self.flops==self.bsize * self.nkern * self.imshp[0] * self.kshp[0] * self.kshp[1] * self.imshp[1] * self.imshp[2] * 2 assert self.flops == self.bsize * self.nkern * self.imshp[0] * \
self.kshp[0] * self.kshp[1] * self.imshp[1] * self.imshp[2] * 2
def make_node(self, inputs, kerns): def make_node(self, inputs, kerns):
# TODO: find a way to make ConvOp work for N-D (after NIPS09) # TODO: find a way to make ConvOp work for N-D (after NIPS09)
...@@ -375,21 +434,25 @@ class ConvOp(Op): ...@@ -375,21 +434,25 @@ class ConvOp(Op):
def grad(self, (inputs, kerns), (gz,)): def grad(self, (inputs, kerns), (gz,)):
""" """
In development. Works for test cases in test_sp.py In development. Works for test cases in test_sp.py
A few known issues: WARNING: a few known issues:
* doesn't work for rectangular images or filters * doesn't work for rectangular images or filters
* inputs needs to be a 4D tensor. Couldn't get 3D to work * inputs needs to be a 4D tensor. Couldn't get 3D to work
* will crash if filter the same size as input image * will crash if filter the same size as input image
""" """
if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical: if self.imshp != self.imshp_logical or self.kshp != self.kshp_logical:
raise NotImplementedError('todo') raise NotImplementedError('todo')
if self.dx!=1 or self.dy!=1: if self.dx!=1 or self.dy!=1:
raise Exception("ERROR: We disable ConvOp.grad now when dx!=1 or dy!=1 as we think their is a high probability of bug in it. We need to raise the error on the gradient to .1!") raise Exception("ERROR: We disable ConvOp.grad now when dx!=1 or "\
"dy!=1 as we think their is a high probability of bug in it."\
"We need to raise the error on the gradient to .1!")
all_shape = self.imshp is not None and self.kshp is not None and self.nkern is not None and self.bsize is not None all_shape = self.imshp is not None and self.kshp is not None and \
self.nkern is not None and self.bsize is not None
if not all_shape and (self.dx!=1 or self.dy!=1): if not all_shape and (self.dx!=1 or self.dy!=1):
raise Exception("ConvOp.grad when dx!=1 or dy!=1 we must have all the optional shape information") raise Exception("ConvOp.grad when dx!=1 or dy!=1 we must have all "\
"the optional shape information")
grad_hack_necessary = False grad_hack_necessary = False
if grad_hack_necessary: if grad_hack_necessary:
...@@ -411,6 +474,7 @@ class ConvOp(Op): ...@@ -411,6 +474,7 @@ class ConvOp(Op):
kshp = None kshp = None
un_p = self.unroll_patch un_p = self.unroll_patch
imshp_logical = None imshp_logical = None
if self.out_mode == 'valid': if self.out_mode == 'valid':
(img, filters) = (newin, newgz) (img, filters) = (newin, newgz)
kshp_logical = self.fulloutshp kshp_logical = self.fulloutshp
...@@ -445,13 +509,17 @@ class ConvOp(Op): ...@@ -445,13 +509,17 @@ class ConvOp(Op):
un_b = bsize un_b = bsize
else: else:
un_b = 1 un_b = 1
print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine a good unroll value for the batch. Maybe you can optimize this!", bsize, un_b, self.unroll_batch, self.unroll_kern print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the batch. Maybe you can optimize this!",\
bsize, un_b, self.unroll_batch, self.unroll_kern
if un_k!=0 and nkern%un_k!=0: if un_k!=0 and nkern%un_k!=0:
if nkern<un_k: if nkern<un_k:
un_k = nkern un_k = nkern
else: else:
un_k = 1 un_k = 1
print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine a good unroll value for the kernel. Maybe you can optimize this!" print "OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the kernel. Maybe you can optimize this!"
dw = ConvOp(imshp, kshp, nkern, bsize, 1,1, output_mode='valid', dw = ConvOp(imshp, kshp, nkern, bsize, 1,1, output_mode='valid',
unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p, unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p,
...@@ -460,9 +528,12 @@ class ConvOp(Op): ...@@ -460,9 +528,12 @@ class ConvOp(Op):
kshp_logical_top_aligned=kshp_logical_top_aligned, kshp_logical_top_aligned=kshp_logical_top_aligned,
version=self.version, version=self.version,
verbose=self.verbose) verbose=self.verbose)
if hasattr(self,'flops'): if hasattr(self,'flops'):
dw.set_flops() dw.set_flops()
dw = dw(img,filters) dw = dw(img,filters)
if all_shape: if all_shape:
assert (dw.owner.op.outshp==self.kshp).all() assert (dw.owner.op.outshp==self.kshp).all()
if self.out_mode == 'valid': if self.out_mode == 'valid':
...@@ -472,18 +543,21 @@ class ConvOp(Op): ...@@ -472,18 +543,21 @@ class ConvOp(Op):
####### Determine gradient on inputs ######## ####### Determine gradient on inputs ########
mode = 'valid' mode = 'valid'
if not self.out_mode == 'full': mode = 'full' if not self.out_mode == 'full':
mode = 'full'
filters = kerns.dimshuffle((1,0,2,3)) filters = kerns.dimshuffle((1,0,2,3))
filters = filters[:,:,::-1,::-1] filters = filters[:,:,::-1,::-1]
nkern = None nkern = None
imshp = None imshp = None
imshp_logical = None imshp_logical = None
kshp = None kshp = None
if all_shape: if all_shape:
nkern = self.imshp[0] nkern = self.imshp[0]
imshp = (self.nkern, self.outshp[0], self.outshp[1]) imshp = (self.nkern, self.outshp[0], self.outshp[1])
imshp_logical=(self.nkern, self.fulloutshp[0], self.fulloutshp[1]) imshp_logical=(self.nkern, self.fulloutshp[0], self.fulloutshp[1])
#print 'din', imshp, self.kshp, nkern
din = ConvOp(imshp, self.kshp, nkern, self.bsize, din = ConvOp(imshp, self.kshp, nkern, self.bsize,
1,1, output_mode=mode, 1,1, output_mode=mode,
unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p, unroll_batch=un_b, unroll_kern=un_k, unroll_patch=un_p,
...@@ -491,10 +565,14 @@ class ConvOp(Op): ...@@ -491,10 +565,14 @@ class ConvOp(Op):
kshp_logical=None, kshp_logical=None,
version=-1,#we we change the mode, we don't forward the version. version=-1,#we we change the mode, we don't forward the version.
verbose=self.verbose) verbose=self.verbose)
if hasattr(self,'flops'): if hasattr(self,'flops'):
din.set_flops() din.set_flops()
din = din(gz,filters) din = din(gz,filters)
assert (din.owner.op.outshp is None and self.imshp is None) or (din.owner.op.outshp==self.imshp[1:]).all()
assert (din.owner.op.outshp is None and self.imshp is None) or \
(din.owner.op.outshp==self.imshp[1:]).all()
return [din, dw] return [din, dw]
def c_headers(self): def c_headers(self):
...@@ -512,8 +590,10 @@ class ConvOp(Op): ...@@ -512,8 +590,10 @@ class ConvOp(Op):
#define MOD % #define MOD %
using namespace std; using namespace std;
""" + tensor.blas.blas_header_text() """ + tensor.blas.blas_header_text()
def c_libraries(self): def c_libraries(self):
return tensor.blas.ldflags() return tensor.blas.ldflags()
def c_code(self, node, name, (img2d, filtersflipped), (z, ), sub): def c_code(self, node, name, (img2d, filtersflipped), (z, ), sub):
if node.inputs[0].type.dtype != node.inputs[1].type.dtype: if node.inputs[0].type.dtype != node.inputs[1].type.dtype:
raise NotImplementedError() raise NotImplementedError()
...@@ -521,7 +601,8 @@ using namespace std; ...@@ -521,7 +601,8 @@ using namespace std;
d=locals() d=locals()
d.update(sub) d.update(sub)
all_shape = self.imshp is not None and self.kshp is not None and self.nkern is not None and self.bsize is not None all_shape = self.imshp is not None and self.kshp is not None and \
self.nkern is not None and self.bsize is not None
d["self_out_mode"]=self.out_mode d["self_out_mode"]=self.out_mode
d["self_dx"]=self.dx d["self_dx"]=self.dx
...@@ -587,7 +668,7 @@ using namespace std; ...@@ -587,7 +668,7 @@ using namespace std;
if self.unroll_patch: if self.unroll_patch:
if self.verbose: if self.verbose:
print "return unroll patch version",self.dx,self.dy print "return unroll patch version. all_shape=", all_shape
return _conv_op_code_unroll_patch%d return _conv_op_code_unroll_patch%d
if self.unroll_batch>0 or self.unroll_kern>0: if self.unroll_batch>0 or self.unroll_kern>0:
if self.unroll_batch<=0: self.unroll_batch=1 if self.unroll_batch<=0: self.unroll_batch=1
...@@ -607,44 +688,6 @@ using namespace std; ...@@ -607,44 +688,6 @@ using namespace std;
print "return no gemm version" print "return no gemm version"
return _conv_op_code_a % d return _conv_op_code_a % d
def convolve2(kerns, kshp, nkern, images, imshp, bsize, step=(1,1),
bias=None, mode='valid', **d):
"""
param kerns: kernel tensor
param kshp: tuple(kern row, kern wid)
param nkern: int the number of kernel
param images:image tensor
param imshp: tuple([stack size,] image row, image wid)
param bsize: batch size
param step: subsampling to apply to the output tuple(row, wid)
param bias: if True, will add a bias
param mode: 'valid' or 'full'
return: tuple(theano graph with the output of ConvOp flattened to 2 dimensions, ?)
"""
#TODO: remove the bias argument from this function because convolution has nothing to do with a bias
# if imshp, is a tuple, images contains one input dimension
if len(imshp)!=3:
nvis_dim = 1
else: nvis_dim = imshp[0]
# all these reshapes should happen in place
imrshp = tensor.as_tensor([bsize] + list(imshp))
imtensor = tensor.reshape(images, imrshp)
kernrshp = tensor.as_tensor([nkern, nvis_dim] + list(kshp))
kerntensor = tensor.reshape(kerns, kernrshp)
convop = ConvOp(imshp, kshp, nkern, bsize, step[0], step[1],
output_mode=mode, **d)
convout = convop(imtensor, kerntensor)
if bias:
biastensor = tensor.DimShuffle((False,), ('x',0,'x','x'), inplace=True)(bias)
convout = convout + biastensor
rval = tensor.flatten(convout, 2)
return rval, N.hstack((nkern, convop.outshp))
_conv_op_code_a = """ _conv_op_code_a = """
const int mode=%(mode)s; const int mode=%(mode)s;
......
...@@ -5,14 +5,15 @@ from theano import config ...@@ -5,14 +5,15 @@ from theano import config
import logging, copy import logging, copy
_logger_name = 'theano.sandbox.cuda' _logger_name = 'theano.sandbox.cuda'
_logger = logging.getLogger(_logger_name) _logger = logging.getLogger(_logger_name)
_logger.setLevel(logging.INFO) _logger.setLevel(logging.WARNING)
_logger.addHandler(logging.StreamHandler()) def error(*msg):
_logger.warning('ERROR (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def warning(*msg): def warning(*msg):
_logger.warning(_logger_name+'WARNING: '+' '.join(str(m) for m in msg)) _logger.warning('WARNING (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def info(*msg): def info(*msg):
_logger.info(_logger_name+'INFO: '+' '.join(str(m) for m in msg)) _logger.warning('INFO (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def debug(*msg): def debug(*msg):
_logger.debug(_logger_name+'DEBUG: '+' '.join(str(m) for m in msg)) _logger.warning('DEBUG (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
# Compile cuda_ndarray.cu # Compile cuda_ndarray.cu
...@@ -63,23 +64,32 @@ if not compile_cuda_ndarray: ...@@ -63,23 +64,32 @@ if not compile_cuda_ndarray:
except ImportError: except ImportError:
compile_cuda_ndarray = True compile_cuda_ndarray = True
if compile_cuda_ndarray: try:
import nvcc_compiler if compile_cuda_ndarray:
if not nvcc_compiler.is_nvcc_available(): import nvcc_compiler
set_cuda_disabled() if not nvcc_compiler.is_nvcc_available():
set_cuda_disabled()
if enable_cuda: if enable_cuda:
code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read() code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read()
if not os.path.exists(cuda_ndarray_loc): if not os.path.exists(cuda_ndarray_loc):
os.makedirs(cuda_ndarray_loc) os.makedirs(cuda_ndarray_loc)
nvcc_compiler.nvcc_module_compile_str('cuda_ndarray', code, location = cuda_ndarray_loc, nvcc_compiler.nvcc_module_compile_str('cuda_ndarray', code, location = cuda_ndarray_loc,
include_dirs=[cuda_path], libs=['cublas']) include_dirs=[cuda_path], libs=['cublas'])
from cuda_ndarray.cuda_ndarray import * from cuda_ndarray.cuda_ndarray import *
except Exception, e:
error( "Failed to compile cuda_ndarray.cu: %s" % str(e))
set_cuda_disabled()
if enable_cuda: if enable_cuda:
#check if their is an old cuda_ndarray that was loading instead of the one we compiled!
import cuda_ndarray.cuda_ndarray
if os.path.join(config.compiledir,'cuda_ndarray','cuda_ndarray.so')!=cuda_ndarray.cuda_ndarray.__file__:
_logger.warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!")
from theano.sandbox.cuda.type import CudaNdarrayType from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda.var import (CudaNdarrayVariable, from theano.sandbox.cuda.var import (CudaNdarrayVariable,
CudaNdarrayConstant, CudaNdarrayConstant,
...@@ -103,7 +113,7 @@ def use(device=config.device): ...@@ -103,7 +113,7 @@ def use(device=config.device):
raise ValueError("Invalid device identifier", device) raise ValueError("Invalid device identifier", device)
if use.device_number is None: if use.device_number is None:
# No successful call to use() has been made yet # No successful call to use() has been made yet
if device=="-1" or device=="CPU": if device<0:
return return
if device in [None,""]: if device in [None,""]:
device=0 device=0
...@@ -134,6 +144,5 @@ def handle_shared_float32(tf): ...@@ -134,6 +144,5 @@ def handle_shared_float32(tf):
else: else:
raise NotImplementedError('removing our handler') raise NotImplementedError('removing our handler')
if enable_cuda and config.device.startswith('gpu'): if enable_cuda and config.device.startswith('gpu'):
use() use()
...@@ -6,6 +6,13 @@ from theano import config ...@@ -6,6 +6,13 @@ from theano import config
_logger=logging.getLogger("theano.sandbox.cuda.nvcc_compiler") _logger=logging.getLogger("theano.sandbox.cuda.nvcc_compiler")
_logger.setLevel(logging.WARN) _logger.setLevel(logging.WARN)
from theano.configparser import config, AddConfigVar, StrParam
AddConfigVar('nvcc.compiler_bindir',
"if defined, nvcc compiler driver will seek g++ and gcc in this directory",
StrParam(""))
def error(*args): def error(*args):
#sys.stderr.write('ERROR:'+ ' '.join(str(a) for a in args)+'\n') #sys.stderr.write('ERROR:'+ ' '.join(str(a) for a in args)+'\n')
_logger.error("ERROR: "+' '.join(str(a) for a in args)) _logger.error("ERROR: "+' '.join(str(a) for a in args))
...@@ -68,6 +75,8 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[ ...@@ -68,6 +75,8 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[
debug('Generating shared lib', lib_filename) debug('Generating shared lib', lib_filename)
# TODO: Why do these args cause failure on gtx285 that has 1.3 compute capability? '--gpu-architecture=compute_13', '--gpu-code=compute_13', # TODO: Why do these args cause failure on gtx285 that has 1.3 compute capability? '--gpu-architecture=compute_13', '--gpu-code=compute_13',
cmd = ['nvcc', '-shared', '-g'] + [pa for pa in preargs if pa.startswith('-O')] cmd = ['nvcc', '-shared', '-g'] + [pa for pa in preargs if pa.startswith('-O')]
if config.nvcc.compiler_bindir:
cmd.extend(['--compiler-bindir', config.nvcc.compiler_bindir])
cmd.extend(['-Xcompiler', ','.join(pa for pa in preargs if not pa.startswith('-O'))]) cmd.extend(['-Xcompiler', ','.join(pa for pa in preargs if not pa.startswith('-O'))])
cmd.extend('-I%s'%idir for idir in include_dirs) cmd.extend('-I%s'%idir for idir in include_dirs)
cmd.extend(['-o',lib_filename]) cmd.extend(['-o',lib_filename])
......
...@@ -140,20 +140,20 @@ def test_elemwise1(): ...@@ -140,20 +140,20 @@ def test_elemwise1():
b = tensor.fmatrix() b = tensor.fmatrix()
#let debugmode catch any mistakes #let debugmode catch any mistakes
print >> sys.stderr, "STARTING FUNCTION 1" print >> sys.stdout, "STARTING FUNCTION 1"
f = pfunc([b], [], updates=[(a, b**a)], mode=mode_with_gpu) f = pfunc([b], [], updates=[(a, b**a)], mode=mode_with_gpu)
for i, node in enumerate(f.maker.env.toposort()): for i, node in enumerate(f.maker.env.toposort()):
print i, node print i, node
f(numpy.random.rand(*shape)+0.3) f(numpy.random.rand(*shape)+0.3)
print >> sys.stderr, "STARTING FUNCTION 2" print >> sys.stdout, "STARTING FUNCTION 2"
#let debugmode catch any mistakes #let debugmode catch any mistakes
f = pfunc([b], [], updates=[(a, tensor.exp(b**a))], mode=mode_with_gpu) f = pfunc([b], [], updates=[(a, tensor.exp(b**a))], mode=mode_with_gpu)
for i, node in enumerate(f.maker.env.toposort()): for i, node in enumerate(f.maker.env.toposort()):
print i, node print i, node
f(numpy.random.rand(*shape)+0.3) f(numpy.random.rand(*shape)+0.3)
print >> sys.stderr, "STARTING FUNCTION 3" print >> sys.stdout, "STARTING FUNCTION 3"
#let debugmode catch any mistakes #let debugmode catch any mistakes
f = pfunc([b], [], updates=[(a, a+b * tensor.exp(b**a))], mode=mode_with_gpu) f = pfunc([b], [], updates=[(a, a+b * tensor.exp(b**a))], mode=mode_with_gpu)
f(numpy.random.rand(*shape)+0.3) f(numpy.random.rand(*shape)+0.3)
...@@ -169,11 +169,11 @@ def test_elemwise2(): ...@@ -169,11 +169,11 @@ def test_elemwise2():
f = pfunc([b], [], updates=[(a, (a+b).dimshuffle(pattern))], mode=mode_with_gpu) f = pfunc([b], [], updates=[(a, (a+b).dimshuffle(pattern))], mode=mode_with_gpu)
has_elemwise = False has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()): for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise) has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise assert not has_elemwise
#let debugmode catch errors #let debugmode catch errors
print >> sys.stderr, 'pattern', pattern print >> sys.stdout, 'pattern', pattern
f(rng.rand(*shape)*.3) f(rng.rand(*shape)*.3)
shape = (3,4,5,6) shape = (3,4,5,6)
...@@ -204,7 +204,7 @@ def test_elemwise3(): ...@@ -204,7 +204,7 @@ def test_elemwise3():
b**a).dimshuffle([2,0,3,1]))], mode=mode_with_gpu) b**a).dimshuffle([2,0,3,1]))], mode=mode_with_gpu)
has_elemwise = False has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()): for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise) has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise assert not has_elemwise
#let debugmode catch errors #let debugmode catch errors
...@@ -220,7 +220,7 @@ def test_elemwise4(): ...@@ -220,7 +220,7 @@ def test_elemwise4():
f = pfunc([b,c], [], updates=[(a, (a+b.dimshuffle('x', 0)*c.dimshuffle(0, 'x')))], mode=mode_with_gpu) f = pfunc([b,c], [], updates=[(a, (a+b.dimshuffle('x', 0)*c.dimshuffle(0, 'x')))], mode=mode_with_gpu)
has_elemwise = False has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()): for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise) has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise assert not has_elemwise
#let debugmode catch errors #let debugmode catch errors
......
...@@ -360,7 +360,7 @@ def test_subsample(): ...@@ -360,7 +360,7 @@ def test_subsample():
def test_logical_shapes(): def test_logical_shapes():
# implement when # implement when
print >> sys.stderr, "INFO: test_logical_shapes not implemented (i.e. imshp_logical, kshp_logical, kshp_logical_top_aligned)" print >> sys.stderr, "WARNING TODO: test_logical_shapes not implemented (i.e. imshp_logical, kshp_logical, kshp_logical_top_aligned)"
def _test_dummy(): def _test_dummy():
......
...@@ -8,7 +8,7 @@ if cuda_ndarray.enable_cuda == False: ...@@ -8,7 +8,7 @@ if cuda_ndarray.enable_cuda == False:
import numpy import numpy
def test_host_to_device(): def test_host_to_device():
print >>sys.stderr, 'starting test_host_to_dev' print >>sys.stdout, 'starting test_host_to_dev'
for shape in ((), (3,), (2,3), (3,4,5,6)): for shape in ((), (3,), (2,3), (3,4,5,6)):
a = theano._asarray(numpy.random.rand(*shape), dtype='float32') a = theano._asarray(numpy.random.rand(*shape), dtype='float32')
b = cuda_ndarray.CudaNdarray(a) b = cuda_ndarray.CudaNdarray(a)
...@@ -53,7 +53,7 @@ def test_add(): ...@@ -53,7 +53,7 @@ def test_add():
def test_exp(): def test_exp():
print >>sys.stderr, 'starting test_exp' print >>sys.stdout, 'starting test_exp'
for shape in ((), (3,), (2,3), (1,10000000),(10,1000000), (100,100000),(1000,10000),(10000,1000)): for shape in ((), (3,), (2,3), (1,10000000),(10,1000000), (100,100000),(1000,10000),(10000,1000)):
a0 = theano._asarray(numpy.random.rand(*shape), dtype='float32') a0 = theano._asarray(numpy.random.rand(*shape), dtype='float32')
a1 = a0.copy() a1 = a0.copy()
...@@ -74,25 +74,25 @@ def test_exp(): ...@@ -74,25 +74,25 @@ def test_exp():
def test_copy(): def test_copy():
print >>sys.stderr, 'starting test_copy' print >>sys.stdout, 'starting test_copy'
shape = (5,) shape = (5,)
a = theano._asarray(numpy.random.rand(*shape), dtype='float32') a = theano._asarray(numpy.random.rand(*shape), dtype='float32')
print >>sys.stderr, '.. creating device object' print >>sys.stdout, '.. creating device object'
b = cuda_ndarray.CudaNdarray(a) b = cuda_ndarray.CudaNdarray(a)
print >>sys.stderr, '.. copy' print >>sys.stdout, '.. copy'
c = copy.copy(b) c = copy.copy(b)
print >>sys.stderr, '.. deepcopy' print >>sys.stdout, '.. deepcopy'
d = copy.deepcopy(b) d = copy.deepcopy(b)
print >>sys.stderr, '.. comparisons' print >>sys.stdout, '.. comparisons'
assert numpy.allclose(a, numpy.asarray(b)) assert numpy.allclose(a, numpy.asarray(b))
assert numpy.allclose(a, numpy.asarray(c)) assert numpy.allclose(a, numpy.asarray(c))
assert numpy.allclose(a, numpy.asarray(d)) assert numpy.allclose(a, numpy.asarray(d))
def test_dot(): def test_dot():
print >>sys.stderr, 'starting test_dot' print >>sys.stdout, 'starting test_dot'
a0 = theano._asarray(numpy.random.rand(4, 7), dtype='float32') a0 = theano._asarray(numpy.random.rand(4, 7), dtype='float32')
a1 = theano._asarray(numpy.random.rand(7, 6), dtype='float32') a1 = theano._asarray(numpy.random.rand(7, 6), dtype='float32')
...@@ -101,7 +101,7 @@ def test_dot(): ...@@ -101,7 +101,7 @@ def test_dot():
assert numpy.allclose(numpy.dot(a0, a1), cuda_ndarray.dot(b0, b1)) assert numpy.allclose(numpy.dot(a0, a1), cuda_ndarray.dot(b0, b1))
print >> sys.stderr, 'WARNING test_dot: not testing all 8 transpose cases of dot' print >> sys.stderr, 'WARNING TODO test_dot: not testing all 8 transpose cases of dot'
def test_sum(): def test_sum():
shape = (2,3) shape = (2,3)
...@@ -147,7 +147,7 @@ def test_reshape(): ...@@ -147,7 +147,7 @@ def test_reshape():
] ]
def subtest(shape_1, shape_2): def subtest(shape_1, shape_2):
#print >> sys.stderr, "INFO: shapes", shape_1, shape_2 #print >> sys.stdout, "INFO: shapes", shape_1, shape_2
a = theano._asarray(numpy.random.rand(*shape_1), dtype='float32') a = theano._asarray(numpy.random.rand(*shape_1), dtype='float32')
b = cuda_ndarray.CudaNdarray(a) b = cuda_ndarray.CudaNdarray(a)
......
...@@ -147,7 +147,7 @@ class DownsampleFactorMaxGrad(Op): ...@@ -147,7 +147,7 @@ class DownsampleFactorMaxGrad(Op):
def c_code_cache_version(self): def c_code_cache_version(self):
return () return ()
def max_pool2D(input, ds, ignore_border=False): def max_pool2D(input, ds, ignore_border=False):
""" """
Takes as input a N-D tensor, where N >= 2. It downscales the input image by Takes as input a N-D tensor, where N >= 2. It downscales the input image by
...@@ -166,7 +166,7 @@ def max_pool2D(input, ds, ignore_border=False): ...@@ -166,7 +166,7 @@ def max_pool2D(input, ds, ignore_border=False):
# extract image dimensions # extract image dimensions
img_shape = input.shape[-2:] img_shape = input.shape[-2:]
# count the number of "leading" dimensions, store as dmatrix # count the number of "leading" dimensions, store as dmatrix
batch_size = tensor.prod(input.shape[:-2]) batch_size = tensor.prod(input.shape[:-2])
batch_size = tensor.shape_padright(batch_size,1) batch_size = tensor.shape_padright(batch_size,1)
......
...@@ -7,7 +7,7 @@ from theano.tests import unittest_tools as utt ...@@ -7,7 +7,7 @@ from theano.tests import unittest_tools as utt
from theano import function, Mode from theano import function, Mode
import theano.tensor as T import theano.tensor as T
from conv import ConvOp, convolve2, getFilterOutShp from conv import ConvOp, getFilterOutShp
def flip(kern, kshp): def flip(kern, kshp):
"flip the kernel as scipy.convolv2d do it flipped." "flip the kernel as scipy.convolv2d do it flipped."
...@@ -41,7 +41,7 @@ def flip(kern, kshp): ...@@ -41,7 +41,7 @@ def flip(kern, kshp):
global_rng = N.random.RandomState(3423489) global_rng = N.random.RandomState(3423489)
dmatrix4=T.TensorType('float64', (False, False, False, False)) dmatrix4=T.TensorType('float64', (False, False, False, False))
def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll_batch=0, unroll_kern=0, img=T.dmatrix(), validate=True, conv_op_py=False, do_convolve2=False, do_print=True, repeat=1, unroll_patch=0): def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll_batch=0, unroll_kern=0, img=T.dmatrix(), validate=True, conv_op_py=False, do_print=True, repeat=1, unroll_patch=False, unroll_patch_size=False, verbose=0):
# build actual input images # build actual input images
imgval = global_rng.rand(bsize, imshp[0], imshp[1], imshp[2]) imgval = global_rng.rand(bsize, imshp[0], imshp[1], imshp[2])
...@@ -92,41 +92,13 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll ...@@ -92,41 +92,13 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll
imgval[b,i,...], w_flip[n,i,...],1,val, bval, 0)[0::ss[0],0::ss[1]] imgval[b,i,...], w_flip[n,i,...],1,val, bval, 0)[0::ss[0],0::ss[1]]
ntot += time.time() - time1 ntot += time.time() - time1
if do_convolve2:
####### test with new sp.convolve2 function ######
time1 = time.time()
hid, outshp2 = convolve2(kern, kshp, nkern, img, imshp,
bsize, (ss[0],ss[1]), mode=conv_mode)
propup = function([kern, img], hid)
propup1 = function([kern, img], hid,mode=Mode(linker="py"))
hidval = propup(w_flip.reshape(nkern,-1), imgval.reshape(bsize,-1))
hidval = hidval.reshape(bsize,nkern,outshp2[-2],outshp2[-1])
# hidval = hidval[:,:,::ss[0],::ss[1]]
hidval = hidval.reshape(bsize, -1)
for i in range(repeat):
hidval1 = propup1(w_flip.reshape(nkern,-1), imgval.reshape(bsize,-1))
hidval1 = hidval1.reshape(bsize,nkern,outshp2[-2],outshp2[-1])
# hidval1 = hidval1[:,:,::ss[0],::ss[1]]
hidval1 = hidval1.reshape(bsize, -1)
assert (N.abs(hidval-hidval1)<1e-5).all()
temp = N.abs(outval.reshape(bsize,-1) - hidval)
if validate:
assert (temp < 1e-5).all()
else:
hid = img #we don't need it, but it make the flow easier flow
hidval=outval.copy()#to keep the same memory
hidval1=outval.copy()
# ConvOp # ConvOp
if unroll_patch: if unroll_patch and not unroll_patch_size:
conv_op = ConvOp(dx=ss[0],dy=ss[1], output_mode=conv_mode, conv_op = ConvOp(dx=ss[0],dy=ss[1], output_mode=conv_mode,
unroll_patch=unroll_patch)(inputs4, kerns4) unroll_patch=unroll_patch, verbose=verbose)(inputs4, kerns4)
else: else:
conv_op = ConvOp(imshp, kshp, nkern, bsize, ss[0],ss[1], conv_mode, conv_op = ConvOp(imshp, kshp, nkern, bsize, ss[0],ss[1], conv_mode,
unroll_batch=unroll_batch, unroll_kern=unroll_kern, unroll_patch=unroll_patch)(inputs4, kerns4) unroll_batch=unroll_batch, unroll_kern=unroll_kern, unroll_patch=unroll_patch, verbose=verbose)(inputs4, kerns4)
l1shp=N.hstack((nkern, l1shp=N.hstack((nkern,
getFilterOutShp(imshp, kshp, ss, conv_mode))) getFilterOutShp(imshp, kshp, ss, conv_mode)))
propup2 = function([inputs4, kerns4], conv_op) propup2 = function([inputs4, kerns4], conv_op)
...@@ -155,7 +127,7 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll ...@@ -155,7 +127,7 @@ def exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp, kshps, nkerns, unroll
temp = N.abs(outval - hidval3) temp = N.abs(outval - hidval3)
assert (temp < 1e-5).all() assert (temp < 1e-5).all()
img, imshp = hid, tuple(outshp) imshp = tuple(outshp)
imgval = outval.reshape(bsize,outshp[0],outshp[1],outshp[2]) imgval = outval.reshape(bsize,outshp[0],outshp[1],outshp[2])
return tctot, tpytot, ntot return tctot, tpytot, ntot
...@@ -246,23 +218,9 @@ class TestConvOp(unittest.TestCase): ...@@ -246,23 +218,9 @@ class TestConvOp(unittest.TestCase):
# print 'img2d', img2d # print 'img2d', img2d
img1d = img2d.reshape(bsize,-1) img1d = img2d.reshape(bsize,-1)
# create filters (need to be flipped to use convolve2d) # create filters
filtersflipped = flip(filters.reshape((nkern,)+kshp), kshp) filtersflipped = flip(filters.reshape((nkern,)+kshp), kshp)
# compute with new convolve2 (no timing info)
output4, outshp4 = convolve2(kerns, kshp, nkern, input,\
imshp, bsize, (ss[0],ss[1]), bias=bias, mode=conv_mode)
# print 'output4', output4
ttime1 = time.time()
f = function([kerns, bias, input], output4)
out4 = f(filtersflipped.reshape(nkern,-1), biasvals, img1d)
# print 'out4', out4, img1d, filtersflipped
tconv2 += [time.time() - ttime1]
out4 = out4.reshape(bsize, nkern, outshp4[1], outshp4[2])
out4 = out4#[:,:,0::ss[0],0::ss[1]]
out4 = out4.reshape(bsize, -1)
# compute with ConvOp # compute with ConvOp
dmatrix3=T.TensorType('float64', (False, False, False)) dmatrix3=T.TensorType('float64', (False, False, False))
inputs4=dmatrix4() inputs4=dmatrix4()
...@@ -307,9 +265,6 @@ class TestConvOp(unittest.TestCase): ...@@ -307,9 +265,6 @@ class TestConvOp(unittest.TestCase):
# compare benchmark with ConvOp # compare benchmark with ConvOp
temp = bench1.flatten() - out2.flatten() temp = bench1.flatten() - out2.flatten()
assert (temp < 1e-5).all() assert (temp < 1e-5).all()
# compare benchmark with convolve2
temp = bench1.flatten() - out4.flatten()
assert (temp < 1e-5).all()
print '**** Convolution Profiling Results ****' print '**** Convolution Profiling Results ****'
print 'Scipy convolve2d processing time: %.3fs'%sum(tscipy),tscipy print 'Scipy convolve2d processing time: %.3fs'%sum(tscipy),tscipy
...@@ -319,55 +274,17 @@ class TestConvOp(unittest.TestCase): ...@@ -319,55 +274,17 @@ class TestConvOp(unittest.TestCase):
d=N.asarray(tscipy)/tconvop d=N.asarray(tscipy)/tconvop
print 'speed up ConvOp vs convolve2d: %.3f'%d.mean(),d print 'speed up ConvOp vs convolve2d: %.3f'%d.mean(),d
def test_multilayer_conv(self): def speed_multilayer_conv(self):
print '\n\n*************************************************'
print ' TEST MULTILAYER CONVOLUTION'
print '*************************************************'
# fixed parameters
# test multiple configuration at the same time
bsizes = [6,6] # batch size
imshp_starts = [(1,13,14),(1,4,5)]
kshpss = ([[5,6],[7,4]],[[2,2],[2,2]])
nkernss = [[20,40],[2,2]] # per output pixel
ssizess = [[(1,1),(1,2)],[(1,1),(2,2)]]
convmodes = ['valid','full']
do_convolve2=True
unroll = [(0,0,True),(0,0,False),(1,1,False),(2,2,False),(3,2,False)]#(batch,kern,patch)
do_speed_test = False
# TODO: this version show a bug that was fixed
# the test is included in the upper test.
# imshp_start = (1,4,4)
# kshps = ([2,2],[2,2])#,[7,4])
# nkerns = [2,2] # per output pixel
# ssizes = [(1,1),(2,2)]#2,2)]
# bsizes = [1,1] # batch size
# imshp_starts = [(1,10,10),(1,5,6)]
# kshpss = ([[2,3],[3,2]],[[2,2],[2,2]])
# nkernss = [[1,1],[1,1]] # per output pixel
N.set_printoptions(threshold=N.nan)
# symbolic stuff
kerns = [T.matrix(),T.dmatrix()]
img = T.dmatrix()
rng = N.random.RandomState(3423489)
tctot, tpytot, ntot = [], [], []
for i in range(len(kshpss)):
assert len(kshpss[i])==len(nkernss[i])==len(kerns)
if do_speed_test:
# calculate the speed up of different combination of unroll # calculate the speed up of different combination of unroll
# put the paramter to the same you will try. # put the paramter to the same you will try.
validate=False# we don't validate the result to have it much faster! validate=False# we don't validate the result to have it much faster!
verbose=1
unroll_batch = [1,2,4,5,10,20] unroll_batch = [1,2,4,5,10,20]
unroll_kern = [1,2,4,5,10,20] unroll_kern = [1,2,4,5,10,20]
unroll_batch = [1,4,5] unroll_batch = [1,4,5]
unroll_kern = [1,4,5] unroll_kern = [1,4,5]
unroll_patch = [True, False]
bsize = 20 # batch size bsize = 20 # batch size
imshp_start = (1,48,48)#un square shape to test more corner case. imshp_start = (1,48,48)#un square shape to test more corner case.
...@@ -381,15 +298,16 @@ class TestConvOp(unittest.TestCase): ...@@ -381,15 +298,16 @@ class TestConvOp(unittest.TestCase):
assert len(kshps)==len(nkerns)==len(kerns) assert len(kshps)==len(nkerns)==len(kerns)
timing = N.zeros((len(unroll_batch),len(unroll_kern),3)) timing = N.zeros((len(unroll_batch),len(unroll_kern),3,len(convmodes)*len(ssizes)))
t_b_k=[] t_b_k=[]
#calculate the timing with unrolling #calculate the timing with unrolling
print 'time unroll batch kern'
t_=[[ 7.60572791, 3.95069814, 3.74271464], [ 4.05631089, 2.90384555, 2.93613672], [ 3.90551591, 2.92595196, 3.00102282]] t_=[[ 7.60572791, 3.95069814, 3.74271464], [ 4.05631089, 2.90384555, 2.93613672], [ 3.90551591, 2.92595196, 3.00102282]]
best=[]
worst=[]
best=[0.52690219879150391, 2.4266397953033447] best=[0.52690219879150391, 2.4266397953033447]
worst=[0.92042708396911621, 6.8822150230407715] worst=[0.92042708396911621, 6.8822150230407715]
best=[]
worst=[]
t_=[] t_=[]
for unroll_b, n_b in zip(unroll_batch,range(len(unroll_batch))): for unroll_b, n_b in zip(unroll_batch,range(len(unroll_batch))):
for unroll_k, n_k in zip(unroll_kern,range(len(unroll_kern))): for unroll_k, n_k in zip(unroll_kern,range(len(unroll_kern))):
...@@ -398,30 +316,31 @@ class TestConvOp(unittest.TestCase): ...@@ -398,30 +316,31 @@ class TestConvOp(unittest.TestCase):
tctot, tpytot, ntot=[],[],[] tctot, tpytot, ntot=[],[],[]
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))): for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))): for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=unroll_b, unroll_kern=unroll_k, validate=validate) tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=unroll_b, unroll_kern=unroll_k, validate=validate, verbose=verbose,do_print=False)
tctot+=[tctot_] tctot+=[tctot_]
tpytot+=[tpytot_] tpytot+=[tpytot_]
ntot+=[ntot_] ntot+=[ntot_]
if unroll_b==4 and unroll_k==4: if unroll_b==4 and unroll_k==4:
print "unroll 4/4",tctot #print "unroll 4/4",tctot
best=tctot best=tctot
if unroll_b==1 and unroll_k==1: if unroll_b==1 and unroll_k==1:
print "unroll 1/1",tctot #print "unroll 1/1",tctot
worst=tctot worst=tctot
timing[n_b,n_k]=[sum(tctot), sum(tpytot), sum(ntot)] timing[n_b,n_k]=[tctot, tpytot, ntot]#[sum(tctot), sum(tpytot), sum(ntot)]
if not t_: if not t_:
t=timing[:,:,0]#We select only the c timing. t=timing[:,:,0,:]#We select only the c timing.
else: else:
t=t_ t=t_
t=N.asarray(t) t=N.asarray(t)
#calculate the old timing #calculate the old timing
print 'time old version'
tctot_=[0.52555489540100098, 6.6634182929992676] tctot_=[0.52555489540100098, 6.6634182929992676]
# tctot_=[]
tctot,tpytot,ntot=[],[],[] tctot,tpytot,ntot=[],[],[]
tctot_=[]
if not tctot_: if not tctot_:
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))): for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))): for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate) tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate, verbose=verbose,do_print=False)
tctot+=[tctot_] tctot+=[tctot_]
tpytot+=[tpytot_] tpytot+=[tpytot_]
ntot+=[ntot_] ntot+=[ntot_]
...@@ -432,29 +351,73 @@ class TestConvOp(unittest.TestCase): ...@@ -432,29 +351,73 @@ class TestConvOp(unittest.TestCase):
print "timing for unrolled version" print "timing for unrolled version"
print t_b_k print t_b_k
print t print t
t_detail=t
t = t.sum(axis=2)
print "max %.3fs"%t.max(), "max param(batch unloop size/kernel unloop size)", t_b_k[t.argmax()] print "max %.3fs"%t.max(), "max param(batch unloop size/kernel unloop size)", t_b_k[t.argmax()]
print "min %.3fs"%t.min(), "min param(batch unloop size/kernel unloop size)", t_b_k[t.argmin()] print "min %.3fs"%t.min(), "min param(batch unloop size/kernel unloop size)", t_b_k[t.argmin()]
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t.min(),sum(tctot)/t.min()) print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t.min(),sum(tctot)/t.min())
print worst/best,tctot/best print worst/best,tctot/best
#calculate the timing of unroll_patch
print 'time unroll_patch'
tctot_patch = [] tctot_patch = []
tctot_patch_size = []
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))): for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizes,range(len(ssizes))): for ss, n_ss in zip(ssizes,range(len(ssizes))):
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=2) tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=True,verbose=verbose,do_print=False)
tctot_patch += [tctot_] tctot_patch += [tctot_]
tctot_, tpytot_, ntot_ = exec_multilayer_conv_nnet(conv_mode, ss, bsize, imshp_start, kshps, nkerns, unroll_batch=0, unroll_kern=0, validate=validate,unroll_patch=True,verbose=verbose,do_print=False,unroll_patch_size=True)
tctot_patch_size += [tctot_]
t_patch=sum(tctot_patch) t_patch=sum(tctot_patch)
print "unroll_patch time", tctot_patch print "unroll_patch without shape time", tctot_patch
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch,sum(tctot)/t_patch) print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch,sum(tctot)/t_patch)
print best/tctot_patch, worst/tctot_patch print best/tctot_patch, worst/tctot_patch
t_patch_size=sum(tctot_patch_size)
print "unroll_patch with shape time", tctot_patch_size
print "speedup vs (1/1)%.3fx, vs old %.3fx"% (t.max()/t_patch_size,sum(tctot)/t_patch_size)
print best/tctot_patch_size, worst/tctot_patch_size
print best
print worst
print tctot
print tctot_patch
return return
def test_multilayer_conv(self):
print '\n\n*************************************************'
print ' TEST MULTILAYER CONVOLUTION'
print '*************************************************'
# fixed parameters
# test multiple configuration at the same time
bsizes = [6,6] # batch size
imshp_starts = [(1,13,14),(1,4,5)]
kshpss = ([[5,6],[7,4]],[[2,2],[2,2]])
nkernss = [[20,40],[2,2]] # per output pixel
ssizess = [[(1,1),(1,2)],[(1,1),(2,2)]]
convmodes = ['valid','full']
do_convolve2=True
unroll = [(0,0,True),(0,0,False),(1,1,False),(2,2,False),(3,2,False)]#(batch,kern,patch)
# TODO: this version show a bug that was fixed
# the test is included in the upper test.
# imshp_start = (1,4,4)
# kshps = ([2,2],[2,2])#,[7,4])
# nkerns = [2,2] # per output pixel
# ssizes = [(1,1),(2,2)]#2,2)]
# bsizes = [1,1] # batch size
# imshp_starts = [(1,10,10),(1,5,6)]
# kshpss = ([[2,3],[3,2]],[[2,2],[2,2]])
# nkernss = [[1,1],[1,1]] # per output pixel
N.set_printoptions(threshold=N.nan)
# symbolic stuff
kerns = [T.matrix(),T.dmatrix()]
img = T.dmatrix()
rng = N.random.RandomState(3423489)
tctot, tpytot, ntot = [], [], []
for i in range(len(kshpss)):
assert len(kshpss[i])==len(nkernss[i])==len(kerns)
for i in range(len(kshpss)): for i in range(len(kshpss)):
for conv_mode, n_mode in zip(convmodes,range(len(convmodes))): for conv_mode, n_mode in zip(convmodes,range(len(convmodes))):
for ss, n_ss in zip(ssizess[i],range(len(ssizess[i]))): for ss, n_ss in zip(ssizess[i],range(len(ssizess[i]))):
......
import unittest, sys, time import unittest, sys, time
import numpy as N import numpy
import theano.tensor as T import theano.tensor as tensor
from theano.tests import unittest_tools as utt from theano.tests import unittest_tools as utt
from theano.sandbox.downsample import DownsampleFactorMax from theano.sandbox.downsample import DownsampleFactorMax, max_pool2D
from theano import function, Mode from theano import function, Mode
def max_pool(images=None, imshp=None, maxpoolshp=None, ignore_border=True):
"""Implements a max pooling layer
Uses the same API as sp.max_pool but uses the Downsample op instead. class TestDownsampleFactorMax(unittest.TestCase):
def setUp(self):
utt.seed_rng()
Takes as input a 2D tensor of shape batch_size x img_size and performs max pooling. @staticmethod
Max pooling downsamples by taking the max value in a given area, here defined by def numpy_max_pool2D(input, ds, ignore_border=False):
maxpoolshp. Outputs a 2D tensor of shape batch_size x output_size. '''Helper function, implementing max_pool2D in pure numpy'''
if len(input.shape) < 2:
raise NotImplementedError('input should have at least 2 dim, shape is %s'\
% str(input.shape))
Parameters are keyword arguments in order to use func_to_mod. xi=0
yi=0
if not ignore_border:
if input.shape[-2] % ds[0]:
xi += 1
if input.shape[-1] % ds[1]:
yi += 1
@param images: 2D tensor containing images on which to apply convolution. out_shp = list(input.shape[:-2])
Assumed to be of shape batch_size x img_size out_shp.append(input.shape[-2]/ds[0]+xi)
@param imgshp: tuple containing image dimensions out_shp.append(input.shape[-1]/ds[1]+yi)
@param maxpoolshp: tuple containing shape of area to max pool over
@output out1: symbolic result (2D tensor)
@output out2: logical shape of the output
""" output_val = numpy.zeros(out_shp)
if len(imshp) == 2:
imshp = (1,) + imshp
elif len(imshp)!=3:
raise NotImplementedError("!")
# all these reshapes should happen in place
imrshp = T.stack(images.shape[0],
*[T.as_tensor(x) for x in imshp])
imtensor = T.reshape(images, imrshp)
maxpop = DownsampleFactorMax(maxpoolshp, ignore_border) for k in numpy.ndindex(input.shape[:-2]):
rval = maxpop(imtensor) for i in range(output_val.shape[-2]):
ii = i*ds[0]
for j in range(output_val.shape[-1]):
jj = j*ds[1]
patch = input[k][ii:ii+ds[0],jj:jj+ds[1]]
output_val[k][i,j] = numpy.max(patch)
return output_val
return T.flatten(rval,2), maxpop.out_shape(imshp, maxpoolshp, ignore_border) def test_DownsampleFactorMax(self):
rng = numpy.random.RandomState(utt.fetch_seed())
class TestDownsampleFactorMax(unittest.TestCase): # generate random images
def test_maxpool(self):
# generate flatted images
maxpoolshps = ((1,1),(2,2),(3,3),(2,3)) maxpoolshps = ((1,1),(2,2),(3,3),(2,3))
imval = N.random.rand(4,10,64,64) imval = rng.rand(4,10,64,64)
images = T.dmatrix() images = tensor.dtensor4()
dmatrix4=T.TensorType('float64', (False, False, False, False))
images4=dmatrix4()
tctot, tpytot, ntot = [],[],[]
for maxpoolshp in maxpoolshps: for maxpoolshp in maxpoolshps:
for border in [True,False]: for ignore_border in [True,False]:
print 'maxpoolshp', maxpoolshp,'border', border print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
# numeric verification
xi=0 ## Pure Numpy computation
yi=0 numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
if not border:
if imval.shape[-2] % maxpoolshp[0]: output = max_pool2D(images, maxpoolshp, ignore_border)
xi += 1
if imval.shape[-1] % maxpoolshp[1]:
yi += 1
my_output_val = N.zeros((imval.shape[0], imval.shape[1],
imval.shape[2]/maxpoolshp[0]+xi,
imval.shape[3]/maxpoolshp[1]+yi))
time1=time.time()
for n in range(imval.shape[0]):
for k in range(imval.shape[1]):
for i in range(my_output_val.shape[2]):
ii = i*maxpoolshp[0]
for j in range(my_output_val.shape[3]):
jj = j*maxpoolshp[1]
patch = imval[n,k,ii:ii+maxpoolshp[0],jj:jj+maxpoolshp[1]]
my_output_val[n,k,i,j] = N.max(patch)
my_output_val = my_output_val.reshape(imval.shape[0],-1)
ntot+=[time.time()-time1]
# symbolic stuff
#### wrapper to DownsampleFactorMax op ####
output, outshp = max_pool(images, imval.shape[1:], maxpoolshp, border)
assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
assert N.prod(my_output_val.shape[1:]) == N.prod(outshp)
f = function([images,],[output,]) f = function([images,],[output,])
imval2=imval.reshape(imval.shape[0],-1) output_val = f(imval)
output_val = f(imval2) assert numpy.all(output_val == numpy_output_val)
assert N.all(output_val == my_output_val)
#DownsampleFactorMax op #DownsampleFactorMax op
maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=border)(images4) maxpool_op = DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(images)
f = function([images4],maxpool_op,mode=Mode(linker="py")) f = function([images], maxpool_op)
f2 = function([images4],maxpool_op,mode=Mode(linker="c"))
f3 = function([images4],maxpool_op)#for when we want to use the debug mode
time1=time.time()
output_val = f(imval) output_val = f(imval)
tctot+=[time.time()-time1] assert (numpy.abs(output_val - numpy_output_val) < 1e-5).all()
assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
time1=time.time()
output_val = f2(imval)
tpytot+=[time.time()-time1]
assert (N.abs(my_output_val.flatten()-output_val.flatten())<1e-5).all()
output_val = f3(imval)
print 'Numpy processing time: %.3fs'%sum(ntot),ntot
print 'c Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tctot),tctot
print 'py Theano(DownsampleFactorMax) processing time: %.3fs'%sum(tpytot),tpytot
d=N.asarray(ntot)/tctot
print 'speed up c theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
d=N.asarray(ntot)/tpytot
print 'speed up py theano(DownsampleFactorMax) vs manual: %.3f'%d.mean(),d
def test_DownsampleFactorMax_grad(self): def test_DownsampleFactorMax_grad(self):
# generate flatted images rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = ((1,1),(3,2),(2,3)) maxpoolshps = ((1,1),(3,2),(2,3))
imval = N.random.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate imval = rng.rand(2,3,3,4) * 10.0 #more variance means numeric gradient will be more accurate
do_theano=True
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
def mp(input):
return DownsampleFactorMax(maxpoolshp, ignore_border=ignore_border)(input)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_2D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = ((1,1),(3,2))
imval = rng.rand(4,7)
images = tensor.dmatrix()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
def mp(input):
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_3D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = [(1,2)]
imval = rng.rand(2,3,4)
images = tensor.dtensor3()
for maxpoolshp in maxpoolshps: for maxpoolshp in maxpoolshps:
for border in [True,False]: for ignore_border in [True,False]:
print 'maxpoolshp', maxpoolshp, 'border', border print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
c = tensor.sum(output)
c_val = function([images], c)(imval)
g = tensor.grad(c, images)
g_val = function([images], [g.shape, tensor.min(tensor.min(tensor.min(g))), tensor.max(tensor.max(tensor.max(g)))])(imval)
def mp(input): def mp(input):
return DownsampleFactorMax(maxpoolshp, ignore_border=border)(input) return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval]) utt.verify_grad(mp, [imval], rng=rng)
def test_max_pool2D_6D(self):
rng = numpy.random.RandomState(utt.fetch_seed())
maxpoolshps = [(3,2)]
imval = rng.rand(2,1,1,1,3,4)
images = tensor.TensorType('float64', [False]*6)()
for maxpoolshp in maxpoolshps:
for ignore_border in [True,False]:
print 'maxpoolshp =', maxpoolshp
print 'ignore_border =', ignore_border
numpy_output_val = self.numpy_max_pool2D(imval, maxpoolshp, ignore_border)
output = max_pool2D(images, maxpoolshp, ignore_border)
output_val = function([images], output)(imval)
assert numpy.all(output_val == numpy_output_val)
def mp(input):
return max_pool2D(input, maxpoolshp, ignore_border)
utt.verify_grad(mp, [imval], rng=rng)
if __name__ == '__main__': if __name__ == '__main__':
t = TestDownsampleFactorMax("test_maxpool").run() unittest.main()
#t.test_maxpool()
from theano.tests import main
# main("test_sp")
...@@ -1125,7 +1125,7 @@ inv = Inv(upgrade_to_float, name = 'inv') ...@@ -1125,7 +1125,7 @@ inv = Inv(upgrade_to_float, name = 'inv')
class Log(UnaryScalarOp): class Log(UnaryScalarOp):
""" log base e """ """ log base e """
def impl(self, x): def impl(self, x):
return math.log(x) return numpy.log(x)
def grad(self, (x, ), (gz, )): def grad(self, (x, ), (gz, )):
if x.type in grad_types: if x.type in grad_types:
return gz / x, return gz / x,
......
...@@ -330,6 +330,7 @@ class TensorType(Type): ...@@ -330,6 +330,7 @@ class TensorType(Type):
self.broadcastable = tuple(broadcastable) self.broadcastable = tuple(broadcastable)
self.dtype_specs() # error checking is done there self.dtype_specs() # error checking is done there
self.name = name self.name = name
self.numpy_dtype = numpy.dtype(self.dtype)
if shape is None: if shape is None:
#backport self.shape = tuple((1 if b else None) for b in self.broadcastable) #backport self.shape = tuple((1 if b else None) for b in self.broadcastable)
l=[] l=[]
...@@ -360,16 +361,16 @@ class TensorType(Type): ...@@ -360,16 +361,16 @@ class TensorType(Type):
This function is not meant to be called in user code. It is for This function is not meant to be called in user code. It is for
`Linker` instances to use when running a compiled graph. `Linker` instances to use when running a compiled graph.
""" """
_data = data if (type(data) is numpy.ndarray) and (data.dtype is self.numpy_dtype):
if strict: pass # fall through to ndim check
elif strict:
# this is its own subcase that doesn't fall through to anything
if not isinstance(data, numpy.ndarray): if not isinstance(data, numpy.ndarray):
raise TypeError("%s expected a ndarray object.", data, type(data)) raise TypeError("%s expected a ndarray object.", data, type(data))
if not str(data.dtype) == self.dtype: if not str(data.dtype) == self.dtype:
raise TypeError("%s expected a ndarray object with dtype = %s (got %s)." % (self, self.dtype, data.dtype)) raise TypeError("%s expected a ndarray object with dtype = %s (got %s)." % (self, self.dtype, data.dtype))
if not data.ndim == self.ndim: if not data.ndim == self.ndim:
raise TypeError("%s expected a ndarray object with %s dimensions (got %s)." % (self, self.ndim, data.ndim)) raise TypeError("%s expected a ndarray object with %s dimensions (got %s)." % (self, self.ndim, data.ndim))
if self.filter_checks_isfinite and (not numpy.all(numpy.isfinite(data))):
raise TypeError("non-finite elements not allowed")
if TensorType.use_shape: if TensorType.use_shape:
for si, di in zip(self.shape, data.shape): for si, di in zip(self.shape, data.shape):
...@@ -378,11 +379,17 @@ class TensorType(Type): ...@@ -378,11 +379,17 @@ class TensorType(Type):
self, self.shape, data.shape)) self, self.shape, data.shape))
return data return data
else: else:
data = theano._asarray(data, dtype = self.dtype) data = theano._asarray(data, dtype = self.dtype) #TODO - consider to pad shape with ones
if not self.ndim == data.ndim: # to make it consistent with self.broadcastable... like vector->row type thing
if self.ndim != data.ndim:
raise TypeError("Wrong number of dimensions: expected %s, got %s with shape %s." % (self.ndim, data.ndim, data.shape), data) raise TypeError("Wrong number of dimensions: expected %s, got %s with shape %s." % (self.ndim, data.ndim, data.shape), data)
if any(b and d != 1 for d, b in zip(data.shape, self.broadcastable)): i = 0
raise TypeError("Non-unit value on shape on a broadcastable dimension.", data.shape, self.broadcastable) for b in self.broadcastable:
if b and data.shape[i] != 1:
raise TypeError("Non-unit value on shape on a broadcastable dimension.", data.shape, self.broadcastable)
i+=1
if self.filter_checks_isfinite and (not numpy.all(numpy.isfinite(data))):
raise ValueError("non-finite elements not allowed")
return data return data
def dtype_specs(self): def dtype_specs(self):
...@@ -1826,14 +1833,16 @@ class Default(gof.Op): ...@@ -1826,14 +1833,16 @@ class Default(gof.Op):
view_map = {0: [0]} view_map = {0: [0]}
def make_node(self, x, default): def make_node(self, x, default):
x, default = as_tensor_variable(x), as_tensor_variable(default) x, default = as_tensor_variable(x), as_tensor_variable(default)
assert x.type == default.type if x.type != default.type:
raise TypeError('Both default() arguments must have same type', x, default)
return gof.Apply(self, [x, default], [default.type()]) return gof.Apply(self, [x, default], [default.type()])
def perform(self, node, (x, default), (out, )): def perform(self, node, (x, default), (out, )):
if x is None: if x is None:
out[0] = default.copy() # why copy? Theano can't yet understand out[0] being a view of either x or y,
else: # so we can be a view of x, but only a copy of y.
out[0] = x out[0] = default.copy()
#backport out[0] = default.copy() if x is None else x else:
out[0] = x
default = Default() default = Default()
setdefault = default # legacy setdefault = default # legacy
...@@ -3588,8 +3597,10 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps=None, tol=None, mode=None, cast ...@@ -3588,8 +3597,10 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps=None, tol=None, mode=None, cast
o_fn = function(tensor_pt, o_output) o_fn = function(tensor_pt, o_output)
o_fn_out = o_fn(*[p.copy() for p in pt]) o_fn_out = o_fn(*[p.copy() for p in pt])
random_projection = rng.rand(*o_fn_out.shape) # random_projection should not have elements too small,
# otherwise too much precision is lost in numerical gradient
random_projection = rng.rand(*o_fn_out.shape) + 0.5
if cast_to_output_type: if cast_to_output_type:
random_projection = numpy.array(random_projection, random_projection = numpy.array(random_projection,
dtype=o_output.dtype) dtype=o_output.dtype)
......
...@@ -822,7 +822,14 @@ class CAReduce(Op): ...@@ -822,7 +822,14 @@ class CAReduce(Op):
to_reduce = reversed(sorted(axis)) to_reduce = reversed(sorted(axis))
if to_reduce: if to_reduce:
for dimension in to_reduce: for dimension in to_reduce:
variable = self.ufunc.reduce(variable, dimension) # If it's a zero-size array, use scalar_op.identity if available
if variable.shape[dimension] == 0:
if hasattr(self.scalar_op, 'identity'):
variable = self.scalar_op.identity
else:
raise ValueError("Input (%s) has zero-size on axis %s, but self.scalar_op (%s) has no attribute 'identity'" % (variable, dimension, self.scalar_op))
else:
variable = self.ufunc.reduce(variable, dimension)
output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype) output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype)
else: else:
output[0] = numpy.copy(variable) output[0] = numpy.copy(variable)
......
...@@ -133,6 +133,8 @@ class test_CAReduce(unittest.TestCase): ...@@ -133,6 +133,8 @@ class test_CAReduce(unittest.TestCase):
((5, 6), (1, )), ((5, 6), (1, )),
((5, 6), ()), ((5, 6), ()),
((2, 3, 4, 5), (0, 1, 3)), ((2, 3, 4, 5), (0, 1, 3)),
((5, 0), (0, )),
((5, 0), (1, )),
((), ())]: ((), ())]:
x = TensorType('float64', [(entry == 1) for entry in xsh])('x') x = TensorType('float64', [(entry == 1) for entry in xsh])('x')
e = CAReduce(add, axis = tosum)(x) e = CAReduce(add, axis = tosum)(x)
...@@ -149,7 +151,7 @@ class test_CAReduce(unittest.TestCase): ...@@ -149,7 +151,7 @@ class test_CAReduce(unittest.TestCase):
def test_c(self): def test_c(self):
self.with_linker(gof.CLinker()) self.with_linker(gof.CLinker())
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论