提交 b4c881d1 authored 作者: Dumitru Erhan's avatar Dumitru Erhan

merge

......@@ -43,8 +43,10 @@ Environment Variables
.. envvar:: THEANO_FLAGS
This is a list of comma-delimited key[=value] pairs that control Theano's behavior. A key that appears without an '=value' must be for a boolean value, and it acts as setting it to True.
This is a list of comma-delimited key[=value] pairs that control
Theano's behavior. A key that appears without an '=value' must be
for a boolean value, and it acts as setting it to True.
For example, in bash, you can override your :envvar:`THEANORC` defaults
for <myscript>.py by typing this:
......@@ -52,11 +54,15 @@ Environment Variables
THEANO_FLAGS='floatX=float32,device=gpu0,nvcc.fastmath' python <myscript>.py
If a value is defined several times in ``THEANO_FLAGS``,
the right-most definition is used. So, for instance, if
``THEANO_FLAGS='device=cpu,device=gpu0'``, then gpu0 will be used.
.. envvar:: THEANORC
The location[s] of the .theanorc file[s] in ConfigParser format.
It defaults to ``$HOME/.theanorc``.
It defaults to ``$HOME/.theanorc``.
Here is the .theanorc equivalent to the THEANO_FLAGS in the example above:
.. code-block:: text
......@@ -70,10 +76,10 @@ Environment Variables
Multiple configuration files can be specified by separating them with ':'
characters (as in $PATH). Multiple configuration files will be merged,
with earlier (left-most) files taking priority over later files in the
with later (right-most) files taking priority over earlier files in the
case that multiple files specify values for a common configuration option.
For example, to override system-wide settings with personal ones,
set ``THEANORC=~/.theanorc:/etc/theanorc``
For example, to override system-wide settings with personal ones,
set ``THEANORC=/etc/theanorc:~/.theanorc``.
The rest of this page describes some of the more common and important flags
that you might want to use. For the complete list (including documentation),
......
......@@ -58,7 +58,7 @@ file and run it.
import numpy
import time
vlen = 100000
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
......@@ -74,28 +74,31 @@ The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to
make sure that the input `x` are stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 3 seconds, whereas on the GPU it takes just over 0.2 seconds. Note that the results are close but not identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. Note that the results are close but not
identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
As a point of reference, a loop that calls ``numpy.exp(x.value)`` also takes about 7 seconds.
.. code-block:: text
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu python thing.py
Looping 100 times took 3.12647008896 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 1.74085572 2.55530456 1.88906098]
Looping 100 times took 7.17374897003 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753 1.62323285]
bergstra@tikuanyin:~/tmp$ THEANO_FLAGS=mode=FAST_RUN,device=gpu0 python thing.py
Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.217401981354 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 1.74085569 2.55530477 1.88906097]
Looping 100 times took 0.418929815292 seconds
Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Returning a handle to device-allocated data
-------------------------------------------
The speedup is not greater in the example above because the function is
returning its result as a numpy ndarray (which has already copied from the
device to the host). This is what makes it so easy to swap in device=gpu0, but
if you want to be less portable, you can see a bigger speedup by changing
returning its result as a numpy ndarray which has already been copied from the
device to the host for your convenience. This is what makes it so easy to swap in device=gpu0, but
if you don't mind being less portable, you might prefer to see a bigger speedup by changing
the graph to express a computation with a GPU-stored result. The gpu_from_host
op means "copy the input from the host to the gpu" and it is optimized away
Op means "copy the input from the host to the gpu" and it is optimized away
after the T.exp(x) is replaced by a GPU version of exp().
.. code-block:: python
......@@ -105,7 +108,7 @@ after the T.exp(x) is replaced by a GPU version of exp().
import numpy
import time
vlen = 100000
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
......@@ -123,17 +126,71 @@ The output from this program is
.. code-block:: text
Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.173671007156 seconds
Looping 100 times took 0.185714006424 seconds
Result is <CudaNdarray object at 0x3e9e970>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 1.74085569 2.55530477 1.88906097]
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
Here we've shaved off about 20% of the run-time by simply not copying the
Here we've shaved off about 50% of the run-time by simply not copying the
resulting array back to the host.
The object returned by each function call is now not a numpy array but a
"CudaNdarray" which can be converted to a numpy ndarray by the normal
numpy casting mechanism.
Running the GPU at Full Speed
------------------------------
To really get maximum performance in this simple example, we need to use an :class:`Out`
instance to tell Theano not to copy the output it returns to us. Theano allocates memory for
internal use like a working buffer, but by default it will never return a result that is
allocated in the working buffer. This is normally what you want, but our example is so simple
that it has the un-wanted side-effect of really slowing things down.
..
TODO:
The story here about copying and working buffers is misleading and potentially not correct
... why exactly does borrow=True cut 75% of the runtime ???
.. code-block:: python
from theano import function, config, shared, sandbox, Out
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([],
Out(sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)),
borrow=True))
t0 = time.time()
for i in xrange(iters):
r = f()
print 'Looping 100 times took', time.time() - t0, 'seconds'
print 'Result is', r
print 'Numpy result is', numpy.asarray(r)
Running this version of the code takes just under 0.05 seconds, over 140x faster than
the CPU implementation!
.. code-block:: text
Using gpu device 0: GeForce GTX 285
Looping 100 times took 0.0497219562531 seconds
Result is <CudaNdarray object at 0x31eeaf0>
Numpy result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296]
This version of the code ``using borrow=True`` is slightly less safe because if we had saved
the `r` returned from one function call, we would have to take care and remember that its value might
be over-written by a subsequent function call. Although borrow=True makes a dramatic difference in this example,
be careful! The advantage of
borrow=True is much weaker in larger graphs, and there is a lot of potential for making a
mistake by failing to account for the resulting memory aliasing.
What can be accelerated on the GPU?
------------------------------------
......
......@@ -428,9 +428,20 @@ class Function(object):
# Reinitialize each container's 'provided' counter
for c in self.input_storage:
c.provided = 0
# Set positional arguments
for i, arg in enumerate(args):
self[i] = arg
i = 0
for arg in args:
#TODO: provide a Param option for skipping the filter if we
# really want speed.
s = self.input_storage[i]
if arg is None:
s.storage[0] = arg
else:
s.storage[0] = s.type.filter(arg, strict=s.strict)
s.provided += 1
i+=1
# Set keyword arguments
for k, arg in kwargs.iteritems():
self[k] = arg
......@@ -448,7 +459,9 @@ class Function(object):
self.inv_finder[c]))
# Do the actual work
t0_fn = time.time()
self.fn()
dt_fn = time.time() - t0_fn
# Retrieve the values that were computed
outputs = [x.data for x in self.output_storage]
......@@ -486,6 +499,9 @@ class Function(object):
self.maker.mode.fct_call_time[self.name] += dt_call
self.maker.mode.fct_call[self.name] += 1
self.maker.mode.call_time += dt_call
self.maker.mode.fn_time += dt_fn
if self.return_none:
return None
elif self.unpack_single and len(outputs) == 1:
......
......@@ -172,6 +172,8 @@ class Mode(object):
if isinstance(optimizer, gof.Query):
self.provided_optimizer = optimizer
self._optimizer = optimizer
self.call_time = 0
self.fn_time = 0
def __str__(self):
return "Mode(linker = %s, optimizer = %s)" % (self.provided_linker, self.provided_optimizer)
......
import time, atexit, copy
from theano.gof.link import WrapLinkerMany
from theano.gof.link import WrapLinker
from theano.gof.cutils import run_cthunk
from theano.compile.mode import Mode, register_mode, predefined_modes, predefined_linkers, predefined_optimizers, default_linker, default_optimizer
from theano.gof.cc import OpWiseCLinker
from theano.gof.python25 import any
from theano import gof
from theano.configparser import config, AddConfigVar, IntParam
from theano.compile.function_module import FunctionMaker
import_time = time.time()
......@@ -18,44 +19,57 @@ AddConfigVar('ProfileMode.n_ops_to_print',
"Number of ops to print by default",
IntParam(20, lambda i: i > 0))
class Profile_Maker(FunctionMaker):
def create(self, input_storage=None, trustme=False):
ret = super(Profile_Maker,self).create(input_storage, trustme)
for i, node in enumerate(ret.maker.env.toposort()):
self.mode.apply_time[(i,node.op)]=0.0
self.mode.apply_call[(i,node.op)]=0
# self.mode.op_cimpl[node.op] =
return ret
class ProfileMode(Mode):
def __init__(self, linker=default_linker, optimizer=default_optimizer):
local_time = [0.0]
apply_time = {}
apply_call = {}
op_time = {}
op_cimpl = {}
op_call = {}
compile_time = 0 #time passed in theano.function()
fct_call_time = {}#time passed inside theano fct call including op time.
fct_call = {}
self.__setstate__((linker, optimizer, local_time,
apply_time, apply_call,
op_time, op_cimpl, op_call,
op_cimpl,
compile_time, fct_call_time, fct_call))
def function_maker(self, i,o,m, *args, **kwargs):
"""Return an instance of `Profiler_Maker` which init the count"""
assert m is self
return Profile_Maker(i, o, self, *args, **kwargs)
def __getstate__(self):
#print "__getstate__",self.provided_linker,self.provided_optimizer
return (self.provided_linker, self.provided_optimizer, self.local_time,
self.apply_time, self.apply_call,
self.op_time, self.op_cimpl, self.op_call, self.compile_time, self.fct_call_time, self.fct_call)
self.op_cimpl, self.compile_time, self.fct_call_time, self.fct_call)
def __setstate__(self, (linker, optimizer, local_time,
apply_time, apply_call,
op_time, op_cimpl, op_call,
op_cimpl,
compile_time, fct_call_time, fct_call)):
self.local_time = local_time
self.apply_time = apply_time
self.apply_call = apply_call
self.op_time = op_time
self.op_cimpl = op_cimpl
self.op_call = op_call
self.compile_time = compile_time
self.fct_call_time = fct_call_time
self.fct_call = fct_call
self.call_time = 0
self.fn_time = 0
def blah(i, node, th):
if hasattr(th, 'cthunk'):
......@@ -63,7 +77,7 @@ class ProfileMode(Mode):
failure = run_cthunk(th.cthunk)
dt = time.time() - t0
if failure:
raise RuntimeError(('A C Op raised an exception. PerformLinker cannot'
raise RuntimeError(('A C Op raised an exception. PROFILE_MODE cannot'
' tell you what it was though. Use a standard mode such as'
' FAST_RUN_NOGC to correct the problem.'))
else:
......@@ -72,11 +86,9 @@ class ProfileMode(Mode):
dt = time.time() - t0
local_time[0] += dt
apply_time[(i,node.op)] = apply_time.get((i,node.op), 0.0) + dt
apply_call[(i,node.op)] = apply_call.get((i,node.op), 0) + 1
op_time[node.op] = op_time.get(node.op, 0.0) + dt
apply_time[(i,node.op)] += dt
apply_call[(i,node.op)] += 1
op_cimpl[node.op] = hasattr(th, 'cthunk')
op_call[node.op] = op_call.get(node.op,0) + 1
self.provided_linker = linker
......@@ -84,7 +96,7 @@ class ProfileMode(Mode):
if isinstance(linker, str) or linker is None:
linker = predefined_linkers[linker]
linker = WrapLinkerMany([linker], [blah])
linker = WrapLinker([linker], blah)
self.linker = linker
if isinstance(optimizer, str) or optimizer is None:
......@@ -113,18 +125,11 @@ class ProfileMode(Mode):
fct_call = self.fct_call
apply_time = self.apply_time
apply_call = self.apply_call
op_time = self.op_time
op_call = self.op_call
op_cimpl = self.op_cimpl
op_flops = {}
for a,t in op_time.items():
if hasattr(a,'flops'):
op_flops[a]=a.flops*op_call[a]/t/1e6
self.print_summary_("print_summary",local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl,
op_flops, n_apply_to_print, n_ops_to_print)
apply_time, apply_call, op_cimpl,
n_apply_to_print, n_ops_to_print)
def print_diff_summary(self, other, n_apply_to_print=15, n_ops_to_print=20):
......@@ -153,42 +158,23 @@ class ProfileMode(Mode):
r[a]+=t
return r
def diff_dict_flops(a_time,b_time_,a_call,b_call):
flops = {}
b_time = copy.copy(b_time_)
for a,ta in a_time.items():
tb = b_time.pop(a,0)
if hasattr(a,'flops'):
flops[a]=a.flops*a_call[a]/ta - a.flops*b_call[a]/tb/1e6
#they are missing in a
for b,tb in b_time.items():
if hasattr(b,'flops'):
flops[b]=b.flops*b_call[b]/tb/1e6
return flops
local_time = self.local_time[0]-other.local_time[0]
compile_time = self.compile_time-other.compile_time
fct_call_time = diff_dict(self.fct_call_time,other.fct_call_time)
fct_call = diff_dict(self.fct_call,other.fct_call)
apply_time = diff_dict(self.apply_time, other.apply_time)
apply_call = diff_dict(self.apply_call, other.apply_call)
op_time = diff_dict(self.op_time, other.op_time)
op_call = diff_dict(self.op_call, other.op_call)
op_cimpl = self.op_cimpl and other.op_cimpl
op_flops = diff_dict_flops(self.op_time, other.op_time, self.op_call, other.op_call)
self.print_summary_("print_diff_summary",local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl,
op_flops, n_apply_to_print=n_apply_to_print,
apply_time, apply_call, op_cimpl,
n_apply_to_print=n_apply_to_print,
n_ops_to_print=n_ops_to_print, print_apply=False)
@staticmethod
def print_summary_(fct_name, local_time, compile_time, fct_call_time, fct_call,
apply_time, apply_call, op_time, op_call, op_cimpl,
op_flops=None, n_apply_to_print=15, n_ops_to_print=20, print_apply=True):
apply_time, apply_call, op_cimpl,
n_apply_to_print=15, n_ops_to_print=20, print_apply=True):
"""
do the actual printing of print_summary and print_diff_summary.
......@@ -218,6 +204,19 @@ class ProfileMode(Mode):
sum(f for f, t, a, nb_call in atimes[n_apply_to_print:])*100,
sum(t for f, t, a, nb_call in atimes[n_apply_to_print:]))
op_time = {}
op_call = {}
for (i,a),t in apply_time.items():
op_time.setdefault(a,0)
op_call.setdefault(a,0)
op_time[a]+=t
op_call[a]+=apply_call[(i,a)]
op_flops = {}
for a,t in op_time.items():
if hasattr(a,'flops'):
op_flops[a]=a.flops*op_call[a]/t/1e6
flops_msg=''
if op_flops:
flops_msg=' <MFlops/s>'
......
......@@ -544,35 +544,20 @@ class Test_check_isfinite(unittest.TestCase):
theano.tensor.TensorType.filter_checks_isfinite = self.old_val
def test_check_isfinite(self):
x = theano.tensor.dvector()
x = theano.tensor.vector()
f = theano.function([x], (x+2) * 5, mode='DEBUG_MODE')
g = theano.function([x], theano.tensor.log(x), mode='DEBUG_MODE')
# this should work
f(numpy.log([3, 4, 5]))
# this should raise InvalidValueError
try:
# insert a NaN
f(numpy.log([3, -4, 5]))
assert False
except debugmode.InvalidValueError:
pass
# this should raise InvalidValueError
try:
# insert an Nan and Inf
f(numpy.asarray([0, 1.0, 0])/0)
assert False
except debugmode.InvalidValueError:
pass
# passing an invalid value as an input should trigger ValueError
self.failUnlessRaises(ValueError, f, numpy.log([3, -4, 5]))
self.failUnlessRaises(ValueError, f, numpy.asarray([0, 1.0, 0])/0)
self.failUnlessRaises(ValueError, f, numpy.asarray([1.0, 1.0, 1.0])/0)
# this should raise InvalidValueError
try:
# insert several Inf
f(numpy.asarray([1.0, 1.0, 1.0])/0)
assert False
except debugmode.InvalidValueError:
pass
# generating an invalid value internally should trigger InvalidValueError
self.failUnlessRaises(debugmode.InvalidValueError, g, [3,-4,5])
# this should disable the exception
theano.tensor.TensorType.filter_checks_isfinite = False
......
......@@ -14,11 +14,12 @@ THEANO_FLAGS=os.getenv("THEANO_FLAGS","")
# [section.]option[=value] entries. If the section part is omited, their should be only one
# section with that contain the gived option.
# THEANORC=~/.theanorc:~lisa/.theanorc
# THEANORC can contain a colon-delimited list of config files, like
# THEANORC=~lisa/.theanorc:~/.theanorc
# In that case, definitions in files on the right (here, ~/.theanorc) have
# precedence over those in files on the left.
def config_files_from_theanorc():
rval = [os.path.expanduser(s) for s in os.getenv('THEANORC', '~/.theanorc').split(':')]
rval.reverse()
print "THEANORC", rval
return rval
theano_cfg = ConfigParser.SafeConfigParser()
theano_cfg.read(config_files_from_theanorc())
......@@ -42,14 +43,15 @@ def fetch_val_for_key(key):
"""Return the overriding config value for a key.
A successful search returs a string value.
An unsuccessful search raises a KeyError
The priority order is:
The (decreasing) priority order is:
- THEANO_FLAGS
- ~./theanorc
"""
# first try to find it in the FLAGS
rval = None
for name_val in THEANO_FLAGS.split(','):
if not name_val:
continue
......@@ -60,7 +62,12 @@ def fetch_val_for_key(key):
name, val = name_val_tuple
if name == key:
return val
# rval might be overriden by a later definition in THEANO_FLAGS
rval = val
# If an rval is found, it should be a string
if rval is not None:
return rval
# next try to find it in the config file
......@@ -77,7 +84,7 @@ def fetch_val_for_key(key):
return theano_cfg.get(section, option)
except (ConfigParser.NoOptionError, ConfigParser.NoSectionError):
raise KeyError(key)
class TheanoConfigParser(object):
#properties are installed by AddConfigVar
......@@ -143,7 +150,7 @@ class ConfigParam(object):
self.val = val
deleter=None
class EnumStr(ConfigParam):
def __init__(self, default, *options):
self.default = default
......
......@@ -222,7 +222,7 @@ class PureType(object):
try:
self.filter(a, True)
return True
except TypeError:
except (TypeError, ValueError):
return False
def make_variable(self, name = None):
......
......@@ -18,18 +18,22 @@ def _asarray(a, dtype=None, order=None):
Currently, this issue has only been causing trouble when the target
data type is 'int32', on some computers. As a result, this is the only
situation where we do more than a simple call to ``numpy.asarray``. If it
turns out that a similar problem can occur for more data type, this
situation where we may do more than a simple call to ``numpy.asarray``. If
it turns out that a similar problem can occur for more data type, this
function should be updated accordingly.
This function's name starts with a '_' to indicate that it is meant to be
used internally. It is imported so as to be available directly through
theano._asarray
"""
dtype = numpy.dtype(dtype) # Convert into dtype object.
rval = numpy.asarray(a, dtype=dtype, order=order)
if dtype is numpy.int32 or dtype == 'int32':
# Make sure the type is properly set to the correct type.
return rval.view(dtype=numpy.int32)
numpy_int32 = numpy.dtype(numpy.int32)
if (dtype is numpy_int32 and rval.dtype is not numpy_int32):
# Enfore the numpy.int32 dtype.
return rval.view(dtype=numpy_int32)
else:
# Using ``numpy.asarray`` should work just fine.
# Debug assert if we want to detect other failure cases (untested):
# assert rval.dtype is dtype
return rval
差异被折叠。
......@@ -5,14 +5,15 @@ from theano import config
import logging, copy
_logger_name = 'theano.sandbox.cuda'
_logger = logging.getLogger(_logger_name)
_logger.setLevel(logging.INFO)
_logger.addHandler(logging.StreamHandler())
_logger.setLevel(logging.WARNING)
def error(*msg):
_logger.warning('ERROR (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def warning(*msg):
_logger.warning(_logger_name+'WARNING: '+' '.join(str(m) for m in msg))
_logger.warning('WARNING (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def info(*msg):
_logger.info(_logger_name+'INFO: '+' '.join(str(m) for m in msg))
_logger.warning('INFO (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
def debug(*msg):
_logger.debug(_logger_name+'DEBUG: '+' '.join(str(m) for m in msg))
_logger.warning('DEBUG (%s): '% ( _logger_name, ' '.join(str(m) for m in msg)))
# Compile cuda_ndarray.cu
......@@ -63,23 +64,32 @@ if not compile_cuda_ndarray:
except ImportError:
compile_cuda_ndarray = True
if compile_cuda_ndarray:
import nvcc_compiler
if not nvcc_compiler.is_nvcc_available():
set_cuda_disabled()
try:
if compile_cuda_ndarray:
import nvcc_compiler
if not nvcc_compiler.is_nvcc_available():
set_cuda_disabled()
if enable_cuda:
code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read()
if enable_cuda:
code = open(os.path.join(cuda_path, "cuda_ndarray.cu")).read()
if not os.path.exists(cuda_ndarray_loc):
os.makedirs(cuda_ndarray_loc)
if not os.path.exists(cuda_ndarray_loc):
os.makedirs(cuda_ndarray_loc)
nvcc_compiler.nvcc_module_compile_str('cuda_ndarray', code, location = cuda_ndarray_loc,
include_dirs=[cuda_path], libs=['cublas'])
nvcc_compiler.nvcc_module_compile_str('cuda_ndarray', code, location = cuda_ndarray_loc,
include_dirs=[cuda_path], libs=['cublas'])
from cuda_ndarray.cuda_ndarray import *
from cuda_ndarray.cuda_ndarray import *
except Exception, e:
error( "Failed to compile cuda_ndarray.cu: %s" % str(e))
set_cuda_disabled()
if enable_cuda:
#check if their is an old cuda_ndarray that was loading instead of the one we compiled!
import cuda_ndarray.cuda_ndarray
if os.path.join(config.compiledir,'cuda_ndarray','cuda_ndarray.so')!=cuda_ndarray.cuda_ndarray.__file__:
_logger.warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!")
from theano.sandbox.cuda.type import CudaNdarrayType
from theano.sandbox.cuda.var import (CudaNdarrayVariable,
CudaNdarrayConstant,
......@@ -103,7 +113,7 @@ def use(device=config.device):
raise ValueError("Invalid device identifier", device)
if use.device_number is None:
# No successful call to use() has been made yet
if device=="-1" or device=="CPU":
if device<0:
return
if device in [None,""]:
device=0
......@@ -134,6 +144,5 @@ def handle_shared_float32(tf):
else:
raise NotImplementedError('removing our handler')
if enable_cuda and config.device.startswith('gpu'):
use()
......@@ -6,6 +6,13 @@ from theano import config
_logger=logging.getLogger("theano.sandbox.cuda.nvcc_compiler")
_logger.setLevel(logging.WARN)
from theano.configparser import config, AddConfigVar, StrParam
AddConfigVar('nvcc.compiler_bindir',
"if defined, nvcc compiler driver will seek g++ and gcc in this directory",
StrParam(""))
def error(*args):
#sys.stderr.write('ERROR:'+ ' '.join(str(a) for a in args)+'\n')
_logger.error("ERROR: "+' '.join(str(a) for a in args))
......@@ -68,6 +75,8 @@ def nvcc_module_compile_str(module_name, src_code, location=None, include_dirs=[
debug('Generating shared lib', lib_filename)
# TODO: Why do these args cause failure on gtx285 that has 1.3 compute capability? '--gpu-architecture=compute_13', '--gpu-code=compute_13',
cmd = ['nvcc', '-shared', '-g'] + [pa for pa in preargs if pa.startswith('-O')]
if config.nvcc.compiler_bindir:
cmd.extend(['--compiler-bindir', config.nvcc.compiler_bindir])
cmd.extend(['-Xcompiler', ','.join(pa for pa in preargs if not pa.startswith('-O'))])
cmd.extend('-I%s'%idir for idir in include_dirs)
cmd.extend(['-o',lib_filename])
......
......@@ -140,20 +140,20 @@ def test_elemwise1():
b = tensor.fmatrix()
#let debugmode catch any mistakes
print >> sys.stderr, "STARTING FUNCTION 1"
print >> sys.stdout, "STARTING FUNCTION 1"
f = pfunc([b], [], updates=[(a, b**a)], mode=mode_with_gpu)
for i, node in enumerate(f.maker.env.toposort()):
print i, node
f(numpy.random.rand(*shape)+0.3)
print >> sys.stderr, "STARTING FUNCTION 2"
print >> sys.stdout, "STARTING FUNCTION 2"
#let debugmode catch any mistakes
f = pfunc([b], [], updates=[(a, tensor.exp(b**a))], mode=mode_with_gpu)
for i, node in enumerate(f.maker.env.toposort()):
print i, node
f(numpy.random.rand(*shape)+0.3)
print >> sys.stderr, "STARTING FUNCTION 3"
print >> sys.stdout, "STARTING FUNCTION 3"
#let debugmode catch any mistakes
f = pfunc([b], [], updates=[(a, a+b * tensor.exp(b**a))], mode=mode_with_gpu)
f(numpy.random.rand(*shape)+0.3)
......@@ -169,11 +169,11 @@ def test_elemwise2():
f = pfunc([b], [], updates=[(a, (a+b).dimshuffle(pattern))], mode=mode_with_gpu)
has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node
print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise
#let debugmode catch errors
print >> sys.stderr, 'pattern', pattern
print >> sys.stdout, 'pattern', pattern
f(rng.rand(*shape)*.3)
shape = (3,4,5,6)
......@@ -204,7 +204,7 @@ def test_elemwise3():
b**a).dimshuffle([2,0,3,1]))], mode=mode_with_gpu)
has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node
print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise
#let debugmode catch errors
......@@ -220,7 +220,7 @@ def test_elemwise4():
f = pfunc([b,c], [], updates=[(a, (a+b.dimshuffle('x', 0)*c.dimshuffle(0, 'x')))], mode=mode_with_gpu)
has_elemwise = False
for i, node in enumerate(f.maker.env.toposort()):
print >> sys.stderr, i, node
print >> sys.stdout, i, node
has_elemwise = has_elemwise or isinstance(node.op, tensor.Elemwise)
assert not has_elemwise
#let debugmode catch errors
......
......@@ -360,7 +360,7 @@ def test_subsample():
def test_logical_shapes():
# implement when
print >> sys.stderr, "INFO: test_logical_shapes not implemented (i.e. imshp_logical, kshp_logical, kshp_logical_top_aligned)"
print >> sys.stderr, "WARNING TODO: test_logical_shapes not implemented (i.e. imshp_logical, kshp_logical, kshp_logical_top_aligned)"
def _test_dummy():
......
......@@ -8,7 +8,7 @@ if cuda_ndarray.enable_cuda == False:
import numpy
def test_host_to_device():
print >>sys.stderr, 'starting test_host_to_dev'
print >>sys.stdout, 'starting test_host_to_dev'
for shape in ((), (3,), (2,3), (3,4,5,6)):
a = theano._asarray(numpy.random.rand(*shape), dtype='float32')
b = cuda_ndarray.CudaNdarray(a)
......@@ -53,7 +53,7 @@ def test_add():
def test_exp():
print >>sys.stderr, 'starting test_exp'
print >>sys.stdout, 'starting test_exp'
for shape in ((), (3,), (2,3), (1,10000000),(10,1000000), (100,100000),(1000,10000),(10000,1000)):
a0 = theano._asarray(numpy.random.rand(*shape), dtype='float32')
a1 = a0.copy()
......@@ -74,25 +74,25 @@ def test_exp():
def test_copy():
print >>sys.stderr, 'starting test_copy'
print >>sys.stdout, 'starting test_copy'
shape = (5,)
a = theano._asarray(numpy.random.rand(*shape), dtype='float32')
print >>sys.stderr, '.. creating device object'
print >>sys.stdout, '.. creating device object'
b = cuda_ndarray.CudaNdarray(a)
print >>sys.stderr, '.. copy'
print >>sys.stdout, '.. copy'
c = copy.copy(b)
print >>sys.stderr, '.. deepcopy'
print >>sys.stdout, '.. deepcopy'
d = copy.deepcopy(b)
print >>sys.stderr, '.. comparisons'
print >>sys.stdout, '.. comparisons'
assert numpy.allclose(a, numpy.asarray(b))
assert numpy.allclose(a, numpy.asarray(c))
assert numpy.allclose(a, numpy.asarray(d))
def test_dot():
print >>sys.stderr, 'starting test_dot'
print >>sys.stdout, 'starting test_dot'
a0 = theano._asarray(numpy.random.rand(4, 7), dtype='float32')
a1 = theano._asarray(numpy.random.rand(7, 6), dtype='float32')
......@@ -101,7 +101,7 @@ def test_dot():
assert numpy.allclose(numpy.dot(a0, a1), cuda_ndarray.dot(b0, b1))
print >> sys.stderr, 'WARNING test_dot: not testing all 8 transpose cases of dot'
print >> sys.stderr, 'WARNING TODO test_dot: not testing all 8 transpose cases of dot'
def test_sum():
shape = (2,3)
......@@ -147,7 +147,7 @@ def test_reshape():
]
def subtest(shape_1, shape_2):
#print >> sys.stderr, "INFO: shapes", shape_1, shape_2
#print >> sys.stdout, "INFO: shapes", shape_1, shape_2
a = theano._asarray(numpy.random.rand(*shape_1), dtype='float32')
b = cuda_ndarray.CudaNdarray(a)
......
......@@ -147,7 +147,7 @@ class DownsampleFactorMaxGrad(Op):
def c_code_cache_version(self):
return ()
def max_pool2D(input, ds, ignore_border=False):
"""
Takes as input a N-D tensor, where N >= 2. It downscales the input image by
......@@ -166,7 +166,7 @@ def max_pool2D(input, ds, ignore_border=False):
# extract image dimensions
img_shape = input.shape[-2:]
# count the number of "leading" dimensions, store as dmatrix
batch_size = tensor.prod(input.shape[:-2])
batch_size = tensor.shape_padright(batch_size,1)
......
......@@ -1125,7 +1125,7 @@ inv = Inv(upgrade_to_float, name = 'inv')
class Log(UnaryScalarOp):
""" log base e """
def impl(self, x):
return math.log(x)
return numpy.log(x)
def grad(self, (x, ), (gz, )):
if x.type in grad_types:
return gz / x,
......
......@@ -330,6 +330,7 @@ class TensorType(Type):
self.broadcastable = tuple(broadcastable)
self.dtype_specs() # error checking is done there
self.name = name
self.numpy_dtype = numpy.dtype(self.dtype)
if shape is None:
#backport self.shape = tuple((1 if b else None) for b in self.broadcastable)
l=[]
......@@ -360,16 +361,16 @@ class TensorType(Type):
This function is not meant to be called in user code. It is for
`Linker` instances to use when running a compiled graph.
"""
_data = data
if strict:
if (type(data) is numpy.ndarray) and (data.dtype is self.numpy_dtype):
pass # fall through to ndim check
elif strict:
# this is its own subcase that doesn't fall through to anything
if not isinstance(data, numpy.ndarray):
raise TypeError("%s expected a ndarray object.", data, type(data))
if not str(data.dtype) == self.dtype:
raise TypeError("%s expected a ndarray object with dtype = %s (got %s)." % (self, self.dtype, data.dtype))
if not data.ndim == self.ndim:
raise TypeError("%s expected a ndarray object with %s dimensions (got %s)." % (self, self.ndim, data.ndim))
if self.filter_checks_isfinite and (not numpy.all(numpy.isfinite(data))):
raise TypeError("non-finite elements not allowed")
if TensorType.use_shape:
for si, di in zip(self.shape, data.shape):
......@@ -378,11 +379,17 @@ class TensorType(Type):
self, self.shape, data.shape))
return data
else:
data = theano._asarray(data, dtype = self.dtype)
if not self.ndim == data.ndim:
data = theano._asarray(data, dtype = self.dtype) #TODO - consider to pad shape with ones
# to make it consistent with self.broadcastable... like vector->row type thing
if self.ndim != data.ndim:
raise TypeError("Wrong number of dimensions: expected %s, got %s with shape %s." % (self.ndim, data.ndim, data.shape), data)
if any(b and d != 1 for d, b in zip(data.shape, self.broadcastable)):
raise TypeError("Non-unit value on shape on a broadcastable dimension.", data.shape, self.broadcastable)
i = 0
for b in self.broadcastable:
if b and data.shape[i] != 1:
raise TypeError("Non-unit value on shape on a broadcastable dimension.", data.shape, self.broadcastable)
i+=1
if self.filter_checks_isfinite and (not numpy.all(numpy.isfinite(data))):
raise ValueError("non-finite elements not allowed")
return data
def dtype_specs(self):
......@@ -1826,14 +1833,16 @@ class Default(gof.Op):
view_map = {0: [0]}
def make_node(self, x, default):
x, default = as_tensor_variable(x), as_tensor_variable(default)
assert x.type == default.type
if x.type != default.type:
raise TypeError('Both default() arguments must have same type', x, default)
return gof.Apply(self, [x, default], [default.type()])
def perform(self, node, (x, default), (out, )):
if x is None:
out[0] = default.copy()
else:
out[0] = x
#backport out[0] = default.copy() if x is None else x
if x is None:
# why copy? Theano can't yet understand out[0] being a view of either x or y,
# so we can be a view of x, but only a copy of y.
out[0] = default.copy()
else:
out[0] = x
default = Default()
setdefault = default # legacy
......@@ -3588,8 +3597,10 @@ def verify_grad(op, pt, n_tests=2, rng=None, eps=None, tol=None, mode=None, cast
o_fn = function(tensor_pt, o_output)
o_fn_out = o_fn(*[p.copy() for p in pt])
random_projection = rng.rand(*o_fn_out.shape)
# random_projection should not have elements too small,
# otherwise too much precision is lost in numerical gradient
random_projection = rng.rand(*o_fn_out.shape) + 0.5
if cast_to_output_type:
random_projection = numpy.array(random_projection,
dtype=o_output.dtype)
......
......@@ -822,7 +822,14 @@ class CAReduce(Op):
to_reduce = reversed(sorted(axis))
if to_reduce:
for dimension in to_reduce:
variable = self.ufunc.reduce(variable, dimension)
# If it's a zero-size array, use scalar_op.identity if available
if variable.shape[dimension] == 0:
if hasattr(self.scalar_op, 'identity'):
variable = self.scalar_op.identity
else:
raise ValueError("Input (%s) has zero-size on axis %s, but self.scalar_op (%s) has no attribute 'identity'" % (variable, dimension, self.scalar_op))
else:
variable = self.ufunc.reduce(variable, dimension)
output[0] = theano._asarray(variable, dtype = node.outputs[0].type.dtype)
else:
output[0] = numpy.copy(variable)
......
......@@ -133,6 +133,8 @@ class test_CAReduce(unittest.TestCase):
((5, 6), (1, )),
((5, 6), ()),
((2, 3, 4, 5), (0, 1, 3)),
((5, 0), (0, )),
((5, 0), (1, )),
((), ())]:
x = TensorType('float64', [(entry == 1) for entry in xsh])('x')
e = CAReduce(add, axis = tosum)(x)
......@@ -149,7 +151,7 @@ class test_CAReduce(unittest.TestCase):
def test_c(self):
self.with_linker(gof.CLinker())
if __name__ == '__main__':
unittest.main()
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论