This is a list of comma-delimited key[=value] pairs that control Theano's behavior. A key that appears without an '=value' must be for a boolean value, and it acts as setting it to True.
This is a list of comma-delimited key[=value] pairs that control
Theano's behavior. A key that appears without an '=value' must be
for a boolean value, and it acts as setting it to True.
For example, in bash, you can override your :envvar:`THEANORC` defaults
vlen = 10 * 30 * 768 # 10 x #cores x # threads per core
iters = 1000
rng = numpy.random.RandomState(22)
...
...
@@ -74,28 +74,31 @@ The program just computes the exp() of a bunch of random numbers.
Note that we use the `shared` function to
make sure that the input `x` are stored on the graphics device.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 3 seconds, whereas on the GPU it takes just over 0.2 seconds. Note that the results are close but not identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
If I run this program (in thing.py) with device=cpu, my computer takes a little over 7 seconds,
whereas on the GPU it takes just over 0.4 seconds. Note that the results are close but not
identical! The GPU will not always produce the exact same floating-point numbers as the CPU.
As a point of reference, a loop that calls ``numpy.exp(x.value)`` also takes about 7 seconds.
This Op implement the convolution of a kernel(tensor 4d,(nkern, stacksize, nb row, nb col)) on an image(tensor 4d, (batchsize, stacksize, nb row, nb col). The batch size is multiple image that we want to apply the same kernel over. The nkern is numtiple kernel that we want to apply to each image. The stack size is mostly used when their is multiple layer in the network. It is the sum of the convolution of multiple 2d image and kernel.
This Op implement the convolution of a kernel(tensor 4d,(nkern, stacksize, nb row, nb
col)) on an image(tensor 4d, (batchsize, stacksize, nb row, nb col). The batch size is
multiple image that we want to apply the same kernel over. The nkern is numtiple kernel
that we want to apply to each image. The stack size is mostly used when their is
multiple layer in the network. It is the sum of the convolution of multiple 2d image
and kernel.
The reason that this op does the summation over convolutions within the 'stack' is that
it allows us to be memory-efficient about how gradients are calculated. If, for
...
...
@@ -89,14 +106,22 @@ class ConvOp(Op):
point) then we would have to sum over a potentially very large tensor to get the
gradient on the filters.
If the imshp, kshp, nkern and bsize are provided, we can generate more optimal code. This make a significant difference for the full mode with unroll_patch version.
The most frequent faster code currently available on 64_x86 computer is unroll_batch=4, unroll_kern=4, unroll_patch=False and this request that all the optional shape information are gived. Those number are empirically tested and backed up by the article: Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12, May 2008. It is in figure 12, it give the value mr x nr, those value are the optimum to use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
If the imshp, kshp, nkern and bsize are provided, we can generate more optimal code.
This make a significant difference for the full mode with unroll_patch version. The
most frequent faster code currently available on 64_x86 computer is unroll_batch=4,
unroll_kern=4, unroll_patch=False and this request that all the optional shape
information are gived. Those number are empirically tested and backed up by the
article: Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert
A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12,
May 2008. It is in figure 12, it give the value mr x nr, those value are the optimum to
use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other
architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
:type out_mode: string
:param out_mode: 'valid'(give an output smaller then the image, 'full'(give an output bigger then the image)
:param out_mode: 'valid'(give an output smaller then the image, 'full'(give an output
bigger then the image)
optional parameter(if provided will be used to generate more optinal c code):
optional parameters: (will generate more optimal c code)
:type imshp: tuple of len 2 or 3: 2 for 2d image, 3 for a stack of 2d images.
:param imshp: (stacksize, nb image row, nb image col)
...
...
@@ -113,13 +138,17 @@ class ConvOp(Op):
param to select the version of code used:
:type unroll_patch: bool
:param unroll_patch: use a version of c_code that unroll the patch loop that don't request all shape information to work, but if all shape information are present, will use it to hardcode the value in the code for faster code.
:param unroll_patch: use a version of c_code that unroll the patch loop that don't
request all shape information to work, but if all shape information are present, will
use it to hardcode the value in the code for faster code.
:type unroll_batch:int
:param unroll_batch: use a version of c_code that unroll the batch(by unroll_batch) and the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern respectively.
:param unroll_batch: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
respectively.
:type unroll_kern:int
:param unroll_kern: use a version of c_code that unroll the batch(by unroll_batch) and the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern respectively.
:param unroll_kern: use a version of c_code that unroll the batch(by unroll_batch) and
the nkern(by unroll_kern) loop. The size must by a multiple of bsize or nkern
print"OPTIMISATION WARNING: in ConvOp.__init__() unroll_batch(%s) must be 0 or a divisor of bsize(%s). We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_batch),str(self.bsize),new)
print"OPTIMISATION WARNING: in ConvOp.__init__() unroll_batch(%s)"\
"must be 0 or a divisor of bsize(%s). We revert it to %d. This"\
"won't change the result, but may make it slower."%\
(str(self.unroll_batch),str(self.bsize),new)
self.unroll_batch=new
ifself.unroll_kern>0andself.nkern%unroll_kern!=0:
ifself.nkern<=self.unroll_kern:
self.unroll_kern=self.nkern
else:
...
...
@@ -192,22 +238,29 @@ class ConvOp(Op):
assert(new>=1)
whileself.nkern%new!=0:
new-=1
print"OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s) should be 0 or a divisor of nkern(%s)We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_kern),str(self.nkern),new)
print"OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s)"\
"should be 0 or a divisor of nkern(%s)We revert it to %d."\
"This won't change the result, but may make it slower."\
raiseException("ERROR: We disable ConvOp.grad now when dx!=1 or dy!=1 as we think their is a high probability of bug in it. We need to raise the error on the gradient to .1!")
raiseException("ERROR: We disable ConvOp.grad now when dx!=1 or "\
"dy!=1 as we think their is a high probability of bug in it."\
"We need to raise the error on the gradient to .1!")
raiseException("ConvOp.grad when dx!=1 or dy!=1 we must have all the optional shape information")
raiseException("ConvOp.grad when dx!=1 or dy!=1 we must have all "\
"the optional shape information")
grad_hack_necessary=False
ifgrad_hack_necessary:
...
...
@@ -411,6 +474,7 @@ class ConvOp(Op):
kshp=None
un_p=self.unroll_patch
imshp_logical=None
ifself.out_mode=='valid':
(img,filters)=(newin,newgz)
kshp_logical=self.fulloutshp
...
...
@@ -445,13 +509,17 @@ class ConvOp(Op):
un_b=bsize
else:
un_b=1
print"OPTIMISATION WARNING: in ConvOp.grad() we can't determine a good unroll value for the batch. Maybe you can optimize this!",bsize,un_b,self.unroll_batch,self.unroll_kern
print"OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the batch. Maybe you can optimize this!",\
bsize,un_b,self.unroll_batch,self.unroll_kern
ifun_k!=0andnkern%un_k!=0:
ifnkern<un_k:
un_k=nkern
else:
un_k=1
print"OPTIMISATION WARNING: in ConvOp.grad() we can't determine a good unroll value for the kernel. Maybe you can optimize this!"
print"OPTIMISATION WARNING: in ConvOp.grad() we can't determine "\
"a good unroll value for the kernel. Maybe you can optimize this!"
_logger.warning("WARNING: cuda_ndarray was loaded from",cuda_ndarray.cuda_ndarray.__file__,"This is not expected as theano should compile it automatically for you. Do you have a directory called cuda_ndarray in your LD_LIBRARY_PATH environment variable? If so, please remove it as it is outdated!")