#FRED: I added both unroll as we don't want ops to be merged if they have different value. Otherwise, the tests for the unroll don't work correctly.
"""These attributes uniquely identify the behaviour of this op for given inputs"""
"""These attributes uniquely identify the behaviour of this op for given inputs"""
#TODO: make the stacksize its own parameter, and make imshp a pair
#TODO: make the stacksize its own parameter, and make imshp a pair
...
@@ -48,12 +47,14 @@ class ConvOp(Op):
...
@@ -48,12 +47,14 @@ class ConvOp(Op):
dx - patch stride rows
dx - patch stride rows
dy - patch stride cols
dy - patch stride cols
out_mode - 'valid', 'full'
out_mode - 'valid', 'full'
unroll_patch - c code generation option
unroll_patch - c code generation option(used when no shape gived)
unroll_batch - c code generation option
unroll_batch - c code generation option
unroll_kern - c code generation option
unroll_kern - c code generation option
verbose - passed to GpuConv
verbose - passed to GpuConv
version - passed to GpuConv
version - passed to GpuConv
If the imshp, kshp, nkern and bsize are provided, we can generate more optimal code. This make a significant difference for the full mode with unroll_patch version.
The reason that this op does the summation over convolutions within the 'stack' is that
The reason that this op does the summation over convolutions within the 'stack' is that
it allows us to be memory-efficient about how gradients are calculated. If, for
it allows us to be memory-efficient about how gradients are calculated. If, for
example, we had a convolution op that took a list of images, a list of kernels, and
example, we had a convolution op that took a list of images, a list of kernels, and
...
@@ -70,16 +71,24 @@ class ConvOp(Op):
...
@@ -70,16 +71,24 @@ class ConvOp(Op):
Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12, May 2008.
Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12, May 2008.
In figure 12, it give the value mr x nr, those value are the optimum to use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
In figure 12, it give the value mr x nr, those value are the optimum to use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
print"OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s) should be 0 or a divisor of nkern(%s)We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_kern),str(self.nkern),new)
print"OPTIMISATION WARNING: in ConvOp.__init__() unroll_kern(%s) should be 0 or a divisor of nkern(%s)We revert it to %d. This won't change the result, but may make it slower."%(str(self.unroll_kern),str(self.nkern),new)