提交 7081a0d9 authored 作者: Frederic Bastien's avatar Frederic Bastien

changed default value of ConvOp.unroll_batch and ConvOp.unroll_kern to 4 as this…

changed default value of ConvOp.unroll_batch and ConvOp.unroll_kern to 4 as this is the best value for x86_64 computers.
上级 5170cb7c
...@@ -28,8 +28,9 @@ class ConvOp(Op): ...@@ -28,8 +28,9 @@ class ConvOp(Op):
#TODO: make the stacksize its own parameter, and make imshp a pair #TODO: make the stacksize its own parameter, and make imshp a pair
def __init__(self, imshp, kshp, nkern, bsize, dx, dy, output_mode='valid', unroll_batch=0, def __init__(self, imshp, kshp, nkern, bsize, dx, dy, output_mode='valid',
unroll_kern=0, unroll_batch=4,
unroll_kern=4,
imshp_logical=None, imshp_logical=None,
kshp_logical=None, kshp_logical=None,
kshp_logical_top_aligned=True): kshp_logical_top_aligned=True):
...@@ -57,6 +58,10 @@ class ConvOp(Op): ...@@ -57,6 +58,10 @@ class ConvOp(Op):
unroll_batch. If >0 will use a version that will unroll the batch loop by the value of the option. By default don't use this version of the code. unroll_batch. If >0 will use a version that will unroll the batch loop by the value of the option. By default don't use this version of the code.
unroll_nkern. idem as unroll_batch but unroll the kernel loop. unroll_nkern. idem as unroll_batch but unroll the kernel loop.
The version is with unroll_batch=4 and unroll_nkern if possible(currenctly it don't support logical shape != physical shape) as this is what give the best performance in practice. This also tell that to have the best performance, you should have a batch size and a number of kernel multiple of 4. In the article:
Anatomy of High-Performance Matrix Multiplication by Kazushige Goto and Robert A. Van De Geijn, ACM Transactions on Mathematical Software, vol 34, No. 3, article 12, May 2008.
In figure 12, it give the value mr x nr, those value are the optimum to use for unroll_batch and unroll_kern. For x86_64 bits computer it is 4x4. Other architecture can have different value.(2x4 for x86, 8x8 for itanium,...)
""" """
imshp = tuple(imshp) imshp = tuple(imshp)
if len(imshp)==2: if len(imshp)==2:
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论