提交 d3018ad3 authored 作者: Arnaud Bergeron's avatar Arnaud Bergeron

Change the elemwise input fusion limit to 32.

上级 2bfe3c82
...@@ -5453,7 +5453,7 @@ for i in xrange(1,len(p64)): print i, 64[i]-p64[i-1] ...@@ -5453,7 +5453,7 @@ for i in xrange(1,len(p64)): print i, 64[i]-p64[i-1]
# ############### # ###############
# # Loop fusion # # # Loop fusion #
# ############### # ###############
def local_elemwise_fusion_op(OP, max_input_fct=lambda node: 1024, def local_elemwise_fusion_op(OP, max_input_fct=lambda node: 32,
maker=None): maker=None):
""" """
We parametrize it to make it work for Elemwise and GpuElemwise op. We parametrize it to make it work for Elemwise and GpuElemwise op.
...@@ -5468,10 +5468,8 @@ def local_elemwise_fusion_op(OP, max_input_fct=lambda node: 1024, ...@@ -5468,10 +5468,8 @@ def local_elemwise_fusion_op(OP, max_input_fct=lambda node: 1024,
limit how many ops we fuse together to avoid busting limit how many ops we fuse together to avoid busting
that 256 limit. that 256 limit.
On the CPU we limit to 1024 input variable On the CPU we limit to 32 input variables
to the resulting fused op. This is big since that is the maximum numpy support.
enough that if we hit it, I'm not sure it
will affect performance.
""" """
if maker is None: if maker is None:
def maker(node, scalar_op): def maker(node, scalar_op):
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论