提交 24401f1c authored 作者: James Bergstra's avatar James Bergstra

Minor mods to Composite optimization

上级 9ca91ae5
...@@ -2236,7 +2236,7 @@ def local_elemwise_fusion_op(OP): ...@@ -2236,7 +2236,7 @@ def local_elemwise_fusion_op(OP):
""" """
def local_fuse(node): def local_fuse(node):
""" """
As part of specialisation, we fusion two consecutif elemwise op of the same shape. As part of specialisation, we fuse two consecutive elemwise op of the same shape.
For mixed dtype, we let the Compise op do the cast. It let the C compile do the cast. For mixed dtype, we let the Compise op do the cast. It let the C compile do the cast.
The number of dimension is validated at call time by theano itself. The number of dimension is validated at call time by theano itself.
...@@ -2269,7 +2269,7 @@ def local_elemwise_fusion_op(OP): ...@@ -2269,7 +2269,7 @@ def local_elemwise_fusion_op(OP):
for i in node.inputs: for i in node.inputs:
do_fusion = False do_fusion = False
catch = False catch = False
if i.owner and isinstance(i.owner.op, OP) and len(i.clients)<=1: if i.owner and isinstance(i.owner.op, OP) and len(i.clients)==1:
#if the scalar_op don't have a c implementation, we skip its fusion to allow the fusion of the other ops. #if the scalar_op don't have a c implementation, we skip its fusion to allow the fusion of the other ops.
do_fusion=True do_fusion=True
try: try:
...@@ -2325,7 +2325,7 @@ def local_elemwise_fusion_op(OP): ...@@ -2325,7 +2325,7 @@ def local_elemwise_fusion_op(OP):
# There is a hard limit of 256 bytes for the formal argument list to a GPU kernel function. # There is a hard limit of 256 bytes for the formal argument list to a GPU kernel function.
# Here, we estimate how many bytes the new Op will need, and abort if it needs too much. # Here, we estimate how many bytes the new Op will need, and abort if it needs too much.
if True: if OP != T.Elemwise:
argument_limit = 240 # 16 bytes are used for block and thread coords etc. argument_limit = 240 # 16 bytes are used for block and thread coords etc.
#TODO: read in from architecture to make this 4 or 8 #TODO: read in from architecture to make this 4 or 8
int_size = 8 int_size = 8
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论