Besides cleanup code, all code has access to the %(fail)s template. For three code blocks, the generated C code will pretty much look like this:
{{{
int failure = 0;
{
<code1>
{
<code2>
.. code-block::
int failure = 0;
{
<code3>
label3:
<cleanup3>
<code1>
{
<code2>
{
<code3>
label3:
<cleanup3>
}
label2:
<cleanup2>
}
label1:
<cleanup1>
}
label2:
<cleanup2>
}
label1:
<cleanup1>
}
return failure;
}}}
return failure;
And %(fail)s in the nth code block will take the value "{failure = n; goto label<n>;}". This means only the blocks executed up to the failure point are cleaned up and the return value indicates which block failed, which is handy for debugging.
When compiling an Op, we want to sync the outputs so we can get the results from Python. In case of failure, we will not necessarily want to sync. Because of that, typical code will look like this:
{{{
int failure = 0;
<declare input>
<declare output>
{
<extract input>
{
<extract output>
.. code-block::
int failure = 0;
<declare input>
<declare output>
{
<perform>
label3:
<clean up perform>
<extract input>
{
<extract output>
{
<perform>
label3:
<clean up perform>
}
label2:
if (!failure)
<sync output>
<clean up output>
}
label1:
<clean up input>
}
label2:
if (!failure)
<sync output>
<clean up output>
}
label1:
<clean up input>
}
return failure;
}}}
return failure;
Furthermore, is not necessary to extract the output because we mean to overwrite it anyway. In that case, <extract output> will be a no-op, but of course we may still need to clean up or sync what <perform> will put in the declared outputs.
...
...
@@ -124,20 +122,19 @@ Example ResultBase
The following ResultBase represents a double (we only care about the C part).
@@ -33,27 +33,26 @@ Question: does it make sense to apply the order to the loop, or is this broadcas
Here is the loop for {{{order == c}}}. Check for errors!
{{{
<initialize iterators>
i1 = -1
while (++i1 < dim1) {
i2 = -1
rank_N-1_accumulator = init
while (++i2 < dim2) {
...
iN = -1
while (++iN < dimN) {
<accumulate rank N input>
<SET rank N output using broadcasted inputs>
<NEXT rank N iterator>
.. code-block::
<initialize iterators>
i1 = -1
while (++i1 < dim1) {
i2 = -1
rank_N-1_accumulator = init
while (++i2 < dim2) {
...
iN = -1
while (++iN < dimN) {
<accumulate rank N input>
<SET rank N output using broadcasted inputs>
<NEXT rank N iterator>
}
...
}
<SET rank 1 output using accumulated inputs>
<NEXT rank 1 iterator>
}
...
}
<SET rank 1 output using accumulated inputs>
<NEXT rank 1 iterator>
}
}}}
When {{{order == f}}}, the iterators ''ideally'' (but not necessarily) iterate in FORTRAN order, i.e. the while loops are on {{{dimN..dim1}}} instead of {{{dim1..dimN}}}.