Hi All,
In my quest to find ways to prevent the compiler reordering all my operations around timing instructions, I discovered what looks to me like an error.
To prevent code movement I came up with the idea of enclosing the code I did not want to move in an if() block that is always taken but cannot be optimized away. The template is this:
This seems to pretty much prevent any of the // do a bunch of stuff code from floating outside the timing block. I wrapped this up in a macro:
#define BEGIN_COMPILE_BARRIER(name) if(alwaysTrue){
#define END_COMPILE_BARRIER(name) }
(The "name" parameter is for my second implementation which removes the overhead of the if but is not shown here).
However, when I started applying this concept in general, I suddenly got some broken code. It turns out that the error does not seem to be related to my efforts but is more general in nature.
Below is a section of the C code in question.
The error is around the calculation of stimLoopTime.
The generated asm is:
I have commented the relevant areas. You can see that for some reason the compiler has decided to change the order of operations from
status->stimLoopTime=(extraBase-ct)+extraTime;
to
status->stimLoopTime=(extraTime-ct)+extraBase;
This is not really ok to do. extraTime has a value of about 20. ct is a raw CNT clock as is extraBase. So extraTime-ct becomes negative which then wraps to a big positive. Then we add a big positive in the form of extraBase and now our total elapsed time, which should be in the order of 2500 is massive.
If I comment out the BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS) block around extraBase=CNT, the code changes to the following:
It has now recast the math to
status->stimLoopTime=(extraBase+extraTime)-ct;
which is still wrong but at least works in this instance.
But the underlying issue is that it is completely ignoring the ordering specified by the (). I know that compilers can do lots of optimizing, but I don't think they get to redefine things like this. Order of operations involving subtraction is important.
Any thoughts?
Best regards,
Tom
In my quest to find ways to prevent the compiler reordering all my operations around timing instructions, I discovered what looks to me like an error.
To prevent code movement I came up with the idea of enclosing the code I did not want to move in an if() block that is always taken but cannot be optimized away. The template is this:
extern volatile int alwaysTrue; void foo(void) { unsigned ct; unsigned time; ct=CNT; if(alwaysTrue) { // do a bunch of stuff. } time=CNT-ct; }
This seems to pretty much prevent any of the // do a bunch of stuff code from floating outside the timing block. I wrapped this up in a macro:
#define BEGIN_COMPILE_BARRIER(name) if(alwaysTrue){
#define END_COMPILE_BARRIER(name) }
(The "name" parameter is for my second implementation which removes the overhead of the if but is not shown here).
However, when I started applying this concept in general, I suddenly got some broken code. It turns out that the error does not seem to be related to my efforts but is more general in nature.
Below is a section of the C code in question.
BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS); extraBase=CNT; END_COMPILE_BARRIER(MEASURE_CALC_LOADS); if (status) { status->stimLoopTime=(extraBase-ct)+extraTime; // <This goes wrong. data->doneStatus=status; } extraTime=CNT-extraBase;
The error is around the calculation of stimLoopTime.
The generated asm is:
412:cpx_stimdriver.cogc **** BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS); 603 .loc 1 412 0 604 0534 0000BC08 rdlong r7, r12 605 0538 00007CC3 cmps r7, #0 wz,wc 413:cpx_stimdriver.cogc **** extraBase=CNT; 606 .loc 1 413 0 607 053c 00001408 IF_NE wrlong CNT, sp 414:cpx_stimdriver.cogc **** END_COMPILE_BARRIER(MEASURE_CALC_LOADS); 415:cpx_stimdriver.cogc **** if (status) 608 .loc 1 415 0 609 0540 00007CC3 cmps r14, #0 wz,wc 610 0544 0000685C IF_E jmp #.L33 416:cpx_stimdriver.cogc **** { 417:cpx_stimdriver.cogc **** status->stimLoopTime=(extraBase-ct)+extraTime; 611 .loc 1 417 0 612 0548 0000BCA0 mov r7, r14 ; load address of status 613 054c 0000BC84 sub r11, r8 ; subtract ct (in r8) from extraTime (r11) 614 .LVL31 615 0550 1000FC80 add r7, #16 ; calculate address of stimLooptime 616 0554 0000BC08 rdlong r3, sp ; read extraBase back. 617 0558 0000BC80 add r11, r3 ; add extraBase to extraTime-ct. 618 055c 00003C08 wrlong r11, r7 ; write result to stimLoopTime 418:cpx_stimdriver.cogc **** data->doneStatus=status; 619 .loc 1 418 0 620 0560 0000BCA0 mov r7, r13 621 0564 0C00FC80 add r7, #12 622 0568 00003C08 wrlong r14, r7 623 .L33 419:cpx_stimdriver.cogc **** } 420:cpx_stimdriver.cogc **** extraTime=CNT-extraBase; 624 .loc 1 420 0 625 056c 0000BCA0 mov r11, CNT ; save CNT 626 0570 0000BC08 rdlong r5, sp ; read extraBase 627 0574 0000BC84 sub r11, r5 ; calculate extraTime 628 .LVL32 629 .LBE3 421:cpx_stimdriver.cogc **** } 630 .loc 1 421 0 631 0578 00007C5C jmp #.L34
I have commented the relevant areas. You can see that for some reason the compiler has decided to change the order of operations from
status->stimLoopTime=(extraBase-ct)+extraTime;
to
status->stimLoopTime=(extraTime-ct)+extraBase;
This is not really ok to do. extraTime has a value of about 20. ct is a raw CNT clock as is extraBase. So extraTime-ct becomes negative which then wraps to a big positive. Then we add a big positive in the form of extraBase and now our total elapsed time, which should be in the order of 2500 is massive.
If I comment out the BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS) block around extraBase=CNT, the code changes to the following:
412:cpx_stimdriver.cogc **** //BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS); 413:cpx_stimdriver.cogc **** extraBase=CNT; 602 .loc 1 413 0 603 0524 0000BCA0 mov r7, CNT ; save extraBase into r7 604 .LVL33 414:cpx_stimdriver.cogc **** //END_COMPILE_BARRIER(MEASURE_CALC_LOADS); 415:cpx_stimdriver.cogc **** if (status) 605 .loc 1 415 0 606 0528 00007CC3 cmps r14, #0 wz,wc 607 052c 0000685C IF_E jmp #.L32 416:cpx_stimdriver.cogc **** { 417:cpx_stimdriver.cogc **** status->stimLoopTime=(extraBase-ct)+extraTime; 608 .loc 1 417 0 609 0530 0000BCA0 mov r6, r14 ; save address of status 610 0534 0000BC80 add r12, r7 ; add extraTime (r12) and extraBase (r7) 611 .LVL34 612 0538 1000FC80 add r6, #16 ; calculate offset of stimLoopTime 613 053c 0000BC84 sub r12, r8 ; subtract ct from extraTime+extraBase 614 0540 00003C08 wrlong r12, r6 ; save result 418:cpx_stimdriver.cogc **** data->doneStatus=status; 615 .loc 1 418 0 616 0544 0000BCA0 mov r6, r13 617 0548 0C00FC80 add r6, #12 618 054c 00003C08 wrlong r14, r6 619 .L32 419:cpx_stimdriver.cogc **** } 420:cpx_stimdriver.cogc **** extraTime=CNT-extraBase; 620 .loc 1 420 0 621 0550 0000BCA0 mov r12, CNT 622 0554 0000BC84 sub r12, r7 ; calculate extraTime 623 .LVL35 624 .LBE3 421:cpx_stimdriver.cogc **** } 625 .loc 1 421 0 626 0558 00007C5C jmp #.L33
It has now recast the math to
status->stimLoopTime=(extraBase+extraTime)-ct;
which is still wrong but at least works in this instance.
But the underlying issue is that it is completely ignoring the ordering specified by the (). I know that compilers can do lots of optimizing, but I don't think they get to redefine things like this. Order of operations involving subtraction is important.
Any thoughts?
Best regards,
Tom