GNU Arm Embedded Toolchain

Missing optimization with predecrement in while-loop

Bug #1538629 reported by demiurg_spb on 2016-01-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	GNU Arm Embedded Toolchain	New	Undecided	Unassigned

Bug Description

foo.c:

void delay_cy(volatile unsigned int cy)
{
do
{
;
} while (--cy);
}

arm-none-eabi-gcc -S -mcpu=cortex-m1 -mthumb -msoft-float -Os foo.c

.L2:
ldr r3, [sp, #4]
subs r3, r3, #1
str r3, [sp, #4]
cmp r3, #0 - this operation is dummy because of SUBS allready sets status regiser correctly
bne .L2

.ident "GCC: (GNU Tools for ARM Embedded Processors) 5.2.1 20151202 (release) [ARM/embedded-5-branch revision 231848]"

Revision history for this message

David Brown (davidbrown) wrote on 2019-06-10:

I think this is an effect of the counter being "volatile". The line "while (--cy)" is being handled as two operations - first "cy" is decremented, then it is compared to 0. Since cy is marked volatile, these operations are not combined.

The C standards are not very clear on exactly how such code should be interpreted - I believe it is right for the compiler to be a bit conservative here.

Revision history for this message

demiurg_spb (demiurg-spb-h) wrote on 2019-06-11:

t is absolutely pointless. As the value in the r3 register does not reload again before the comparison operation with zero.

Revision history for this message

demiurg_spb (demiurg-spb-h) wrote on 2019-06-11:

just checked on the:
.ident "GCC: (15:7-2018-q2-6) 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]"
All the same.

Revision history for this message

demiurg_spb (demiurg-spb-h) wrote on 2019-06-11:

The volatile-qualifier forces the r3 register to reload at the very beginning of the cycle and saves the r3 register at the end of the cycle. This is absolutely right.

Revision history for this message

David Brown (davidbrown) wrote on 2019-06-11:

Since you are using a volatile variable, the code /is/ right - even with the "extra" compare. "Volatile" tells the compiler to avoid optimisations, and generate bigger and slower code. It doesn't make sense in code like this to say the compiler generated unnecessarily inefficient code - anything other than a "bx lr" instruction is inefficient for an empty loop. You don't use a loop like this for a specific timed delay - you use it merely for a "wait a little bit" delay where the exact timing doesn't matter. If the timing is important, use a timer.

A slightly better (IMHO) way to write such loops is:

void delay_cy(unsigned int cy)
{
do
{
asm volatile("");
} while (--cy);
}

This avoids stack and memory accesses, and keeps the loop shorter and simpler. It also minimises the volatile accesses, which keeps things more consistent.

Having said all that, there does appear to be an issue with code generation in loops for the M0 and M1. It is more general than just this case, but the loop here makes it clear. When compiled with "-mcpu=cortex-m1", or "m0" or "m0plus", the compiler generates an extra "cmp" instruction. When compiled with "m3", "m4" or "m7", this extra "cmp" is not there.

So while this should not be an issue for such volatile delay code, it /is/ an issue in more general code generation for the M0 and M1. And it happens on a range of loops, and still happens on relatively recent gcc 8.2 version.

Revision history for this message

demiurg_spb (demiurg-spb-h) wrote on 2019-06-11:

You are absolutely right. This problem has nothing to do with the use of a volatile-qualifier. This is a more common problem. I published a minimal test case that I was able to invent.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.