GNU Arm Embedded Toolchain

Bug #1538629
Comment #5

Comment 5 for bug 1538629

Revision history for this message

David Brown (davidbrown) wrote on 2019-06-11:

Since you are using a volatile variable, the code /is/ right - even with the "extra" compare. "Volatile" tells the compiler to avoid optimisations, and generate bigger and slower code. It doesn't make sense in code like this to say the compiler generated unnecessarily inefficient code - anything other than a "bx lr" instruction is inefficient for an empty loop. You don't use a loop like this for a specific timed delay - you use it merely for a "wait a little bit" delay where the exact timing doesn't matter. If the timing is important, use a timer.

A slightly better (IMHO) way to write such loops is:

void delay_cy(unsigned int cy)
{
do
{
asm volatile("");
} while (--cy);
}

This avoids stack and memory accesses, and keeps the loop shorter and simpler. It also minimises the volatile accesses, which keeps things more consistent.

Having said all that, there does appear to be an issue with code generation in loops for the M0 and M1. It is more general than just this case, but the loop here makes it clear. When compiled with "-mcpu=cortex-m1", or "m0" or "m0plus", the compiler generates an extra "cmp" instruction. When compiled with "m3", "m4" or "m7", this extra "cmp" is not there.

So while this should not be an issue for such volatile delay code, it /is/ an issue in more general code generation for the M0 and M1. And it happens on a range of loops, and still happens on relatively recent gcc 8.2 version.