GNU Arm Embedded Toolchain

Bug #1722849
Comment #7

Comment 7 for bug 1722849

Revision history for this message

David Brown (davidbrown) wrote on 2017-10-12:

I agree that it is surprising that "asm volatile" statements can be re-arranged with respect to each other, and with respect to other volatile accesses. This seems to be a particular problem with asm statements that return outputs - "volatile" is used primarily to tell the compiler that you can get different outputs at different times, even if the inputs (if any) are the same. For asm statements with no outputs, the compiler appears to assume they have a special function and should not be moved.

As far as I know, there is no way in C or as a gcc extension to specify ordering of statements or executable code - you can only specify ordering of memory (via volatile accesses). Even the traditional method of declaring a function (like "foo" in the sample) externally is in danger - with link-time optimisation, the compiler knows everything and can re-arrange bits of "foo" with respect to the asm statements or volatile accesses.

A related problem is well documented for the AVR gcc port:

http://www.nongnu.org/avr-libc/user-manual/optimization.html

There is, however, a solution to all this. (I have told the avr-libc folks about it a number of times, but they have not added my solution to their webpage. I have given up trying to persuade them.)

I have three macros defined that I use in circumstances like these:

#define forceDependency(val) \
asm volatile("" :: "" (val) : )

#define forgetCompilerKnowledge(v) \
asm ("" : "+g" (v))

#define forgetCompilerBlock(start, size) \
    do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \
      asm ("" : "+m" (*p)); \
    } while (0);

The first one tells the compiler "I am using "val" here, so you have to evaluate it before this point". The second one tells the compiler "I am using "val" here, and I might change it, so you have to evaluate it before this point, but forget anything you know about it". The third one is just another version that can handle data of any size.

Putting "forceDependency(status)" after the "mrs" instruction, but before the "foo()" call, ensures that the compiler has evaluated "status" before calling foo. It makes most sense to put it before the "cpsid i" instruction, but that does not appear to be critical. The neatest arrangement is to combine it with the cpsid:

    uint32_t status;
    asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: );
    asm volatile ("cpsid i" :: "" (status) :);

foo();

asm volatile ("msr PRIMASK, %0" :: "r" (status) : );

I agree that it is surprising that "asm volatile" statements can be re-arranged with respect to each other, and with respect to other volatile accesses.  This seems to be a particular problem with asm statements that return outputs - "volatile" is used primarily to tell the compiler that you can get different outputs at different times, even if the inputs (if any) are the same.  For asm statements with no outputs, the compiler appears to assume they have a special function and should not be moved.

As far as I know, there is no way in C or as a gcc extension to specify ordering of statements or executable code - you can only specify ordering of memory (via volatile accesses).  Even the traditional method of declaring a function (like "foo" in the sample) externally is in danger - with link-time optimisation, the compiler knows everything and can re-arrange bits of "foo" with respect to the asm statements or volatile accesses.

A related problem is well documented for the AVR gcc port:

http://www.nongnu.org/avr-libc/user-manual/optimization.html

There is, however, a solution to all this.  (I have told the avr-libc folks about it a number of times, but they have not added my solution to their webpage.  I have given up trying to persuade them.)

I have three macros defined that I use in circumstances like these:

#define forceDependency(val) \
                asm volatile("" :: "" (val) : )

#define forgetCompilerKnowledge(v) \
                asm ("" : "+g" (v))

#define forgetCompilerBlock(start, size) \
    do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \
      asm ("" : "+m" (*p)); \
    } while (0);

The first one tells the compiler "I am using "val" here, so you have to evaluate it before this point".  The second one tells the compiler "I am using "val" here, and I might change it, so you have to evaluate it before this point, but forget anything you know about it".  The third one is just another version that can handle data of any size.

Putting "forceDependency(status)" after the "mrs" instruction, but before the "foo()" call, ensures that the compiler has evaluated "status" before calling foo.  It makes most sense to put it before the "cpsid i" instruction, but that does not appear to be critical.  The neatest arrangement is to combine it with the cpsid:

uint32_t status;
    asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: );
    asm volatile ("cpsid i" :: "" (status) :);

foo();

asm volatile ("msr PRIMASK, %0" :: "r" (status) : );