I think it's just the 2nd call that's wrong; the calls are to norm2 all based off the same matrix, with the first call being at the base, the 2nd at base+12 and the 3rd at base+24; it looks to me like some of the accesses in the 2nd call are being made to base rather than base+12 and then shared with the loads from the 1st call.
From the rtl dumps, I think ns.c.169r.loop2_done is ok, but ns.c.193r.split2 is bad.
I'm not sure about ns.c.184r.subreg2 ; it looks like it's trying to post increment 12 onto the address but then I suspect it's that post increment that isn't being properly followed through:
and I believe that insn 133/131/134 should be using the value from box+12+8, and insn 135 is uisng the SF 862 which
I believe should be the value from box+12+4 which will only happen if that post inc happens.
I think it's just the 2nd call that's wrong; the calls are to norm2 all based off the same matrix, with the first call being at the base, the 2nd at base+12 and the 3rd at base+24; it looks to me like some of the accesses in the 2nd call are being made to base rather than base+12 and then shared with the loads from the 1st call.
From the rtl dumps, I think ns.c.169r. loop2_done is ok, but ns.c.193r.split2 is bad.
I'm not sure about ns.c.184r.subreg2 ; it looks like it's trying to post increment 12 onto the address but then I suspect it's that post increment that isn't being properly followed through:
------- ------- ------- ------- ------- ------- ------- ------- ------- --
(insn 2430 105 106 2 ns.c:1677 (set (reg/f:SI 571 [ D.9349 ])
(reg/v/f:SI 849 [ box ])) 591 {*thumb2_movsi_vfp} (nil))
(insn 106 2430 107 2 ns.c:1677 (set (reg:SF 861)
(plus: SI (reg/f:SI 571 [ D.9349 ])
( const_int 12 [0xc]))) [0 S4 A32])) 598 {*thumb2_movsf_vfp} (expr_list:REG_INC (reg/f:SI 571 [ D.9349 ])
(mem/s/j:SF (post_modify:SI (reg/f:SI 571 [ D.9349 ])
(nil)))
(insn 107 106 109 2 ns.c:1677 (set (mem/s/j:SF (plus:SI (reg/f:SI 25 sfp)
(const_ int -12 [0xffffffffffff fff4])) [0 box_size+0 S4 A32])
(reg:SF 861)) 598 {*thumb2_movsf_vfp} (nil))
(insn 109 107 110 2 ns.c:1677 (set (reg:SF 862)
(const_ int 4 [0x4])) [0 S4 A32])) 598 {*thumb2_movsf_vfp} (nil))
(mem/s/j:SF (plus:SI (reg/f:SI 571 [ D.9349 ])
------- ------- ------- ------- ------- ------- ------- ------- ------- --
then later we have:
(const_ int 12 [0xc])) [0 S4 A32])) 598 {*thumb2_movsf_vfp} (nil))
(insn 131 129 133 4 vec.h:355 (set (reg:SF 389 [ D.11242 ])
(mem:SF (plus:SI (reg/v/f:SI 849 [ box ])
(insn 133 131 134 4 vec.h:355 (set (reg:SF 396 [ D.11235 ])
(const_ int 8 [0x8])) [0 S4 A32])) 598 {*thumb2_movsf_vfp} (nil))
(mem:SF (plus:SI (reg/f:SI 571 [ D.9349 ])
(note 134 133 135 4 NOTE_INSN_DELETED)
(insn 135 134 136 4 vec.h:355 (set (reg:SF 869) ------- ------- ------- ------- ------- ------- ------- ------- --
(mult:SF (reg:SF 862)
(reg:SF 862))) 615 {*mulsf3_vfp} (expr_list:REG_DEAD (reg:SF 862)
(nil)))
-------
and I believe that insn 133/131/134 should be using the value from box+12+8, and insn 135 is uisng the SF 862 which
I believe should be the value from box+12+4 which will only happen if that post inc happens.
Dave