So it seems my initial theory was incorrect, the mali drivers (at least) do not choke if you submit an improperly-tagged buffer to the hwc, nor does adding the proper bits to the software buffers affect the bug. Rather something in the cursor and observer infrastructures seems to be hanging. Overlays (and in particular, the 2-way fencing where we don't necessarily wait on makecurrent/swapbuffers that mali & powervr has, but adreno doesn't) exacerbate the problem.
So it seems my initial theory was incorrect, the mali drivers (at least) do not choke if you submit an improperly-tagged buffer to the hwc, nor does adding the proper bits to the software buffers affect the bug. Rather something in the cursor and observer infrastructures seems to be hanging. Overlays (and in particular, the 2-way fencing where we don't necessarily wait on makecurrent/ swapbuffers that mali & powervr has, but adreno doesn't) exacerbate the problem.