Comment 77 for bug 653714

Revision history for this message
In , Nv28m (nv28m) wrote :

I had the feeling that there's not much chance that the dump alone will help. So I did some debugging on my own, after all it can't be that hard if I have the hardware at hand to play with ;)

First I've collected all writes locations out from the dump, and ignored the part which looked like a frame buffer. I got around 260 locations, I guess most of these must be mapped registers.

Then I did the nvidia boot and kexec to nouveau, when everything works except that the GPU locks after a while.

Anyway, it's stable enough to dump of those 260 registers. Then I made a suspend to RAM, and back. Screen trashing is back of course, but let's dump those registers again.

Comparing the dumps reveals that 30 locations are different now.

I wrote all those registers back as they were before the suspend. Of course console went immediately blank. But I can still start X blindly, which hangs of course, but at least I can move a flawless cursor around. (normally it's trashed as well)

So it must be one of these. I resorted to normal bisecting from here to find transhing/non-trashing set of registers. There were plenty of hangs and reboots ;)

At the end I got what I wanted. The winner is:
nvapoke 00100080 e1000000

This register is e0000000 after suspend or after clean boot. So all this trouble only because of a single bit was not set ;(

The earlier GPU lockup (with kexec nouveau after nvidia driver init) does not happen anymore when I only set this bit, but nothing else. It probably must be due to some other uninitialized state, but I'm not sure if it's worth to find out what it is, after all the default state on boot is ok.

From here I pass the ball back. Please, with this knowledge could someone fix up the driver? Most likely it would be better done on the kernel side as the KMS frame buffer is affected too. Of course testing patches should be no problem ;)

Thanks!