On Tue, Apr 13, 2010 at 09:13:49PM -0000, Phillip Susi wrote: > On 4/13/2010 4:30 PM, Launchpad Bug Tracker wrote: > > * SAUCE: sync before umount to reduce time taken by ext4 umount > > - LP: #543617 > > This sounds more like a temporary workaround than a fix of the real bug. > Is that the case and why? Just can't find the real problem, or it will > take too long to fix? I recommended doing a sync in userspace (i.e., in various shutdown scripts and GNOME/KDE desktops) as a temporary workaround because I didn't have time to poke at this before the Lucid release deadlines (which is coming quite rapidly, yes). I guess the Ubuntu kernel team decided it was easier drop a forced sync into the kernel. I haven't examined the patch that they ultimately chose, but presumably it's low risk to be inserted less than two weeks before the final release date of Lucid if it was coded correctly. Me, I'd probably would have stuck the sync in userspace, but I'm super paranoid this close to a "enterprise-quality" release date, which is what the Lucid LTS release purports to be. As far as "trying to find the real problem", if Ubuntu was paying my salary I'd give it more time to find the root cause of this bug, but this is a low priority bug given other things on my plate. Red Hat employs several very high powered file system developers, so they fix a lot more of their own distro-specific bugs. Interestingly, this is something that hasn't shown up as a complaint on Fedora systems. I'm not sure why; the test case Kees provided shows that this is definitely an upstream problem, but apparently something about their choice of desktop components or how they are configured or something about their init/hal scripts means that it's not showing up for their users in practice for some reason. My problem is I'm incredibly and busy at the moment, and I've already done Ubuntu a huge favor by spending ten minutes to do a quickie investigation. Ubuntu needs to learn that it can't rely on upstream developers to jump through flaming hoops on short notice before a LTS release deadline as a cost-saving mechanism to avoid hiring their own senior kernel engineers. So hiring Surbhi is definitely a step in the right direction. (One step on a journey of ten thousand, but a step in the right direction nonetheless. :-) Surbhi will eventually have the experience of folks like Eric Sandeen and Josef Bacik, or Jan Kara at SuSE, and eventually hopefully she'll be able to fix bugs like this quickly. Someone who is an ext4 expert probably could localize this down in less than a day, especially given my "ten minute investigation" to point them in the right direction. The fact that "sync" on the command line causes the right thing to happen, and "umount" with dirty inodes extant, doesn't, is a pretty strong hint of where to look, and no, the root cause is probably not the jbd2 layer as Surbhi has suggested. - Ted P.S. Next thing for Ubuntu to learn --- how to pay their engineers well enough, and how to give them enough time to work on upstream issues, that once they gain that experience on Ubuntu's dime and become well known in the open source community, they don't end jumping ship to companies like Red Hat or Google. :-) On the other hand, if Ubuntu management doesn't learn, that's also OK. Google is hiring. :-)