close or truncate of os-locked file gives EIO on some NFSv4 servers

Bug #137387 reported by yml
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Bazaar
Incomplete
Low
Unassigned

Bug Description

I am having some strange error message that decorate the end of the output generated by many of the bzr commands. Here it is a sample:

yml@ssh1:~/workspace/dj_project/dj_survey$ bzr status
modified:
  dj_project/settings.py
unknown:
  dj_project/media/dj_survey/
  web_server_config/apache_modfcgi/admin_media@
  web_server_config/apache_modfcgi/site_media@
bzr: ERROR: [Errno 5] Input/output error
/usr/lib/python2.4/site-packages/bzrlib/lockable_files.py:110: UserWarning: file group LockableFiles(<bzrlib.transport.local.LocalTransport url=file:///nfs/http1/yml/www/workspace/.bzr/checkout/>) was not explicitly unlocked
  warn("file group %r was not explicitly unlocked" % self)
/usr/lib/python2.4/site-packages/bzrlib/lock.py:79: UserWarning: lock on <closed file u'/nfs/http1/yml/www/workspace/.bzr/checkout/dirstate', mode 'rb+' at 0xb757d890> not released
  warn("lock on %r not released" % self.f)
Exception exceptions.NotImplementedError: <exceptions.NotImplementedError instance at 0xb75fa7ec> in <bound method _fcntl_TemporaryWriteLock.__del__ of <bzrlib.lock._fcntl_TemporaryWriteLock object at 0xb75753ec>> ignored
/usr/lib/python2.4/site-packages/bzrlib/lock.py:79: UserWarning: lock on <open file u'/nfs/http1/yml/www/workspace/.bzr/checkout/dirstate', mode 'rb' at 0xb7552ec0> not released
  warn("lock on %r not released" % self.f)

In case the log file would be usefull here it is the last lines of ".bzr.log":

Plugin name __init__ already loaded
Plugin name __init__ already loaded
encoding stdout as sys.stdout encoding 'UTF-8'
return code 0

bzr arguments: [u'status']
looking for plugins in /home/yml/www/.bazaar/plugins
looking for plugins in /usr/lib/python2.4/site-packages/bzrlib/plugins
Plugin name __init__ already loaded
Plugin name __init__ already loaded
encoding stdout as sys.stdout encoding 'UTF-8'
opening working tree '/nfs/http1/yml/www/workspace'
check paths: None
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/bzrlib/commands.py", line 718, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.4/site-packages/bzrlib/commands.py", line 679, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.4/site-packages/bzrlib/commands.py", line 375, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.4/site-packages/bzrlib/commands.py", line 689, in ignore_pipe
    result = func(*args, **kwargs)
  File "/usr/lib/python2.4/site-packages/bzrlib/builtins.py", line 205, in run
    to_file=self.outf, short=short, versioned=versioned)
  File "/usr/lib/python2.4/site-packages/bzrlib/status.py", line 183, in show_tree_status
    wt.unlock()
  File "/usr/lib/python2.4/site-packages/bzrlib/workingtree_4.py", line 1122, in unlock
    self._dirstate.save()
  File "/usr/lib/python2.4/site-packages/bzrlib/dirstate.py", line 1735, in save
    self._lock_token = self._lock_token.restore_read_lock()
  File "/usr/lib/python2.4/site-packages/bzrlib/lock.py", line 244, in restore_read_lock
    self._clear_f()
  File "/usr/lib/python2.4/site-packages/bzrlib/lock.py", line 73, in _clear_f
    self.f.close()
IOError: [Errno 5] Input/output error

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 137387] Strange error message decorating the output of bzr status

Something is wrong in the lock code.

 importance high
 status triaged

Changed in bzr:
importance: Undecided → High
status: New → Triaged
Revision history for this message
yml (yml) wrote : Re: Strange error message decorating the output of bzr status

Hello,
Is there any workaround that will allow me to ignore silently this error?
Thank you
--yml

Revision history for this message
Martin Pool (mbp) wrote :

yml,

Changing the python invocation in the bzr script to -Wignore will hide this - or find the line in lockdir.py that raises this warning, and comment it out.

Revision history for this message
Michael B. Trausch (mtrausch) wrote : Re: UserWarning: lock not released cluttering user output

I seem to be experiencing the same problem with a home directory which is mounted via NFSv4.

Adding -Wignore to the python invocation in the bzr script does not work around the error message; this impedes my use of GNU Emacs at the moment with bzr working copies, because it sees the error return code from bzr and thinks that an error has occurred, so it stops processing. This means that while I can modify and save things in bzr working trees, it's really in my face and annoying.

Here is the fstab line for my /home:
allspice:/home /home nfs4 rw,sec=krb5,proto=tcp 0 0

Attached, please find the output from the bzr status command and the ~/.bzr.log file, after creating a new, empty repository and running status on it. This is 100% reproducible on both an empty and already-populated repository.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 137387] Re: UserWarning: lock not released cluttering user output

File "/usr/lib/python2.5/site-packages/bzrlib/dirstate.py", line 1997,
in save
    self._state_file.truncate()
IOError: [Errno 5] Input/output error

That seems to be the core error - truncate() is failing
on .bzr/checkout/dirstate. Do you have appropriate permissions?

-Rob

Revision history for this message
Michael B. Trausch (mtrausch) wrote : Re: truncate gives EIO on NFSv4

Yes, I do. This is within my home directory, which I own all files in. This includes creating a new, empty bzr repository and trying to just diff or status it.

I can do so on a local filesystem with no issues. It's _definitely_ because my $HOME is mounted over NFSv4. I do not have NFSv3 on my network, so I can't reasonably say if this is NFSv4 specific or not.

Revision history for this message
Martin Pool (mbp) wrote :

Could you please have a look in /var/log/messages or equivalent on the client and see if there is anything relevant? If you can login to the server please look there too.

What os and version are the server and client?

(btw -Wignore will hide the "lock not released" errors, which was the original summary of the bug, but that's an unimportant knock-on effect compared to the io error.)

Revision history for this message
Michael B. Trausch (mtrausch) wrote : Re: [Bug 137387] Re: truncate gives EIO on NFSv4

On Wed, 2008-03-19 at 22:37 +0000, Martin Pool wrote:
> Could you please have a look in /var/log/messages or equivalent on the
> client and see if there is anything relevant? If you can login to the
> server please look there too.

Neither server nor client output anything to /var/log/messages
or /var/log/syslog when doing the following:

mbt@sage:~/foo$ bzr init
mbt@sage:~/foo$ bzr status
bzr: ERROR: [Errno 5] Input/output error
mbt@sage:~/foo$ bzr diff
bzr: ERROR: [Errno 5] Input/output error

(I watched both server and client using 'tail -f' while doing this,
twice, once for /var/log/messages on both and once again
for /var/log/syslog on both.)

>
> What os and version are the server and client?
>

Ubuntu Hardy server, fully updated; Ubuntu Hardy Desktop, fully updated.

> (btw -Wignore will hide the "lock not released" errors, which was the
> original summary of the bug, but that's an unimportant knock-on effect
> compared to the io error.)

-Wignore had zero effect; what you see above in the pasted shell session
is exactly the problem I have been having the entire time. :-(

For whatever reason, I did not think to include the information on the
client or server. Sorry about that... I am usually better about such
things, I think.

If there is any more information that I can provide, I would be happy to
do so.

 --- Mike

--
Michael B. Trausch <email address hidden>
home: 404-592-5746, 1 www.trausch.us
cell: 678-522-7934 im: <email address hidden>, jabber
Ubuntu Unofficial Backports Project: http://backports.trausch.us/

Revision history for this message
Martin Pool (mbp) wrote :
Download full text (5.4 KiB)

I suspect we're provoking an NFS bug. I'd like to get a network trace,
though if Kerberos is doing encryption as well as access that may be
hard.

On 3/20/08, Michael B. Trausch <email address hidden> wrote:
> On Wed, 2008-03-19 at 22:37 +0000, Martin Pool wrote:
> > Could you please have a look in /var/log/messages or equivalent on the
> > client and see if there is anything relevant? If you can login to the
> > server please look there too.
>
> Neither server nor client output anything to /var/log/messages
> or /var/log/syslog when doing the following:
>
> mbt@sage:~/foo$ bzr init
> mbt@sage:~/foo$ bzr status
> bzr: ERROR: [Errno 5] Input/output error
> mbt@sage:~/foo$ bzr diff
> bzr: ERROR: [Errno 5] Input/output error
>
> (I watched both server and client using 'tail -f' while doing this,
> twice, once for /var/log/messages on both and once again
> for /var/log/syslog on both.)
>
> >
> > What os and version are the server and client?
> >
>
> Ubuntu Hardy server, fully updated; Ubuntu Hardy Desktop, fully updated.
>
> > (btw -Wignore will hide the "lock not released" errors, which was the
> > original summary of the bug, but that's an unimportant knock-on effect
> > compared to the io error.)
>
> -Wignore had zero effect; what you see above in the pasted shell session
> is exactly the problem I have been having the entire time. :-(
>
> For whatever reason, I did not think to include the information on the
> client or server. Sorry about that... I am usually better about such
> things, I think.
>
> If there is any more information that I can provide, I would be happy to
> do so.
>
> --- Mike
>
> --
> Michael B. Trausch <email address hidden>
> home: 404-592-5746, 1 www.trausch.us
> cell: 678-522-7934 im: <email address hidden>, jabber
> Ubuntu Unofficial Backports Project: http://backports.trausch.us/
>
> --
> truncate gives EIO on NFSv4
> https://bugs.launchpad.net/bugs/137387
> You received this bug notification because you are a member of Bazaar
> Developers, which is the registrant for Bazaar.
>
> Status in Bazaar Version Control System: Triaged
>
> Bug description:
> I am having some strange error message that decorate the end of the output
> generated by many of the bzr commands. Here it is a sample:
>
> yml@ssh1:~/workspace/dj_project/dj_survey$ bzr status
> modified:
> dj_project/settings.py
> unknown:
> dj_project/media/dj_survey/
> web_server_config/apache_modfcgi/admin_media@
> web_server_config/apache_modfcgi/site_media@
> bzr: ERROR: [Errno 5] Input/output error
> /usr/lib/python2.4/site-packages/bzrlib/lockable_files.py:110: UserWarning:
> file group LockableFiles(<bzrlib.transport.local.LocalTransport
> url=file:///nfs/http1/yml/www/workspace/.bzr/checkout/>) was not explicitly
> unlocked
> warn("file group %r was not explicitly unlocked" % self)
> /usr/lib/python2.4/site-packages/bzrlib/lock.py:79: UserWarning: lock on
> <closed file u'/nfs/http1/yml/www/workspace/.bzr/checkout/dirstate', mode
> 'rb+' at 0xb757d890> not released
> warn("lock on %r not released" % self.f)
> Exception exceptions.NotImplementedError: <exceptions.NotIm...

Read more...

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Thu, 2008-03-20 at 01:37 +0000, Martin Pool wrote:
> I suspect we're provoking an NFS bug. I'd like to get a network trace,
> though if Kerberos is doing encryption as well as access that may be
> hard.

Good news: It does not appear to be encrypted.

Bad news: I use a heavily loaded GUI and that means lots of file
access, lol.

Give me a few minutes, and I will have a (hopefully, completely
relevant) dump ready for you that won't have to be filtered or have
(hopefully) any extraneous crud in it.

 --- Mike

--
Michael B. Trausch <email address hidden>
home: 404-592-5746, 1 www.trausch.us
cell: 678-522-7934 im: <email address hidden>, jabber
Ubuntu Unofficial Backports Project: http://backports.trausch.us/

Revision history for this message
Michael B. Trausch (mtrausch) wrote : Re: truncate gives EIO on NFSv4

Here is the communication between my machine and the server for the following run (pasted from history because I forgot to screendump(1) the terminal to a text file every so often... d'oh!)

  489 mkdir {foo,bar}
  490 cd foo
  491 bzr init
  492 bzr diff
[error occurred here]

  493 cd ..
  494 rm -Rf foo
  495 cd bar
  496 bzr init
  497 bzr diff
[error occurred here]

  498 cd ..
  499 rm -Rf bar

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 137387] Re: truncate gives EIO on NFSv4
Download full text (5.9 KiB)

On Do, 2008-03-20 at 01:37 +0000, Martin Pool wrote:
> I suspect we're provoking an NFS bug. I'd like to get a network trace,
> though if Kerberos is doing encryption as well as access that may be
> hard.
wireshark should be able to decrypt it if you give it a kerberos keytab.

Cheers,

Jelmer

> On 3/20/08, Michael B. Trausch <email address hidden> wrote:
> > On Wed, 2008-03-19 at 22:37 +0000, Martin Pool wrote:
> > > Could you please have a look in /var/log/messages or equivalent on the
> > > client and see if there is anything relevant? If you can login to the
> > > server please look there too.
> >
> > Neither server nor client output anything to /var/log/messages
> > or /var/log/syslog when doing the following:
> >
> > mbt@sage:~/foo$ bzr init
> > mbt@sage:~/foo$ bzr status
> > bzr: ERROR: [Errno 5] Input/output error
> > mbt@sage:~/foo$ bzr diff
> > bzr: ERROR: [Errno 5] Input/output error
> >
> > (I watched both server and client using 'tail -f' while doing this,
> > twice, once for /var/log/messages on both and once again
> > for /var/log/syslog on both.)
> >
> > >
> > > What os and version are the server and client?
> > >
> >
> > Ubuntu Hardy server, fully updated; Ubuntu Hardy Desktop, fully updated.
> >
> > > (btw -Wignore will hide the "lock not released" errors, which was the
> > > original summary of the bug, but that's an unimportant knock-on effect
> > > compared to the io error.)
> >
> > -Wignore had zero effect; what you see above in the pasted shell session
> > is exactly the problem I have been having the entire time. :-(
> >
> > For whatever reason, I did not think to include the information on the
> > client or server. Sorry about that... I am usually better about such
> > things, I think.
> >
> > If there is any more information that I can provide, I would be happy to
> > do so.
> >
> > --- Mike
> >
> > --
> > Michael B. Trausch <email address hidden>
> > home: 404-592-5746, 1 www.trausch.us
> > cell: 678-522-7934 im: <email address hidden>, jabber
> > Ubuntu Unofficial Backports Project: http://backports.trausch.us/
> >
> > --
> > truncate gives EIO on NFSv4
> > https://bugs.launchpad.net/bugs/137387
> > You received this bug notification because you are a member of Bazaar
> > Developers, which is the registrant for Bazaar.
> >
> > Status in Bazaar Version Control System: Triaged
> >
> > Bug description:
> > I am having some strange error message that decorate the end of the output
> > generated by many of the bzr commands. Here it is a sample:
> >
> > yml@ssh1:~/workspace/dj_project/dj_survey$ bzr status
> > modified:
> > dj_project/settings.py
> > unknown:
> > dj_project/media/dj_survey/
> > web_server_config/apache_modfcgi/admin_media@
> > web_server_config/apache_modfcgi/site_media@
> > bzr: ERROR: [Errno 5] Input/output error
> > /usr/lib/python2.4/site-packages/bzrlib/lockable_files.py:110: UserWarning:
> > file group LockableFiles(<bzrlib.transport.local.LocalTransport
> > url=file:///nfs/http1/yml/www/workspace/.bzr/checkout/>) was not explicitly
> > unlocked
> > warn("file group %r was not explicitly unlocked" % self)
...

Read more...

Revision history for this message
Michael B. Trausch (mtrausch) wrote : Re: truncate gives EIO on NFSv4

Is the tcpdump I provided helpful?

Revision history for this message
Martin Pool (mbp) wrote :

assigned to me to read tcpdump

Changed in bzr:
assignee: nobody → mbp
Revision history for this message
Martin Pool (mbp) wrote :

I looked at this in wireshark with the filter "!(rpc.msgtyp == 0) && (nfs.nfsstat4 != 0)", which should show only non-successful replies.

Actually (nfs.nfsstat4 != 0) does just as well.

The only errors this finds are NFS4ERR_NOENT, and NFS4ERR_OPENMODE (10038).

http://209.85.173.104/search?q=cache:7n6xv9xCgPkJ:osdir.com/ml/ietf.nfsv4/2002-06/msg00027.html+nfs4err_openmode&hl=en&ct=clnk&cd=1 has some description of this second error:

> NFS4ERR_OPENMODE The client attempted a READ, WRITE, or SETATTR
> operation not sanctioned by the stateid passed
> (e.g. writing to a file opend only for read).

For example one of these occurs in frame 628. In frame 623 the client did LOCK(locktype=WRITE_LT). Without going into the the NFS4 spec further it may be that the server wanted us to lock it in read/write mode. At any rathe this seems like more of a bug in the client or server... We might be able to avoid it by doing different locking.

Revision history for this message
Ricardo B (ricardo-lip) wrote :

I'm encontering the same problem.

Acording to strace, bzr holds and locks two handles for .bzr/checkout/dirstate at the same time: the first in read only mode, the second in read/write mode.
...
stat64("/home/ricardo/NHQxx2x/.bzr/checkout/dirstate", {st_mode=S_IFREG|0644, st_size=2079, ...}) = 0
lstat64("/home", {st_mode=S_IFDIR|0755, st_size=127, ...}) = 0
lstat64("/home/ricardo", {st_mode=S_IFDIR|0750, st_size=8192, ...}) = 0
lstat64("/home/ricardo/NHQxx2x", {st_mode=S_IFDIR|0755, st_size=61, ...}) = 0
lstat64("/home/ricardo/NHQxx2x/.bzr", {st_mode=S_IFDIR|0755, st_size=102, ...}) = 0
lstat64("/home/ricardo/NHQxx2x/.bzr/checkout", {st_mode=S_IFDIR|0755, st_size=61, ...}) = 0
lstat64("/home/ricardo/NHQxx2x/.bzr/checkout/dirstate", {st_mode=S_IFREG|0644, st_size=2079, ...}) = 0
open("/home/ricardo/NHQxx2x/.bzr/checkout/dirstate", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=2079, ...}) = 0
fcntl64(4, F_SETLK64, {type=F_RDLCK, whence=SEEK_SET, start=0, len=0}, 0xbff81380) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=2079, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa2000
read(4, "#bazaar dirstate flat format 3\nc"..., 4096) = 2079
_llseek(4, 0, [2079], SEEK_CUR) = 0
...
_llseek(4, 2079, [2079], SEEK_SET) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=2079, ...}) = 0
_llseek(4, 0, [2079], SEEK_CUR) = 0
read(4, "", 4096)
...
open("/home/ricardo/NHQxx2x/.bzr/checkout/dirstate", O_RDWR|O_LARGEFILE) = 5
fstat64(5, {st_mode=S_IFREG|0644, st_size=2079, ...}) = 0
fcntl64(5, F_SETLK64, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}, 0xbff811c0) = 0
fstat64(5, {st_mode=S_IFREG|0644, st_size=2079, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa0000
_llseek(5, 0, [0], SEEK_SET) = 0
write(5, "#bazaar dirstate flat format 3\nc"..., 2078) = 2078
ftruncate64(5, 2078) = -1 EIO (Input/output error)
fcntl64(5, F_SETLKW64, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}, 0xbff81760) = 0
close(5) = -1 EIO (Input/output error)
...
fcntl64(4, F_SETLKW64, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}, 0xbff82110) = 0
close(4) = 0

Revision history for this message
Ricardo B (ricardo-lip) wrote :

Hello,
I dug into bzr and wrote a patch that looks like it works around the issue.

As Martin Pool wrote, this looks more like an NFS bug than a bzr bug.

If I perform a "bzr stat" on a NFS4 repository using a patched bzr, subsequent "bzr stat" performed on the same repository using a unpatched bzr 1.3.1 also succeed without errors.

If anyone with the issue is still looking at this thread, please try it and let me know how if it works out for you.

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

@Ricardo:

I will check it out as soon as I have a chance to set up an NFSv4 server again. I wound up pulling mine down because of all of the problems that I was having (many unrelated ones in trying to get things like Kerberos to work nicely and the whole thing not providing the nice smoothness I'd thought I'd get from it).

Revision history for this message
Stefan Monnier (monnier) wrote : Re: [Bug 137387] Re: truncate gives EIO on NFSv4

> I dug into bzr and wrote a patch that looks like it works around the issue.
> As Martin Pool wrote, this looks more like an NFS bug than a bzr bug.

I'm pretty sure there's a bug in the NFSv4 client indeed: not only do
I see the bug discussed in this thread, but I also regularly have to
reboot the machine if I use Bzr over a NFSv4+Krb filesystem because
apparently something gets deadlocked in the kernel and then user
processes freeze (since my home directory is accessed over NFSv4+Krb).

Sadly, my sysdmins tell you it's no great achievment to find a bug in
there: it took them months of fidgetting/bugreporting/tuning to came up
with something that doesn't freeze/crash once a day.

It appears that there simply isn't any reliable and secure distributed
filesystem for GNU/Linux :-(

> If I perform a "bzr stat" on a NFS4 repository using a patched bzr,
> subsequent "bzr stat" performed on the same repository using a unpatched
> bzr 1.3.1 also succeed without errors.

I'll try it and tell you if the machine survived ;-)

        Stefan

Revision history for this message
Michael B. Trausch (mtrausch) wrote : Re: truncate gives EIO on NFSv4

@Ricardo:

Well, I finally have the entire NFS thing working. Took nothing but lots of time and frustration. But, I found my way back here, because I was still encountering the lock problem.

I applied your patch on my local copy of bzr (1.6.1 presently, from Intrepid) and it fixed the problem; I can use bzr repositories on my home directory now, mounted via NFSv4.

Unfortunately, it doesn't seem like it works with bzr 1.9, though; it applies with fuzz but seemingly doesn't fix the problem with that version... I guess something must've changed?

Anyway, this is going to be my permanent situation now, though I guess I'll be keeping bzr local to my machine's hard drive and just not in my home directory, for now... any chance this'll be fixed in the mainline soon?

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 137387] Re: truncate gives EIO on NFSv4

It seems like there should be no problem putting bzr repositories on
nfsv4, you just can't put working trees there. That's when we use
file locks of the type addressed by this patch.

Regarding the patch
<http://launchpadlibrarian.net/16300560/bzr.nfs4.diff>, it looks like
it's doing two things. One is to keep a better count if we fail to
lock, which makes sense. The other is to release the read lock before
trying to upgrade it to a write lock. I don't think that's valid as
it stands (I haven't checked it very carefully) but if this does fix
the problem it would be useful information.

I'm not sure how this relates to truncate() eventually failing. Maybe
the server thinks there's still a read lock on the file and therefore
won't let it be truncated.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Thu, 20 Nov 2008 01:35:57 -0000
Martin Pool <email address hidden> wrote:

> It seems like there should be no problem putting bzr repositories on
> nfsv4, you just can't put working trees there. That's when we use
> file locks of the type addressed by this patch.
>
> Regarding the patch
> <http://launchpadlibrarian.net/16300560/bzr.nfs4.diff>, it looks like
> it's doing two things. One is to keep a better count if we fail to
> lock, which makes sense. The other is to release the read lock before
> trying to upgrade it to a write lock. I don't think that's valid as
> it stands (I haven't checked it very carefully) but if this does fix
> the problem it would be useful information.
>
> I'm not sure how this relates to truncate() eventually failing. Maybe
> the server thinks there's still a read lock on the file and therefore
> won't let it be truncated.

I don't know---I've read an awful lot of information on NFSv4 over the
past few days, noting some quirks between various implementations of it
and the like, and some on older NFS versions, too. I still am totally
in the dark with why it doesn't work.

A piece of interesting information, though: bzr 1.9 actually doesn't
need the patch, and while it does spit an error out still on an NFSv4
filesystem, it doesn't seem that the error means anything. I am
confused. :-) See the attached bzr.log for what I am talking about
here; I have the feeling, though, that it's a different bug.

 --- Mike

Revision history for this message
Martin Pool (mbp) wrote :

On Fri, Nov 21, 2008 at 3:07 AM, Michael B. Trausch <email address hidden> wrote:
> http://launchpadlibrarian.net/19812490/bzr.log

It's in the same kind of area though, file locking on the dirstate
file. It is weird that you would get an error while closing the file.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Vincent Ladeuil (vila) wrote : Re: truncate gives EIO on NFSv4

Well, closing the file may flush it, so an IO error makes sense.

What I find weird though is that anytime I use virtual machines, I use NFS to share the files with the host machine. That means that 90% of my bzr commands run on NFS for branches, repositories and working trees.

In the last 3 years I only had a single corruption and that was with a knit repository...

Servers used include OSX 10.4, Festy, Gutsy and Hardy.
Clients used include Festy, Gutsy, Hardy, Solaris 2.10 and OpenSolaris 08.11.

NFS versions used, well, things are less clear here as some negotiations occur generally at mount time, but at least v2 and v3 were used and I'm pretty sure I now use v4.

I also used some options at export/mount time in the past (sorry I didn't take notes :-/) but I don't anymore.

I thought I should at least mention that as, may be, something is wrong in Michael and Ricardo setups.

Revision history for this message
Martin Pool (mbp) wrote :

I'm not sure what to do with this bug. I'm close to saying it's a server bug or quirk and bzr is not doing anything wrong, especially as Vincent reports he can't reproduce it.

Releasing the read lock before acquiring the write lock as proposed in the original patch would be wrong. The contract of this locking code is that you can write-lock a file that was previously read locked.

Eventually, per bug 98836, we'd like to get rid of the use of os locks for dirstates and repositories, which will avoid the area that's causing trouble with this bug.

summary: - truncate gives EIO on NFSv4
+ close or truncate of os-locked file gives EIO on some NFSv4 servers
Changed in bzr:
assignee: Martin Pool (mbp) → nobody
importance: High → Low
status: Triaged → Incomplete
Revision history for this message
Michael B. Trausch (mtrausch) wrote :

Hrm... I think this could be a bug that would be a candidate for linking bugs together; if I understand it correctly, this bug basically won't go away until bug 98836 is fixed, and so should probably be dispo'd as "Status: blocked" or so... being that we don't have that, and bug 98836 is going to fix the issue when it is fixed, maybe this one should just be marked a dupe?

tags: added: dirstate dirstate2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.