kernel bug: rpcb with a mounted nfs share

Bug #244443 reported by Florian Hackenberger
This bug report is a duplicate of:  Bug #212485: kernel bug rpc nfs client. Edit Remove
4
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

Description: Ubuntu 8.04.1
Release: 8.04

Steps to reproduce (on a Thinkpad X61t) using 2.6.24-19-generic:
- Mount an nfs4 share
- suspend
- unplug the network cable
- resume
- be lucky

[ 42.839213] eth0: no IPv6 routers present
[ 160.390966] nfs: server lancelot.acoveo.co not responding, timed out
[ 160.391075] ------------[ cut here ]------------
[ 160.391083] kernel BUG at /build/buildd/linux-2.6.24/net/sunrpc/rpcb_clnt.c:322!
[ 160.391088] invalid opcode: 0000 [#1] SMP
[ 160.391095] Modules linked in: iwl3945 iwlwifi_mac80211 cfg80211 e1000 des_generic joydev ppp_deflate zlib_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc sr_mod cdrom option airprime usbserial usb_storage libusual af_packet i915 drm rfcomm l2cap bluetooth kvm_intel kvm parport_pc ppdev ipv6 acpi_cpufreq cpufreq_stats cpufreq_conservative cpufreq_ondemand freq_table cpufreq_powersave cpufreq_userspace container sbs sbshc bay dock rpcsec_gss_krb5 auth_rpcgss nfs lockd nfs_acl sunrpc iptable_filter ip_tables x_tables uinput sbp2 lp parport loop arc4 ecb pcmcia snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_hwdep snd_seq_dummy sdhci ac mmc_core battery yenta_socket rsrc_nonstatic pcmcia_core snd_seq_oss snd_seq_midi video output snd_rawmidi snd_seq_midi_event psmouse serio_raw snd_seq snd_timer snd_seq_device thinkpad_acpi button nvram snd iTCO_wdt iTCO_vendor_support evdev intel_agp soundcore shpchp pci_hotplug agpgart pcspkr ext3 jbd mbcache sha256_generic aes_i586 cbc blkcipher sg pata_acpi sd_mod usbhid hid ata_piix ahci ata_generic libata ohci1394 scsi_mod ieee1394 dm_crypt ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 160.391321]
[ 160.391326] Pid: 31193, comm: gvfsd-trash Not tainted (2.6.24-19-generic #1)
[ 160.391333] EIP: 0060:[<f9051491>] EFLAGS: 00010206 CPU: 0
[ 160.391382] EIP is at rpcb_getport_async+0x251/0x3d0 [sunrpc]
[ 160.391388] EAX: c2bddf00 EBX: c2bddf00 ECX: c2bddf80 EDX: f906aa20
[ 160.391393] ESI: eaf48e00 EDI: ea885600 EBP: d3ccfc00 ESP: d071bbf4
[ 160.391399] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 160.391405] Process gvfsd-trash (pid: 31193, ti=d071a000 task=ef351700 task.ti=d071a000)
[ 160.391410] Stack: d071bc28 00000000 d071bc28 00000000 d071bc30 c031b2c9 f90490a0 c2bddf00
[ 160.391425] 00000001 f90490a0 c2bddf80 c031b33b c2bddf00 d3ccfc00 d071bd38 c2bddf80
[ 160.391439] f9042195 c0140c80 d071bc3c d071bc3c 00000000 00000000 c2bddf00 f9049542
[ 160.391453] Call Trace:
[ 160.391501] [<c031b2c9>] __wait_on_bit+0x59/0x70
[ 160.391517] [<f90490a0>] rpc_wait_bit_interruptible+0x0/0x20 [sunrpc]
[ 160.391590] [<f90490a0>] rpc_wait_bit_interruptible+0x0/0x20 [sunrpc]
[ 160.391651] [<c031b33b>] out_of_line_wait_on_bit+0x5b/0x70
[ 160.391689] [<f9042195>] call_bind+0x45/0x80 [sunrpc]
[ 160.391728] [<c0140c80>] wake_bit_function+0x0/0x60
[ 160.391777] [<f9049542>] __rpc_execute+0x62/0x280 [sunrpc]
[ 160.391837] [<c0137665>] sigprocmask+0x65/0x110
[ 160.391860] [<f9048bf4>] rpc_set_active+0x44/0x70 [sunrpc]
[ 160.391933] [<f9042f38>] rpc_do_run_task+0x68/0xc0 [sunrpc]
[ 160.392017] [<f904300d>] rpc_call_sync+0x1d/0x40 [sunrpc]
[ 160.392088] [<f90d7633>] nfs4_proc_access+0x73/0x1b0 [nfs]
[ 160.392155] [<f90c1d12>] nfs_lookup+0xc2/0x170 [nfs]
[ 160.392229] [<c01b4223>] ll_rw_block+0x23/0xc0
[ 160.392263] [<f8aac8d7>] ext3_find_entry+0x1c7/0x650 [ext3]
[ 160.392310] [<c0121084>] kunmap_atomic+0x84/0xb0
[ 160.392428] [<f90c255d>] nfs_do_access+0xbd/0x2e0 [nfs]
[ 160.392512] [<f904a5b8>] rpcauth_lookupcred+0x58/0xa0 [sunrpc]
[ 160.392599] [<f90c2834>] nfs_permission+0xb4/0x160 [nfs]
[ 160.392636] [<c01a3f95>] dput+0x85/0x100
[ 160.392661] [<c019916b>] __follow_mount+0x1b/0x70
[ 160.392685] [<f90c2780>] nfs_permission+0x0/0x160 [nfs]
[ 160.392745] [<c0198fe7>] permission+0x97/0x120
[ 160.392775] [<c019acb9>] __link_path_walk+0x79/0xe10
[ 160.392783] [<c012813d>] finish_task_switch+0x8d/0x90
[ 160.392818] [<c031f3bc>] kprobe_flush_task+0x4c/0xa0
[ 160.392872] [<c019ba95>] link_path_walk+0x45/0xc0
[ 160.392897] [<c017d988>] handle_mm_fault+0x118/0x730
[ 160.392969] [<c019bce7>] do_path_lookup+0x77/0x200
[ 160.392984] [<c019aa3a>] getname+0xaa/0xe0
[ 160.393009] [<c019c74b>] __user_walk_fd+0x3b/0x60
[ 160.393037] [<c01953af>] vfs_lstat_fd+0x1f/0x50
[ 160.393062] [<c017d988>] handle_mm_fault+0x118/0x730
[ 160.393134] [<c019545f>] sys_lstat64+0xf/0x30
[ 160.393147] [<c031e0bf>] do_page_fault+0x13f/0x730
[ 160.393212] [<c01043c2>] sysenter_past_esp+0x6b/0xa9
[ 160.393250] [<c0310000>] unix_stream_sendmsg+0x1a0/0x390
[ 160.393286] =======================
[ 160.393289] Code: fe ff ff 8b 54 24 1c 0f b7 82 ac 00 00 00 c7 44 24 08 44 69 05 f9 c7 04 24 70 a0 05 f9 89 44 24 04 e8 14 bd 0d c7 e9 3c fe ff ff <0f> 0b eb fe 0f b7 b0 ac 00 00 00 8b 57 24 8b 4f 20 8b 5f 2c 8b
[ 160.393366] EIP: [<f9051491>] rpcb_getport_async+0x251/0x3d0 [sunrpc] SS:ESP 0068:d071bbf4
[ 160.393425] ---[ end trace 6a6cff2fbe4f1a2c ]---
[ 274.568351] rpcbind: server lancelot.acoveo.co not responding, timed out
[ 274.576358] nfs: server lancelot.acoveo.co not responding, timed out
[ 331.381202] nfs: server lancelot.acoveo.co not responding, timed out
[ 446.130997] rpcbind: server lancelot.acoveo.co not responding, timed out

Revision history for this message
Daniel J Blueman (danielblueman) wrote :
Download full text (7.7 KiB)

Same NFSv4 server (Ubuntu 8.04.1 LTS x86-64 incidentally), Ubuntu 8.04.1 LTS x86-64 client:

[ 65.224366] Pid: 10397, comm: gnome-screensav Not tainted 2.6.24-19-generic #1
[ 65.224368] RIP: 0010:[<ffffffff8839d8b2>] [<ffffffff8839d8b2>] :sunrpc:rpcb_getport_async+0x272/0x3c0
[ 65.224384] RSP: 0018:ffff8100cb66d9b8 EFLAGS: 00010287
[ 65.224386] RAX: ffffffff883bb5c0 RBX: ffff8100cbd77400 RCX: 00000000c0000100
[ 65.224387] RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff8100853d8000
[ 65.224389] RBP: ffff8100cb559800 R08: ffff8100cb66c000 R09: 0000000000000000
[ 65.224390] R10: ffff810009028fe0 R11: ffffffff8839d640 R12: ffff8100cba92400
[ 65.224392] R13: ffff8100853d8000 R14: ffff8100cb66da88 R15: ffff8100cb66dbb8
[ 65.224393] FS: 00007fcbc0e6d7a0(0000) GS:ffff8100ccc01700(0000) knlGS:0000000000000000
[ 65.224395] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 65.224396] CR2: 00007fa7776dc000 CR3: 00000000b8c86000 CR4: 00000000000006a0
[ 65.224398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 65.224399] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 65.224401] Process gnome-screensav (pid: 10397, threadinfo ffff8100cb66c000, task ffff8100b3c6c000)
[ 65.224402] Stack: ffffffff80253a30 ffff8100cb66d9c0 0000000000000286 ffff8100853d8090
[ 65.224406] 0000000000000000 ffff8100853d8000 ffffffff883a3390 ffff8100853d80f0
[ 65.224408] ffff8100cb66da88 ffffffff88394b3b ffffffff883a3390 ffff8100853d8000
[ 65.224411] Call Trace:
[ 65.224415] [<ffffffff80253a30>] wake_bit_function+0x0/0x30
[ 65.224429] [<ffffffff88394b3b>] :sunrpc:__rpc_execute+0x6b/0x290
[ 65.224441] [<ffffffff8838df86>] :sunrpc:rpc_do_run_task+0x76/0xd0
[ 65.224453] [<ffffffff8838e045>] :sunrpc:rpc_call_sync+0x15/0x40
[ 65.224467] [<ffffffff883f3fff>] :nfs:nfs4_proc_access+0x8f/0x1e0
[ 65.224477] [<ffffffff883e1f95>] :nfs:nfs_revalidate_inode+0x25/0x70
[ 65.224482] [<ffffffff802c692b>] dput+0xab/0x140
[ 65.224491] [<ffffffff883dc7b4>] :nfs:nfs_lookup_revalidate+0x244/0x3d0
[ 65.224509] [<ffffffff883dd82a>] :nfs:nfs_do_access+0xda/0x350
[ 65.224523] [<ffffffff883ddb78>] :nfs:nfs_permission+0xd8/0x1a0
[ 65.224528] [<ffffffff802bab90>] permission+0xb0/0x160
[ 65.224531] [<ffffffff802bce87>] __link_path_walk+0x87/0xe90
[ 65.224538] [<ffffffff802bdceb>] link_path_walk+0x5b/0x100
[ 65.224544] [<ffffffff802b0a67>] get_unused_fd_flags+0x77/0x120
[ 65.224547] [<ffffffff80467d72>] __down_read+0x12/0xb1
[ 65.224553] [<ffffffff802bdfaa>] do_path_lookup+0x8a/0x250
[ 65.224555] [<ffffffff802bcb49>] getname+0x1a9/0x220
[ 65.224559] [<ffffffff802bec0b>] __user_walk_fd+0x4b/0x80
[ 65.224563] [<ffffffff802b14eb>] sys_faccessat+0xbb/0x1b0
[ 65.224568] [<ffffffff802b0a67>] get_unused_fd_flags+0x77/0x120
[ 65.224571] [<ffffffff80467d72>] __down_read+0x12/0xb1
[ 65.224574] [<ffffffff8034d551>] __up_read+0x21/0xb0
[ 65.224579] [<ffffffff8020c37e>] system_call+0x7e/0x83
[ 65.224584]
[ 65.224585]
[ 65.224585] Code: 0f 0b eb fe 8b 85 b8 00 00 00 0f b7 b7 48 01 00 00 48 c7 c2
[ 65.224592] RIP [<ffffffff8839d8b2>] :sunrpc:rpcb_getport_async...

Read more...

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

As symbolic information isn't built [1] into the kernel image, we can probably guess what line etc, but I'll reproduce and resolve line numbers etc after installing 'linux-image-debug-generic'...

--- [1]

$ modinfo sunrpc
filename: /lib/modules/2.6.24-19-generic/kernel/net/sunrpc/sunrpc.ko
$ objdump -dl /lib/modules/2.6.24-19-generic/kernel/net/sunrpc/sunrpc.ko >sunrpc.dump
$ grep '<rpcb_getport_async>' sunrpc.dump
0000000000010640 <rpcb_getport_async>:
$ printf '%x\n' $((0x10640 + 0x272))
108b2
$ grep -C10 108b2: sunrpc.dump
   1087e: f6 05 00 00 00 00 20 testb $0x20,0x0(%rip) # 10885 <rpcb_getport_async+0x245>
   10885: bb a0 ff ff ff mov $0xffffffa0,%ebx
   1088a: 0f 84 42 fe ff ff je 106d2 <rpcb_getport_async+0x92>
   10890: 41 0f b7 b5 48 01 00 movzwl 0x148(%r13),%esi
   10897: 00
   10898: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
   1089f: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
   108a6: 31 c0 xor %eax,%eax
   108a8: e8 00 00 00 00 callq 108ad <rpcb_getport_async+0x26d>
   108ad: e9 20 fe ff ff jmpq 106d2 <rpcb_getport_async+0x92>
   108b2: 0f 0b ud2a <------ cross-check matches 'Code: 0f 0b'
   108b4: eb fe jmp 108b4 <rpcb_getport_async+0x274>
   108b6: 8b 85 b8 00 00 00 mov 0xb8(%rbp),%eax
   108bc: 0f b7 b7 48 01 00 00 movzwl 0x148(%rdi),%esi
   108c3: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
   108ca: 44 8b 4b 44 mov 0x44(%rbx),%r9d
   108ce: 44 8b 43 40 mov 0x40(%rbx),%r8d
   108d2: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
   108d9: 48 8b 4b 50 mov 0x50(%rbx),%rcx
   108dd: 89 04 24 mov %eax,(%rsp)
   108e0: 31 c0 xor %eax,%eax

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

void rpcb_getport_async(struct rpc_task *task)
{
        struct rpc_clnt *clnt = task->tk_client;
        int bind_version;
        struct rpc_xprt *xprt = task->tk_xprt;
        struct rpc_clnt *rpcb_clnt;
        static struct rpcbind_args *map;
        struct rpc_task *child;
        struct sockaddr addr;
        int status;
        struct rpcb_info *info;

        dprintk("RPC: %5u %s(%s, %u, %u, %d)\n",
                task->tk_pid, __FUNCTION__,
                clnt->cl_server, clnt->cl_prog, clnt->cl_vers, xprt->prot);

        /* Autobind on cloned rpc clients is discouraged */
        BUG_ON(clnt->cl_parent != clnt); <------

Indeed, we're hitting the same issue as we used to in LP#224750, so it looks like there is more than one case which needs addressing.

Related links:
http://www.linuxhq.com/kernel/v2.6/25-rc9/net/sunrpc/rpcb_clnt.c
http://kerneltrap.org/mailarchive/linux-nfs/2007/12/10/487250
http://lists.debian.org/debian-kernel/2008/04/msg00409.html

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

We can check if a related fix is relevant or not by disabling ipv6 address assignment on our interfaces, and doing suspend/resume cycles:

http://groups.google.co.uk/group/linux.kernel/browse_thread/thread/599db16a8d9d84ed#

Revision history for this message
Florian Hackenberger (f-hackenberger) wrote :

I'm willing to test if someone can outline the required steps.

Revision history for this message
Daniel J Blueman (danielblueman) wrote :

Duplicate of LP#212485, which has been triaged as critical; moving updates to this one. Can this dup be closed?

Revision history for this message
Florian Hackenberger (f-hackenberger) wrote :

I reported the bug and the trace from LP#212485 looks similar enough. I'd say it's fine to close this one.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.