NFSd4 crashes system in unhash_delegation_locked

Bug #1885265 reported by Anton Ivanov
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux-signed-hwe (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After running quite busy NFS4 server with ZFS as backend filesystem for some time we get system crash with weekly regularity. Clients are mounted with delegation propagation enabled and client mount options are as follows:

type nfs4 (rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,acregmin=600,acregmax=600,acdirmin=600,acdirmax=600,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=y.y.y.y,local_lock=none,addr=x.x.x.x)

Server side configuration is

PIPEFS_MOUNTPOINT=/run/rpc_pipefs
RPCNFSDARGS="--grace-time 10 32"
RPCMOUNTDARGS="--manage-gids --num-threads=8"
STATDARGS=""
RPCSVCGSSDARGS=""
SVCGSSDARGS=""

The error happens in executing unhash_delegation_locked function called from laundromat_main. Error on the console before reboot is below:

[2768169.862683] BUG: unable to handle page fault for address: ffffffffc09451a9
[2768169.863924] #PF: supervisor write access in kernel mode
[2768169.864790] #PF: error_code(0x0003) - permissions violation
[2768169.865695] PGD 3fe20e067 P4D 3fe20e067 PUD 3fe210067 PMD bf9c25067 PTE bf9f81161
[2768169.866895] Oops: 0003 [#1] SMP NOPTI
[2768169.867493] CPU: 8 PID: 4105769 Comm: kworker/u24:1 Tainted: P W OE 5.3.0-46-generic #38~18.04.1-Ubuntu
[2768169.869154] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.12.0-1 04/01/2014
[2768169.870447] Workqueue: nfsd4 laundromat_main [nfsd]
[2768169.871235] RIP: 0010:_raw_spin_lock+0x10/0x30
[2768169.871959] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 0a 13 66 ff 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 c1 fa 65 ff 66 90 5d c3 0f 1f 00
[2768169.874528] RSP: 0018:ffffbe5ed12f7de0 EFLAGS: 00010246
[2768169.875177] RAX: 0000000000000000 RBX: ffffbe5ed12f7de8 RCX: 0000000000000000
[2768169.876084] RDX: 0000000000000001 RSI: ffff9508089084e0 RDI: ffffffffc09451a9
[2768169.876993] RBP: ffffbe5ed12f7de0 R08: 000000000000077e R09: 0000000000000004
[2768169.877942] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc09451a9
[2768169.878793] R13: ffffbe5ed12f7e20 R14: ffffbe5ed12f7e40 R15: ffff9508089084e0
[2768169.879627] FS: 0000000000000000(0000) GS:ffff950d8fa00000(0000) knlGS:0000000000000000
[2768169.880624] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2768169.881359] CR2: ffffffffc09451a9 CR3: 0000000bd237c000 CR4: 0000000000340ee0
[2768169.882241] Call Trace:
[2768169.882571] unhash_delegation_locked+0x39/0xa0 [nfsd]
[2768169.883201] laundromat_main+0x235/0x5a0 [nfsd]
[2768169.883756] process_one_work+0x1fd/0x3f0
[2768169.884272] worker_thread+0x34/0x410
[2768169.884725] kthread+0x121/0x140
[2768169.885165] ? process_one_work+0x3f0/0x3f0
[2768169.885730] ? kthread_park+0xb0/0xb0
[2768169.886302] ret_from_fork+0x22/0x40
[2768169.886837] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid rpcsec_gss_krb5 rbd libceph ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_ipport xt_ipvs ip_set_hash_ip ip_set_hash_net ip_set dummy xt_tcpudp iptable_raw xt_CT veth xt_MASQUERADE xt_comment xt_mark iptable_nat iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo aufs overlay zfs(POE) zunicode(PO) zavl(PO) icp(POE) zcommon(POE) znvpair(POE) spl(OE) zlua(POE) nls_iso8859_1 kvm_amd ccp kvm joydev input_leds irqbypass mac_hid serio_raw qemu_fw_cfg sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi br_netfilter bridge stp llc ip_vs_sh nfsd ip_vs_wrr ip_vs_rr auth_rpcgss ip_vs nfs_acl lockd nf_conntrack grace nf_defrag_ipv6 nf_defrag_ipv4 sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear
[2768169.886885] hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cirrus drm_kms_helper aesni_intel syscopyarea aes_x86_64 crypto_simd sysfillrect sysimgblt cryptd fb_sys_fops glue_helper psmouse virtio_scsi virtio_net drm net_failover i2c_piix4 failover pata_acpi floppy
[2768169.902274] CR2: ffffffffc09451a9
[2768169.902777] ---[ end trace dcbbef50958ba3f7 ]---
[2768169.903440] RIP: 0010:_raw_spin_lock+0x10/0x30
[2768169.904064] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 0a 13 66 ff 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 c1 fa 65 ff 66 90 5d c3 0f 1f 00
[2768169.907606] RSP: 0018:ffffbe5ed12f7de0 EFLAGS: 00010246
[2768169.908641] RAX: 0000000000000000 RBX: ffffbe5ed12f7de8 RCX: 0000000000000000
[2768169.910010] RDX: 0000000000000001 RSI: ffff9508089084e0 RDI: ffffffffc09451a9
[2768169.911399] RBP: ffffbe5ed12f7de0 R08: 000000000000077e R09: 0000000000000004
[2768169.912648] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc09451a9
[2768169.913952] R13: ffffbe5ed12f7e20 R14: ffffbe5ed12f7e40 R15: ffff9508089084e0
[2768169.915217] FS: 0000000000000000(0000) GS:ffff950d8fa00000(0000) knlGS:0000000000000000
[2768169.916626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2768169.917774] CR2: ffffffffc09451a9 CR3: 0000000bd237c000 CR4: 0000000000340ee0
[2768169.919025] Kernel panic - not syncing: Fatal exception
[2768169.920317] Kernel Offset: 0x12400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[2768169.922007] Rebooting in 10 seconds..

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-5.3.0-46-generic 5.3.0-46.38~18.04.1
ProcVersionSignature: Ubuntu 5.3.0-53.47~18.04.1-generic 5.3.18
Uname: Linux 5.3.0-53-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.9-0ubuntu7.11
Architecture: amd64
Date: Fri Jun 26 10:36:00 2020
Ec2AMI: ami-00000005
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: phx-c107
Ec2InstanceType: test-c4.4xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
ProcEnviron:
 LC_CTYPE=C.UTF-8
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Anton Ivanov (biwwy) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed-hwe (Ubuntu):
status: New → Confirmed
Anton Ivanov (biwwy)
description: updated
Revision history for this message
Anton Ivanov (biwwy) wrote :

This exact kernel panic happened to us again on newer linux version 5.4.0-39-generic

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.