NFSd4 crashes system in unhash_delegation_locked
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-signed-hwe (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
After running quite busy NFS4 server with ZFS as backend filesystem for some time we get system crash with weekly regularity. Clients are mounted with delegation propagation enabled and client mount options are as follows:
type nfs4 (rw,nosuid,
Server side configuration is
PIPEFS_
RPCNFSDARGS=
RPCMOUNTDARGS=
STATDARGS=""
RPCSVCGSSDARGS=""
SVCGSSDARGS=""
The error happens in executing unhash_
[2768169.862683] BUG: unable to handle page fault for address: ffffffffc09451a9
[2768169.863924] #PF: supervisor write access in kernel mode
[2768169.864790] #PF: error_code(0x0003) - permissions violation
[2768169.865695] PGD 3fe20e067 P4D 3fe20e067 PUD 3fe210067 PMD bf9c25067 PTE bf9f81161
[2768169.866895] Oops: 0003 [#1] SMP NOPTI
[2768169.867493] CPU: 8 PID: 4105769 Comm: kworker/u24:1 Tainted: P W OE 5.3.0-46-generic #38~18.04.1-Ubuntu
[2768169.869154] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.12.0-1 04/01/2014
[2768169.870447] Workqueue: nfsd4 laundromat_main [nfsd]
[2768169.871235] RIP: 0010:_raw_
[2768169.871959] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 0a 13 66 ff 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 c1 fa 65 ff 66 90 5d c3 0f 1f 00
[2768169.874528] RSP: 0018:ffffbe5ed1
[2768169.875177] RAX: 0000000000000000 RBX: ffffbe5ed12f7de8 RCX: 0000000000000000
[2768169.876084] RDX: 0000000000000001 RSI: ffff9508089084e0 RDI: ffffffffc09451a9
[2768169.876993] RBP: ffffbe5ed12f7de0 R08: 000000000000077e R09: 0000000000000004
[2768169.877942] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc09451a9
[2768169.878793] R13: ffffbe5ed12f7e20 R14: ffffbe5ed12f7e40 R15: ffff9508089084e0
[2768169.879627] FS: 000000000000000
[2768169.880624] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2768169.881359] CR2: ffffffffc09451a9 CR3: 0000000bd237c000 CR4: 0000000000340ee0
[2768169.882241] Call Trace:
[2768169.882571] unhash_
[2768169.883201] laundromat_
[2768169.883756] process_
[2768169.884272] worker_
[2768169.884725] kthread+0x121/0x140
[2768169.885165] ? process_
[2768169.885730] ? kthread_
[2768169.886302] ret_from_
[2768169.886837] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs cpuid rpcsec_gss_krb5 rbd libceph ipt_REJECT nf_reject_ipv4 xt_set ip_set_hash_ipport xt_ipvs ip_set_hash_ip ip_set_hash_net ip_set dummy xt_tcpudp iptable_raw xt_CT veth xt_MASQUERADE xt_comment xt_mark iptable_nat iptable_filter bpfilter xt_conntrack nf_nat nf_conntrack_
[2768169.886885] hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cirrus drm_kms_helper aesni_intel syscopyarea aes_x86_64 crypto_simd sysfillrect sysimgblt cryptd fb_sys_fops glue_helper psmouse virtio_scsi virtio_net drm net_failover i2c_piix4 failover pata_acpi floppy
[2768169.902274] CR2: ffffffffc09451a9
[2768169.902777] ---[ end trace dcbbef50958ba3f7 ]---
[2768169.903440] RIP: 0010:_raw_
[2768169.904064] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 0a 13 66 ff 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 c1 fa 65 ff 66 90 5d c3 0f 1f 00
[2768169.907606] RSP: 0018:ffffbe5ed1
[2768169.908641] RAX: 0000000000000000 RBX: ffffbe5ed12f7de8 RCX: 0000000000000000
[2768169.910010] RDX: 0000000000000001 RSI: ffff9508089084e0 RDI: ffffffffc09451a9
[2768169.911399] RBP: ffffbe5ed12f7de0 R08: 000000000000077e R09: 0000000000000004
[2768169.912648] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc09451a9
[2768169.913952] R13: ffffbe5ed12f7e20 R14: ffffbe5ed12f7e40 R15: ffff9508089084e0
[2768169.915217] FS: 000000000000000
[2768169.916626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2768169.917774] CR2: ffffffffc09451a9 CR3: 0000000bd237c000 CR4: 0000000000340ee0
[2768169.919025] Kernel panic - not syncing: Fatal exception
[2768169.920317] Kernel Offset: 0x12400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000
[2768169.922007] Rebooting in 10 seconds..
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-
ProcVersionSign
Uname: Linux 5.3.0-53-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.9-0ubuntu7.11
Architecture: amd64
Date: Fri Jun 26 10:36:00 2020
Ec2AMI: ami-00000005
Ec2AMIManifest: FIXME
Ec2Availability
Ec2InstanceType: test-c4.4xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
ProcEnviron:
LC_CTYPE=C.UTF-8
TERM=xterm-
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)
Status changed to 'Confirmed' because the bug affects multiple users.