Server crashes on soft lockup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-lts-xenial (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Release: Ubuntu 14.04.5 LTS
Kernel: Linux 4.4.0-67-generic #88~14.04.1-Ubuntu SMP
Filesystems: ext4 on Hardware RAID 6
We regularly run a backup script, that mainly utilities rsync and mv. When there is a lot of change, the server sometimes freezes and can only be recovered by power cycling. I thought it was a hardware problem, but we have this problem now on 2 out of 18 identical machines. They have different BIOS versions. So probably, it's related to the amount of data. During the process I see high load by the processes rsync and chmod.
Kernel messages:
Apr 2 01:09:58 server kernel: [483707.688686] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [kswapd0:83]
Apr 2 01:09:58 server kernel: [483707.688716] Modules linked in: drbg ansi_cprng ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc dm_crypt intel_rapl x86_pkg_
Apr 2 01:09:58 server kernel: [483707.688718] CPU: 7 PID: 83 Comm: kswapd0 Tainted: G L 4.4.0-67-generic #88~14.04.1-Ubuntu
Apr 2 01:09:58 server kernel: [483707.688719] Hardware name: Dell Inc. PowerEdge T630, BIOS 1.5.4 10/04/2015
Apr 2 01:09:58 server kernel: [483707.688720] task: ffff881034ac6200 ti: ffff88102da44000 task.ti: ffff88102da44000
Apr 2 01:09:58 server kernel: [483707.688722] RIP: 0010:[<
Apr 2 01:09:58 server kernel: [483707.688723] RSP: 0018:ffff88102d
Apr 2 01:09:58 server kernel: [483707.688724] RAX: 0000000000000000 RBX: 000000000000037a RCX: ffff88103d3d7940
Apr 2 01:09:58 server kernel: [483707.688725] RDX: ffff88103d417940 RSI: 0000000000200000 RDI: ffffffff821dc7e0
Apr 2 01:09:58 server kernel: [483707.688725] RBP: ffff88102da47c58 R08: 0000000000000101 R09: 28f5c28f5c28f5c3
Apr 2 01:09:58 server kernel: [483707.688726] R10: 0000000000000000 R11: ffff88102da47a58 R12: 0000000000000080
Apr 2 01:09:58 server kernel: [483707.688727] R13: 0000000000000000 R14: ffffffff81e8ae40 R15: 0000000000007ace
Apr 2 01:09:58 server kernel: [483707.688728] FS: 000000000000000
Apr 2 01:09:58 server kernel: [483707.688728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 2 01:09:58 server kernel: [483707.688729] CR2: 00007ff3c624c0f2 CR3: 0000000001e0c000 CR4: 00000000001426e0
Apr 2 01:09:58 server kernel: [483707.688730] Stack:
Apr 2 01:09:58 server kernel: [483707.688731] ffff88102da47c68 ffffffff81183477 ffff88102da47c78 ffffffff81806af0
Apr 2 01:09:58 server kernel: [483707.688733] ffff88102da47c88 ffffffff8125dfd5 ffff88102da47d60 ffffffff8119601a
Apr 2 01:09:58 server kernel: [483707.688734] 0000000000000000 0000000000000000 ffff880da9fdf340 0000000000e86866
Apr 2 01:09:58 server kernel: [483707.688735] Call Trace:
Apr 2 01:09:58 server kernel: [483707.688737] [<ffffffff81183
Apr 2 01:09:58 server kernel: [483707.688739] [<ffffffff81806
Apr 2 01:09:58 server kernel: [483707.688740] [<ffffffff8125d
Apr 2 01:09:58 server kernel: [483707.688742] [<ffffffff81196
Apr 2 01:09:58 server kernel: [483707.688744] [<ffffffff8119a
Apr 2 01:09:58 server kernel: [483707.688746] [<ffffffff8119b
Apr 2 01:09:58 server kernel: [483707.688749] [<ffffffff8119b
Apr 2 01:09:58 server kernel: [483707.688750] [<ffffffff8109c
Apr 2 01:09:58 server kernel: [483707.688752] [<ffffffff8109c
Apr 2 01:09:58 server kernel: [483707.688753] [<ffffffff81807
Apr 2 01:09:58 server kernel: [483707.688754] [<ffffffff8109c
Apr 2 01:09:58 server kernel: [483707.688772] Code: c2 c1 e8 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 79 01 00 48 03 14 c5 00 99 f3 81 48 89 0a 8b 41 08 85 c0 75 0d f3 90 <8b> 41 08 85 c0 74 f7 eb 02 f3 90 8b 17 66 85 d2 75 f7 39 f2 66
Apr 2 01:09:58 server kernel: [483707.698419] Modules linked in: drbg ansi_cprng ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc dm_crypt intel_rapl x86_pkg_
Apr 2 01:09:58 server kernel: [483707.698441] CPU: 3 PID: 3119 Comm: freshclam Tainted: G L 4.4.
0-67-generic #88~14.04.1-Ubuntu
Apr 2 01:09:58 server kernel: [483707.698441] Hardware name: Dell Inc. PowerEdge T630, BIOS 1.5.4 10/0
4/2015
Apr 2 01:09:58 server kernel: [483707.698443] task: ffff88102b9b3800 ti: ffff88102ef28000 task.ti: ffff88102e
f28000
Apr 2 01:09:58 server kernel: [483707.698444] RIP: 0010:[<
ued_spin_
Apr 2 01:09:58 server kernel: [483707.698447] RSP: 0018:ffff88102e
Apr 2 01:09:58 server kernel: [483707.698448] RAX: 0000000000000000 RBX: 000000000000037a RCX: ffff88103d2d79
40
Apr 2 01:09:58 server kernel: [483707.698448] RDX: ffff88103d3d7940 RSI: 0000000000100000 RDI: ffffffff821dc7
e0
Apr 2 01:09:58 server kernel: [483707.698449] RBP: ffff88102ef2b7c0 R08: 0000000000000101 R09: 28f5c28f5c28f5
c3
Apr 2 01:09:58 server kernel: [483707.698450] R10: 0000000000000000 R11: ffff88102ef2b5c8 R12: 00000000000000
80
Apr 2 01:09:58 server kernel: [483707.698451] R13: 0000000000000000 R14: ffffffff81e8ae40 R15: 0000000000007a
ce
Apr 2 01:09:58 server kernel: [483707.698452] FS: 00007fe59bc0278
0000000000000
Apr 2 01:09:58 server kernel: [483707.698453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 2 01:09:58 server kernel: [483707.698454] CR2: 00007fe59bc13000 CR3: 000000102c83f000 CR4: 00000000001426
e0
Apr 2 01:09:58 server kernel: [483707.698455] Stack:
Apr 2 01:09:58 server kernel: [483707.698456] ffff88102ef2b7d0 ffffffff81183477 ffff88102ef2b7e0 ffffffff818
06af0
Apr 2 01:09:58 server kernel: [483707.698457] ffff88102ef2b7f0 ffffffff8125dfd5 ffff88102ef2b8c8 ffffffff811
9601a
Apr 2 01:09:58 server kernel: [483707.698459] 0000000000000003 0000000000000001 0000000000000000 0000000000e
876d8
Apr 2 01:09:58 server kernel: [483707.698461] Call Trace:
Apr 2 01:09:58 server kernel: [483707.698463] [<ffffffff81183
Apr 2 01:09:58 server kernel: [483707.698465] [<ffffffff81806
Apr 2 01:09:58 server kernel: [483707.698467] [<ffffffff8125d
Apr 2 01:09:58 server kernel: [483707.698469] [<ffffffff81196
Apr 2 01:09:58 server kernel: [483707.698471] [<ffffffff8119a
Apr 2 01:09:58 server kernel: [483707.698473] [<ffffffff8119a
Apr 2 01:09:58 server kernel: [483707.698475] [<ffffffff81197
Apr 2 01:09:58 server kernel: [483707.698477] [<ffffffff8119a
Apr 2 01:09:58 server kernel: [483707.698479] [<ffffffff811fb
Apr 2 01:09:58 server kernel: [483707.698482] [<ffffffff8118e
Apr 2 01:09:58 server kernel: [483707.698483] [<ffffffff811d4
Apr 2 01:09:58 server kernel: [483707.698485] [<ffffffff81185
Apr 2 01:09:58 server kernel: [483707.698487] [<ffffffff81186
Apr 2 01:09:58 server kernel: [483707.698488] [<ffffffff81186
Apr 2 01:09:58 server kernel: [483707.698490] [<ffffffff8128e
Apr 2 01:09:58 server kernel: [483707.698492] [<ffffffff81185
Apr 2 01:09:58 server kernel: [483707.698494] [<ffffffff8121a
Apr 2 01:09:58 server kernel: [483707.698496] [<ffffffff81187
Apr 2 01:09:58 server kernel: [483707.698498] [<ffffffff81283
Apr 2 01:09:58 server kernel: [483707.698500] [<ffffffff81200
Apr 2 01:09:58 server kernel: [483707.698501] [<ffffffff81200
Apr 2 01:09:58 server kernel: [483707.698503] [<ffffffff81200
Apr 2 01:09:58 server kernel: [483707.698504] [<ffffffff81201
Apr 2 01:09:58 server kernel: [483707.698506] [<ffffffff81806
Apr 2 01:09:58 server kernel: [483707.698507] Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 79 01 00 48 03 14 c5 00 99 f3 81 48 89 0a 8b 41 08 85 c0 75 0d f3 90 8b 41 08 <85> c0 74 f7 eb 02 f3 90 8b 17 66 85 d2 75 f7 39 f2 66 90 75 0f
The problem exists for a while now. None of the latest kernel updates helped. Can you please advice me what do do? Thank you!
ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-
ProcVersionSign
Uname: Linux 4.4.0-67-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.23
Architecture: amd64
Date: Tue Apr 4 12:38:13 2017
InstallationDate: Installed on 2016-02-22 (406 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
SourcePackage: linux-lts-xenial
UpgradeStatus: No upgrade log present (probably fresh install)
We ran all tests with Dell support and they tell us, there is definitely no hardware problem. Any ideas on how to proceed? Thanks