System hangs apparently randomly when disconnecting iScsi volumes

Bug #1449910 reported by Roberto
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned

Bug Description

We have several servers here which mount Ubuntu LTS Trusty Tahr 14.04. We use LVM snapshots to do backups during the night. Randomly (witha rate of 1 event every 10-12 days) the LVM snapshot creation runs in some problem. The server hangs and it must be rebooted hardly. No shell, no even any screen output, just a black screen, no ping, nothing. And Magic SysRq key doesn't help.
No logs are written either (I also tried redirect logs to another machine, just in case). This happens with different hardware, and different backup software. The only constant is LVM snapshots. Kernel is 3.13.0-49. We tried the 14.10 kernel (3.16.0-34), we had the same hangs, but this time we had something logged:

Apr 21 23:02:56 server-name kernel: [654840.108023] INFO: task kswapd0:50 blocked for more than 120 seconds.
Apr 21 23:02:56 server-name kernel: [654840.108145] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:02:56 server-name kernel: [654840.108245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:02:56 server-name kernel: [654840.108361] kswapd0 D ffff88007fc130c0 0 50 2 0x00000000
Apr 21 23:02:56 server-name kernel: [654840.108367] ffff880077bcf998 0000000000000046 ffff880077bd0000 ffff880077bcffd8
Apr 21 23:02:56 server-name kernel: [654840.108372] 00000000000130c0 00000000000130c0 ffff88007bdef010 ffffc90010d3c000
Apr 21 23:02:56 server-name kernel: [654840.108377] ffffc90010d3c0d8 0000000000000000 0000000000000000 ffffc90010d3c000
Apr 21 23:02:56 server-name kernel: [654840.108382] Call Trace:
/0x70
Apr 21 23:02:56 server-name kernel: [654840.108419] [<ffffffffc0707aad>] reiserfs_wait_on_write_block+0x4d/0x80 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108426] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:02:56 server-name kernel: [654840.108437] [<ffffffffc0709271>] do_journal_begin_r+0xe1/0x3e0 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108443] [<ffffffff810a5fa8>] ? __enqueue_entity+0x78/0x80
Apr 21 23:02:56 server-name kernel: [654840.108454] [<ffffffffc070967a>] journal_begin+0x8a/0x160 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108464] [<ffffffffc06f67fc>] reiserfs_release_dquot+0x4c/0xd0 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108470] [<ffffffff813ab4f1>] ? __percpu_counter_add+0x51/0x70
Apr 21 23:02:56 server-name kernel: [654840.108476] [<ffffffff8123003d>] dqput+0x9d/0x200
Apr 21 23:02:56 server-name kernel: [654840.108480] [<ffffffff8123166d>] __dquot_drop+0x5d/0x70
Apr 21 23:02:56 server-name kernel: [654840.108485] [<ffffffff812316ad>] dquot_drop+0x2d/0x40
Apr 21 23:02:56 server-name kernel: [654840.108494] [<ffffffffc06ec7a0>] reiserfs_evic
t_inode+0xa0/0x180 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108501] [<ffffffff811ff7de>] ? inode_wait_for_writeback+0x2e/0x40
Apr 21 23:02:56 server-name kernel: [654840.108507] [<ffffffff811eef34>] evict+0xb4/0x180
Apr 21 23:02:56 server-name kernel: [654840.108511] [<ffffffff811ef039>] dispose_list+0x39/0x50
Apr 21 23:02:56 server-name kernel: [654840.108515] [<ffffffff811eff47>] prune_icache_sb+0x47/0x60
Apr 21 23:02:56 server-name kernel: [654840.108520] [<ffffffff811d78f5>] super_cache_scan+0x105/0x170
Apr 21 23:02:56 server-name kernel: [654840.108526] [<ffffffff81171b78>] shrink_slab_node+0x138/0x290
Apr 21 23:02:56 server-name kernel: [654840.108532] [<ffffffff810f8ccb>] ? css_next_descendant_pre+0x3b/0x40
Apr 21 23:02:56 server-name kernel: [654840.108536] [<ffffffff8117367b>] shrink_slab+0x8b/0x160
Apr 21 23:02:56 server-name kernel: [654840.108540] [<ffffffff811770e2>] balance_pgdat+0x3f2/0x620
Apr 21 23:02:56 server-name kernel: [654840.108544] [<ffffffff8117746b>] kswapd+0x15b/0x3f0
Apr 21 23:02:56 server-name kernel: [654840.108549] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:02:56 server-name kernel: [654840.108552] [<ffffffff81177310>] ? balance_pgdat+0x620/0x620
Apr 21 23:02:56 server-name kernel: [654840.108558] [<ffffffff81091332>] kthread+0xd2/0xf0
Apr 21 23:02:56 server-name kernel: [654840.108562] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:02:56 server-name kernel: [654840.108567] [<ffffffff8176c9bc>] ret_from_fork+0x7c/0xb0
Apr 21 23:02:56 server-name kernel: [654840.108571] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:02:56 server-name kernel: [654840.108595] INFO: task kworker/1:1:25606 blocked for more than 120 seconds.
Apr 21 23:02:56 server-name kernel: [654840.108709] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:02:56 server-name kernel: [654840.108809] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:02:56 server-name kernel: [654840.108923] kworker/1:1 D ffff88007fc530c0 0 25606 2 0x00000000
Apr 21 23:02:56 server-name kernel: [654840.108936] Workqueue: events_long flush_old_commits [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108939] ffff88000311fc90 0000000000000046 ffff880079db65e0 ffff88000311ffd8
Apr 21 23:02:56 server-name kernel: [654840.108943] 00000000000130c0 00000000000130c0 ffff880036b065e0 ffffc90010d3c000
Apr 21 23:02:56 server-name kernel: [654840.108948] ffffc90010d3c0d8 0000000000000000 0000000000000000 ffffc90010d3c000
Apr 21 23:02:56 server-name kernel: [654840.108952] Call Trace:
Apr 21 23:02:56 server-name kernel: [654840.108956] [<ffffffff817689d9>] schedule+0x29/0x70
Apr 21 23:02:56 server-name kernel: [654840.108967] [<ffffffffc0707aad>] reiserfs_wait_on_write_block+0x4d/0x80 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108972] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:02:56 server-name kernel: [654840.108983] [<ffffffffc0709271>] do_journal_begin_r+0xe1/0x3e0 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.108989] [<ffffffff8101c355>] ? native_sched_clock+0x35/0x90
Apr 21 23:02:56 server-name kernel: [654840.109000] [<ffffffffc070967a>] journal_begin+0x8a/0x160 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.109010] [<ffffffffc06f5a44>] reiserfs_sync_fs+0x34/0x70 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.109019] [<ffffffffc06f5aca>] flush_old_commits+0x4a/0x60 [reiserfs]
Apr 21 23:02:56 server-name kernel: [654840.109024] [<ffffffff8108a322>] process_one_work+0x182/0x450
Apr 21 23:02:56 server-name kernel: [654840.109028] [<ffffffff8108aa91>] worker_thread+0x121/0x570
Apr 21 23:02:56 server-name kernel: [654840.109032] [<ffffffff8108a970>] ? rescuer_thread+0x380/0x380
Apr 21 23:02:56 server-name kernel: [654840.109036] [<ffffffff81091332>] kthread+0xd2/0xf0
Apr 21 23:02:56 server-name kernel: [654840.109040] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:02:56 server-name kernel: [654840.109044] [<ffffffff8176c9bc>] ret_from_fork+0x7c/0xb0
Apr 21 23:02:56 server-name kernel: [654840.109048] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:02:56 server-name kernel: [654840.109053] INFO: task lvcreate:25648 blocked for more than 120 seconds.
Apr 21 23:02:56 server-name kernel: [654840.109159] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:02:56 server-name kernel: [654840.109258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:02:56 server-name kernel: [654840.109372] lvcreate D ffff88007fc530c0 0 25648 25623 0x00000000
Apr 21 23:02:56 server-name kernel: [654840.109377] ffff880000053b60 0000000000000086 ffff8800773432f0 ffff880000053fd8
Apr 21 23:02:56 server-name kernel: [654840.109381] 00000000000130c0 00000000000130c0 ffff88007c0c9460 ffff8800773432f0
Apr 21 23:02:56 server-name kernel: [654840.109385] ffff880037a23080 ffff880037a23068 ffffffff00000000 ffff880037a23070
Apr 21 23:02:56 server-name kernel: [654840.109390] Call Trace:
Apr 21 23:02:56 server-name kernel: [654840.109394] [<ffffffff817689d9>] schedule+0x29/0x70
Apr 21 23:02:56 server-name kernel: [654840.109398] [<ffffffff8176b77d>] rwsem_down_write_failed+0x1ed/0x390
Apr 21 23:02:56 server-name kernel: [654840.109404] [<ffffffff81011627>] ? __switch_to+0x167/0x590
Apr 21 23:02:56 server-name kernel: [654840.109409] [<ffffffff81394013>] call_rwsem_down_write_failed+0x13/0x20
Apr 21 23:02:56 server-name kernel: [654840.109416] [<ffffffff815e6c05>] ? dm_free_md_mempools+0x35/0x40
Apr 21 23:02:56 server-name kernel: [654840.109419] [<ffffffff8176b0ad>] ? down_write+0x2d/0x40
Apr 21 23:02:56 server-name kernel: [654840.109423] [<ffffffff811d6a6d>] thaw_super+0x1d/0xb0
Apr 21 23:02:56 server-name kernel: [654840.109428] [<ffffffff81209935>] thaw_bdev+0x65/0x80
Apr 21 23:02:56 server-name kernel: [654840.109432] [<ffffffff815e4a80>] unlock_fs.part.23+0x20/0x40
Apr 21 23:02:56 server-name kernel: [654840.109436] [<ffffffff815e6898>] dm_resume+0xb8/0xd0
Apr 21 23:02:56 server-name kernel: [654840.109441] [<ffffffff815eba5b>] dev_suspend+0x12b/0x220
Apr 21 23:02:56 server-name kernel: [654840.109445] [<ffffffff815eb930>] ? table_load+0x350/0x350
Apr 21 23:02:56 server-name kernel: [654840.109449] [<ffffffff815ec2f5>] ctl_ioctl+0x255/0x500
Apr 21 23:02:56 server-name kernel: [654840.109454] [<ffffffff812d75ef>] ? SYSC_semtimedop+0x23f/0xcf0
Apr 21 23:02:56 server-name kernel: [654840.109459] [<ffffffff815ec5b3>] dm_ctl_ioctl+0x13/0x20
Apr 21 23:02:56 server-name kernel: [654840.109463] [<ffffffff811e7090>] do_vfs_ioctl+0x2e0/0x4c0
Apr 21 23:02:56 server-name kernel: [654840.109468] [<ffffffff812ebff6>] ? security_sem_associate+0x16/0x20
Apr 21 23:02:56 server-name kernel: [654840.109472] [<ffffffff812d65a9>] ? sem_security+0x9/0x10
Apr 21 23:02:56 server-name kernel: [654840.109476] [<ffffffff812d4a09>] ? ipcget+0x149/0x1c0
Apr 21 23:02:56 server-name kernel: [654840.109480] [<ffffffff811e72f1>] SyS_ioctl+0x81/0xa0
Apr 21 23:02:56 server-name kernel: [654840.109484] [<ffffffff8176ca6d>] system_call_fastpath+0x1a/0x1f
Apr 21 23:04:56 server-name kernel: [654960.108025] INFO: task kswapd0:50 blocked for more than 120 seconds.
Apr 21 23:04:56 server-name kernel: [654960.108124] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:04:56 server-name kernel: [654960.108224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:04:56 server-name kernel: [654960.108339] kswapd0 D ffff88007fc130c0 0 50 2 0x00000000
Apr 21 23:04:56 server-name kernel: [654960.108346] ffff880077bcf998 0000000000000046 ffff880077bd0000 ffff880077bcffd8
Apr 21 23:04:56 server-name kernel: [654960.108351] 00000000000130c0 00000000000130c0 ffff88007bdef010 ffffc90010d3c000
Apr 21 23:04:56 server-name kernel: [654960.108355] ffffc90010d3c0d8 0000000000000000 0000000000000000 ffffc90010d3c000
Apr 21 23:04:56 server-name kernel: [654960.108360] Call Trace:
Apr 21 23:04:56 server-name kernel: [654960.108372] [<ffffffff817689d9>] schedule+0x29/0x70
Apr 21 23:04:56 server-name kernel: [654960.108397] [<ffffffffc0707aad>] reiserfs_wait_on_write_block+0x4d/0x80 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108404] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:04:56 server-name kernel: [654960.108416] [<ffffffffc0709271>] do_journal_begin_r+0xe1/0x3e0 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108422] [<ffffffff810a5fa8>] ? __enqueue_entity+0x78/0x80
Apr 21 23:04:56 server-name kernel: [654960.108433] [<ffffffffc070967a>] journal_begin+0x8a/0x160 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108443] [<ffffffffc06f67fc>] reiserfs_release_dquot+0x4c/0xd0 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108449] [<ffffffff813ab4f1>] ? __percpu_counter_add+0x51/0x70
Apr 21 23:04:56 server-name kernel: [654960.108455] [<ffffffff8123003d>] dqput+0x9d/0x200
Apr 21 23:04:56 server-name kernel: [654960.108459] [<ffffffff8123166d>] __dquot_drop+0x5d/0x70
Apr 21 23:04:56 server-name kernel: [654960.108464] [<ffffffff812316ad>] dquot_drop+0x2d/0x40
Apr 21 23:04:56 server-name kernel: [654960.108473] [<ffffffffc06ec7a0>] reiserfs_evict_inode+0xa0/0x180 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108480] [<ffffffff811ff7de>] ? inode_wait_for_writeback+0x2e/0x40
Apr 21 23:04:56 server-name kernel: [654960.108485] [<ffffffff811eef34>] evict+0xb4/0x180
Apr 21 23:04:56 server-name kernel: [654960.108490] [<ffffffff811ef039>] dispose_list+0x39/0x50
Apr 21 23:04:56 server-name kernel: [654960.108494] [<ffffffff811eff47>] prune_icache_sb+0x47/0x60
Apr 21 23:04:56 server-name kernel: [654960.108499] [<ffffffff811d78f5>] super_cache_scan+0x105/0x170
Apr 21 23:04:56 server-name kernel: [654960.108504] [<ffffffff81171b78>] shrink_slab_node+0x138/0x290
Apr 21 23:04:56 server-name kernel: [654960.108510] [<ffffffff810f8ccb>] ? css_next_descendant_pre+0x3b/0x40
Apr 21 23:04:56 server-name kernel: [654960.108515] [<ffffffff8117367b>] shrink_slab+0x8b/0x160
Apr 21 23:04:56 server-name kernel: [654960.108519] [<ffffffff811770e2>] balance_pgdat+0x3f2/0x620
Apr 21 23:04:56 server-name kernel: [654960.108523] [<ffffffff8117746b>] kswapd+0x15b/0x3f0
Apr 21 23:04:56 server-name kernel: [654960.108528] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:04:56 server-name kernel: [654960.108531] [<ffffffff81177310>] ? balance_pgdat+0x620/0x620
Apr 21 23:04:56 server-name kernel: [654960.108536] [<ffffffff81091332>] kthread+0xd2/0xf0
Apr 21 23:04:56 server-name kernel: [654960.108541] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:04:56 server-name kernel: [654960.108546] [<ffffffff8176c9bc>] ret_from_fork+0x7c/0xb0
Apr 21 23:04:56 server-name kernel: [654960.108550] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:04:56 server-name kernel: [654960.108573] INFO: task kworker/1:1:25606 blocked for more than 120 seconds.
Apr 21 23:04:56 server-name kernel: [654960.108688] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:04:56 server-name kernel: [654960.108788] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:04:56 server-name kernel: [654960.108905] kworker/1:1 D ffff88007fc530c0 0 25606 2 0x00000000
Apr 21 23:04:56 server-name kernel: [654960.108918] Workqueue: events_long flush_old_commits [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108920] ffff88000311fc90 0000000000000046 ffff880079db65e0 ffff88000311ffd8
Apr 21 23:04:56 server-name kernel: [654960.108925] 00000000000130c0 00000000000130c0 ffff880036b065e0 ffffc90010d3c000
Apr 21 23:04:56 server-name kernel: [654960.108929] ffffc90010d3c0d8 0000000000000000 0000000000000000 ffffc90010d3c000
Apr 21 23:04:56 server-name kernel: [654960.108934] Call Trace:
Apr 21 23:04:56 server-name kernel: [654960.108938] [<ffffffff817689d9>] schedule+0x29/0x70
Apr 21 23:04:56 server-name kernel: [654960.108949] [<ffffffffc0707aad>] reiserfs_wait_on_write_block+0x4d/0x80 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108953] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:04:56 server-name kernel: [654960.108964] [<ffffffffc0709271>] do_journal_begin_r+0xe1/0x3e0 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108971] [<ffffffff8101c355>] ? native_sched_clock+0x35/0x90
Apr 21 23:04:56 server-name kernel: [654960.108982] [<ffffffffc070967a>] journal_begin+0x8a/0x160 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.108991] [<ffffffffc06f5a44>] reiserfs_sync_fs+0x34/0x70 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.109001] [<ffffffffc06f5aca>] flush_old_commits+0x4a/0x60 [reiserfs]
Apr 21 23:04:56 server-name kernel: [654960.109006] [<ffffffff8108a322>] process_one_work+0x182/0x450
Apr 21 23:04:56 server-name kernel: [654960.109010] [<ffffffff8108aa91>] worker_thread+0x121/0x570
Apr 21 23:04:56 server-name kernel: [654960.109014] [<ffffffff8108a970>] ? rescuer_thread+0x380/0x380
Apr 21 23:04:56 server-name kernel: [654960.109018] [<ffffffff81091332>] kthread+0xd2/0xf0
Apr 21 23:04:56 server-name kernel: [654960.109022] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:04:56 server-name kernel: [654960.109026] [<ffffffff8176c9bc>] ret_from_fork+0x7c/0xb0
Apr 21 23:04:56 server-name kernel: [654960.109030] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:04:56 server-name kernel: [654960.109035] INFO: task lvcreate:25648 blocked for more than 120 seconds.
Apr 21 23:04:56 server-name kernel: [654960.109141] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:04:56 server-name kernel: [654960.109240] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:04:56 server-name kernel: [654960.109354] lvcreate D ffff88007fc530c0 0 25648 25623 0x00000000
Apr 21 23:04:56 server-name kernel: [654960.109359] ffff880000053b60 0000000000000086 ffff8800773432f0 ffff880000053fd8
Apr 21 23:04:56 server-name kernel: [654960.109363] 00000000000130c0 00000000000130c0 ffff88007c0c9460 ffff8800773432f0
Apr 21 23:04:56 server-name kernel: [654960.109367] ffff880037a23080 ffff880037a23068 ffffffff00000000 ffff880037a23070
Apr 21 23:04:56 server-name kernel: [654960.109372] Call Trace:
Apr 21 23:04:56 server-name kernel: [654960.109376] [<ffffffff817689d9>] schedule+0x29/0x70
Apr 21 23:04:56 server-name kernel: [654960.109380] [<ffffffff8176b77d>] rwsem_down_write_failed+0x1ed/0x390
Apr 21 23:04:56 server-name kernel: [654960.109386] [<ffffffff81011627>] ? __switch_to+0x167/0x590
Apr 21 23:04:56 server-name kernel: [654960.109391] [<ffffffff81394013>] call_rwsem_down_write_failed+0x13/0x20
Apr 21 23:04:56 server-name kernel: [654960.109398] [<ffffffff815e6c05>] ? dm_free_md_mempools+0x35/0x40
Apr 21 23:04:56 server-name kernel: [654960.109401] [<ffffffff8176b0ad>] ? down_write+0x2d/0x40
Apr 21 23:04:56 server-name kernel: [654960.109405] [<ffffffff811d6a6d>] thaw_super+0x1d/0xb0
Apr 21 23:04:56 server-name kernel: [654960.109410] [<ffffffff81209935>] thaw_bdev+0x65/0x80
Apr 21 23:04:56 server-name kernel: [654960.109414] [<ffffffff815e4a80>] unlock_fs.part.23+0x20/0x40
Apr 21 23:04:56 server-name kernel: [654960.109419] [<ffffffff815e6898>] dm_resume+0xb8/0xd0
Apr 21 23:04:56 server-name kernel: [654960.109424] [<ffffffff815eba5b>] dev_suspend+0x12b/0x220
Apr 21 23:04:56 server-name kernel: [654960.109428] [<ffffffff815eb930>] ? table_load+0x350/0x350
Apr 21 23:04:56 server-name kernel: [654960.109431] [<ffffffff815ec2f5>] ctl_ioctl+0x255/0x500
Apr 21 23:04:56 server-name kernel: [654960.109437] [<ffffffff812d75ef>] ? SYSC_semtimedop+0x23f/0xcf0
Apr 21 23:04:56 server-name kernel: [654960.109441] [<ffffffff815ec5b3>] dm_ctl_ioctl+0x13/0x20
Apr 21 23:04:56 server-name kernel: [654960.109446] [<ffffffff811e7090>] do_vfs_ioctl+0x2e0/0x4c0
Apr 21 23:04:56 server-name kernel: [654960.109451] [<ffffffff812ebff6>] ? security_sem_associate+0x16/0x20
Apr 21 23:04:56 server-name kernel: [654960.109455] [<ffffffff812d65a9>] ? sem_security+0x9/0x10
Apr 21 23:04:56 server-name kernel: [654960.109459] [<ffffffff812d4a09>] ? ipcget+0x149/0x1c0
Apr 21 23:04:56 server-name kernel: [654960.109462] [<ffffffff811e72f1>] SyS_ioctl+0x81/0xa0
Apr 21 23:04:56 server-name kernel: [654960.109467] [<ffffffff8176ca6d>] system_call_fastpath+0x1a/0x1f
Apr 21 23:06:10 server-name nmbd[962]: [2015/04/21 23:06:10.798735, 0] ../source3/nmbd/nmbd_packets.c:759(queue_query_name)
Apr 21 23:06:10 server-name nmbd[962]: queue_query_name: interface 0 has NULL IP address !
Apr 21 23:06:56 server-name kernel: [655080.108025] INFO: task kswapd0:50 blocked for more than 120 seconds.
Apr 21 23:06:56 server-name kernel: [655080.108123] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:06:56 server-name kernel: [655080.108223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:06:56 server-name kernel: [655080.108347] kswapd0 D ffff88007fc130c0 0 50 2 0x00000000
Apr 21 23:06:56 server-name kernel: [655080.108354] ffff880077bcf998 0000000000000046 ffff880077bd0000 ffff880077bcffd8
Apr 21 23:06:56 server-name kernel: [655080.108359] 00000000000130c0 00000000000130c0 ffff88007bdef010 ffffc90010d3c000
Apr 21 23:06:56 server-name kernel: [655080.108363] ffffc90010d3c0d8 0000000000000000 0000000000000000 ffffc90010d3c000
Apr 21 23:06:56 server-name kernel: [655080.108369] Call Trace:
Apr 21 23:06:56 server-name kernel: [655080.108379] [<ffffffff817689d9>] schedule+0x29/0x70
_on_write_block+0x4d/0x80 [reiserfs]
Apr 21 23:06:56 server-name kernel: [655080.108413] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:06:56 server-name kernel: [655080.108424] [<ffffffffc0709271>] do_journal_begin_r+0xe1/0x3e0 [reiserfs]
Apr 21 23:06:56 server-name kernel: [655080.108430] [<ffffffff810a5fa8>] ? __enqueue_entity+0x78/0x80
Apr 21 23:06:56 server-name kernel: [655080.108441] [<ffffffffc070967a>] journal_begin+0x8a/0x160 [reiserfs]
Apr 21 23:06:56 server-name kernel: [655080.108451] [<ffffffffc06f67fc>] reiserfs_release_dquot+0x4c/0xd0 [reiserfs]
Apr 21 23:06:56 server-name kernel: [655080.108457] [<ffffffff813ab4f1>] ? __percpu_counter_add+0x51/0x70
Apr 21 23:06:56 server-name kernel: [655080.108463] [<ffffffff8123003d>] dqput+0x9d/0x200
Apr 21 23:06:56 server-name kernel: [655080.108467] [<ffffffff8123166d>] __dquot_drop+0x5d/0x70
Apr 21 23:06:56 server-name kernel: [655080.108472] [<ffffffff812316ad>] dquot_drop+0x2d/0x40
Apr 21 23:06:56 server-name kernel: [655080.108481] [<ffffffffc06ec7a0>] reiserfs_evict_inode+0xa0/0x180 [reiserfs]
Apr 21 23:06:56 server-name kernel: [655080.108488] [<ffffffff811ff7de>] ? inode_wait_Apr 21 23:06:56 server-name kernel: [655080.108493] [<ffffffff811eef34>] evict+0xb4/0x180
Apr 21 23:06:56 server-name kernel: [655080.108498] [<ffffffff811ef039>] dispose_list+0x39/0x50
Apr 21 23:06:56 server-name kernel: [655080.108502] [<ffffffff811eff47>] prune_icache_sb+0x47/0x60
Apr 21 23:06:56 server-name kernel: [655080.108507] [<ffffffff811d78f5>] super_cache_scan+0x105/0x170
Apr 21 23:06:56 server-name kernel: [655080.108512] [<ffffffff81171b78>] shrink_slab_node+0x138/0x290
Apr 21 23:06:56 server-name kernel: [655080.108519] [<ffffffff810f8ccb>] ? css_next_descendant_pre+0x3b/0x40
Apr 21 23:06:56 server-name kernel: [655080.108523] [<ffffffff8117367b>] shrink_slab+0x8b/0x160
Apr 21 23:06:56 server-name kernel: [655080.108527] [<ffffffff811770e2>] balance_pgdat+0x3f2/0x620
Apr 21 23:06:56 server-name kernel: [655080.108531] [<ffffffff8117746b>] kswapd+0x15b/0x3f0
Apr 21 23:06:56 server-name kernel: [655080.108536] [<ffffffff810b4d70>] ? prepare_to_wait_event+0x100/0x100
Apr 21 23:06:56 server-name kernel: [655080.108539] [<ffffffff81177310>] ? balance_pgdat+0x620/0x620
Apr 21 23:06:56 server-name kernel: [655080.108545] [<ffffffff81091332>] kthread+0xd2/0xf0
Apr 21 23:06:56 server-name kernel: [655080.108549] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:06:56 server-name kernel: [655080.108554] [<ffffffff8176c9bc>] ret_from_fork+0x7c/0xb0
Apr 21 23:06:56 server-name kernel: [655080.108558] [<ffffffff81091260>] ? kthread_create_on_node+0x1c0/0x1c0
Apr 21 23:06:56 server-name kernel: [655080.108582] INFO: task kworker/1:1:25606 blocked for more than 120 seconds.
Apr 21 23:06:56 server-name kernel: [655080.108697] Not tainted 3.16.0-34-generic #47~14.04.1-Ubuntu
Apr 21 23:06:56 server-name kernel: [655080.108796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 21 23:06:56 server-name kernel: [655080.108910] kworker/1:1 D ffff88007fc530c0 0 25606 2 0x00000000
Apr 21 23:06:56 server-name kernel: [655080.108923] Workqueue: events_long flush_old_commits [reiserfs]
Apr 21 23:06:56 server-name kernel: [655080.108925] ffff88000311fc90 0000000000000046 ffff880079db65e0 ffff88000311ffd8

and this keeps going for some minutes, then the next morning the server is hung.

description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1449910

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: utopic
Revision history for this message
Roberto (roberto-colnaghi) wrote : Re: System hangs apparently randomly when creating LVM snapshots

I installed this apport thing, but it doesn't seem to work, it tries to open the webiste via links, but the website thinks I'm a bot.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Roberto (roberto-colnaghi) wrote :

The automated script did "tags: added: utopic ", but it's actually a Trusty (LTS) bug.

tags: added: trusty
removed: utopic
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.0 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.1-rc1-vivid/

Changed in linux (Ubuntu):
importance: Undecided → Medium
importance: Medium → High
tags: added: kernel-da-key
Revision history for this message
Roberto (roberto-colnaghi) wrote :

Thank you Joseph for your answer. I guess I should test the 4.1 kernel which can be found at the link you provided. I'll do this next Monday, just keep in mind that, being the problem apparently random, one should wait some time before drawing conclusions (I'll say one month without hangs, doing snapshots every night, to be sufficiently sure).

Revision history for this message
Roberto (roberto-colnaghi) wrote :

OK, I've just installed the latest mainline kernel (4.1.0-040100rc1-generic), let's see if this happens again. By the way, this last kernel doesn't contain the firmware file bnx2-mips-06-6.2.3.fw, which I needed on 2 of the 4 servers I'm testing this on. It's a network driver, I fixed the problem copying manually the requested file from /lib/firmware/3.13.0-51-generic/ , so from the old kernel set of firmware files.
I'll report back here if something happens, or if the problem seems fixed. I'll test this every night for at least one month, as I said in the previous comment.

Revision history for this message
Roberto (roberto-colnaghi) wrote :

So, all the systems have been up and running for 35 nights as of now. The upstream kernel is doing good. What can we do from here?
Thank you for your help.

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Roberto (roberto-colnaghi) wrote :

2 nights ago something happened. I have a 5th server which doesn't use LVM snapshots, but still it's backed up every night. I didn't install the mainline kernel on this machine, because we thought the problem was due to snapshots. And it crashed two nights ago, in the same way the other machines crashed, without using snapshots. It crashed after finishing the backup. I think the problem lies in the disconnection of the iscsi volume on which the machines do the backup. On the first reboot, I had the following in the kernel log:

[ 209.312097] ------------[ cut here ]------------
[ 209.312106] WARNING: CPU: 0 PID: 1808 at /build/buildd/linux-3.13.0/drivers/pci/pci.c:1444 pci_disable_device+0x9c/0xb0()
[ 209.312108] ipmi_si 0000:01:04.6: disabling already-disabled device
[ 209.312110] Modules linked in: ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sit tunnel4 ip_tunnel dm_crypt gpio_ich coretemp kvm joydev serio_raw hpilo lpc_ich ipmi_si(-) i3200_edac shpchp edac_core mac_hid lp parport reiserfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear raid1 hid_generic radeon i2c_algo_bit ttm drm_kms_helper psmouse drm pata_acpi tg3 usbhid hid ptp pps_core
[ 209.312151] CPU: 0 PID: 1808 Comm: modprobe Not tainted 3.13.0-53-generic #89-Ubuntu
[ 209.312153] Hardware name: HP ProLiant DL320 G5p, BIOS W05 04/03/2008
[ 209.312154] 0000000000000009 ffff88007abb3d40 ffffffff81722e1e ffff88007abb3d88
[ 209.312158] ffff88007abb3d78 ffffffff810677fd ffff88007c311000 ffff88007c2c5580
[ 209.312161] ffff88007c311000 00007f459c5473f0 00007ffd2b319338 ffff88007abb3dd8
[ 209.312164] Call Trace:
[ 209.312170] [<ffffffff81722e1e>] dump_stack+0x45/0x56
[ 209.312175] [<ffffffff810677fd>] warn_slowpath_common+0x7d/0xa0
[ 209.312177] [<ffffffff8106786c>] warn_slowpath_fmt+0x4c/0x50
[ 209.312182] [<ffffffff811a259d>] ? kfree+0xfd/0x140
[ 209.312186] [<ffffffff813a9c7c>] pci_disable_device+0x9c/0xb0
[ 209.312192] [<ffffffffa0398059>] ipmi_pci_remove+0x29/0x30 [ipmi_si]
[ 209.312195] [<ffffffff813ac68b>] pci_device_remove+0x3b/0xb0
[ 209.312200] [<ffffffff81498c3f>] __device_release_driver+0x7f/0xf0
[ 209.312203] [<ffffffff81499608>] driver_detach+0xb8/0xc0
[ 209.312207] [<ffffffff81498875>] bus_remove_driver+0x55/0xd0
[ 209.312210] [<ffffffff81499c7c>] driver_unregister+0x2c/0x50
[ 209.312213] [<ffffffff813ab179>] pci_unregister_driver+0x29/0x90
[ 209.312218] [<ffffffffa03984c4>] cleanup_ipmi_si+0xd4/0xf0 [ipmi_si]
[ 209.312222] [<ffffffff810e05d2>] SyS_delete_module+0x162/0x200
[ 209.312227] [<ffffffff81013ed7>] ? do_notify_resume+0x97/0xb0
[ 209.312231] [<ffffffff8173391d>] system_call_fastpath+0x1a/0x1f
[ 209.312233] ---[ end trace f6143eeb3c0e8dba ]---

I don't know if this is related, anyway it happened only on this particular reboot. I now installed the mainline kernel also on this machine.
I changed the title of this bug to reflect the additional information I got.

summary: - System hangs apparently randomly when creating LVM snapshots
+ System hangs apparently randomly when disconnecting iScsi volumes
Revision history for this message
Roberto (roberto-colnaghi) wrote :

After the installation of the mainline kernel on the remaining machine, the problem hasn't appeared anymore for more than one month, then unfortunately I couldn't test this anymore, because of other unrelated hardware problems. I plan to start again in the next days, still I think the problem is solved with the mainline kernel. So, what can we do now? Will the new release kernel be free of this bug?

Revision history for this message
Roberto (roberto-colnaghi) wrote :

After another month of testing, the mainline kernel never failed. So, is this bug fixed in some release version of the kernel?

Revision history for this message
Roberto (roberto-colnaghi) wrote :

It's been a full year now, without any problem. Definitely, the 4.1 kernel works well (and also the 4.0.1, which I'm running on one of the machines). Tomorrow I will start the software upgrade of these machines to the latest LTS distribution (16.04.1), which runs the 4.4 kernel, which I trust should be running well.
Are there any news about this bug?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.