deadlock on balloon deflation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Triaged
|
High
|
Gavin Guo | ||
Trusty |
Fix Released
|
High
|
Gavin Guo |
Bug Description
Latest Ubuntu trusty with kernel 3.13.0-91-generic run in a KVM virtual machine with virtio_balloon hangs when the previously inflated balloon is deflated.
The problem is in the recently committed backport:
commit 838478a8496ef96
Author: Konstantin Khlebnikov <email address hidden>
AuthorDate: Mon May 16 14:43:10 2016 +0800
Commit: Kamal Mostafa <email address hidden>
CommitDate: Fri Jun 10 07:15:37 2016 -0700
mm/
BugLink: http://
Sasha Levin reported KASAN splash inside isolate_
Problem is in the function __is_movable_
AS_BALLOON_MAP in page->mapping-
against anonymous pages. As result it tried to check address space flags
inside struct anon_vma.
Further investigation shows more problems in current implementation:
* Special branch in __unmap_and_move() never works:
balloon_
_
balloon_
normal migration path. virtballoon_
MIGRATEPA
move_
newpage-
balloon an all ability for further migration.
* lru_lock erroneously required in isolate_
isolation ballooned page. This function releases lru_lock periodically,
this makes migration mostly impossible for some pages.
* balloon_
balloon_
picking page from list and locking page_lock. Race is rare because they
use trylock_page() for locking.
This patch fixes all of them.
Instead of fake mapping with special flag this patch uses special state of
page-
PAGE_
directly in struct page makes everything safer and easier.
PagePrivate is used to mark pages present in page list (i.e. not
isolated, like PageLRU for normal pages). It replaces special rules for
reference counter and makes balloon migration similar to migration of
normal pages. This flag is protected by page_lock together with link to
the balloon device.
Signed-off-by: Konstantin Khlebnikov <email address hidden>
Reported-by: Sasha Levin <email address hidden>
Link: http://<email address hidden>
Cc: Rafael Aquini <email address hidden>
Cc: Andrey Ryabinin <email address hidden>
Cc: <email address hidden> [3.8+]
Signed-off-by: Andrew Morton <email address hidden>
Signed-off-by: Linus Torvalds <email address hidden>
(backported from commit d6d86c0a7f8ddc5
Signed-off-by: Gavin Guo <email address hidden>
Conflicts:
mm/
mm/migrate.c
Acked-by: Stefan Bader <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
It was applied after another backport:
commit 47618e32c2a7295
Author: Minchan Kim <email address hidden>
AuthorDate: Mon Dec 28 08:35:13 2015 +0900
Commit: Luis Henriques <email address hidden>
CommitDate: Mon Feb 22 19:31:53 2016 +0000
virtio_balloon: fix race between migration and ballooning
BugLink: http://
commit 21ea9fb69e7c4b1
In balloon_
(ie, list_for_
be isolated by compaction and then list_del by isolation could
poison the page->lru.
access wrong address like this. This patch fixes the bug.
general protection fault: 0000 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 2 PID: 82 Comm: vballoon Not tainted 4.4.0-rc5-
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800a7ff0000 ti: ffff8800a7fec000 task.ti: ffff8800a7fec000
RIP: 0010:[<
RSP: 0018:ffff8800a7
RAX: ffff88013fff9a70 RBX: ffffea000056fe00 RCX: 0000000000002b7d
RDX: ffff88013fff9a70 RSI: ffffea000056fe00 RDI: ffff88013fff9a68
RBP: ffff8800a7fefde8 R08: ffffea000056fda0 R09: 0000000000000000
R10: ffff8800a7fefd90 R11: 0000000000000001 R12: dead0000000000e0
R13: ffffea000056fe20 R14: ffff880138809070 R15: ffff880138809060
FS: 000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f229c10e000 CR3: 00000000b8b53000 CR4: 00000000000006a0
Stack:
0000000000
0000000000
ffff880138
Call Trace:
[<
[<
[<
[<
[<
[<
[<
[<
[<
Code: 8d 60 e0 0f 84 af 00 00 00 48 8b 43 20 a8 01 75 3b 48 89 d8 f0 0f ba 28 00 72 10 48 8b 03 f6 c4 08 75 2f 48 89 df e8 8c 83 f9 ff <49> 8b 44 24 20 4d 8d 6c 24 20 48 83 e8 20 4d 39 f5 74 7a 4c 89
RIP [<ffffffff8115e
RSP <ffff8800a7fefdc0>
---[ end trace 43cf28060d708d5f ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Signed-off-by: Minchan Kim <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>
Acked-by: Rafael Aquini <email address hidden>
Signed-off-by: Kamal Mostafa <email address hidden>
This resulted in the following code:
82 struct page *balloon_
83 {
84 struct page *page, *tmp;
85 unsigned long flags;
86 bool dequeued_page;
87
88 dequeued_page = false;
89 spin_lock_
90 list_for_
91 /*
92 * Block others from accessing the 'page' while we get around
93 * establishing additional references and preparing the 'page'
94 * to be released by the balloon driver.
95 */
96 if (trylock_
97 #ifdef CONFIG_
98 if (!PagePrivate(
99 /* raced with isolation */
100 unlock_page(page);
101 continue;
102 }
103 #endif
104 spin_lock_
105 balloon_
106 spin_unlock_
107 unlock_page(page);
108 dequeued_page = true;
109 break;
110 }
111 }
112 spin_unlock_
Note the line 104 takes the spinlock already taken in line 89.
CVE References
tags: | added: kernel-da-key |
Changed in linux-lts-trusty (Ubuntu): | |
importance: | Undecided → High |
affects: | linux-lts-trusty (Ubuntu) → linux (Ubuntu) |
Changed in linux (Ubuntu Trusty): | |
status: | New → Triaged |
Changed in linux (Ubuntu): | |
status: | Confirmed → Triaged |
Changed in linux (Ubuntu Trusty): | |
importance: | Undecided → High |
Changed in linux (Ubuntu Trusty): | |
assignee: | nobody → Gavin Guo (mimi0213kimo) |
Changed in linux (Ubuntu): | |
assignee: | nobody → Gavin Guo (mimi0213kimo) |
Changed in linux (Ubuntu Trusty): | |
status: | Triaged → Fix Committed |
tags: |
added: verification-done-trusty removed: verification-needed-trusty |
Status changed to 'Confirmed' because the bug affects multiple users.