crash (on ppc64) when restarting numad while huge guest is active
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Incomplete
|
Low
|
Unassigned | ||
numad (Ubuntu) |
Incomplete
|
Low
|
Unassigned | ||
Bionic |
Expired
|
Undecided
|
Unassigned |
Bug Description
while verifying bug 1832915 I found "by accident" that this crash (at least on our power 9 box seems to happen often.
Case:
- huge kvm guest running
- restart numad
=> Numad crashes.
Steps to recreate:
1. deploy P9 Bionic (or later) system
2. install uvtool
$ apt install uvttool-libvirt
3. log out & in to get permissions right
4. sync images
$ uvt-simplestrea
6. install and manually start numad
$ apt install numad
$ systemctl start numad
5. spawn guest
$ uvt-kvm create --memory $((1024*64)) --cpu 64 --password ubuntu eoan arch=ppc64el release=eoan label=daily
6. restart numad
$ systemctl restart numad
The crash seems related to some re-init of a static structure:
stack trace ---
#0 tcache_get (tc_idx=<optimized out>) at malloc.c:2950
e = 0x9a5ddc1950
e = <optimized out>
#1 __GI___libc_malloc (bytes=16) at malloc.c:3058
ar_ptr = <optimized out>
victim = <optimized out>
hook = <optimized out>
tbytes = <optimized out>
tc_idx = <optimized out>
#2 0x0000009a300279a0 in ?? ()
No symbol table info available.
#3 0x0000009a3002cad8 in ?? ()
No symbol table info available.
#4 0x0000009a30023794 in ?? ()
No symbol table info available.
#5 0x00007a6150998278 in generic_start_main (main=0x9a30022a00, argc=<optimized out>, argv=0x7fffe93a
self = 0x7a6150dc38d0
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {84650536672305
#6 0x00007a6150998484 in __libc_start_main (argc=<optimized out>, argv=<optimized out>, ev=<optimized out>, auxvec=<optimized out>, rtld_fini=
No locals.
#7 0x0000000000000000 in ?? ()
No symbol table info available.
--- source code stack trace ---
#0 tcache_get (tc_idx=<optimized out>) at malloc.c:2950
[Error: malloc.c was not found in source tree]
#1 __GI___libc_malloc (bytes=16) at malloc.c:3058
[Error: malloc.c was not found in source tree]
#2 0x0000009a300279a0 in ?? ()
#3 0x0000009a3002cad8 in ?? ()
#4 0x0000009a30023794 in ?? ()
#5 0x00007a6150998278 in generic_start_main (main=0x9a30022a00, argc=<optimized out>, argv=0x7fffe93a
[Error: libc-start.c was not found in source tree]
#6 0x00007a6150998484 in __libc_start_main (argc=<optimized out>, argv=<optimized out>, ev=<optimized out>, auxvec=<optimized out>, rtld_fini=
[Error: libc-start.c was not found in source tree]
#7 0x0000000000000000 in ?? ()
I thought at first this would be related to my debug rebuilds, but it seems to appear as-is in the version as it is in the Ubuntu Archive.
summary: |
- crash (on ppc64) hen restarting numad while huge guest is active + crash (on ppc64) when restarting numad while huge guest is active |
description: | updated |
Changed in numad (Ubuntu): | |
assignee: | nobody → bugproxy (bugproxy) |
tags: | added: architecture-ppc64le bugnameltc-179340 severity-low targetmilestone-inin--- |
Changed in ubuntu-power-systems: | |
status: | New → Confirmed |
assignee: | nobody → bugproxy (bugproxy) |
importance: | Undecided → Medium |
tags: | added: universe |
tags: | added: reverse-proxy-bugzilla |
Changed in ubuntu-power-systems: | |
status: | Triaged → Incomplete |
tags: | added: hwe-long-running |
Changed in numad (Ubuntu): | |
status: | Confirmed → Incomplete |
Changed in ubuntu-power-systems: | |
assignee: | bugproxy (bugproxy) → nobody |
Changed in numad (Ubuntu): | |
assignee: | bugproxy (bugproxy) → nobody |
It seems to be an alloc call with 16 bytes and this is clearly not OOM.
Needs to be analyzed deeper ...