I was trying to recreate this on x86 with a 128G guest and 64 CPUs.
I see numad action:
Thu Jul 18 10:51:22 2019: Advising pid 13197 (qemu-system-x86) move from nodes (0-1) to nodes (1)
Thu Jul 18 10:51:23 2019: PID 13197 moved to node(s) 1 in 0.19 seconds
Running stressapptest [1] in Host and guest for a while triggered more of those, without crashes (expected).
Restarting numad did not break it on this system.
A shutdown seems to do a re-evaluation and then go on as usual:
Thu Jul 18 11:00:54 2019: Shutting down numad
Thu Jul 18 11:00:54 2019: Registering numad version 20150602 PID 15629
Thu Jul 18 11:01:01 2019: Advising pid 15500 (stressapptest) move from nodes (0-1) to nodes (0-1)
Thu Jul 18 11:01:01 2019: PID 15500 moved to node(s) 0-1 in 0.0 seconds
Thu Jul 18 11:01:06 2019: Advising pid 13197 (qemu-system-x86) move from nodes (0-1) to nodes (0-1)
Thu Jul 18 11:01:06 2019: PID 13197 moved to node(s) 0-1 in 0.0 seconds
So the assumption for now is that this is either ppc64el specific or even specific to our particular P9 (dradis).
Lowering importance as it seems not to be a general issue.
I'll ping Frank if he wants to reverse mirror that to IBM.
I was trying to recreate this on x86 with a 128G guest and 64 CPUs.
I see numad action:
Thu Jul 18 10:51:22 2019: Advising pid 13197 (qemu-system-x86) move from nodes (0-1) to nodes (1)
Thu Jul 18 10:51:23 2019: PID 13197 moved to node(s) 1 in 0.19 seconds
Running stressapptest [1] in Host and guest for a while triggered more of those, without crashes (expected).
Restarting numad did not break it on this system.
A shutdown seems to do a re-evaluation and then go on as usual:
Thu Jul 18 11:00:54 2019: Shutting down numad
Thu Jul 18 11:00:54 2019: Registering numad version 20150602 PID 15629
Thu Jul 18 11:01:01 2019: Advising pid 15500 (stressapptest) move from nodes (0-1) to nodes (0-1)
Thu Jul 18 11:01:01 2019: PID 15500 moved to node(s) 0-1 in 0.0 seconds
Thu Jul 18 11:01:06 2019: Advising pid 13197 (qemu-system-x86) move from nodes (0-1) to nodes (0-1)
Thu Jul 18 11:01:06 2019: PID 13197 moved to node(s) 0-1 in 0.0 seconds
So the assumption for now is that this is either ppc64el specific or even specific to our particular P9 (dradis).
Lowering importance as it seems not to be a general issue.
I'll ping Frank if he wants to reverse mirror that to IBM.
[1]: https:/ /github. com/stressappte st/stressapptes t/releases