corosync 2.3.4 memory leak
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Fix Released
|
High
|
MOS Linux | ||
6.1.x |
Won't Fix
|
High
|
MOS Maintenance | ||
7.0.x |
Fix Released
|
High
|
Sergii Rizvan | ||
8.0.x |
Fix Released
|
High
|
Sergii Rizvan | ||
9.x |
Fix Released
|
High
|
MOS Linux |
Bug Description
Bug Description:
Encountered a memory leak with corosync on all three nodes in a cluster:
Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.525991] Out of memory: Kill process 4846 (corosync) score 941 or sacrifice child
Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.620411] Killed process 4846 (corosync) total-vm:
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.069557] Out of memory: Kill process 27791 (corosync) score 938 or sacrifice child
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.166524] Killed process 27791 (corosync) total-vm:
Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.615377] Out of memory: Kill process 5167 (corosync) score 943 or sacrifice child
Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.709419] Killed process 5167 (corosync) total-vm:
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.617203] Out of memory: Kill process 27073 (corosync) score 941 or sacrifice child
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.713521] Killed process 27073 (corosync) total-vm:
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.096937] Out of memory: Kill process 5422 (corosync) score 942 or sacrifice child
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.191025] Killed process 5422 (corosync) total-vm:
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.359940] Out of memory: Kill process 5183 (corosync) score 927 or sacrifice child
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.455015] Killed process 5183 (corosync) total-vm:
Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.347836] Out of memory: Kill process 24183 (corosync) score 937 or sacrifice child
Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.444000] Killed process 24183 (corosync) total-vm:
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.041372] Out of memory: Kill process 5088 (corosync) score 941 or sacrifice child
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.135414] Killed process 5088 (corosync) total-vm:
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.175029] Out of memory: Kill process 5255 (corosync) score 925 or sacrifice child
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.270203] Killed process 5255 (corosync) total-vm:
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942502.987771] Out of memory: Kill process 5230 (corosync) score 940 or sacrifice child
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942503.081826] Killed process 5230 (corosync) total-vm:
The memory leak was confirmed through an analysis of atop logs where it was observed that memory utilization by corosync would go from 47% to 97% over the course of several days before corosync was then killed.
The are many memory leaks identified for the current version of corosync in MOS6.1
# dpkg -l | grep corosync
ii corosync 2.3.4-0u~
ii libcorosync-common4 2.3.4-0u~
Steps to reproduce:
Unsure how to reproduce at this point, as logging is not detailed enough. Will enable debug when possible.
Expected results:
Impact:
corosync has crashed relatively frequently on all three nodes, however unsure if this has occurred in other zones.
Environment description:
- Operation system: Ubuntu 14.04.2 LTS - 3.13.0-61-generic
- Versions of components:
# dpkg -l | egrep 'corosync|
ii corosync 2.3.4-0u~
ii crmsh 2.1.0-1~u14.04+mos1 all CRM shell for the pacemaker cluster manager
ii libcorosync-common4 2.3.4-0u~
ii pacemaker 1.1.12-
ii pacemaker-cli-utils 1.1.12-
# uname -r
3.13.0-61-generic
- Reference architecture:
MOS6.1 - unable to provide more information due to restrictions, but at scale
- Network model:
Neutron+GRE+vlan
- Related projects installed:
N/A
description: | updated |
tags: | added: customer-found |
tags: | added: on-verification |
https:/ /github. com/corosync/ corosync/ issues