Encountered a memory leak with corosync on all three nodes in a cluster:
Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.525991] Out of memory: Kill process 4846 (corosync) score 941 or sacrifice child
Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.620411] Killed process 4846 (corosync) total-vm:267928256kB, anon-rss:257475632kB, file-rss:37816kB
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.069557] Out of memory: Kill process 27791 (corosync) score 938 or sacrifice child
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.166524] Killed process 27791 (corosync) total-vm:265216168kB, anon-rss:255941644kB, file-rss:28580kB
Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.615377] Out of memory: Kill process 5167 (corosync) score 943 or sacrifice child
Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.709419] Killed process 5167 (corosync) total-vm:265023016kB, anon-rss:256668244kB, file-rss:33844kB
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.617203] Out of memory: Kill process 27073 (corosync) score 941 or sacrifice child
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.713521] Killed process 27073 (corosync) total-vm:261875792kB, anon-rss:255939160kB, file-rss:24760kB
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.096937] Out of memory: Kill process 5422 (corosync) score 942 or sacrifice child
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.191025] Killed process 5422 (corosync) total-vm:264643868kB, anon-rss:256189360kB, file-rss:33976kB
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.359940] Out of memory: Kill process 5183 (corosync) score 927 or sacrifice child
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.455015] Killed process 5183 (corosync) total-vm:271136904kB, anon-rss:251953372kB, file-rss:33760kB
Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.347836] Out of memory: Kill process 24183 (corosync) score 937 or sacrifice child
Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.444000] Killed process 24183 (corosync) total-vm:270476488kB, anon-rss:255257476kB, file-rss:32248kB
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.041372] Out of memory: Kill process 5088 (corosync) score 941 or sacrifice child
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.135414] Killed process 5088 (corosync) total-vm:265582012kB, anon-rss:255851792kB, file-rss:36000kB
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.175029] Out of memory: Kill process 5255 (corosync) score 925 or sacrifice child
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.270203] Killed process 5255 (corosync) total-vm:269154272kB, anon-rss:251736288kB, file-rss:35740kB
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942502.987771] Out of memory: Kill process 5230 (corosync) score 940 or sacrifice child
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942503.081826] Killed process 5230 (corosync) total-vm:265560916kB, anon-rss:256339740kB, file-rss:33788kB
The memory leak was confirmed through an analysis of atop logs where it was observed that memory utilization by corosync would go from 47% to 97% over the course of several days before corosync was then killed.
The are many memory leaks identified for the current version of corosync in MOS6.1
# dpkg -l | grep corosync
ii corosync 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync-common4 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework, common library
Steps to reproduce:
Unsure how to reproduce at this point, as logging is not detailed enough.
Expected results:
Impact:
corosync has crashed relatively frequently on all three nodes, however
# dpkg -l | egrep 'corosync|pacemaker'
ii corosync 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework (daemon and modules)
ii crmsh 2.1.0-1~u14.04+mos1 all CRM shell for the pacemaker cluster manager
ii libcorosync-common4 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework, common library
ii pacemaker 1.1.12-0u~u14.04+mos6.1 amd64 HA cluster resource manager
ii pacemaker-cli-utils 1.1.12-0u~u14.04+mos6.1 amd64 Command line interface utilities for Pacemaker
# uname -r
3.13.0-61-generic
- Reference architecture:
MOS6.1 - unable to provide more information due to restrictions, but at scale
- Network model:
Neutron+GRE+vlan
- Related projects installed:
N/A
Bug Description:
Encountered a memory leak with corosync on all three nodes in a cluster:
Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.525991] Out of memory: Kill process 4846 (corosync) score 941 or sacrifice child 267928256kB, anon-rss: 257475632kB, file-rss:37816kB 265216168kB, anon-rss: 255941644kB, file-rss:28580kB
Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.620411] Killed process 4846 (corosync) total-vm:
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.069557] Out of memory: Kill process 27791 (corosync) score 938 or sacrifice child
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.166524] Killed process 27791 (corosync) total-vm:
Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.615377] Out of memory: Kill process 5167 (corosync) score 943 or sacrifice child 265023016kB, anon-rss: 256668244kB, file-rss:33844kB 261875792kB, anon-rss: 255939160kB, file-rss:24760kB 264643868kB, anon-rss: 256189360kB, file-rss:33976kB 271136904kB, anon-rss: 251953372kB, file-rss:33760kB
Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.709419] Killed process 5167 (corosync) total-vm:
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.617203] Out of memory: Kill process 27073 (corosync) score 941 or sacrifice child
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.713521] Killed process 27073 (corosync) total-vm:
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.096937] Out of memory: Kill process 5422 (corosync) score 942 or sacrifice child
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.191025] Killed process 5422 (corosync) total-vm:
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.359940] Out of memory: Kill process 5183 (corosync) score 927 or sacrifice child
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.455015] Killed process 5183 (corosync) total-vm:
Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.347836] Out of memory: Kill process 24183 (corosync) score 937 or sacrifice child 270476488kB, anon-rss: 255257476kB, file-rss:32248kB 265582012kB, anon-rss: 255851792kB, file-rss:36000kB 269154272kB, anon-rss: 251736288kB, file-rss:35740kB 265560916kB, anon-rss: 256339740kB, file-rss:33788kB
Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.444000] Killed process 24183 (corosync) total-vm:
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.041372] Out of memory: Kill process 5088 (corosync) score 941 or sacrifice child
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.135414] Killed process 5088 (corosync) total-vm:
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.175029] Out of memory: Kill process 5255 (corosync) score 925 or sacrifice child
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.270203] Killed process 5255 (corosync) total-vm:
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942502.987771] Out of memory: Kill process 5230 (corosync) score 940 or sacrifice child
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942503.081826] Killed process 5230 (corosync) total-vm:
The memory leak was confirmed through an analysis of atop logs where it was observed that memory utilization by corosync would go from 47% to 97% over the course of several days before corosync was then killed.
The are many memory leaks identified for the current version of corosync in MOS6.1
# dpkg -l | grep corosync u14.04+ mos1 amd64 Standards-based cluster framework (daemon and modules) u14.04+ mos1 amd64 Standards-based cluster framework, common library
ii corosync 2.3.4-0u~
ii libcorosync-common4 2.3.4-0u~
Steps to reproduce:
Unsure how to reproduce at this point, as logging is not detailed enough.
Expected results:
Impact:
corosync has crashed relatively frequently on all three nodes, however
Environment description:
- Operation system: Ubuntu 14.04.2 LTS - 3.13.0-61-generic
- Versions of components:
# dpkg -l | egrep 'corosync| pacemaker' u14.04+ mos1 amd64 Standards-based cluster framework (daemon and modules) u14.04+ mos1 amd64 Standards-based cluster framework, common library 0u~u14. 04+mos6. 1 amd64 HA cluster resource manager 0u~u14. 04+mos6. 1 amd64 Command line interface utilities for Pacemaker
ii corosync 2.3.4-0u~
ii crmsh 2.1.0-1~u14.04+mos1 all CRM shell for the pacemaker cluster manager
ii libcorosync-common4 2.3.4-0u~
ii pacemaker 1.1.12-
ii pacemaker-cli-utils 1.1.12-
# uname -r
3.13.0-61-generic
- Reference architecture:
MOS6.1 - unable to provide more information due to restrictions, but at scale
- Network model:
Neutron+GRE+vlan
- Related projects installed:
N/A