Contrail config node manager and analytics node manager crashes

Bug #1747725 reported by Hemachandra Reddy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
Undecided
mkheni

Bug Description

Both Contrail config node manager and analytic node manager PODs entered into crashloopbackoff state.
In the last 15 hours they have restarted 200+ times.

Here's the log from container...

ubuntu@contrail5x-vm2:~$ sudo docker logs bed4f1e9e903e860cbb0008f2c70eb672d49d12436861c1ff414f449c2e9174c
02/06/2018 05:47:55 PM [contrail-config-nodemgr]: SANDESH: CONNECT TO COLLECTOR: True
02/06/2018 05:47:55 PM [contrail-config-nodemgr]: SANDESH: Logging: LEVEL: [SYS_INFO] -> [SYS_NOTICE]
ls: cannot access /var/crashes: No such file or directory
Traceback (most recent call last):
  File "/usr/bin/contrail-nodemgr", line 9, in <module>
    load_entry_point('nodemgr==0.1dev', 'console_scripts', 'contrail-nodemgr')()
  File "/usr/lib/python2.7/site-packages/nodemgr/main.py", line 244, in main
    gevent.spawn(prog.run_periodically(prog.do_periodic_events, 60))])
  File "/usr/lib/python2.7/site-packages/nodemgr/common/event_manager.py", line 758, in run_periodically
    function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/nodemgr/config_nodemgr/config_event_manager.py", line 75, in do_periodic_events
    self.event_tick_60()
  File "/usr/lib/python2.7/site-packages/nodemgr/common/event_manager.py", line 719, in event_tick_60
    process_mem_cpu_usage = self.get_group_processes_mem_cpu_usage(group)
  File "/usr/lib/python2.7/site-packages/nodemgr/common/event_manager.py", line 565, in get_group_processes_mem_cpu_usage
    for key in self.process_state_db[group_name]:
RuntimeError: dictionary changed size during iteration
ubuntu@contrail5x-vm2:~$
---

Environment: All-in-one containerized Contrail 5x built using https://github.com/Juniper/contrail-helm-deployer/blob/master/doc/contrail-osh-aio-install.md

Ubuntu 16.04 VM with 16 CPUs, 32GB RAM and 150GB disk space. 17GB free RAM left after deploying it.

Jim Reilly (jpreilly)
information type: Proprietary → Private
tags: added: att-aic-contrail
Qasim Arham (qarham-h)
information type: Private → Public
tags: added: csg
Sachin Bansal (sbansal)
Changed in juniperopenstack:
assignee: nobody → Sundaresan Rajangam (srajanga)
Changed in juniperopenstack:
assignee: Sundaresan Rajangam (srajanga) → mkheni (mkheni)
tags: added: blocker
removed: contrail5 csg kubernetes openstack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.