Active contrail-device-manager started flipping between initializing and active for a span of time.
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R2.21.x |
Fix Committed
|
Undecided
|
Sandeep Sridhar | |||
R3.1 |
Fix Committed
|
Undecided
|
Sandeep Sridhar | |||
R3.1.1.x |
Fix Committed
|
Undecided
|
Sandeep Sridhar | |||
R3.2 |
Fix Committed
|
Undecided
|
Sandeep Sridhar | |||
R4.0 |
Fix Committed
|
Undecided
|
Sandeep Sridhar | |||
R4.1 |
Fix Committed
|
Undecided
|
Sandeep Sridhar | |||
Trunk |
Fix Committed
|
Undecided
|
Sandeep Sridhar |
Bug Description
The customer has 3 config/control nodes - kw1np-coct0001n, kw1np-coct0002n and kw1np-coct0003n. coct0003n was the guy having active contrail-
-------
09/24/2017 08:01:27 PM [DeviceManager]: RabbitMQ connection down
09/24/2017 08:01:27 PM [DeviceManager]: RabbitMQ connection ESTABLISHED <Connection: amqp://
09/24/2017 08:01:27 PM [DeviceManager]: Error in rabbitmq drainer greenlet: Queue.declare: (404) NOT_FOUND - queue ‘device_
estart
-------
All logs can be found here:
[root@LocalStorage 2017-0924-0252]# pwd /home/ssandeep/
[root@LocalStorage 2017-0924-0252]# ls -lrt drwxr-xr-x. 2 root root 4096 Sep 26 08:27 kw1np-coct0002n drwxr-xr-x. 2 root root 4096 Sep 26 09:59 kw1np-coct0001n drwxr-xr-x. 5 root root 4096 Sep 26 10:05 kw1np-coct0003n
Can you please take a look to see if you can find something?
Changed in juniperopenstack: | |
assignee: | Ignatious Johnson Christopher (ijohnson-x) → Sachin Bansal (sbansal) |
Changed in juniperopenstack: | |
assignee: | Sachin Bansal (sbansal) → Nagendra Prasath (npchandran) |
Changed in juniperopenstack: | |
assignee: | Nagendra Prasath (npchandran) → Sandeep Sridhar (ssandeep) |
tags: | added: config device-manager |
description: | updated |
Suresh's analysis here:
Just to summarize based on logs, I see some issues. I see there was a RabbitMQ connection failure:
[3:15] contrail- device- manager" , line 9, in <module> entry_point( 'device- manager= =0.1dev' , 'console_scripts', 'contrail- device- manager' )() python2. 7/dist- packages/ device_ manager/ device_ manager. py", line 538, in server_main python2. 7/dist- packages/ device_ manager/ device_ manager. py", line 527, in main python2. 7/dist- packages/ cfgm_common/ zkclient. py", line 293, in master_election _election. run(self. _zk_election_ callback, func, *args, **kwargs) python2. 7/dist- packages/ kazoo/recipe/ election. py", line 48, in run python2. 7/dist- packages/ cfgm_common/ zkclient. py", line 285, in _zk_election_ callback python2. 7/dist- packages/ device_ manager/ device_ manager. py", line 532, in run_device_manager python2. 7/dist- packages/ device_ manager/ device_ manager. py", line 175, in __init__ config_ log) python2. 7/dist- packages/ cfgm_common/ vnc_kombu. py", line 218, in __init__ python2. 7/dist- packages/ cfgm_common/ vnc_kombu. py", line 143, in _start _reconnect( delete_ old_q=True) python2. 7/dist- packages/ cfgm_common/ vnc_kombu. py", line 99, in _reconnect [self._ subscribe] ) python2. 7/dist- packages/ kombu/messaging .py", line 357, in __init__ revive( self.channel) python2. 7/dist- packages/ kombu/messaging .py", line 369, in revive python2. 7/dist- packages/ kombu/messaging .py", line 379, in declare python2. 7/dist- packages/ kombu/entity. py", line 504, in declare exchange. declare( nowait) python2. 7/dist- packages/ kombu/entity. py", line 166, in declare python2. 7/dist- packages/ amqp/channel. py", line 620, in exchange_declare exchange_ declare_ ok python2. 7/dist- packages/ amqp/abstract_ channel. py", line 67, in wait channel_ id, allowed_methods) python2. 7/dist- packages/ amqp/connection .py", line 237, in _wait_method method_ reader. read_method( ) python2. 7/dist- packages/ amqp/method_ framing. py", line 189, in read_method
Traceback (most recent call last):
File "/usr/bin/
load_
File "/usr/lib/
main()
File "/usr/lib/
args)
File "/usr/lib/
self.
File "/usr/lib/
func(*args, **kwargs)
File "/usr/lib/
func(*args, **kwargs)
File "/usr/lib/
device_manager = DeviceManager(args)
File "/usr/lib/
self.
File "/usr/lib/
self._start()
File "/usr/lib/
self.
File "/usr/lib/
callbacks=
File "/usr/lib/
self.
File "/usr/lib/
self.declare()
File "/usr/lib/
queue.declare()
File "/usr/lib/
self.
File "/usr/lib/
nowait=nowait, passive=passive,
File "/usr/lib/
(40, 11), # Channel.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
raise m
IOError: Socket closed
[3:17]
And there is an exception while draining events from Rabbit MQ:
[3:17] guest@10. 3.135.126: 5673// at 0x7fab7de4c650> manager. kw1np-coct0003n ' in vhost '/' has crashed and failed to restart
09/24/2017 08:00:49 PM [DeviceManager]: RabbitMQ connection ESTABLISHED <Connection: amqp://
09/24/2017 08:00:49 PM [DeviceManager]: Error in rabbitmq drainer greenlet: Queue.declare: (404) NOT_FOUND - queue 'device_
09/24/2017 08:00:49 PM [DeviceManager]: RabbitMQ connection down
09/24/2017 08:00:49 PM [DeviceManager]: RabbitMQ connection ESTABLISHED <Connection: ...