Active contrail-device-manager started flipping between initializing and active for a span of time.

Bug #1724132 reported by Sandeep Sridhar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.21.x
Fix Committed
Undecided
Sandeep Sridhar
R3.1
Fix Committed
Undecided
Sandeep Sridhar
R3.1.1.x
Fix Committed
Undecided
Sandeep Sridhar
R3.2
Fix Committed
Undecided
Sandeep Sridhar
R4.0
Fix Committed
Undecided
Sandeep Sridhar
R4.1
Fix Committed
Undecided
Sandeep Sridhar
Trunk
Fix Committed
Undecided
Sandeep Sridhar

Bug Description

The customer has 3 config/control nodes - kw1np-coct0001n, kw1np-coct0002n and kw1np-coct0003n. coct0003n was the guy having active contrail-device-manager when this issue occurred. The other guys were reporting the status as backup as expected. On coct0003n, the status of device-manager was being shown as initializing but yet other guys in the fabric coct0002n/coct0001n did not assume mastership. The customer mentioned that issue got auto healed. The timestamp of this issue was from 2017/09/23 18:10(UTC) to 2017/09/24 20:00:00(UTC). The logs for Sep23 for the problematic node is rolled over. The earliest contrail-device-manger log we have for coct0003n is from sep24. It is filled with the following messages:

--------------------------------------------
09/24/2017 08:01:27 PM [DeviceManager]: RabbitMQ connection down
09/24/2017 08:01:27 PM [DeviceManager]: RabbitMQ connection ESTABLISHED <Connection: amqp://guest@10.3.135.126:5673// at 0x7fab7de4c650>
09/24/2017 08:01:27 PM [DeviceManager]: Error in rabbitmq drainer greenlet: Queue.declare: (404) NOT_FOUND - queue ‘device_manager.kw1np-coct0003n’ in vhost ‘/’ has crashed and failed to r
estart
---------------------------------------------

All logs can be found here:

[root@LocalStorage 2017-0924-0252]# pwd /home/ssandeep/2017-0924-0252
[root@LocalStorage 2017-0924-0252]# ls -lrt drwxr-xr-x. 2 root root 4096 Sep 26 08:27 kw1np-coct0002n drwxr-xr-x. 2 root root 4096 Sep 26 09:59 kw1np-coct0001n drwxr-xr-x. 5 root root 4096 Sep 26 10:05 kw1np-coct0003n

Can you please take a look to see if you can find something?

Revision history for this message
Sandeep Sridhar (ssandeep) wrote :
Download full text (4.7 KiB)

Suresh's analysis here:

Just to summarize based on logs, I see some issues. I see there was a RabbitMQ connection failure:

[3:15]
Traceback (most recent call last):
  File "/usr/bin/contrail-device-manager", line 9, in <module>
    load_entry_point('device-manager==0.1dev', 'console_scripts', 'contrail-device-manager')()
  File "/usr/lib/python2.7/dist-packages/device_manager/device_manager.py", line 538, in server_main
    main()
  File "/usr/lib/python2.7/dist-packages/device_manager/device_manager.py", line 527, in main
    args)
  File "/usr/lib/python2.7/dist-packages/cfgm_common/zkclient.py", line 293, in master_election
    self._election.run(self._zk_election_callback, func, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/kazoo/recipe/election.py", line 48, in run
    func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/cfgm_common/zkclient.py", line 285, in _zk_election_callback
    func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/device_manager/device_manager.py", line 532, in run_device_manager
    device_manager = DeviceManager(args)
  File "/usr/lib/python2.7/dist-packages/device_manager/device_manager.py", line 175, in __init__
    self.config_log)
  File "/usr/lib/python2.7/dist-packages/cfgm_common/vnc_kombu.py", line 218, in __init__
    self._start()
  File "/usr/lib/python2.7/dist-packages/cfgm_common/vnc_kombu.py", line 143, in _start
    self._reconnect(delete_old_q=True)
  File "/usr/lib/python2.7/dist-packages/cfgm_common/vnc_kombu.py", line 99, in _reconnect
    callbacks=[self._subscribe])
  File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 357, in __init__
    self.revive(self.channel)
  File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 369, in revive
    self.declare()
  File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 379, in declare
    queue.declare()
  File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 504, in declare
    self.exchange.declare(nowait)
  File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
    nowait=nowait, passive=passive,
  File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 620, in exchange_declare
    (40, 11), # Channel.exchange_declare_ok
  File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
    self.channel_id, allowed_methods)
  File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 237, in _wait_method
    self.method_reader.read_method()
  File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
    raise m
IOError: Socket closed

[3:17]
And there is an exception while draining events from Rabbit MQ:

[3:17]
09/24/2017 08:00:49 PM [DeviceManager]: RabbitMQ connection ESTABLISHED <Connection: amqp://guest@10.3.135.126:5673// at 0x7fab7de4c650>
09/24/2017 08:00:49 PM [DeviceManager]: Error in rabbitmq drainer greenlet: Queue.declare: (404) NOT_FOUND - queue 'device_manager.kw1np-coct0003n' in vhost '/' has crashed and failed to restart
09/24/2017 08:00:49 PM [DeviceManager]: RabbitMQ connection down
09/24/2017 08:00:49 PM [DeviceManager]: RabbitMQ connection ESTABLISHED <Connection: ...

Read more...

Revision history for this message
Sandeep Sridhar (ssandeep) wrote :

Hi Suresh,

  Please use this bug to change our provisioning scripts to pick latest RabbitMQ version (for future contrail releases).

I checked R3.1.3-85 which the customer will eventually move to and even there, rabbit-mq is 3.5

root@contrail60:~# contrail-version
Package Version Build-ID | Repo | Package Name
-------------------------------------- ------------------------------ ----------------------------------
contrail-analytics 3.1.3.0-85 85
contrail-config 3.1.3.0-85 85
contrail-config-openstack 3.1.3.0-85 85
contrail-control 3.1.3.0-85 85
contrail-database-common 3.1.3.0-85 85
^C
root@contrail60:~# rabbitmqctl status
Status of node 'rabbit@contrail60-ctrl' ...
[{pid,2148},
 {running_applications,[{rabbit,"RabbitMQ","3.5.0"},

Greetings,
Sandeep.

information type: Proprietary → Public
Revision history for this message
Suresh Balineni (sbalineni) wrote :

@ignatious: We will need to upgrade to latest rabbit-mq, customer is hitting the issue reported in https://github.com/rabbitmq/rabbitmq-server/issues/714.

Changed in juniperopenstack:
assignee: nobody → Ignatious Johnson Christopher (ijohnson-x)
Changed in juniperopenstack:
assignee: Ignatious Johnson Christopher (ijohnson-x) → Sachin Bansal (sbansal)
Sachin Bansal (sbansal)
Changed in juniperopenstack:
assignee: Sachin Bansal (sbansal) → Nagendra Prasath (npchandran)
Revision history for this message
Nagendra Prasath (npchandran) wrote :
Revision history for this message
Nagendra Prasath (npchandran) wrote :

Could you please provide a list of steps to reproduce this failure?

There is no 3.6.x version for Xenial/Trusty. So i created an upstream bug https://bugs.launchpad.net/ubuntu/+source/rabbitmq-server/+bug/1729671

If we could justify with proper steps, we can request an update.

Changed in juniperopenstack:
status: New → Incomplete
Sachin Bansal (sbansal)
Changed in juniperopenstack:
assignee: Nagendra Prasath (npchandran) → Sandeep Sridhar (ssandeep)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/38212
Submitter: Nagendra Prasath (<email address hidden>)

Jeba Paulaiyan (jebap)
tags: added: config device-manager
description: updated
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38212
Committed: http://github.com/Juniper/contrail-packaging/commit/b41ae74b0463c82009c0fb934dc47bbf6a206aae
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit b41ae74b0463c82009c0fb934dc47bbf6a206aae
Author: Nagendra Maynattamai <email address hidden>
Date: Fri Dec 8 00:36:34 2017 -0800

Upgrade rabbitmq-server from 3.5.0-1 to 3.6.14-1
Closes-Bug: 1724132

Change-Id: Ia5225960f2c185cf11d11083083c216b9326dbaa

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/38533
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1.1.x

Review in progress for https://review.opencontrail.org/38546
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38533
Committed: http://github.com/Juniper/contrail-packaging/commit/1c3abb7aeaf6689f8a8735aab391cea32678cbb7
Submitter: Zuul (<email address hidden>)
Branch: R2.21.x

commit 1c3abb7aeaf6689f8a8735aab391cea32678cbb7
Author: Nagendra Maynattamai <email address hidden>
Date: Fri Dec 22 11:31:13 2017 -0800

Upgrade rabbitmq-server from 3.5.0-1 to 3.6.14-1
Closes-Bug: 1724132

Change-Id: I059f9fd82f0bc7051cb94dc13a59855462bd39a5

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/38546
Committed: http://github.com/Juniper/contrail-packaging/commit/f7a6676aa572274546f0f55f3e9a0479e9ae0003
Submitter: Zuul (<email address hidden>)
Branch: R3.1.1.x

commit f7a6676aa572274546f0f55f3e9a0479e9ae0003
Author: Nagendra Maynattamai <email address hidden>
Date: Fri Dec 22 15:19:35 2017 -0800

Upgrade rabbitmq-server from 3.5.0-1 to 3.6.14-1
Closes-Bug: 1724132

Change-Id: I08a60a21e3f52314f4a69d62c28af2b96e228401

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/38643
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/38644
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/38645
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/38643
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/38644
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/38645
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38643
Committed: http://github.com/Juniper/contrail-packaging/commit/ba5aefb77c0e40c459588571a35e6459ea398bba
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit ba5aefb77c0e40c459588571a35e6459ea398bba
Author: Nagendra Maynattamai <email address hidden>
Date: Fri Dec 8 00:36:34 2017 -0800

Upgrade rabbitmq-server from 3.5.0-1 to 3.6.14-1
Closes-Bug: 1724132

Change-Id: Ia5225960f2c185cf11d11083083c216b9326dbaa
(cherry picked from commit b41ae74b0463c82009c0fb934dc47bbf6a206aae)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/38644
Committed: http://github.com/Juniper/contrail-packaging/commit/8612d6c1b1611cbb3c6fa10cdbe4a10a4485d439
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 8612d6c1b1611cbb3c6fa10cdbe4a10a4485d439
Author: Nagendra Maynattamai <email address hidden>
Date: Fri Dec 8 00:36:34 2017 -0800

Upgrade rabbitmq-server from 3.5.0-1 to 3.6.14-1
Closes-Bug: 1724132

Change-Id: Ia5225960f2c185cf11d11083083c216b9326dbaa
(cherry picked from commit b41ae74b0463c82009c0fb934dc47bbf6a206aae)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/38645
Committed: http://github.com/Juniper/contrail-packaging/commit/49f2347c6ab196969a45faaebed216abc3a55e0c
Submitter: Zuul (<email address hidden>)
Branch: master

commit 49f2347c6ab196969a45faaebed216abc3a55e0c
Author: Nagendra Maynattamai <email address hidden>
Date: Fri Dec 8 00:36:34 2017 -0800

Upgrade rabbitmq-server from 3.5.0-1 to 3.6.14-1
Closes-Bug: 1724132

Change-Id: Ia5225960f2c185cf11d11083083c216b9326dbaa
(cherry picked from commit b41ae74b0463c82009c0fb934dc47bbf6a206aae)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/38814
Submitter: Nagendra Prasath (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/38814
Committed: http://github.com/Juniper/contrail-packaging/commit/fda17ee85ac423bcaefda988e680b6aae280e57f
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit fda17ee85ac423bcaefda988e680b6aae280e57f
Author: Nagendra Maynattamai <email address hidden>
Date: Fri Dec 22 15:19:35 2017 -0800

Upgrade rabbitmq-server from 3.5.0-1 to 3.6.14-1
Closes-Bug: 1724132

Change-Id: I08a60a21e3f52314f4a69d62c28af2b96e228401
(cherry picked from commit f7a6676aa572274546f0f55f3e9a0479e9ae0003)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.