Sanity: Rabbitmq cluster does not form

Bug #1716012 reported by Pavana
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Won't Fix
High
Siva Gurumurthy
R4.1
Fix Committed
High
Siva Gurumurthy
Trunk
Fix Committed
High
Siva Gurumurthy

Bug Description

Build Version: 4.0.1.0-37 Ubuntu 16.04.2 Ocata

Topology: 4 nodes with 2 computes (Contrail VMs) esxi

On a multi-node vcenter setup seeing vrouter agent getting stuck at ‘initializing’ after a restart.

root@nodei36-compute-vm:~# contrail-status
== Contrail vRouter ==
contrail-vrouter-agent: initializing (No Configuration for self)
contrail-vrouter-nodemgr: active

As Manish debugged,
It is seen that on restart, agent which was previously connected to Control node i27 is now getting connected to the other control node i32.
Since i32 has no config for agent, agent does not get any config and displays the above state.
Can be seen on the introspect - http://10.204.217.139:8083/Snh_IFMapTableShowReq?table_name=&search_string= (nodei27) has entry for nodei36-compute-vm (the compute contrail VM where the issue is seen) and http://10.204.217.144:8083/Snh_IFMapTableShowReq?table_name=&search_string= (nodei32) does not have the config.

Pavana (pavanap)
information type: Proprietary → Public
Revision history for this message
Hari Prasad Killi (haripk) wrote :

As mentioned above, other control node doesnt have config. Please check why this is so.

Revision history for this message
Sachin Bansal (sbansal) wrote :

From Pramodh:

Nodei32 is misconfigured.
In rabbitmq.config you have ..
{cluster_nodes, {['rabbit@puppet','rabbit@nodei32'], disc}},{vm_memory_high_watermark, 0.4},

The rabbitMQ cluster is not forming because of this.

Regards,
Pramodh

Revision history for this message
Sachin Bansal (sbansal) wrote :

Please let us know how this was provisioned and why rabbitmq is not provisioned correctly. Also, any reason to test with 2 controller setup?

Revision history for this message
kamlesh parmar (kparmar) wrote :

Pavana, please provide the setup details in problem state.

Revision history for this message
Dheeraj Gautam (dgautam) wrote :

looked into this issue:
This issue could happen if following conditions are met:
1. SMlite provisioning. For SM provisioning, this issue won't occur. as puppet is resolving to different ip-address
2. Rabbitmq being managed by Controller instead of openstack or external rabbitmq.
3. There are 2 or more controllers or HA
4. Nodes are single network. This will not happen in case of dual network case as rabbitmq would use control-data interface for that case.
5. one of Control-Controller is on smlite node.

Revision history for this message
Dheeraj Gautam (dgautam) wrote :

Possible Work Arounds:

use openstack_manage_amqp=True

OR use openstack node as smlite node ( in case of multiple nodes, single node there is no issue)

OR edit /etc/hosts entry on contrail-controller nodes and move puppet entry to the last

Revision history for this message
Abhay Joshi (abhayj) wrote :

Not a critical problem. Happens with certain particular setup only, for which workaround has been specified above.

Revision history for this message
Dheeraj Gautam (dgautam) wrote :

Siva is working on it to remove puppet entry from /etc/hosts

Revision history for this message
Siva Gurumurthy (sgurumurthy) wrote :

This needs thorough testing in upgrade scenario as this is a disruptive change.
Don't want to hurried to check this in for 4.0.2.
This bug also requires another fix from Nitish as well.
Will test it thoroughly and check it in for the next release

Revision history for this message
Abhay Joshi (abhayj) wrote :

We will add 4.0.3 target when available.

Revision history for this message
Pavana (pavanap) wrote :

Observed rabbitmq cluster forming failing on the latest ubuntu 14.04~35 mitaka.
Siva is working on the fix. Adding sanity/provisioning tags

root@nodec4(controller):/# rabbitmqctl cluster_status
Cluster status of node rabbit@nodec4 ...
[{nodes,[{disc,[rabbit@nodec4]}]},
 {running_nodes,[rabbit@nodec4]},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]

root@nodec4(controller):/# service rabbitmq-server status
Status of node rabbit@nodec4 ...
[{pid,15562},
 {running_applications,[{rabbit,"RabbitMQ","3.5.0"},
                        {os_mon,"CPO CXC 138 46","2.2.14"},
                        {mnesia,"MNESIA CXC 138 12","4.11"},
                        {xmerl,"XML parser","1.3.5"},
                        {sasl,"SASL CXC 138 11","2.3.4"},
                        {stdlib,"ERTS CXC 138 10","1.19.4"},
                        {kernel,"ERTS CXC 138 10","2.16.4"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:30] [kernel-poll:true]\n"},
 {memory,[{total,53673784},
          {connection_readers,155944},
          {connection_writers,41640},
          {connection_channels,120528},
          {connection_other,295712},
          {queue_procs,263288},
          {queue_slave_procs,0},
          {plugins,0},
          {other_proc,13464336},
          {mnesia,87632},
          {mgmt_db,0},
          {msg_index,64016},
          {other_ets,801624},
          {binary,16898848},
          {code,16351158},
          {atom,561761},
          {other_system,4567297}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"0.0.0.0"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,13483900928},
 {disk_free_limit,50000000},
 {disk_free,415870029824},
 {file_descriptors,[{total_limit,3996},
                    {total_used,18},
                    {sockets_limit,3594},
                    {sockets_used,16}]},
 {processes,[{limit,1048576},{used,302}]},
 {run_queue,0},
 {uptime,1911}]

summary: - vcenter only provisioning: Vrouter agent issue seen on restart
+ vcenter-only provisioning: Rabbitmq cluster does not form
tags: added: provisioning sanityblocker
Revision history for this message
Pavana (pavanap) wrote : Re: vcenter-only provisioning: Rabbitmq cluster does not form

Issue seen with ubuntu 16-04~35 ocata as well

Jeba Paulaiyan (jebap)
tags: added: sanity
removed: sanityblocker
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36774
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/36775
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36776
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/36777
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
Siva Gurumurthy (sgurumurthy) wrote : Re: vcenter-only provisioning: Rabbitmq cluster does not form

The following are the upgrade scenarios with failure conditions after the removal of ‘$server puppet’ from the ‘/etc/hosts’.

1. SM is upgraded but contrail images are not upgraded

Problem:
          SM: Provision followed by reimage will fail as SM on reimage won’t put the entry $server puppet in /etc/hosts but the provision logic expects that
          SMLITE: Provision will fail as preconfig.py will not put the $server puppet entry in the /etc/hosts but puppet code still expects that entry

Solution: Add and provision the latest contrail image

2. SM is the old one but contrail images are upgraded

  Problem:
      SM: Cannot reimage the nodes using SM as it will insert the ‘$server puppet’ in /etc/hosts. Reimage followed by provision will fail.
      SMLITE: smlite cannot be used to provision the images as it will update the /etc/hosts with the $server puppet entry but the puppet files will have the latest one without that reference.
              Openstack provision will fail

Solution: Upgrade the SM

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/36775
Committed: http://github.com/Juniper/contrail-puppet/commit/8bc19fd57bff261fb14c5d26aee72dbb025dc609
Submitter: Zuul (<email address hidden>)
Branch: master

commit 8bc19fd57bff261fb14c5d26aee72dbb025dc609
Author: sgurumurthy <email address hidden>
Date: Tue Oct 24 12:20:01 2017 -0700

Closes-Bug: #1716012
Use the servername instead of the 'puppet'

Change-Id: I9909f484d95f37e87b158c85f9bbd9927787782d

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/36774
Committed: http://github.com/Juniper/contrail-puppet/commit/35f9fd9289a2d12a5242f25e499a69dddfea4095
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 35f9fd9289a2d12a5242f25e499a69dddfea4095
Author: sgurumurthy <email address hidden>
Date: Tue Oct 24 12:20:01 2017 -0700

Closes-Bug: #1716012
Use the servername instead of the 'puppet'

Change-Id: I9909f484d95f37e87b158c85f9bbd9927787782d

Revision history for this message
Jeba Paulaiyan (jebap) wrote :

R4.1-Mitaka-CB#30, R4.1-Newton-CB#32, R5.0-All latest CB's are affected with this. Hence changing the subject to reflect generic status.

summary: - vcenter-only provisioning: Rabbitmq cluster does not form
+ Sanity: Rabbitmq cluster does not form
tags: added: sanityblocker
removed: sanity
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36776
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/36777
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36776
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/36777
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/36776
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/36777
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.1

Review in progress for https://review.opencontrail.org/37037
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/37038
Submitter: sgurumurthy (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/37037
Committed: http://github.com/Juniper/contrail-puppet/commit/4e17c518a66241adef3ab81a0f77423d4b463540
Submitter: Zuul (<email address hidden>)
Branch: R4.1

commit 4e17c518a66241adef3ab81a0f77423d4b463540
Author: sgurumurthy <email address hidden>
Date: Tue Oct 31 15:35:48 2017 -0700

Revert "Closes-Bug: #1716012 Use the servername instead of the 'puppet'"

This reverts commit 35f9fd9289a2d12a5242f25e499a69dddfea4095.

Change-Id: I5c4e2788a5f80e0ec9d85f27441d3f540cb2cdc4

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/37038
Committed: http://github.com/Juniper/contrail-puppet/commit/8724db0c5534c3351b66cef5c0724d9323796ca8
Submitter: Zuul (<email address hidden>)
Branch: master

commit 8724db0c5534c3351b66cef5c0724d9323796ca8
Author: sgurumurthy <email address hidden>
Date: Tue Oct 31 15:36:25 2017 -0700

Revert "Closes-Bug: #1716012 Use the servername instead of the 'puppet'"

This reverts commit 8bc19fd57bff261fb14c5d26aee72dbb025dc609.

Change-Id: Ibfd8cd90eb3170cc2ad1570a11514235104b833a

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.