hacluster charm fails to start corosync and cluster

Bug #1597548 reported by David Ames
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
hacluster (Juju Charms Collection)
Invalid
Undecided
David Ames

Bug Description

It seems the most recent commit the hacluster charm (41dc7b3fad59ea1bbe35ada4c1557f699a7a67a6) has introduced a bug.

Corosync on one or more nodes may not start properly. The symptom is a juju hook left in executing for ever.

root 17499 0.0 0.0 19700 3276 ? Ss 20:17 0:00 bash
/var/lib/juju/init/jujud-unit-hacluster-keystone-2/exec-start.sh
root 17510 0.0 0.0 1686172 52192 ? Sl 20:17 0:01 \_
/var/lib/juju/tools/unit-hacluster-keystone-2/jujud unit --data-dir
/var/lib/juju
root 31870 1.1 0.0 123400 54460 ? S 20:21 1:03 \_
/usr/bin/python /var/lib/juju/agents/unit-hacluster-keystone-2/charm/hooks/ha-
root 223353 0.0 0.0 4508 712 ? S 21:58 0:00 \_
sh -c { crm node list; } 2>&1
root 223354 0.0 0.0 102056 19268 ? R 21:58 0:00
\_ /usr/bin/python /usr/sbin/crm node list

ubuntu@juju-machine-3-lxc-4:~$ sudo crm status
sudo: unable to resolve host juju-machine-3-lxc-4
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport
endpoint is not connected

Corosync is failing to start with a timeout

Jun 29 20:25:02 juju-machine-3-lxc-4 corosync[32209]: info [QB ]
withdrawing server sockets
Jun 29 20:25:02 juju-machine-3-lxc-4 corosync[32209]: info [QB ]
withdrawing server sockets
Jun 29 20:25:02 juju-machine-3-lxc-4 corosync[32209]: [QB ] withdrawing
server sockets
Jun 29 20:25:02 juju-machine-3-lxc-4 corosync[32209]: [QB ] withdrawing
server sockets
Jun 29 20:25:02 juju-machine-3-lxc-4 corosync[32209]: [QB ] withdrawing
server sockets
Jun 29 20:25:02 juju-machine-3-lxc-4 corosync[32209]: notice [TOTEM ]
Retransmit List: 2 3 4 7
Jun 29 20:25:02 juju-machine-3-lxc-4 corosync[32209]: [TOTEM ] Retransmit
List: 2 3 4 7
Jun 29 20:25:02 juju-machine-3-lxc-4 systemd[1]: Failed to start Corosync
Cluster Engine.
Jun 29 20:25:02 juju-machine-3-lxc-4 systemd[1]: corosync.service: Unit
entered failed state.
Jun 29 20:25:02 juju-machine-3-lxc-4 systemd[1]: corosync.service: Failed with
result 'timeout'.

Jun 29 22:17:18 juju-machine-3-lxc-4 systemd[1]: Starting Corosync Cluster Engine... [71/2760]
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] Initializing transport (UDP/IP Unicast).
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] Initializing transport (UDP/IP Unicast).
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] The network interface [10.5.1.111] is now up.
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: info [QB ] server name: cmap
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: info [QB ] server name: cfg
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] The network interface [10.5.1.111] is now up.
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: info [QB ] server name: cpg
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [QB ] server name: cmap
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: info [QB ] server name: votequorum
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: info [QB ] server name: quorum
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] adding new UDPU member {10.5.1.88}
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] adding new UDPU member {10.5.1.111}
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [QB ] server name: cfg
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] A new membership (10.5.1.111:64) was formed. Members joined: 1002
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [QB ] server name: cpg
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [QB ] server name: votequorum
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [QB ] server name: quorum
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] adding new UDPU member {10.5.1.88}
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] adding new UDPU member {10.5.1.111}
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] A new membership (10.5.1.111:64) was formed. Members joined: 1002
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] A new membership (10.5.1.88:68) was formed. Members joined: 1000 1001
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] A new membership (10.5.1.88:68) was formed. Members joined: 1000 1001
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] Retransmit List: 2 3 4
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] Retransmit List: 2 3 4
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] Retransmit List: 2 3 4
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] Retransmit List: 2 3 4
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: notice [TOTEM ] Retransmit List: 2 3 4 7
Jun 29 22:17:18 juju-machine-3-lxc-4 corosync[322592]: [TOTEM ] Retransmit List: 2 3 4 7

David Ames (thedac)
affects: charms → hacluster (Juju Charms Collection)
Changed in hacluster (Juju Charms Collection):
assignee: nobody → David Ames (thedac)
status: New → Triaged
milestone: none → 16.07
Revision history for this message
David Ames (thedac) wrote :

This turned out to be missing the cluster_count setting. Setting to invalid.

Changed in hacluster (Juju Charms Collection):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.