no visible sign that HA is degraded when lost

Bug #1602749 reported by Richard Harding
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

I bootstrapped on lxd and ran enable-ha. I then was presented with three state servers and each show controller-member-status: has-vote

I then used lxc delete to remove the controller #1 (leaving #0 and #2) and there was no indication that there was any sort of degradation or failure in show-controller, juju status, juju status --format=yaml. I could not tell any way to interrogate and see that the #1 was gone. I tried to ping each controller IP address and found that #1 was not responding.

Current output:

juju status --format=yaml
model:
  name: controller
  controller: uxtest
  cloud: lxd
  version: 2.0-beta11
machines:
  "0":
    juju-status:
      current: started
      since: 13 Jul 2016 11:38:08-04:00
      version: 2.0-beta11
    dns-name: 10.90.136.71
    instance-id: juju-8c982a-0
    machine-status:
      current: running
      message: Running
      since: 13 Jul 2016 11:22:34-04:00
    series: xenial
    hardware: arch=amd64 cpu-cores=0 mem=0M
    controller-member-status: has-vote
  "1":
    juju-status:
      current: started
      since: 13 Jul 2016 11:38:08-04:00
      version: 2.0-beta11
    dns-name: 10.90.136.92
    instance-id: juju-8c982a-1
    machine-status:
      current: running
      message: Running
      since: 13 Jul 2016 11:37:36-04:00
    series: xenial
    hardware: arch=amd64 cpu-cores=0 mem=0M
    controller-member-status: has-vote
  "2":
    juju-status:
      current: started
      since: 13 Jul 2016 11:38:08-04:00
      version: 2.0-beta11
    dns-name: 10.90.136.249
    instance-id: juju-8c982a-2
    machine-status:
      current: running
      message: Running
      since: 13 Jul 2016 11:37:20-04:00
    series: xenial
    hardware: arch=amd64 cpu-cores=0 mem=0M
    controller-member-status: has-vote
applications: {}

Changed in juju-core:
importance: Undecided → High
Curtis Hovey (sinzui)
Changed in juju-core:
status: Confirmed → Triaged
Changed in juju-core:
milestone: none → 2.0.0
Changed in juju-core:
milestone: 2.0.0 → 2.0.1
affects: juju-core → juju
Changed in juju:
milestone: 2.0.1 → none
milestone: none → 2.0.1
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0.1 → none
Revision history for this message
John A Meinel (jameinel) wrote :

related to bug #1766576

Revision history for this message
Nobuto Murata (nobuto) wrote :

This is reproducible still with Juju 2.5.1 and maas provider with maas 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1.

I simulated a dead Juju controller machine by suspending a MAAS Pod VM responsible for one of the controller, but Juju reports all controller nodes are healthy.

$ virsh suspend juju-1 ## simulate an unresponsive controller

$ juju ssh -m controller 0 ## 0 = juju-1
ERROR cannot connect to any address: [192.168.151.22:22 192.168.151.22:22]

$ juju status -m controller; juju show-controller | grep ha-status:

Model Controller Cloud/Region Version SLA Timestamp
controller foundations-maas foundations-maas 2.5.1 unsupported 18:39:21Z

Machine State DNS Inst id Series AZ Message
0 started 192.168.151.22 cg6qmw bionic zone1 Deployed
1 started 192.168.153.22 4skcpe bionic zone3 Deployed
2 started 192.168.152.21 wshrgx bionic zone2 Deployed

      ha-status: ha-enabled
      ha-status: ha-enabled
      ha-status: ha-enabled

-> status doesn't report any down / unhealthy controller node even after 5 minutes.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

It took roughly 15 minutes to mark the machine down. But ha-status still looks healthy.

$ juju status -m controller; juju show-controller | grep ha-status:
Model Controller Cloud/Region Version SLA Timestamp
controller foundations-maas foundations-maas 2.5.1 unsupported 18:53:25Z

Machine State DNS Inst id Series AZ Message
0 down 192.168.151.22 cg6qmw bionic zone1 Deployed
1 started 192.168.153.22 4skcpe bionic zone3 Deployed
2 started 192.168.152.21 wshrgx bionic zone2 Deployed

      ha-status: ha-enabled
      ha-status: ha-enabled
      ha-status: ha-enabled

tags: added: reviewed
removed: 2.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.