vault unit moves between ready and error repeatedly

Bug #1779730 reported by Chris Procter
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
vault-charm
Incomplete
Medium
Liam Young

Bug Description

We have a 3 node ha vault cluster, the two slave nodes switch between active (Unit is ready) and
error (hook failed: "update-status") states repeatedly (roughly every 10 minutes)

The associated error message in the logs is:

2018-07-02 16:38:10 DEBUG update-status active
2018-07-02 16:38:10 DEBUG update-status Traceback (most recent call last):
2018-07-02 16:38:10 DEBUG update-status File "/var/lib/juju/agents/unit-vault-2/charm/hooks/update-status", line 19, in <module>
2018-07-02 16:38:10 DEBUG update-status main()
2018-07-02 16:38:10 DEBUG update-status File "/var/lib/juju/agents/unit-vault-2/.venv/lib/python3.5/site-packages/charms/reactive/__init__.py", line 82, in main
2018-07-02 16:38:10 DEBUG update-status hookenv._run_atexit()
2018-07-02 16:38:10 DEBUG update-status File "/var/lib/juju/agents/unit-vault-2/.venv/lib/python3.5/site-packages/charmhelpers/core/hookenv.py", line 1128, in _run_atexit
2018-07-02 16:38:10 DEBUG update-status callback(*args, **kwargs)
2018-07-02 16:38:10 DEBUG update-status File "/var/lib/juju/agents/unit-vault-2/charm/reactive/vault_handlers.py", line 585, in _assess_status
2018-07-02 16:38:10 DEBUG update-status application_version_set(health.get('version'))
2018-07-02 16:38:10 DEBUG update-status File "/var/lib/juju/agents/unit-vault-2/.venv/lib/python3.5/site-packages/charmhelpers/core/hookenv.py", line 970, in application_version_set
2018-07-02 16:38:10 DEBUG update-status subprocess.check_call(cmd)
2018-07-02 16:38:10 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 576, in check_call
2018-07-02 16:38:10 DEBUG update-status retcode = call(*popenargs, **kwargs)
2018-07-02 16:38:10 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 557, in call
2018-07-02 16:38:10 DEBUG update-status with Popen(*popenargs, **kwargs) as p:
2018-07-02 16:38:10 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
2018-07-02 16:38:10 DEBUG update-status restore_signals, start_new_session)
2018-07-02 16:38:10 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
2018-07-02 16:38:10 DEBUG update-status restore_signals, start_new_session, preexec_fn)
2018-07-02 16:38:10 DEBUG update-status TypeError: Can't convert 'NoneType' object to str implicitly

The error seems to originate because /var/lib/juju/agents/unit-vault-2/charm/lib/charm/vault.py line 209 gets an invalid reponse:
    response = requests.get(VAULT_HEALTH_URL.format(vault_addr=VAULT_LOCALHOST_URL))

I added some print statements into the charm and it is connecting to http://127.0.0.1:8220/v1/sys/health but the response it receives is {'errors': []}

But when I try it by hand (on a machine showing the error) I get:
root@juju-3732e2-18-lxd-14:~# curl http://127.0.0.1:8220/v1/sys/health
{"initialized":true,"sealed":false,"standby":false,"replication_performance_mode":"disabled","replication_dr_mode":"disabled","server_time_utc":1530554818,"version":"0.10.1","cluster_name":"vault-cluster-c5a6c830","cluster_id":"916d5cca-98c5-1e53-23d5-90d13c5bc5c5"}

The leader of the cluster appears to be stable

James Page (james-page)
Changed in vault-charm:
importance: Undecided → Medium
James Page (james-page)
Changed in vault-charm:
assignee: nobody → Liam Young (gnuoy)
status: New → Incomplete
Revision history for this message
John George (jog) wrote :

This bug is subscribed to Canonical Field Critical and tracked by the SLA process.
Does the incomplete status mean development needs something from the submitter?
Could a comment be added with what's needed or an estimate on when this bug is targeted for a fix?

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Please attach sanitized unit logs, a reproducer bundle, and ideally a juju crashdump.

Revision history for this message
Ashley Lai (alai) wrote :

The latest charm provided for us to use on customer's deployment does not go into error state.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

It's not clear whether this is still an outstanding bug or not?

@Ashley, are you using a site-specific version of the vault charm, please? Or the stable one from the charm store?

If the latter, then I think we can close the bug. Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.