[SRU] Watcher crashes on creation of multiple audits and gets stuck in PENDING
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Ubuntu Cloud Archive | Status tracked in Epoxy | |||||
Antelope |
New
|
Undecided
|
Unassigned | |||
Bobcat |
New
|
Undecided
|
Unassigned | |||
Caracal |
New
|
Undecided
|
Unassigned | |||
Dalmatian |
Fix Released
|
Undecided
|
Unassigned | |||
Epoxy |
Fix Released
|
Undecided
|
Unassigned | |||
Yoga |
New
|
Undecided
|
Bryan Fraschetti | |||
Zed |
New
|
Undecided
|
Unassigned | |||
watcher (Ubuntu) | Status tracked in Plucky | |||||
Focal |
Confirmed
|
Undecided
|
Unassigned | |||
Jammy |
Confirmed
|
Undecided
|
Bryan Fraschetti | |||
Noble |
Confirmed
|
Undecided
|
Unassigned | |||
Oracular |
Fix Released
|
Undecided
|
Unassigned | |||
Plucky |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[ Impact ]
* The watcher releases targeted by this SRU are experiencing a bug where you can only create one audit of type CONTINUOUS. Any subsequently created audits end up getting stuck in a pending state. The root cause of this error is the conversion of an improperly typed date which causes watcher to crash. The function converting the date format, utc_timestamp_
* The commit landed upstream in 2024.2.
[ Test Plan ]
* Deploy openstack yoga on jammy with watcher and gnocchi services
* Create two watcher audits of CONTINUOUS type and monitor their status
openstack optimize audit create --name test_audit_1 -s workload_
openstack optimize audit create --name test_audit_2 -s workload_
openstack optimize audit list
* Without the patch, the second audit will get stuch in state PENDING and systemctl status watcher-
[ What can go wrong ]
* This commit overrides the apscheduler's implementation of get_next_run_time, since the apscheduler's implementation obtains the decimal.Decimal object which crashes the engine. This should expand compatibility to include SQLAlchemy 2.0 but may have otherwise have effects. It shouldn't since the function it's overriding is what precipitates the issue but it may affect legacy software (eg. older SQLAlchemy)
[1] https:/
-------
Original Description:
A customer is facing an issue where the watcher-
Environment Details:
1. Deploy Openstack Yoga on Jammy with Watcher and Gnocchi as watcher's storage backend
2. Create an audit
openstack optimize audit create --name workload_
3. Check the audit state
openstack optimize audit list
Observe it says "CONTINUOUS ONGOING"
4. Create a second audit
openstack optimize audit create --name workload_
5. Check the audit state
openstack optimize audit list
Observe the second audit is stuck in "CONTINUOUS PENDING"
6. Check watcher's status and observe that it crashed with the following traceback
systemctl status watcher-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
Nov 27 19:53:54 juju-2752e1-
This was fixed upstream in 2024.2 at https:/
Changed in watcher (Ubuntu Oracular): | |
status: | New → Fix Released |
Changed in watcher (Ubuntu Plucky): | |
status: | New → Fix Released |
no longer affects: | watcher |
summary: |
- Watcher crashes on creation of multiple audits and gets stuck in PENDING + [SRU] Watcher crashes on creation of multiple audits and gets stuck in + PENDING |
Changed in watcher (Ubuntu Jammy): | |
assignee: | nobody → Bryan Fraschetti (bryanfraschetti) |
Changed in watcher (Ubuntu Plucky): | |
assignee: | Bryan Fraschetti (bryanfraschetti) → nobody |
description: | updated |
description: | updated |
Debdiff for bobcat that applies the upstream patch and accompanying unit tests (squashed into one diff called omnibus- fixes.patch)