(latest/edge) prometheus-relation-joined hook can cause a mysql error during deployment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MySQL InnoDB Cluster Charm |
Fix Committed
|
Undecided
|
Unassigned | ||
Jammy |
New
|
Undecided
|
Unassigned |
Bug Description
During the deployment of a cluster in the gate, there is a race-hazard error when the prometheus-
The issue is basically that if a transaction is attempted (with a commit) whilst the cluster is recovering Group Replication, then that commit will hard fail with the following error:
MySQLdb.
The trace from the error.log file provides more details:
2023-04-
2023-04-
ber of missing transactions being higher than the configured threshold of 1.'
2023-04-
or.'
2023-04-
:3306 on view 168214951498077
2023-04-
2023-04-
2023-04-
2023-04-
n when the server is ONLINE.'
Essentially, what seems to be happening is that the prometheus-
Possible solution:
------------------
The solution is to retry the commit if the 3100 error occurs several times (to allow Group Replication to finish) and then just return False so that the handler will try again on the next hook execution. This would allow the unit to recover gracefully from the error.
tags: | added: sts |
Changed in charm-mysql-innodb-cluster: | |
assignee: | nobody → Alex Kavanagh (ajkavanagh) |
Changed in charm-mysql-innodb-cluster: | |
assignee: | Alex Kavanagh (ajkavanagh) → nobody |
Fix proposed to branch: master /review. opendev. org/c/openstack /charm- mysql-innodb- cluster/ +/883300
Review: https:/