The lp bug 1868326 handles two situations where seeded file is lost, but it doesn't cover another situation. The following customer logs show that SST can be triggered due to data inconsistency, SST will remove the seeded file and the charm won't ever recreate it again.
2022-10-18T04:10:55.963214Z 48 [ERROR] Slave SQL: Could not execute Delete_rows event on table workloadmgr.setting_metadata; Can't find record in 'setting_metadata', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 352, Error_code: 1032
2022-10-18T04:10:55.963233Z 48 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 1426184637
......
2022-10-18T04:10:55.965096Z 48 [ERROR] WSREP: Failed to apply trx 1426184637 4 times
2022-10-18T04:10:55.965107Z 48 [ERROR] WSREP: Node consistency compromised, aborting...
2022-10-18T04:10:55.965217Z 48 [Note] WSREP: turning isolation on
......
2022-10-18T04:11:00.966995Z 48 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted
2022-10-18T04:11:01.269881Z mysqld_safe Number of processes running now: 0
2022-10-18T04:11:01.274788Z mysqld_safe WSREP: sleeping 15 seconds before restart
2022-10-18T04:11:16.280616Z mysqld_safe mysqld restarted
2022-10-18T04:11:16.308401Z mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/percona-xtradb-cluster/wsrep_recovery.9kU4kC' --pid-file='/var/lib/percona-xtradb-cluster/juju-de2b34-26-lxd-7-recover.pid'
2022-10-18T04:11:25.664909Z mysqld_safe WSREP: Recovered position ea5dc9d5-351f-11eb-9431-aa4bc52e10af:1426184636
Log of wsrep recovery (--wsrep-recover):
2022-10-18T04:11:16.759233Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.20-18-18-log) starting as process 368837 ...
This is reproducible by triggering an SST to a secondary node. One way to do this:
1. Deploy charm using 3 units with min-size 2, eg.
juju deploy -n 3 --series bionic percona-cluster --config min-cluster-size=2
2. Log into a non-leader node, stop mysql, delete /var/lib/ percona- xtradb- cluster/ grastate. dat, and restart percona- xtradb- cluster/ grastate. dat
juju ssh {non leader}
sudo systemctl stop mysql
sudo rm /var/lib/
sudo systemctl start mysql
This will trigger an SST which juju does not know about, which wipes out most of /var/lib/ percona- xtradb- cluster/ (normal operation, as documented in [1]), including the 'seeded' file.
3. After the database completes the SST the mysql percona cluster is completely recovered. Checking /var/lib/ percona- xtradb- cluster/ seeded on the non-leader node, it is missing and juju status will show the unit stuck waiting to bootstrap as follows:
Unit Workload Agent Machine Public address Ports Message
percona-cluster/0* active idle 37 10.133.201.64 3306/tcp Unit is ready
percona-cluster/1 waiting idle 38 10.133.201.185 3306/tcp Unit waiting to bootstrap ('seeded' file missing)
percona-cluster/2 active idle 39 10.133.201.124 3306/tcp Unit is ready
[1] https:/ /docs.percona. com/percona- xtradb- cluster/ 5.7/manual/ xtrabackup_ sst.html