MySQL dump created via charm action cannot be restored on a different cluster

Bug #2040159 reported by Gabriel Samfira
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
In Progress
Undecided
Unassigned

Bug Description

We had a scenario where we needed to migrate an entire keystone service from one model, to a different model. This was needed because the existing cloud was to be expanded to multiple regions, where one keystone would be responsible for them all.

So the need arose to dump the keystone database and restore it in a different model in the MySQL innodb cluster present there. The problem is that once we restored the database dump, all secondary machines exited the cluster.

If we restored the dump without first creating the database, each of the secondary nodes would fail with:

32023-10-23T08:38:23.635226Z 15 [ERROR] [MY-010584] [Repl] Replica SQL for channel 'group_replication_applier': Worker 1 failed executing transaction '60736a51-717d-11ee-8f2e-00163e69f32a:51'; Error 'Can't drop database 'keystone'; database doesn't exist' on query. Default database: 'keystone'. Query: 'drop database keystone', Error_code: MY-001008

If the database was created prior to restoring the dump, it would fail with:

2023-10-21T13:31:28.452077Z 15 [ERROR] [MY-010584] [Repl] Replica SQL for channel 'group_replication_applier': Worker 1 failed executing transaction 'd0ba5d0d-700f-11ee-b708-00163e53fc6f:1545'; Could not execute Update_rows event on table keystone.local_user; Cannot add or update a child row: a foreign key constraint fails (`keystone`.`local_user`, CONSTRAINT `local_user_user_id_fkey` FOREIGN KEY (`user_id`, `domain_id`) REFERENCES `user` (`id`, `domain_id`) ON DELETE CASCADE ON UPDATE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW, Error_code: MY-001452
2023-10-21T13:31:28.453576Z 14 [Warning] [MY-010584] [Repl] Replica SQL for channel 'group_replication_applier': ... The replica coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756

This turned out to be because of the way the MySQL database dump was created. Currently, the action sets the --set-gtid-purged to COMMENTED, which according to the documentation[1], will set the `sql_log_bin` variable. This in turn breaks replication. We had to remove both secondary units and add new ones to recover.

Funny enough, this warning is displayed when creating a dump:

gabriel@arrakis:~/work/cloudbase/charm-mysql-innodb-cluster$ juju run mysql-innodb-cluster/11 mysqldump databases=keystone --wait=10m
Running operation 33 with 1 task
  - task 34 on unit-mysql-innodb-cluster-11

Waiting for task 34...
mysqldump-file: /var/backups/mysql/mysqldump-keystone-202310231145.gz
outcome: Success

Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.
Warning: A dump from a server that has GTIDs enabled will by default include the GTIDs of all transactions, even those that were executed during its extraction and might not be represented in the dumped data. This might result in an inconsistent data dump.
In order to ensure a consistent backup of the database, pass --single-transaction or --lock-all-tables or --master-data.

For this use case, the proper value for that option is "OFF". This will allow mysql group replication to properly handle the GTID and just insert the needed information into the database.

[1]https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html#option_mysqldump_set-gtid-purged

Changed in charm-mysql-innodb-cluster:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.