Charm cannot scale out the cluster

Bug #1945002 reported by Peter Matulis
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Fix Released
High
Unassigned

Bug Description

Scaling out the mysql-innodb-application involves Juju command `add-unit` and mysql-innodb-cluster charm action 'add-instance'. On a freshly deployed (and functional) cloud (with three database units) I attempted to add a fourth node. It did not work. The workload status of the new unit remained at:

"Instance not yet configured for clustering"

I tried both revisions 11 and 85 (upgrade-charm) of the charm. Using action 'update-unit-acls' available in Rev 85 (followed by action 'add-instance') had no effect, which I did not expect it would as all IP addresses are on the same subnet.

Please see attachment 'mysql-innodb-cluster-bug-notes.txt' for details.

I could not find anything of interest in the unit logs.

p.s. I was able to deploy a cloud with four database units out of the box.

Tags: scale-out
Revision history for this message
Peter Matulis (petermatulis) wrote :
Revision history for this message
Jake Hill (routergod) wrote :

FWIW these are the error.log and the juju debug from my model where I also hit this problem. I was not able to deduce anything from them unfortunately.

Revision history for this message
Jake Hill (routergod) wrote :
tags: added: scale-out
Changed in charm-mysql-innodb-cluster:
status: New → Confirmed
importance: Undecided → High
milestone: none → 22.04
Revision history for this message
Peter Matulis (petermatulis) wrote :

As a datapoint, the following scenario works:

1. Started with a three-node lxd-based cluster
2. Removed one node (using action `remove-instance address=X.X.X.X` and command `juju remove-unit`)
3. Added one node (using command `juju add-unit`)
4. The new node got the same IP address as the previously removed node

Revision history for this message
Peter Matulis (petermatulis) wrote :

Following up on comment #4, I added a fourth node to my scenario and that worked too.

Revision history for this message
Kirill Andrienko (andrico) wrote (last edit ):

Looks like the issue mysql.user table on existing cluster nodes aren't updated with new nodes IP addresses.

I've also stucked with 'Instance not yet configured for clustering' issue and after several days of inefficient troubleshooting finally realized wtf is going on. New node unsuccessfully trying to login to the existing cluster and finally stucks in 'Instance not yet configured for clustering' status.

After reviewing of mysql.user table I've realized there's no 'clusteruser' with new node IP address created. Also noticed that no old users for deleted instances were cleaned up.

mysql> select user,host from mysql.user where user like 'clusteruser';
+-------------+---------------+
| user | host |
+-------------+---------------+
| clusteruser | 10.69.157.162 |
| clusteruser | 10.69.157.163 |
| clusteruser | 10.69.157.164 |
| clusteruser | 10.69.157.165 |
| clusteruser | 10.69.157.166 |
| clusteruser | 10.69.157.167 |
| clusteruser | 10.69.157.206 |
| clusteruser | 10.69.157.207 |
| clusteruser | 10.69.157.208 |
| clusteruser | 10.69.157.236 |
| clusteruser | 10.69.157.237 |
| clusteruser | 10.69.157.238 |
| clusteruser | localhost |
+-------------+---------------+
13 rows in set (0.00 sec)

So in the table above we see 12 users left from previous scale-out attempts, but app have only 3 units currently - .165, .167, .206. So removal part of the charm needs to be fixed and add-part also.

So the solution is to create user with right IP address right before adding the unit. After adding the user manually DB charm then works like a charm :)
CREATE USER 'clusteruser'@'<new_node_ip_address>' IDENTIFIED BY '<cluster_password_you_have_to_acquire_from_leader>';

Revision history for this message
Kirill Andrienko (andrico) wrote :

'coordinator' list also not updated: I see all units - existing and non-existing in 'coordinator' list returned by 'juju run --unit mysql-innodb-cluster/leader leader-get'

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This now works and is tested in gate. Therefore marking as fix released. If the bug still persists then please add new debugging information and re-open the bug.

Changed in charm-mysql-innodb-cluster:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.