Having wsrep_causal_reads enabled globally on the donor node potentially breaks SST

Bug #1398284 reported by Doug Barth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

We (PagerDuty) have wsrep_causal_reads enabled globally on all our cluster nodes. We've noticed as we have made changes to the cluster (adding new nodes to vertically scale the cluster), that xtrabackup based SST had a tendency to fail near the end of backup process due to the following error.

DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr//bin/innobackupex line 3016.
innobackupex: Error:
Error executing 'SHOW STATUS': DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try restarting transaction at /usr//bin/innobackupex line 3016.
140605 22:35:15 innobackupex: Waiting for ibbackup (pid=6698) to finish

I originally posted about this issue on the mailing list: https://groups.google.com/forum/#!topic/percona-discussion/0X9Zeon3lpY

We were able to track the problem back to having wsrep_causal_reads enabled on the donor node. Disabling that setting on the donor node just prior to kicking off SST allows the xtrabackup SST process to complete successfully. We have followed this procedure for several SSTs (probably about 5) and have never experienced the issue.

We were also able to determine that this issue only ever shows up during SST on a cluster under load. Our load test cluster performs SSTs on a weekly basis and has never experienced this issue. The production cluster is frequently affected.

We are in the process of removing the global setting of wsrep_causal_reads in our my.cnf file, but perhaps the SST script should explicitly disable causal reads on the donor node before taking the backup to avoid this issue altogether.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Doug,

This shouldn't happen with latest PXC/PXB combination where backup locks are used. Can you provide versions of pkgs installed.

Revision history for this message
Doug Barth (dougbarth) wrote :

We're currently running these packages. Unfortunately I can't confirm that this particular combination experiences the issue since we've now disable causal reads before starting a new node.

ii percona-toolkit 2.2.7-1~dfsg1 all Command-line tools for MySQL and system tasks
ii percona-xtrabackup 2.1.9-744-1.trusty amd64 Open source backup tool for InnoDB and XtraDB
ii percona-xtradb-cluster-client-5.5 5.5.39-25.11-816.trusty amd64 Percona XtraDB Cluster database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.39-25.11-816.trusty all Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
ii percona-xtradb-cluster-galera-2.x 165-0ubuntu1 amd64 Synchronous multi-master replication plugin for transactional applications
ii percona-xtradb-cluster-server-5.5 5.5.39-25.11-816.trusty amd64 Percona XtraDB Cluster database server binaries

It looks like backup locks require MySQL 5.6? We are still on MySQL 5.5.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote : Re: [Bug 1398284] Re: Having wsrep_causal_reads enabled globally on the donor node potentially breaks SST
Download full text (3.7 KiB)

PXB 2.2.3 fixed the issue with causal-reads wrt. SST in this issue
https://bugs.launchpad.net/percona-xtrabackup/+bug/1320441

Upgrading PXB should help. (or using rsync as the SST method till then if
you want to use causal-reads globally (which is not recommended)).

 Regards,
 --
 Raghavendra Prabhu | http://about.me/raghavendra.prabhu
 Contact: http://wnohang.net/contact | GPG: 0xD72BE977

On Mon, Dec 8, 2014 at 10:07 PM, Doug Barth <email address hidden> wrote:

> We're currently running these packages. Unfortunately I can't confirm
> that this particular combination experiences the issue since we've now
> disable causal reads before starting a new node.
>
> ii percona-toolkit 2.2.7-1~dfsg1
> all Command-line tools for MySQL and system tasks
> ii percona-xtrabackup 2.1.9-744-1.trusty
> amd64 Open source backup tool for InnoDB and XtraDB
> ii percona-xtradb-cluster-client-5.5 5.5.39-25.11-816.trusty
> amd64 Percona XtraDB Cluster database client binaries
> ii percona-xtradb-cluster-common-5.5 5.5.39-25.11-816.trusty
> all Percona XtraDB Cluster database common files (e.g.
> /etc/mysql/my.cnf)
> ii percona-xtradb-cluster-galera-2.x 165-0ubuntu1
> amd64 Synchronous multi-master replication plugin for transactional
> applications
> ii percona-xtradb-cluster-server-5.5 5.5.39-25.11-816.trusty
> amd64 Percona XtraDB Cluster database server binaries
>
> It looks like backup locks require MySQL 5.6? We are still on MySQL 5.5.
>
> --
> You received this bug notification because you are subscribed to Percona
> XtraDB Cluster.
> Matching subscriptions: percona-xtradb-cluster
> https://bugs.launchpad.net/bugs/1398284
>
> Title:
> Having wsrep_causal_reads enabled globally on the donor node
> potentially breaks SST
>
> Status in Percona XtraDB Cluster - HA scalable solution for MySQL:
> New
>
> Bug description:
> We (PagerDuty) have wsrep_causal_reads enabled globally on all our
> cluster nodes. We've noticed as we have made changes to the cluster
> (adding new nodes to vertically scale the cluster), that xtrabackup
> based SST had a tendency to fail near the end of backup process due to
> the following error.
>
> DBD::mysql::db selectall_hashref failed: Lock wait timeout exceeded; try
> restarting transaction at /usr//bin/innobackupex line 3016.
> innobackupex: Error:
> Error executing 'SHOW STATUS': DBD::mysql::db selectall_hashref failed:
> Lock wait timeout exceeded; try restarting transaction at
> /usr//bin/innobackupex line 3016.
> 140605 22:35:15 innobackupex: Waiting for ibbackup (pid=6698) to finish
>
> I originally posted about this issue on the mailing list:
> https://groups.google.com/forum/#!topic/percona-discussion/0X9Zeon3lpY
>
> We were able to track the problem back to having wsrep_causal_reads
> enabled on the donor node. Disabling that setting on the donor node
> just prior to kicking off SST allows the xtrabackup SST process to
> complete successfully. We have followed this procedure for several
> SSTs (probably about 5) and have never experienced the issue.
>
> We were also able to det...

Read more...

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Marking as a duplicate of bug #1320441. Please reopen if the problem can be reproduced with PXB 2.2.3 or later versions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.