Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Primary node leaving the cluster causes other nodes to crash

Bug #1323412 reported by Fernando Laudares Camargos on 2014-05-26

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Fix Released	High	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.5.39-25.11
5.6	Fix Released	High	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.19-25.6

Bug Description

The environment is a cluster composed of three nodes: db1, db2 and db3). db2 was acting as primary node when it warns for a gap in state sequence and kills some connections.

2014-05-26 14:42:25 27548 [Warning] WSREP: Gap in state sequence. Need state transfer.
2014-05-26 14:42:25 27548 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.42941S), skipping check
2014-05-26 14:42:27 27548 [Note] WSREP: killing local connection: 134143377
2014-05-26 14:42:46 27548 [Note] WSREP: killing local connection: 134129242
2014-05-26 14:42:46 27548 [Note] WSREP: killing local connection: 134143482

It got overloaded a few more times, in one them the gcomm background thread stalled for around 20 seconds:

2014-05-26 14:42:49 27548 [Warning] WSREP: last inactive check more than PT1.5S ago (PT21.1142S), skipping check

Because of that the node dropped from the group while it was requesting for SST but soon after it decided to abort, giving SST was not possible.

2014-05-26 14:43:06 27548 [ERROR] WSREP: Requesting state transfer failed: -125(Operation canceled)
2014-05-26 14:43:06 27548 [ERROR] WSREP: State transfer request failed unrecoverably: 125 (Operation canceled). Most likely it is due to inability to communicate with the cluster primary component. Restart required.
2014-05-26 14:43:06 27548 [Note] WSREP: Closing send monitor...
2014-05-26 14:43:06 27548 [Note] WSREP: Closed send monitor.
2014-05-26 14:43:06 27548 [Note] WSREP: gcomm: terminating thread
2014-05-26 14:43:06 27548 [Note] WSREP: gcomm: joining thread
2014-05-26 14:43:06 27548 [Note] WSREP: gcomm: closing backend

While aborting it tried to leave the cluster gracefully by closeing the gcomm connection, which caused some message exchange between nodes and triggered a bug on db1 and db3, effectivelly crashing both nodes:

db1:
2014-05-26 14:43:21 2979 [Warning] WSREP: evs::proto(9bdb737e-df4a-11e3-87c9-eab020c42bd0, GATHER, view_id(REG,9bdb737e-df4a-11e3-87c9-eab020c42bd0,174)) install timer expired
2014-05-26 14:43:21 2979 [ERROR] WSREP: exception from gcomm, backend must be restarted: NodeMap::value(i).leave_message() == 0: (FATAL)

db3:
2014-05-26 14:43:21 30104 [Warning] WSREP: evs::proto(c7a2a117-daa4-11e3-8b73-863bb950f40a, GATHER, view_id(REG,9bdb737e-df4a-11e3-87c9-eab020c42bd0,174)) install timer expired
2014-05-26 14:43:21 30104 [ERROR] WSREP: exception from gcomm, backend must be restarted: NodeMap::value(i).leave_message() == 0: (FATAL)

This happened in Percona-XtraDB-Cluster-galera-3-3.5-1.216.rhel6.x86_64, built with Galera 25.3.4

I've also opened a bug on Galera's Github as suggested by Teemu: https://github.com/codership/galera/issues/41

Tags:

Valerii Kravchuk (valerii-kravchuk) on 2014-05-27

tags:

added: i42454

Fernando Laudares Camargos (fernando-laudares-camargos) on 2014-05-27

Changed in percona-xtradb-cluster:
status:	New → Confirmed

Revision history for this message

Teemu Ollakka (teemu-ollakka) wrote on 2014-05-29:

Fix available in https://github.com/codership/galera/commit/ed9b4bb58c76e669ef0a96f02dccee77691800eb

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1005

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.