Issue with signals and wsrep_sst_* scripts

Bug #1190787 reported by Raghavendra D Prabhu
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
New
Undecided
Unassigned
5.5
Won't Fix
Undecided
Unassigned
5.6
Won't Fix
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Low
Unassigned
5.6
Fix Committed
Undecided
Unassigned

Bug Description

I will add more details on this as I get more info.

However, what I have seen is that if and when SST gets stuck (or
few other conditions), mysqld shuts down but wsrep_sst_*,
xbstream etc. remain. They can't be killed with a SIGTERM either,
requiring SIGKILL finally.

I surmise that this may be due to signal handling in posix_spawn
used by wsrep.

Changed in percona-xtradb-cluster:
milestone: none → 5.5.31-25
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

To add what happens is as follows:

mysqld on joiner spawns wsrep_sst_xtrabackup which in turn spaans
netcat, xbcrypt and xbstream.

Now when xbcrypt dies, the whole setup hangs.

This may be a different bug altogether though.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

The issue mentioned in description of the bug - I see it on donor - when joiner hangs, if I SIGQUIT the donor, it quits as

=================
130614 6:22:49 InnoDB: Shutdown completed; log sequence number 9618707
130614 6:22:49 [ERROR] Plugin 'InnoDB' has ref_count=1 after shutdown.
130614 6:22:49 [Note] /pxc/bin/mysqld: Shutdown complete

Error in my_thread_global_end(): 1 threads didn't exit
==================

It leaves behind:

mysql 2453 0.0 0.2 147764 17832 pts/21 S 06:17 0:00 perl /usr/bin/innobackupex --galera-info --stream=xbstream --defaults-file=/pxc/etc/my.cnf.local --socket=/pxc/datadir/pxc.sock --user=root --password=test --encrypt=AES256 --encrypt-key=6F3AD9F428143F133FD7D50D77D91EA4 /tmp
mysql 2464 0.0 0.0 269048 5764 pts/21 Sl 06:17 0:00 xtrabackup_55 --defaults-file=/pxc/etc/my.cnf.local --defaults-group=mysqld --backup --suspend-at-end --target-dir=/tmp --tmpdir=/tmp --encrypt=AES256 --encrypt-key=6F3AD9F428143F133FD7D50D77D91EA4 --encrypt-threads=1 --stream=xbstream

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

After kill -9 the innobackupex, it displays

WSREP_SST: [ERROR] innobackupex finished with error: 137. Check /pxc/datadir//innobackup.backup.log (20130614 06:25:37.905)

on terminal with this

xtrabackup: innodb_log_files_in_group = 2
xtrabackup: innodb_log_file_size = 20971520
xtrabackup: using O_DIRECT
130614 6:17:57 InnoDB: Warning: allocated tablespace 10, old maximum was 0
>> log scanned up to (9618553)
[01] Encrypting and streaming ./ibdata1
^Gxtrabackup_55: Error writing file 'UNOPENED' (Errcode: 32)
encrypt: write to the destination file failed.
xb_stream_write_data() failed.
>> log scanned up to (9618553)
>> log scanned up to (9618553)
>> log scanned up to (9618553)
>> log scanned up to (9618553)
>> log scanned up to (9618553)
>> log scanned up to (9618553)
>> log scanned up to (9618553)
>> log scanned up to (9618553)
>> log scanned up to (9618553)

This probably is an xtrabackup bug.

Changed in percona-xtradb-cluster:
importance: Undecided → Low
Changed in percona-xtradb-cluster:
milestone: 5.5.33-23.7.6 → future-5.5
Revision history for this message
Anthony Somerset (anthonysomerset) wrote :

I can confirm similar issues, for me i was able to solve by downgrading percona-xtrabackup to percona-xtrabackup-20 on debian wheezy and all works correctly as it should for SST

Revision history for this message
Amol (ajkedar) wrote :

Hi this bug affects us in production and we would like to know when is it fixed?
or is it fixed in 5.5.33?

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Anthony,

If downgrading xtrabackup helped, then it may be a different issue since this is not related to xtrabackup per se. Please report it separately with logs. Also which version of Xtrabackup did you downgrade from?

@Amol,

Again, need details on this. In general, this should affect you only if due to a bug in SST/xtrabackup, it gets stuck.

In general, PXB also has the 'xtrabackup alive after innobackupex death' bug fixed now

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

For the last 'bug fixed' mentioned above, https://bugs.launchpad.net/percona-xtrabackup/+bug/1135441 is the one fixed and released in 2.1.5

Revision history for this message
Przemek (pmalkowski) wrote :
Download full text (6.7 KiB)

I can confirm this still happens in latest PXC versions. For example when we forget to allow SST TCP port (4444), the joiner is not cleaning it's processes after failed SST attempt.

percona33 mysql> select @@version,@@version_comment;
+--------------------+---------------------------------------------------------------------------------------------------+
| @@version | @@version_comment |
+--------------------+---------------------------------------------------------------------------------------------------+
| 5.6.20-68.0-56-log | Percona XtraDB Cluster (GPL), Release rel68.0, Revision 888, WSREP version 25.7, wsrep_25.7.r4126 |
+--------------------+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

[root@percona33 ~]# iptables -I INPUT -p tcp --dport 4444 -j REJECT

[root@percona33 ~]# service mysql stop
Shutting down MySQL (Percona XtraDB Cluster).... SUCCESS!
[root@percona33 ~]# rm -f /var/lib/mysql/grastate.dat

[root@percona33 ~]# service mysql start
Starting MySQL (Percona XtraDB Cluster)...State transfer in progress, setting sleep higher
. ERROR! The server quit without updating PID file (/var/lib/mysql/percona33.pid).
 ERROR! MySQL (Percona XtraDB Cluster) server startup failed!

-- in the error log on joiner:
2014-09-08 11:01:09 16425 [Note] WSREP: New cluster view: global state: c3b203a1-3435-11e4-aa44-9605577e3230:0, view# 5: Primary, number of nodes: 3, my index: 0, protocol version 3
2014-09-08 11:01:09 16425 [Warning] WSREP: Gap in state sequence. Need state transfer.
2014-09-08 11:01:09 16425 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.4.40' --auth 'root:' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '16425' --binlog 'percona33-bin' '
WSREP_SST: [INFO] Streaming with xbstream (20140908 11:01:09.931)
WSREP_SST: [INFO] Using socat as streamer (20140908 11:01:09.933)
WSREP_SST: [INFO] Evaluating timeout 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20140908 11:01:10.679)
2014-09-08 11:01:10 16425 [Note] WSREP: Prepared SST request: xtrabackup-v2|192.168.4.40:4444/xtrabackup_sst
2014-09-08 11:01:10 16425 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2014-09-08 11:01:10 16425 [Note] WSREP: REPL Protocols: 6 (3, 2)
2014-09-08 11:01:10 16425 [Note] WSREP: Service thread queue flushed.
2014-09-08 11:01:10 16425 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
2014-09-08 11:01:10 16425 [Note] WSREP: Service thread queue flushed.
2014-09-08 11:01:10 16425 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (c3b203a1-3435-11e4-aa44-9605577e3230): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():455. IST will be unavailable.
2014-09-08 11:01:10 16425 [Note] WSREP: Member 0.0 (percona33) requested state transfer from '*any*'. Selected 1.0 (percona22)(SYNCED) as donor.
2014-09-08 11:...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

This has been in recent fixes, marking as fix committed.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

As has been noted, it appears that posix_spawn does not seem to provide for passing a signal from parent to child. Alternative fix https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1382797 seems to be non-standard and Linux-specific.

This leaves us with SST script needs to watch the parent process status via supplied PID and take appropriate actions when the parent dies.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Alex,

It is possible for us to add ifdef for Linux in the fix of lp:1382797, so that on linux, it works as expected and as before elsewhere.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Shouldn't this bug be marked as a duplicate of bug #1382797?

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Raghu,

I think it is a question first to Alexey (akopytov). But as far as I'm concerned
1) as I understand that will be a very big ifdef
2) this will make requirements for SST scripts differ between Linux and other platforms.
The latter consideration kinda kills the idea to me.

And yes, this and lp:1382797 must be duplicates

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Marked as duplicate as requested.

But, yes, even if we add ifdef, there are some bits that need to be in non-Linux code as well - setsid being one of them.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.