cs:xenial/percona-cluster-247 startup race condition: MySQL server has gone away
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Landscape Server |
Fix Released
|
High
|
David Britton | ||
OpenStack Percona Cluster Charm |
Fix Released
|
High
|
David Ames | ||
percona-cluster (Juju Charms Collection) |
Invalid
|
High
|
David Ames |
Bug Description
Found this in the mysql unit logs:
2017-01-04 20:06:13 INFO juju-log shared-db:71: Writing file /var/lib/
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 INFO shared-
2017-01-04 20:06:13 ERROR juju.worker.
Getting on the node wasn't a lot more help... The service can't be restarted because of already existing processes, killing those yields different errors. Feels like something has got in a pretty bad bind.
Now, looks like originally, it hit this error:
SREP_SST: [INFO] Streaming with xbstream (20170104 20:05:55.435)
WSREP_SST: [INFO] Using socat as streamer (20170104 20:05:55.453)
WSREP_SST: [INFO] Using /tmp/tmp.TN5S6deAAd as innobackupex temporary directory (20170104 20:05:56.241)
WSREP_SST: [INFO] Streaming GTID file before SST (20170104 20:05:56.295)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:10.5.1.90:4444; RC=( ${PIPESTATUS[@]} ) (20170104 20:05:56.320)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20170104 20:05:56.378)
WSREP_SST: [INFO] Streaming the backup to joiner at 10.5.1.90 4444 (20170104 20:06:06.408)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-
20:06:11 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https:/
key_buffer_
read_buffer_
max_used_
max_threads=25002
thread_count=4
connection_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x171a450
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f945c031e58 thread_stack 0x30000
/usr/sbin/
/usr/sbin/
/lib/x86_
[0x7f92080000a8]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 188
Status: KILL_CONNECTION
You may download the Percona XtraDB Cluster operations manual by visiting
http://
in the manual which will help you identify the cause of the crash.
WSREP_SST: [ERROR] innobackupex finished with error: 9. Check /var/lib/
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20170104 20:06:11.747)
WSREP_SST: [INFO] Cleaning up temporary directories (20170104 20:06:11.757)
After talking with thedac, it seems our max-connections are set too high for xenial (25000). Thedac and beisner suggested 2000 for a default. I suggest the charm should probably do this as well.
Changed in landscape: | |
milestone: | none → 16.12 |
tags: | added: landscape |
Changed in landscape: | |
status: | New → In Progress |
assignee: | nobody → David Britton (davidpbritton) |
importance: | Undecided → High |
Changed in landscape: | |
status: | In Progress → Fix Committed |
Changed in landscape: | |
milestone: | 16.12 → 17.01 |
Changed in landscape: | |
status: | Fix Committed → Fix Released |
Changed in charm-percona-cluster: | |
assignee: | nobody → David Ames (thedac) |
importance: | Undecided → High |
status: | New → In Progress |
Changed in percona-cluster (Juju Charms Collection): | |
status: | In Progress → Invalid |
Changed in charm-percona-cluster: | |
status: | In Progress → Fix Committed |
Changed in charm-percona-cluster: | |
milestone: | 17.05 → 17.08 |
Changed in charm-percona-cluster: | |
status: | Fix Committed → Fix Released |
Theory: I think it's very specifically a 5.5 to 5.6 thing, not necessarily the init sys or server release. performance_schema flipped default values, and now it grabs the mem in advance [1]
So with 5.5, there was no down side to setting max connections way higher than it needed to be or should be. But with 5.6, you pay for that with memory allocation (or potentially OOM, which causes a weird wild goose chase of errors).
I'd lean toward keeing keeping the charm's low default max conn so that it is usuably for non-OpenStack contexts out of the box. Exposing a config flag to bit flip the performance_schema config might be worth exploring. That, or 5.6 users will have to get nitty gritty about tuning their db settings to the resources available.
[1] http:// dev.mysql. com/doc/ relnotes/ mysql/5. 6/en/news- 5-6-6.html