Nova becomes slow after controller shutdown because 1 of memcached servers is unavailable
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Mirantis OpenStack | Status tracked in 10.0.x | |||||
10.0.x |
Fix Committed
|
Medium
|
MOS Nova | |||
8.0.x |
Won't Fix
|
Medium
|
MOS Nova | |||
9.x |
Fix Committed
|
Medium
|
MOS Nova |
Bug Description
Fuel version info (7.0 build #298): http://
If some of memcached instances used by Nova are unavailable then all Nova API operations take >2x time, Horizon and CLI work slowly, for example here are results of Rally tests on baremetal lab:
# All (3) controller nodes online
+------
| Response Times (sec) |
+------
| action | min | median | 90%ile | 95%ile | max | avg | success | count |
+------
| nova.list_servers | 0.239 | 0.283 | 0.319 | 0.361 | 1.084 | 0.297 | 100.0% | 601 |
| total | 0.24 | 0.283 | 0.319 | 0.361 | 1.084 | 0.297 | 100.0% | 601 |
+------
Load duration: 60.2184360027
Full duration: 70.9516170025
Opening of 'instances' tab in Horizon takes ~2 seconds (1 instance is running)
# 1 controller node is down, 2 controllers are online
+------
| Response Times (sec) |
+------
| action | min | median | 90%ile | 95%ile | max | avg | success | count |
+------
| nova.list_servers | 0.228 | 0.356 | 3.257 | 3.285 | 6.003 | 1.421 | 100.0% | 121 |
| total | 0.228 | 0.356 | 3.257 | 3.285 | 6.003 | 1.422 | 100.0% | 121 |
+------
Load duration: 60.1724669933
Full duration: 104.123668909
Opening of 'instances' tab in Horizon takes ~12 seconds (1 instance is running), I extended logging in Apache and it shows that request hadling took >8 seconds:
192.168.0.2 - - [22/Sep/
# 1 controller node is down, memcached usage is disabled for Nova services (I just commented 'memcached_servers' setting in nova.conf on alive controllers and restarted Nova services)
+------
| Response Times (sec) |
+------
| action | min | median | 90%ile | 95%ile | max | avg | success | count |
+------
| nova.list_servers | 0.217 | 0.253 | 0.55 | 1.277 | 2.669 | 0.39 | 100.0% | 460 |
| total | 0.217 | 0.253 | 0.55 | 1.277 | 2.669 | 0.391 | 100.0% | 460 |
+------
Load duration: 61.1688148975
Full duration: 89.9979431629
Opening of 'instances' tab in Horizon takes ~6 seconds (1 instance is running):
192.168.0.2 - - [22/Sep/
5.0 (X11; Ubuntu; Linux x86_64; rv:36.0) Gecko/20100101 Firefox/36.0" **5/5374409**
As you can see cloud API performance degradation is significant and it is not caused by Keystone (actually keystone also works little bit slower, but looks like it's expected behaviour, see bug #1405549 ). I tried to play with different options for memcached in nova.conf (e.g. memcache_
Steps to reproduce:
1. Deploy cloud with 3 controllers
2. Run benchmark tests for Nova API
3. Shutdown 1 of controllers
4. Run benchmark tests for Nova API
5. Compare benchmark tests results
Expected result: there is no performance degradation after controller shutdown or degradation is lower then 33%
Actual result: there is 80-percent degradation
Also I tested a case when 2 of 5 controllers nodes are down and the results were even worst - about 90% of performance degradation in Nova API, execution of simple CLI command 'nova list' took up to 15 seconds.
Changed in fuel: | |
milestone: | none → 8.0 |
Changed in fuel: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
assignee: | MOS Nova (mos-nova) → MOS Keystone (mos-keystone) |
tags: | added: long-haul-testing |
Changed in fuel: | |
assignee: | MOS Keystone (mos-keystone) → MOS Nova (mos-nova) |
tags: | added: area-mos |
Summary:
1. Nova doesn’t use Memcached in MOS 7.0 – it isn’t configured in nova.conf to use memcached
2. Keystone uses for Memcached 1 sec timeout and 30 sec dead_retry
3. The dead_retry is set to 30 sec due to FUEL requirements, it doesn’t like to wait for 300 sec at installation time
What need to be done:
1. Validate that Nova (keystone client) doesn’t use memcached
2. Validate that long delay (40 sec) comes from interactions with Keystone and it isn’t internal to Nova client