Too few wsgi workers for keystone

Bug #1556102 reported by Leontiy Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
Undecided
MOS Keystone

Bug Description

We faced with an issue during boot_and_list rally scenario: http://paste.openstack.org/show/490154/
We run rally on dedicated harware node with 48 CPUs and 256 RAM
rally@6bb2d5d3882c:~$ rally --version
0.3.2~dev84

We got 500 HTTP error:
from rally log: http://paste.openstack.org/show/490147/
The error was from nova-api:
find 500 HTTP code at the time from haproxy log: http://paste.openstack.org/show/490146/
look at nova log on the node which we know from haproxy: http://paste.openstack.org/show/490145/
can't find a keystone node from haproxy log : http://paste.openstack.org/show/490150/
There is node error in keystone logs at the same time. But I've noticed that there is only 6 processes for keystone wsgi workers: http://paste.openstack.org/show/490153/
And it seems all of the workers was overloaded:
atop from node-1 controller: http://paste.openstack.org/show/490156/
atop from node-2 controller: http://paste.openstack.org/show/490158/
atop from node-3 controller: http://paste.openstack.org/show/490159/

Environment description:
3controllers,20computes+ceph,176computes,Ceph_for_all,vxlan

[root@fuel ~]# cat /etc/fuel/8.0/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "8.0"
  api: "1.0"
  build_number: "570"
  build_id: "570"
  fuel-nailgun_sha: "558ca91a854cf29e395940c232911ffb851899c1"
  python-fuelclient_sha: "4f234669cfe88a9406f4e438b1e1f74f1ef484a5"
  fuel-agent_sha: "658be72c4b42d3e1436b86ac4567ab914bfb451b"
  fuel-nailgun-agent_sha: "b2bb466fd5bd92da614cdbd819d6999c510ebfb1"
  astute_sha: "b81577a5b7857c4be8748492bae1dec2fa89b446"
  fuel-library_sha: "c2a335b5b725f1b994f78d4c78723d29fa44685a"
  fuel-ostf_sha: "3bc76a63a9e7d195ff34eadc29552f4235fa6c52"
  fuel-mirror_sha: "fb45b80d7bee5899d931f926e5c9512e2b442749"
  fuelmenu_sha: "78ffc73065a9674b707c081d128cb7eea611474f"
  shotgun_sha: "63645dea384a37dde5c01d4f8905566978e5d906"
  network-checker_sha: "a43cf96cd9532f10794dce736350bf5bed350e9d"
  fuel-upgrade_sha: "616a7490ec7199f69759e97e42f9b97dfc87e85b"
  fuelmain_sha: "d605bcbabf315382d56d0ce8143458be67c53434"

!!! NOTE: RabbitMQ was replaced by ZeroMQ in this installation. Actually, rabbitmq wasn't removed, but zeromq was installed and oslo-messaging worked via zeromq.

Will try to generate Diagnostic Snapshot, but there is bug which can prevent to do that (https://bugs.launchpad.net/fuel/+bug/1546023)

Logs from controller nodes are here: http://mos-scale-share.mirantis.com/bug-1556102.tar.gz

Tags: scale
description: updated
Revision history for this message
Ilya Shakhat (shakhat) wrote :

According to keystone.log the average request to API takes >20 second. e.g.:
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.103 - - [11/Mar/2016:12:47:31 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 22387336 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.103 - - [11/Mar/2016:12:47:43 +0000] "POST /v2.0/tokens HTTP/1.1" 200 4577 10758361 "-" "neutron/7.0.2 keystonemiddleware.auth_token/2.3.2"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.103 - - [11/Mar/2016:12:47:32 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 22235663 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.101 - - [11/Mar/2016:12:47:33 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 21911819 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.103 - - [11/Mar/2016:12:47:32 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 23379620 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.28 - - [11/Mar/2016:12:47:35 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 22400081 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.28 - - [11/Mar/2016:12:47:35 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 23467167 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.28 - - [11/Mar/2016:12:47:36 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 22843466 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.103 - - [11/Mar/2016:12:47:36 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 23528153 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.28 - - [11/Mar/2016:12:47:37 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 23840234 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access - - - [11/Mar/2016:12:47:48 +0000] "GET /v3 HTTP/1.0" 500 251 12117172 "-" "-"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.101 - - [11/Mar/2016:12:47:39 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7687 22441397 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access - - - [11/Mar/2016:12:47:49 +0000] "GET /v3 HTTP/1.0" 500 251 12886038 "-" "-"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.28 - - [11/Mar/2016:12:47:38 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 23778944 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access - - - [11/Mar/2016:12:47:49 +0000] "GET /v3 HTTP/1.0" 500 251 13210862 "-" "-"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.101 - - [11/Mar/2016:12:47:41 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 21187830 "-" "python-keystoneclient"
<46>Mar 11 12:48:03 node-1 keystone_wsgi_main_access 192.168.0.101 - - [11/Mar/2016:12:47:40 +0000] "GET /v3/auth/tokens HTTP/1.1" 200 7649 22641118 "-" "python-keystoneclient"

description: updated
Revision history for this message
Boris Bobrov (bbobrov) wrote :

6 processes is enough for our scale lab and for all our scale labs.

There is a problem that remote env is not owned by Mirantis and the access there is very complicated. Could you please post keystone logs from /var/log/keystone/* from all controllers?

Changed in mos:
status: New → Incomplete
description: updated
description: updated
description: updated
summary: - too few wsgi workers for keystone
+ Too few wsgi workers for keystone
Revision history for this message
Leontiy Istomin (listomin) wrote :

rally.log is attached

Revision history for this message
Leontiy Istomin (listomin) wrote :

Hasn't been reproduced with default MOS 8.0 (without ZeroMQ) on the same environment.
It could be https://bugs.launchpad.net/bugs/1555007.

Changed in mos:
status: Incomplete → Invalid
Changed in mos:
milestone: 8.0-mu-1 → 8.0-updates
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.