Keystone ends up in error state when revoking big number of tokens at once
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Dmitry Ilyin | |||
Mitaka |
Fix Released
|
High
|
Dmitry Ilyin | |||
Newton |
Fix Committed
|
High
|
Dmitry Ilyin | |||
Mirantis OpenStack | Status tracked in 10.0.x | |||||
10.0.x |
Fix Committed
|
High
|
Dmitry Ilyin |
Bug Description
Environment:
Reproduced on RackSpace lab, 3 controllers, 197 computes, VxLAN+DVR, MOS 9.0 ISO 188
Detailed description:
Keystone caches the whole revoke tree, which can exceed the 1M memcached object size limit if huge number of tokens get revoked at the same time (details: https:/
from keystone adimn log file http://
After that keystone breaks its operation and cluster in not usable.
Keystone error:
2016-04-18 09:44:57.484 33105 ERROR keystone.
2016-04-18 09:44:57.484 33105 ERROR keystone.
Steps to reproduce:
1. set backend = dogpile.
2. Perform raly tests. All rally tests was failed excluding only three of them (results of the three tests are attached - rally_report.html)
3. found the following bug https:/
4. tried http://
run rally scenario KeystoneBasic.
{
"kw": {
"runner": {
"type": "constant",
"
"times": 1970
},
"sla": {
"
"max": 0
}
},
"context": {
"
"keystone": {
}
}
}
},
"name": "KeystoneBasic.
"pos": 0
}
diagnostic snapshot: http://
etc and log folders from controller nodes: http://
Changed in mos: | |
assignee: | Boris Bobrov (bbobrov) → MOS Keystone (mos-keystone) |
tags: | added: area-keystone |
Changed in mos: | |
status: | New → Confirmed |
importance: | Undecided → High |
milestone: | none → 9.0 |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: | added: blocker-for-qa |
Changed in mos: | |
assignee: | MOS Keystone (mos-keystone) → MOS Puppet Team (mos-puppet) |
Changed in mos: | |
assignee: | MOS Puppet Team (mos-puppet) → Dmitry Ilyin (idv1985) |
Changed in mos: | |
status: | Confirmed → Fix Committed |
Changed in mos: | |
status: | Fix Committed → Fix Released |
First glance: revocation tree grows larger 1M and gets inacceptable for caching in memcached.
I.e. memcahced doesn't accept such size.
The problem appeared to be well-known in the Community and they suggest just turning revocation caching off.