Keystone connection issue during big heat stack creation

Bug #1680430 reported by Sergey Galkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
fuel-ccp
New
Undecided
Unassigned

Bug Description

Steps to reproduce
1. Deploy ccp with configs from https://review.openstack.org/#/c/451419/ on the 152 nodes
2. Try to run shaker (http://pyshaker.readthedocs.io/en/latest/) scenario openstack/full_l2

Creation of heat stack always failed with error Unable to establish connection to http://keystone.ccp.svc.cluster.local:35357/v3/auth/tokens: ('Connection aborted.', BadStatusLine("''",))
On example
2017-04-06 11:30:46.379 33032 ERROR shaker.engine.server Exception: Failed to deploy Heat stack 76fa9649-fecb-434c-b0e4-c3380666f318. Expected status COMPLETE, but got FAILED. Reason: Resource CREATE failed: ConnectFailure: resources.shaker_uskadi_slave_25: Unable to establish connection to http://keystone.ccp.svc.cluster.local:35357/v3/auth/tokens: ('Connection aborted.', BadStatusLine("''",))

Changing replicas for keystone up to 10 in the topology.yaml a little bit helps and heat stack deploying successfully from time to time

Logs from all keystones pods does not show any errors.

Tags: scale
Sergey Galkin (sgalkin)
summary: - Keystone connection issue during big heat stake creation
+ Keystone connection issue during big heat stack creation
Revision history for this message
Yuriy Taraday (yorik-sar) wrote :

I've investigated this issue and here's what I've found.

From the issue in requests [0] it seemed that most likely reason for such error is server dropping connection before answering to request. In this scenario [1] httplib in 2.7 stdlib raises such error, which means that there's no way to handle it higher in libraries (urllib3 or requests). This was fixed in stdlib in Python 3.5 [2] by raising different exception in such case, but it still doesn't seem to be handled higher in the stack.

I've took a look at traffic between heat-engine and keystone and found these 3 problems that we have:
1. heat-engine does a lot of token creation requests during stack creation (about 30-60 requests per second);
2. Keystone (or rather Apache in front of it) eventually drops keep-alive'd connection from heat-engine;
3. There's no way for heat-engine to retry on such failure (in Python 2.7).

Re 1.: Heat doesn't seem to be properly caching token that is uses (or reusing keystoneclient session), I didn't find relevant issue in Heat upstream.

Re 2.: It seems like legit behavior, I don't see a way to work around it. My guess would be to make keep-alive connections persist longer, but the default limit is really high as it is.

Re 3.: As I understand, Heat claims to support Python 3.x for some time, but still we would need to adjust clients to handle this situation.

[0] https://github.com/kennethreitz/requests/issues/2364
[1] https://bugs.python.org/issue8450
[2] https://bugs.python.org/issue3566

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.