Keystone connection issue during big heat stack creation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
fuel-ccp |
New
|
Undecided
|
Unassigned |
Bug Description
Steps to reproduce
1. Deploy ccp with configs from https:/
2. Try to run shaker (http://
Creation of heat stack always failed with error Unable to establish connection to http://
On example
2017-04-06 11:30:46.379 33032 ERROR shaker.
Changing replicas for keystone up to 10 in the topology.yaml a little bit helps and heat stack deploying successfully from time to time
Logs from all keystones pods does not show any errors.
summary: |
- Keystone connection issue during big heat stake creation + Keystone connection issue during big heat stack creation |
I've investigated this issue and here's what I've found.
From the issue in requests [0] it seemed that most likely reason for such error is server dropping connection before answering to request. In this scenario [1] httplib in 2.7 stdlib raises such error, which means that there's no way to handle it higher in libraries (urllib3 or requests). This was fixed in stdlib in Python 3.5 [2] by raising different exception in such case, but it still doesn't seem to be handled higher in the stack.
I've took a look at traffic between heat-engine and keystone and found these 3 problems that we have:
1. heat-engine does a lot of token creation requests during stack creation (about 30-60 requests per second);
2. Keystone (or rather Apache in front of it) eventually drops keep-alive'd connection from heat-engine;
3. There's no way for heat-engine to retry on such failure (in Python 2.7).
Re 1.: Heat doesn't seem to be properly caching token that is uses (or reusing keystoneclient session), I didn't find relevant issue in Heat upstream.
Re 2.: It seems like legit behavior, I don't see a way to work around it. My guess would be to make keep-alive connections persist longer, but the default limit is really high as it is.
Re 3.: As I understand, Heat claims to support Python 3.x for some time, but still we would need to adjust clients to handle this situation.
[0] https:/ /github. com/kennethreit z/requests/ issues/ 2364 /bugs.python. org/issue8450 /bugs.python. org/issue3566
[1] https:/
[2] https:/