ISO:{"build_id": "2014-08-11_12-45-06", "mirantis": "yes", "build_number": "169", "ostf_sha": "09b6bccf7d476771ac859bb3c76c9ebec9da9e1f", "nailgun_sha": "04ada3cd7ef14f6741a05fd5d6690260f9198095", "production": "docker", "api": "1.0", "fuelmain_sha": "43374c706b4fdce28aeb4ef11e69a53f41646740", "astute_sha": "6db5f5031b74e67b92fcac1f7998eaa296d68025", "release": "5.0.1", "fuellib_sha": "a31dbac8fff9cf6bc4cd0d23459670e34b27a9ab"}
Steps to reproduce:
1. Install OS(CentOS, neutron GRE, Murano, Savanna, Ceilometer, 3 contollers, 1 compute)
2. Wait for ~12 hours (a night for example)
3. Go to controller and execute 'nova-manage service list'
Actual result:
We can see:
root@node-1:~# nova-manage service list
Binary Host Zone Status State Updated_At
nova-conductor node-1 internal enabled :-) 2014-08-12 08:00:53
nova-consoleauth node-1 internal enabled :-) 2014-08-12 08:00:53
nova-cert node-1 internal enabled :-) 2014-08-12 08:00:53
nova-scheduler node-1 internal enabled :-) 2014-08-12 08:00:53
nova-consoleauth node-2 internal enabled XXX 2014-08-11 19:19:18
nova-conductor node-2 internal enabled XXX 2014-08-11 19:19:11
nova-scheduler node-2 internal enabled :-) 2014-08-12 08:00:48
nova-cert node-2 internal enabled XXX 2014-08-11 19:19:18
nova-conductor node-3 internal enabled XXX 2014-08-11 19:19:12
nova-scheduler node-3 internal enabled XXX 2014-08-11 19:19:16
nova-cert node-3 internal enabled XXX 2014-08-11 19:19:20
nova-consoleauth node-3 internal enabled XXX 2014-08-11 19:19:14
nova-compute node-4 nova enabled :-) 2014-08-12 08:00:48
After looking at the env a bit, I see the following:
1. Nova services are actually up, but they fail to update their status in the database, so nova-manage services list shows them as disabled
2. Nova services fail to update their state in the database because SQLAlchemy refuses to open a new db connection (http:// paste.openstack .org/show/ 93736/), as it thinks 10+30 connections are already opened (10 in the pool, 30 - overflow)
3. lsof shows that nova-conductor and other services don't have *any* open connections to MySQL
4. This looks very similar to the known SQLAlchemy issue - https:/ /bitbucket. org/zzzeek/ sqlalchemy/ issue/2772 , which was fixed in 0.8.3 (http:// docs.sqlalchemy .org/en/ latest/ changelog/ changelog_ 08.html# change- a95b0f6765fdf5e bdf844806fb2aa1 22) and we are using 0.8.2 right now
5. Restart of the services helps, but that's not an option, of course
Upgrade to the latest SQLAlchemy 0.8.x release should fix this.