[Errno 24] Too many open files error on simple deployment with neutron

Bug #1290968 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Tatyanka

Bug Description

Steps to reproduce:
235 iso:
Ubuntu simple with murano:
network: neutron untagged vlan
1 controller + 4 compute

1. Set each network on separate interface
2. Untagg networks
3. Run network verification
4. Run deployment
5. When deployment finish with succes - run all ostf tests

Expected result:
tests are passed

Actual result:
Some instance failed to get ACTIVE state, stack in build status. Also I can no delete them. All qemu processes are run.
As soon as test are fail on step wait until instance become Active - next trace appear in nova-api
http://paste.openstack.org/show/73128/

on controller we can see that nova -api has more than 1024 opened fd (1024 set by default value for open files ulimit -n show 1024)
root@node-14:~# lsof -p 16906 | wc -l
2150

If we set for this limit value 65534 (in /etc/init/nova-api.conf)
root@node-14:~# cat /etc/init/nova-api.conf

http://paste.openstack.org/show/73168/

restart nova services and run ostf omne more time - mysql go away
2014-03-11T16:37:39.409592+00:00 debug: 2014-03-11 16:37:36.545 16906 DEBUG routes.middleware [-] Route path: '/{project_id}/flavors/:(id)', defaults: {'action': u'show', 'controller': <nova.api.openstack.wsgi.Resource object at 0x311d190>} __call__ /usr/lib/python2.7/dist-packages/routes/middleware.py:102
2014-03-11T16:37:39.409592+00:00 debug: 2014-03-11 16:37:36.546 16906 DEBUG routes.middleware [-] Match dict: {'action': u'show', 'controller': <nova.api.openstack.wsgi.Resource object at 0x311d190>, 'project_id': u'97b7d83d8c4d4af29193cdfd9568fd9b', 'id': u'7867'} __call__ /usr/lib/python2.7/dist-packages/routes/middleware.py:103
2014-03-11T16:37:39.409592+00:00 debug: 2014-03-11 16:37:36.548 16906 DEBUG nova.api.openstack.wsgi [req-db385363-e90d-477f-b22b-a7429d9c544d a8565eac62a84d5787d3e9b9e6d1aa4a 97b7d83d8c4d4af29193cdfd9568fd9b] No Content-Type provided in request get_body /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:835
2014-03-11T16:37:39.409592+00:00 debug: 2014-03-11 16:37:36.548 16906 DEBUG nova.api.openstack.wsgi [req-db385363-e90d-477f-b22b-a7429d9c544d a8565eac62a84d5787d3e9b9e6d1aa4a 97b7d83d8c4d4af29193cdfd9568fd9b] Calling method <bound method Controller.show of <nova.api.openstack.compute.flavors.Controller object at 0x311d110>> _process_stack /usr/lib/python2.7/dist-packages/nova/api/openstack/wsgi.py:962
2014-03-11T16:37:39.409592+00:00 debug: 2014-03-11 16:37:36.555 16906 WARNING nova.openstack.common.db.sqlalchemy.session [req-db385363-e90d-477f-b22b-a7429d9c544d a8565eac62a84d5787d3e9b9e6d1aa4a 97b7d83d8c4d4af29193cdfd9568fd9b] Got mysql server has gone away: (2006, 'MySQL server has gone away')

also in we run netstat - there is a lot of next rows:
tcp 0 0 10.108.1.2:8774 10.108.1.2:54348 ESTABLISHED 16906/python

Reproduced 2 times from 2 attempts
in logs controller nodes are 14 for second deployment and 11 for first attemp

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
milestone: none → 4.1.1
importance: Undecided → Medium
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

also qemu processes are not killed and after several retries create instance and then delete it - we have race condition and qemu processes eat all resources

Revision history for this message
Ryan Moe (rmoe) wrote :

I'm not able to reproduce this problem. Is there any other info you could provide to help recreate this?

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Hm :( in my deployments 100% reproduce and on our bare metal tests too.
I deploy simple env with neutron vlan on Ubuntu ( in centos all fine)
Set each network on separate eth interface, and untag
All networks in setting tab. In my deployment
I have 1 controller and 2-4 computes with 1 CPU on each and
2 gb memory . As soon as deployment finishes with success
Run ostf all test suite or use rally to create - delete several vms( with qemu)
Trigger several times this case- you can see that instance became stay
In building state and with deleting task ( that never stops)
In this moment you can see that one of the 3 nova-API processes
Hit limit for opens fd and if we go to the compute we can see a lot of qemu processes.
Ryan , I can reproduce this on Mondey and leave environment and provide access for you, if it would be helpfull

Revision history for this message
Ryan Moe (rmoe) wrote :

Those are the same steps I followed. It would be very helpful to have access to your environment on Monday.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Marking as incomplete as their is no confirmation of the issue.

Changed in fuel:
status: New → Incomplete
Changed in fuel:
milestone: 4.1.1 → 5.0
tags: added: backports-4.1.1
Changed in fuel:
status: Incomplete → Opinion
status: Opinion → Confirmed
Changed in fuel:
status: Confirmed → Incomplete
Mike Scherbakov (mihgen)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Tatyana (tatyana-leontovich)
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

can not reproduce on {"build_id": "2014-04-24_15-01-23", "mirantis": "yes", "build_number": "135", "nailgun_sha": "dcaed120b34dd2c8cf817463a243eedfb844f096", "production": "prod", "ostf_sha": "134765fcb5a07dce0cd1bb399b2290c988c3c63b", "fuelmain_sha": "387d2e931ee14d5773f21f1e2860a13b50b37d04", "astute_sha": "6e8fa4cc12968d7b468fc590b2f06bb59bf74511", "release": "5.0", "fuellib_sha": "fa57b98344e7cdd342b7c08a5e692525b102af8b"}

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.