test_negative_servers in tempest is disabled because of nova bugs

Bug #1046870 reported by David Kranz
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Sean Dague
tempest
Fix Released
High
Sean Dague

Bug Description

Recently all tempest runs have been failing with some selection of errors in test_servers_negative.py and test_images.py. The n-cpu log is loaded with errors and stacktraces. Here is a sample of the most recent case but there is some flakiness and not all builds show exactly the same set of failures: http://logs.openstack.org/11936/13/check/gate-tempest-devstack-vm/10571/

Here is some output from my local run:

======================================================================
ERROR: Reboot a deleted server
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/tests/compute/test_servers_negative.py", line 127, in test_reboot_deleted_server
    self.client.wait_for_server_termination(self.server_id)
  File "/opt/stack/tempest/tempest/services/nova/json/servers_client.py", line 167, in wait_for_server_termination
    raise exceptions.BuildErrorException
BuildErrorException: Server %(server_id)s failed to build and is in ERROR status

======================================================================
ERROR: Rebuild a deleted server
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/tests/compute/test_servers_negative.py", line 144, in test_rebuild_deleted_server
    self.client.wait_for_server_termination(self.server_id)
  File "/opt/stack/tempest/tempest/services/nova/json/servers_client.py", line 167, in wait_for_server_termination
    raise exceptions.BuildErrorException
BuildErrorException: Server %(server_id)s failed to build and is in ERROR status

======================================================================
ERROR: Return error when creating an image of a server that is building
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/tests/compute/test_images.py", line 380, in tearDown
    ImagesTestBase.tearDown(self)
  File "/opt/stack/tempest/tempest/tests/compute/test_images.py", line 37, in tearDown
    self.servers_client.wait_for_server_termination(server['id'])
  File "/opt/stack/tempest/tempest/services/nova/json/servers_client.py", line 167, in wait_for_server_termination
    raise exceptions.BuildErrorException
BuildErrorException: Server %(server_id)s failed to build and is in ERROR status
Details: Server %(server_id)s failed to build and is in ERROR status
-------------------- >> begin captured logging << --------------------
tempest.common.rest_client: ERROR: Request URL: http://172.18.0.156:8774/v2/8e694b8a45df44c18fb54aabdeb48c71/servers/01defe3f-1aed-419c-8b02-43ece401a73b/action
tempest.common.rest_client: ERROR: Request Body: {"createImage": {"name": "test-snap-56810686560"}}
tempest.common.rest_client: ERROR: Response Headers: {'date': 'Thu, 06 Sep 2012 14:31:56 GMT', 'status': '409', 'content-length': '111', 'content-type': 'application/json; charset=UTF-8', 'x-compute-request-id': 'req-ba860a75-c914-4611-87e0-105c53ddf2a6'}
tempest.common.rest_client: ERROR: Response Body: {u'conflictingRequest': {u'message': u"Cannot 'createImage' while instance is in vm_state building", u'code': 409}}
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: Return error when creating an image of a server that is building
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/tests/compute/test_images.py", line 406, in tearDown
    ImagesTestBase.tearDown(self)
  File "/opt/stack/tempest/tempest/tests/compute/test_images.py", line 37, in tearDown
    self.servers_client.wait_for_server_termination(server['id'])
  File "/opt/stack/tempest/tempest/services/nova/json/servers_client.py", line 167, in wait_for_server_termination
    raise exceptions.BuildErrorException
BuildErrorException: Server %(server_id)s failed to build and is in ERROR status
Details: Server %(server_id)s failed to build and is in ERROR status
-------------------- >> begin captured logging << --------------------
tempest.common.rest_client: ERROR: Request URL: http://172.18.0.156:8774/v2/6d20fdf0dac54e0c96f52c5032aa7cdf/servers/18dacb06-d135-40a4-98a2-c75211a55fac/action
tempest.common.rest_client: ERROR: Request Body: {"createImage": {"name": "test-snap-99444707830"}}
tempest.common.rest_client: ERROR: Response Headers: {'date': 'Thu, 06 Sep 2012 14:34:58 GMT', 'status': '409', 'content-length': '111', 'content-type': 'application/json; charset=UTF-8', 'x-compute-request-id': 'req-d130e4ad-6c90-416b-8624-571ca2e6f9ab'}
tempest.common.rest_client: ERROR: Response Body: {u'conflictingRequest': {u'message': u"Cannot 'createImage' while instance is in vm_state building", u'code': 409}}
--------------------- >> end captured logging << ---------------------

-------

Tags: tempest
Revision history for this message
Russell Bryant (russellb) wrote :

Can you narrow this down a bit? Which specific tests are failing? What do those tests do? What does compute.log have in it? You mentioned stack traces, can you share them?

Changed in nova:
status: New → Incomplete
Revision history for this message
David Kranz (david-kranz) wrote :

Please follow the link in the description and go to the bottom of the console log. You will see the errors. Note that the current
versions of test_images.py and test_servers_negative.py have skips. The problem seems to be more general than one particular test. If you follow the link and go into the logs directory you can see the stacktraces. Like in

http://logs.openstack.org/11936/13/check/gate-tempest-devstack-vm/10571/logs/screen-n-cpu.txt

Thierry Carrez (ttx)
Changed in nova:
status: Incomplete → New
Revision history for this message
Thierry Carrez (ttx) wrote :
Download full text (3.5 KiB)

Looks like all errors are due to the same cause:

2012-09-06 06:00:56 TRACE nova.compute.manager Traceback (most recent call last):
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 167, in decorated_function
2012-09-06 06:00:56 TRACE nova.compute.manager return function(self, context, *args, **kwargs)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 202, in decorated_function
2012-09-06 06:00:56 TRACE nova.compute.manager kwargs['instance']['uuid'], e, sys.exc_info())
2012-09-06 06:00:56 TRACE nova.compute.manager File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-09-06 06:00:56 TRACE nova.compute.manager self.gen.next()
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 196, in decorated_function
2012-09-06 06:00:56 TRACE nova.compute.manager return function(self, context, *args, **kwargs)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 816, in run_instance
2012-09-06 06:00:56 TRACE nova.compute.manager do_run_instance()
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/utils.py", line 751, in inner
2012-09-06 06:00:56 TRACE nova.compute.manager retval = f(*args, **kwargs)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 815, in do_run_instance
2012-09-06 06:00:56 TRACE nova.compute.manager admin_password, is_first_time, instance)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 512, in _run_instance
2012-09-06 06:00:56 TRACE nova.compute.manager self._set_instance_error_state(context, instance['uuid'])
2012-09-06 06:00:56 TRACE nova.compute.manager File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-09-06 06:00:56 TRACE nova.compute.manager self.gen.next()
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 477, in _run_instance
2012-09-06 06:00:56 TRACE nova.compute.manager self._start_building(context, instance)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 687, in _start_building
2012-09-06 06:00:56 TRACE nova.compute.manager None))
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/compute/manager.py", line 254, in _instance_update
2012-09-06 06:00:56 TRACE nova.compute.manager context, instance_uuid, kwargs)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/db/api.py", line 675, in instance_update_and_get_original
2012-09-06 06:00:56 TRACE nova.compute.manager values)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 130, in wrapper
2012-09-06 06:00:56 TRACE nova.compute.manager return f(*args, **kwargs)
2012-09-06 06:00:56 TRACE nova.compute.manager File "/opt/stack/nova/nova/db/sqlalchemy/api.py", line 1768, in instance_update_and_get_original
2012-09-06 06:00:56 TRAC...

Read more...

Changed in nova:
importance: Undecided → High
status: New → Confirmed
summary: - Tempest showing nova compute failures
+ expecting ('scheduling', None) but the actual state is deleting (was:
+ Tempest showing compute fails)
Revision history for this message
Thierry Carrez (ttx) wrote : Re: expecting ('scheduling', None) but the actual state is deleting (was: Tempest showing compute fails)

Seems to no longer happen in recent runs...

Changed in nova:
importance: High → Undecided
status: Confirmed → Incomplete
Revision history for this message
David Kranz (david-kranz) wrote :

Thierry, maybe you missed the bit in comment #2 that these tests are now skipped in tempest so that they do not continue to fail and block other checkins. I tried enabling them again and the same failures happen.

Revision history for this message
David Kranz (david-kranz) wrote :

Even with these tests commented out the n-cpu log is strewn with ERRORs and backtraces, some of them on the scary side at least to me. Is any one concerned about this?

Revision history for this message
Nithya Ganesan (nithya-ganesan) wrote :

Hi David,

From the last comment I understand that the errors seen are not related to the tests that are commented. I see the tempest tests to be skipped even in top of the tree.

Would it make sense to enable them now. If you are fine with enabling the tests, I will raise a bug in launchpad and get the tests enabled.

Please let me know.

Regards
Nithya

Thierry Carrez (ttx)
Changed in nova:
status: Incomplete → New
Sean Dague (sdague)
Changed in nova:
importance: Undecided → Medium
assignee: nobody → Sean Dague (sdague-b)
tags: added: tempest
Changed in nova:
status: New → Triaged
Sean Dague (sdague)
Changed in nova:
assignee: Sean Dague (sdague-b) → nobody
assignee: nobody → Sean Dague (sdague-b)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/23199

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/23203

Sean Dague (sdague)
summary: - expecting ('scheduling', None) but the actual state is deleting (was:
- Tempest showing compute fails)
+ test_negative_servers in tempest is disabled because of nova bugs
Changed in tempest:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Sean Dague (sdague-b)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/23199
Committed: http://github.com/openstack/nova/commit/d6999fbe8c545292a98c4e3d20f3b80e3e66d901
Submitter: Jenkins
Branch: master

commit d6999fbe8c545292a98c4e3d20f3b80e3e66d901
Author: Sean Dague <email address hidden>
Date: Thu Feb 28 14:58:42 2013 -0500

    don't stack trace if long ints are passed to db

    If we pass a integer, which is longer than the id field in the
    database, but a valid python integer into any of the REST API
    calls SQLalchemy throws a DataError, and the user sees a 500
    instead of InstanceNotFound. This can be triggered by using a
    id of 9223372036854775808 (for instance).

    This issue is seen on postgresql, mysql and sqlite don't hit it.

    Instead, catch the DataError exception, and return an InvalidID
    exception. If InvalidID is thrown catch that at the API layer
    and return the expected not found.

    This was caught when reenabling tempest tests for bug #1046870
    and is one of serveral patches in nova needed to enable
    test_servers_negative.py in tempest.

    Change-Id: I6130863a95567eabd3ae05a09e4a9f26bb5f5098

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/23203
Committed: http://github.com/openstack/nova/commit/1c6cc18b0cef25bfff8566c2a646cd8bcbbe132a
Submitter: Jenkins
Branch: master

commit 1c6cc18b0cef25bfff8566c2a646cd8bcbbe132a
Author: Sean Dague <email address hidden>
Date: Thu Feb 28 15:32:33 2013 -0500

    validate security_groups on server create

    security_groups were getting passed all the way into the database
    without ever checking for their existance. This allowed for crudding
    up the db with bad data, as well as succeeding when an admin typoed
    a security group, and getting no active security group.

    Also, security_group is actually an array, which is not implied by
    it being a singular variable. Make it a plural variable to cause
    less confusion in the future.

    And given that we multiply change security_groups from None =>
    ['default'], get rid of a utility function that serves no purpose
    to handle the None or string cases.

    To test this correctly, stub out the db call, so we manage to get
    all the way down to the bottom of the stack sanely. Also ensure that
    the negative case gets us the expected HTTPBadRequest.

    Partial fix for nova bugs needed to reenable test_servers_negative
    in Tempest. Fixes bug #1046870

    Change-Id: I5b477f7bc1992abb47e014f137b8010cc76b2113

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/23110
Committed: http://github.com/openstack/tempest/commit/e623f75ca687db6ada2efded3509d7be1bf738fe
Submitter: Jenkins
Branch: master

commit e623f75ca687db6ada2efded3509d7be1bf738fe
Author: Sean Dague <email address hidden>
Date: Wed Feb 27 14:52:15 2013 -0500

    enable test_servers_negative

    this was disabled when we were seeing flakey fails during Folsom
    release, however all but one of these tests current pass. And the
    failure is a new nova bug. Pushing to CI to verify that bug on both
    databases before providing the fixes to nova (of which there will
    be at least 2)

    fix xml clients for negative tests

    the xml clients were failing additional tests because the clients
    didn't support all the features as the json tests. This enables
    security_groups and networks in the xml client for create.

    It also skips the test_create_numeric_server_name as that can't be
    tested in xml, it always turns it into a string.

    Fixes bug #1046870

    Change-Id: I8a3a0bef9b1d134da369242a0cf14c3dcb61c6de

Changed in tempest:
status: In Progress → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-rc1 → 2013.1
Sean Dague (sdague)
Changed in tempest:
milestone: none → havana-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.