periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master is failing while executing cold migration & resize tempest tests with Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/<uuid>’: Permission denied\n'

Bug #1903033 reported by Sandeep Yadav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Martin Schuppert

Bug Description

Description:-

periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master is failing while executing cold migration & resize tempest tests with Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/<uuid>’: Permission denied\n'

Logs:-

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/a542b39/logs/undercloud/var/log/tempest/stestr_results.html.gz
~~~

tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps:-

Test: test_server_connectivity_cold_migration

~~~
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 227, in test_server_connectivity_cold_migration
    'VERIFY_RESIZE')
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestNetworkAdvancedServerOps:test_server_connectivity_cold_migration) Server 6665fc35-4c29-434e-9254-7b4a2d5976ec failed to reach VERIFY_RESIZE status and task state "None" within the required time (300 s). Current status: ACTIVE. Current task state: None.
~~~

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/a542b39/logs/overcloud-novacompute-1/var/log/containers/nova/nova-compute.log.txt.gz

~~~
Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/6665fc35-4c29-434e-9254-7b4a2d5976ec’: Permission denied\n'
~~~

Test: test_server_connectivity_resize

~~~
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 194, in test_server_connectivity_resize
    'VERIFY_RESIZE')
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestNetworkAdvancedServerOps:test_server_connectivity_resize) Server 6c855bbc-35ae-4022-87a8-5ac742f16ac5 failed to reach VERIFY_RESIZE status and task state "None" within the required time (300 s). Current status: ACTIVE. Current task state: None.
~~~

https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/a542b39/logs/overcloud-novacompute-0/var/log/containers/nova/nova-compute.log.txt.gz
~~~
Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/6c855bbc-35ae-4022-87a8-5ac742f16ac5’: Permission denied\n'
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 10259, in migrate_disk_and_power_off
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server self._remotefs.create_dir(dest, inst_base)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/volume/remotefs.py", line 94, in create_dir
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server on_completion=on_completion)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/volume/remotefs.py", line 185, in create_dir
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server on_execute=on_execute, on_completion=on_completion)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/utils.py", line 105, in ssh_execute
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server return processutils.execute(*ssh_cmd, **kwargs)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 441, in execute
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server cmd=sanitized_cmd)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Command: ssh -o BatchMode=yes 172.17.0.60 mkdir -p /var/lib/nova/instances/6c855bbc-35ae-4022-87a8-5ac742f16ac5
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Exit code: 1
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Stdout: ''
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/6c855bbc-35ae-4022-87a8-5ac742f16ac5’: Permission denied\n'
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 10079, in _error_out_instance_on_exception
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server yield
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 5580, in _resize_instance
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server timeout, retry_interval)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 10263, in migrate_disk_and_power_off
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server exception.ResizeError(reason=reason))
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server nova.exception.InstanceFaultRollback: Instance rollback performed due to: Resize error: not able to execute ssh command: Unexpected error while running command.
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Command: ssh -o BatchMode=yes 172.17.0.60 mkdir -p /var/lib/nova/instances/6c855bbc-35ae-4022-87a8-5ac742f16ac5
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Exit code: 1
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Stdout: ''
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/6c855bbc-35ae-4022-87a8-5ac742f16ac5’: Permission denied\n'
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 78, in wrapped
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server raise value
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 189, in decorated_function
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server raise value
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 159, in decorated_function
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1434, in decorated_function
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 217, in decorated_function
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server raise value
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 205, in decorated_function
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 5544, in resize_instance
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server self._revert_allocation(context, instance, migration)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server raise value
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 5541, in resize_instance
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server instance_type, clean_shutdown, request_spec)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 5599, in _resize_instance
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server request_spec)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server self.gen.throw(type, value, traceback)
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 10092, in _error_out_instance_on_exception
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server raise error.inner_exception
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server nova.exception.ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Command: ssh -o BatchMode=yes 172.17.0.60 mkdir -p /var/lib/nova/instances/6c855bbc-35ae-4022-87a8-5ac742f16ac5
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Exit code: 1
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Stdout: ''
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/6c855bbc-35ae-4022-87a8-5ac742f16ac5’: Permission denied\n'
2020-11-05 03:30:39.845 8 ERROR oslo_messaging.rpc.server
~~~

Revision history for this message
Marios Andreou (marios-b) wrote :

This is not a blocker because we have waived fs20 on master promotion (http://38.145.34.55/config/CentOS-8/master.ini ) - it is quite broken [1]

For anyone looking into this bug, the latest builds [1] - the last few days seems like we have a new/infra related issue (failing node introspection) for which I will file a new bug.

That being said, the issue reported in this bug is still happening. Some more recent examples at [2][3] from the 14th

    * Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/5581e212-2b2e-4a7c-ae9e-56734970f3c6’: Permission denied\n'

[1] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master
[2] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/bebf3e9/logs/undercloud/var/log/tempest/stestr_results.html.gz
[3] https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master/bebf3e9/logs/overcloud-novacompute-0/var/log/containers/nova/nova-compute.log.txt.gz

Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Martin Schuppert (mschuppert) wrote :

I have a reproducer for this and will check it in more detail.

The root cause is that base instance directory /var/lib/nova/instances has the wrong ownership.
It needs to be nova:nova (42436:42436) from nova user inside the container:

[root@overcloud-novacompute-0 instances]# ll /var/lib/nova/
total 0
drwxr-xr-x. 4 root root 32 Nov 19 15:14 instances

Will check on the root cause.

Changed in tripleo:
assignee: nobody → Martin Schuppert (mschuppert)
Revision history for this message
Martin Schuppert (mschuppert) wrote :

with change in Ic6f053d56194613046ae0a4a908206ebb453fcf4 run() was
removed to be triggered, as a result the script actually don't run.
I'll submit a fix as soon gerrit is up again.

tags: added: queens-backport-potential
Revision history for this message
Martin Schuppert (mschuppert) wrote :
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
yatin (yatinkarel) wrote :

Only master/queens were impacted and are fixed with https://review.opendev.org/q/I5050f198f0109faa9299de85e01b0dbe4e5a30ab.

Other branches original cherry-pick not merged yet, once merged issue will appear on those branches too, so would be better to squash the fix in those unmerged patches:- https://review.opendev.org/q/Ic6f053d56194613046ae0a4a908206ebb453fcf4

Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Revision history for this message
Martin Schuppert (mschuppert) wrote :

I have squashed the fix in the unmerged patches for victoria, ussuri and train at: https://review.opendev.org/q/Ic6f053d56194613046ae0a4a908206ebb453fcf4

Revision history for this message
yatin (yatinkarel) wrote :

All required patches merged, this can be closed. Thanks Martin.

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 14.0.0

This issue was fixed in the openstack/tripleo-heat-templates 14.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates queens-eol

This issue was fixed in the openstack/tripleo-heat-templates queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.