rdo ovb featureset001 fails baremetal provisioning Permission denied: '/var/lib/ironic/images/

Bug #1907272 reported by Marios Andreou
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

At [1][2][3][4] in third-party check the tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001 and tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost jobs are failing during the overcloud deploy with error during baremetal node provisioning - trace looks like:

        * 2020-12-08 11:27:16 | 2020-12-08 11:27:16.298192 | fa163ecb-4831-0f29-ca02-000000000017 | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Created port overcloud-controller-0-ctlplane (UUID a5d1ef30-f342-4461-a79d-be07c161b318) for node baremetal-765835-5-38436-3 (UUID 71018750-8224-4a3b-a05b-48a936d30c7c) with {'network_id': 'eb0f1c97-6068-4ed1-8e61-7ae103c522aa', 'name': 'overcloud-controller-0-ctlplane'}\nCreated port overcloud-novacompute-0-ctlplane (UUID 5dfba319-a36f-407b-a129-0532cdadb904) for node baremetal-765835-5-38436-2 (UUID 443f72e7-a302-48ed-a309-5bb4136f55d9) with {'network_id': 'eb0f1c97-6068-4ed1-8e61-7ae103c522aa', 'name': 'overcloud-novacompute-0-ctlplane'}\nCreated port overcloud-controller-2-ctlplane (UUID 40276e99-0c0d-4725-a5d9-7eb13ab93fcb) for node baremetal-765835-5-38436-1 (UUID 4ef3a8db-7f3d-436c-a67f-0234404704f6) with {'network_id': 'eb0f1c97-6068-4ed1-8e61-7ae103c522aa', 'name': 'overcloud-controller-2-ctlplane'}\nCreated port overcloud-controller-1-ctlplane (UUID a84aa7f3-2d90-46d8-918f-706a08408326) for node baremetal-765835-5-38436-0 (UUID a8be5cdb-0cc0-4955-8c61-945865104ec0) with {'network_id': 'eb0f1c97-6068-4ed1-8e61-7ae103c522aa', 'name': 'overcloud-controller-1-ctlplane'}\nAttached port overcloud-controller-0-ctlplane (UUID a5d1ef30-f342-4461-a79d-be07c161b318) to node baremetal-765835-5-38436-3 (UUID 71018750-8224-4a3b-a05b-48a936d30c7c)\nAttached port overcloud-controller-1-ctlplane (UUID a84aa7f3-2d90-46d8-918f-706a08408326) to node baremetal-765835-5-38436-0 (UUID a8be5cdb-0cc0-4955-8c61-945865104ec0)\nAttached port overcloud-novacompute-0-ctlplane (UUID 5dfba319-a36f-407b-a129-0532cdadb904) to node baremetal-765835-5-38436-2 (UUID 443f72e7-a302-48ed-a309-5bb4136f55d9)\nAttached port overcloud-controller-2-ctlplane (UUID 40276e99-0c0d-4725-a5d9-7eb13ab93fcb) to node baremetal-765835-5-38436-1 (UUID 4ef3a8db-7f3d-436c-a67f-0234404704f6)\nProvisioning started on node baremetal-765835-5-38436-3 (UUID 71018750-8224-4a3b-a05b-48a936d30c7c)\nProvisioning started on node baremetal-765835-5-38436-2 (UUID 443f72e7-a302-48ed-a309-5bb4136f55d9)\nProvisioning started on node baremetal-765835-5-38436-0 (UUID a8be5cdb-0cc0-4955-8c61-945865104ec0)\nProvisioning started on node baremetal-765835-5-38436-1 (UUID 4ef3a8db-7f3d-436c-a67f-0234404704f6)\n", "msg": "Node 71018750-8224-4a3b-a05b-48a936d30c7c reached failure state \"deploy failed\"; the last error is Failed to prepare to deploy. Exception: [Errno 13] Permission denied: '/var/lib/ironic/images/71018750-8224-4a3b-a05b-48a936d30c7c'"}

Third party doesn't vote so this is not blocking the gates.

[1] https://logserver.rdoproject.org/35/765835/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001/98731b6/logs/undercloud/home/zuul/overcloud_deploy.log.gz
[2] https://logserver.rdoproject.org/35/765835/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/1539f38/logs/undercloud/home/zuul/overcloud_deploy.log.gz
[3] https://logserver.rdoproject.org/34/765834/7/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001/691c31b/logs/undercloud/home/zuul/overcloud_deploy.log.gz
[4] https://logserver.rdoproject.org/46/765746/1/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/4fe6998/logs/undercloud/home/zuul/overcloud_deploy.log.gz

Revision history for this message
wes hayutin (weshayutin) wrote :

adding promotion-blocker so we can track the issue in degraded.

tags: added: promotion-blocker
Revision history for this message
wes hayutin (weshayutin) wrote :
Download full text (3.7 KiB)

https://logserver.rdoproject.org/35/765835/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001/98731b6/logs/undercloud/var/log/containers/ironic/

2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils [req-6f92d89b-4e04-4f22-824d-f7a9b941cdfa e8f063ef807645e9b737bd99b943fc2c 7a16516f899a4055ae35ab010cbe99ca - default default] Unexpected error while preparing to deploy to node 9e7e75e9-13f9-4d72-a13e-ae95217c8d70: PermissionError: [Errno 13] Permission denied: '/var/lib/ironic/images/9e7e75e9-13f9-4d72-a13e-ae95217c8d70'
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils Traceback (most recent call last):
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/deployments.py", line 165, in do_node_deploy
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils task.driver.deploy.prepare(task)
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils return f(*args, **kwargs)
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/agent.py", line 635, in prepare
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils _update_instance_info()
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/agent.py", line 555, in _update_instance_info
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils deploy_utils.build_instance_info_for_deploy(task))
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 1107, in build_instance_info_for_deploy
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils _cache_and_convert_image(task, instance_info)
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 1000, in _cache_and_convert_image
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils force_raw=force_raw)
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 928, in cache_instance_image
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils fileutils.ensure_tree(_get_image_dir_path(node.uuid))
2020-12-08 11:49:28.713 8 ERROR ironic.conductor.utils File "...

Read more...

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Harald Jensås (harald-jensas) wrote :

So, I just reproduces this issue locally.
2020-12-10 09:55:08.910489 | fa163e86-b65a-e367-7890-000000000017 | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Created port compute-0-ctlplane (UUID 8dc41646-695f-4fe3-bdbf-ef704270aa01) for node baremetal-69096-leaf2-0 (UUID f4f581bb-5f20-4857-ad0c-f0a2e2a3fdfa) with {'network_id': '7575929f-4388-4011-ab4e-275c5264bd87', 'name': 'compute-0-ctlplane'}\nCreated port controller-0-ctlplane (UUID 717b8e89-320f-4a2d-b3f0-2a4cc6d2609d) for node baremetal-69096-leaf1-0 (UUID 16cdeeb7-15bd-4cee-9bd2-f8a2fd728d76) with {'network_id': '7575929f-4388-4011-ab4e-275c5264bd87', 'name': 'controller-0-ctlplane'}\nAttached port compute-0-ctlplane (UUID 8dc41646-695f-4fe3-bdbf-ef704270aa01) to node baremetal-69096-leaf2-0 (UUID f4f581bb-5f20-4857-ad0c-f0a2e2a3fdfa)\nAttached port controller-0-ctlplane (UUID 717b8e89-320f-4a2d-b3f0-2a4cc6d2609d) to node baremetal-69096-leaf1-0 (UUID 16cdeeb7-15bd-4cee-9bd2-f8a2fd728d76)\nProvisioning started on node baremetal-69096-leaf2-0 (UUID f4f581bb-5f20-4857-ad0c-f0a2e2a3fdfa)\nProvisioning started on node baremetal-69096-leaf1-0 (UUID 16cdeeb7-15bd-4cee-9bd2-f8a2fd728d76)\n", "msg": "Node f4f581bb-5f20-4857-ad0c-f0a2e2a3fdfa reached failure state \"deploy failed\"; the last error is Failed to prepare to deploy. Exception: [Errno 13] Permission denied: '/var/lib/ironic/images/f4f581bb-5f20-4857-ad0c-f0a2e2a3fdfa'"

(undercloud) [centos@undercloud ~]$ ls -l /var/lib/ironic
total 0
drwxr-xr-x. 2 42422 42422 86 Dec 9 17:53 httpboot
drwxrwxr-x. 2 root root 91 Dec 9 17:53 images
drwxr-xr-x. 3 42422 42422 133 Dec 9 17:29 tftpboot

(undercloud) [centos@undercloud ~]$ ls -l /var/lib/ironic/images
total 3495876
-rw-r--r--. 1 root root 53915577 Dec 9 17:53 overcloud-full.initrd
-rw-r--r--. 1 root root 3573940224 Dec 9 17:53 overcloud-full.raw
-rwxr-xr-x. 1 root root 9514120 Dec 9 17:53 overcloud-full.vmlinuz

(undercloud) [centos@undercloud ~]$ podman exec -it ironic_conductor
Error: no container with name or ID ironic_conductor found: no such container
(undercloud) [centos@undercloud ~]$ sudo podman exec -it ironic_conductor /bin/bash
bash-4.4$ mkdir /var/lib/ironic/images/test
mkdir: cannot create directory ‘/var/lib/ironic/images/test’: Permission denied
bash-4.4$

bash-4.4$ ls -l /var/lib/ironic
total 0
drwxr-xr-x. 2 ironic ironic 86 Dec 9 17:53 httpboot
drwxrwxr-x. 2 root root 91 Dec 9 17:53 images
drwxr-xr-x. 3 ironic ironic 133 Dec 9 17:29 tftpboot

bash-4.4$ ls -ln /var/lib/ironic
total 0
drwxr-xr-x. 2 42422 42422 86 Dec 9 17:53 httpboot
drwxrwxr-x. 2 0 0 91 Dec 9 17:53 images
drwxr-xr-x. 3 42422 42422 133 Dec 9 17:29 tftpboot

  ^^ If I am not mistaken in previous versions the "images" directory used to be owned by "ironic" when listing it from inside the container. This is no longer the case.

podman-2.0.5-5.module_el8.3.0+512+b3b58dca.x86_64

Revision history for this message
Rabi Mishra (rabi) wrote :

I noticed the same issue when deploying locally from master.

Noticed that Steve has proposed a patch[1] which did not work for me and I proposed an alternate solution[2] which worked for me.

[1] https://review.opendev.org/c/openstack/python-tripleoclient/+/766126
[2] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/766404

Revision history for this message
Harald Jensås (harald-jensas) wrote :

      kolla_config:
        /var/lib/kolla/config_files/ironic_conductor.json:
          command: /usr/bin/ironic-conductor
          config_files:
            - source: "/var/lib/kolla/config_files/src/*"
              dest: "/"
              merge: true
              preserve_properties: true
          permissions:
            - path: /var/lib/ironic
              owner: ironic:ironic
              recurse: true
            - path: /var/log/ironic
              owner: ironic:ironic
              recurse: true

Is that supposed to make everything owned by ironic in that container mounted volume?

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Oliver Walsh (owalsh) wrote :
Revision history for this message
Alex Schultz (alex-schultz) wrote :

/var/lib/ironic/images should be created via host_prep_tasks if it's required for a container

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
wes hayutin (weshayutin)
Changed in tripleo:
status: Fix Released → In Progress
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

hitting the same issue locally, even using "file://" href:

metalsmith-0.yaml
---
- name: Controller
  count: 1
  defaults:
    image:
      href: "file:///var/lib/ironic/images/overcloud-full.raw"
      kernel: "file:///var/lib/ironic/images/overcloud-full.vmlinuz"
      ramdisk: "file:///var/lib/ironic/images/overcloud-full.initrd"
  instances:
    - hostname: oc0-controller-0
      name: oc0-controller-0

Script:
#!/bin/bash
# This file is managed by ansible
set -xeo pipefail

export PROVISION_OUTPUT=overcloud-baremetal-deployed-0.yaml
export PROVISION_STACK=overcloud-0
export PROVISION_USER=None
export PROVISION_KEY=None
export PROVISION_CONCURRENCY=None
export PROVISION_TIMEOUT=None
source /home/stack/stackrc; openstack overcloud node provision -o $PROVISION_OUTPUT --stack $PROVISION_STACK ~/metalsmith-0.yaml >/home/stack/overcloud_node_provision.log 2>&1

[...]
2020-12-17 08:15:39.397852 | 24420175-467f-3696-1e0e-000000000017 | FATAL | Provision instances | localhost | error={"changed": false, "logging": "Created port oc0-controller-0-ctlplane (UUID 54f64e6e-6ea2-4f5e-8e5b-1b58af748d50) for node oc0-controller-0 (UUID bfa6f708-eb5b-4dd6-b015-9f26d85378c8) with {'network_id': '68159e2c-9d0e-424f-9d3f-0f0baec0af3a', 'name': 'oc0-controller-0-ctlplane'}\nAttached port oc0-controller-0-ctlplane (UUID 54f64e6e-6ea2-4f5e-8e5b-1b58af748d50) to node oc0-controller-0 (UUID bfa6f708-eb5b-4dd6-b015-9f26d85378c8)\nProvisioning started on node oc0-controller-0 (UUID bfa6f708-eb5b-4dd6-b015-9f26d85378c8)\n", "msg": "Node bfa6f708-eb5b-4dd6-b015-9f26d85378c8 reached failure state \"deploy failed\"; the last error is Failed to prepare to deploy. Exception: [Errno 13] Permission denied: '/var/lib/ironic/images/bfa6f708-eb5b-4dd6-b015-9f26d85378c8'"}

ls -lZd /var/lib/ironic/
drwxr-xr-x. 7 42422 42422 system_u:object_r:container_file_t:s0 87 Dec 17 08:15 /var/lib/ironic/
ls -lZ /var/lib/ironic/images
total 3628356
-rw-r--r--. 1 root root unconfined_u:object_r:container_file_t:s0 61642348 Dec 17 07:50 overcloud-full.initrd
-rw-r--r--. 1 root root unconfined_u:object_r:container_file_t:s0 4667932672 Dec 17 07:50 overcloud-full.raw
-rwxr-xr-x. 1 root root unconfined_u:object_r:container_file_t:s0 9514120 Dec 17 07:50 overcloud-full.vmlinuz

IIRC, using the "file://" href bypasses ironic - so their might be something else. Does it need "sudo" at some point? Or a missing "become" in some ansible parts?

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

OK - can confirm Rabi's patch is doing the job properly, I could deploy everything. So just a matter of waiting for a promotion apparently.

Revision history for this message
wes hayutin (weshayutin) wrote :

@Cedric, what branch were you using? I would assume via your new tool you can be more definitive w/ regards to where the patch is :) lolz

Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 14.0.0

This issue was fixed in the openstack/tripleo-heat-templates 14.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.