Issues with the new CI image

Bug #1991660 reported by daniel.pawlik
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Medium
Unassigned

Bug Description

Hello,
on providing new image for CI check gates that has been created with virt-customize not Diskimage Builder [1][2], it raises issues on some test jobs, that are currently tested on [3].
By discovering what changes needs to be done for virt-customize image, later that image can be replaced by "pure" Centos 9 stream image (only Zuul SSH authorized key will be injected) with pre-tasks, that can apply all the current changes. It means, that the image customization would be simple and helps to control more the system.

In current virt-customize image, some functionality from DIB has been ported:
- simple_init element - https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/simple-init/install.d/50-simple-init#L25 and https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/simple-init/install.d/60-simple-init-remove-interfaces#L13-L14 that has been added
https://softwarefactory-project.io/r/c/config/+/26166/18/nodepool/virt_images/roles/network-config/tasks/main.yaml#13

Currently errors:

- for job: periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp_1supp-featureset064-master
```
2022-10-03 14:32:03.574994 | primary | TASK [Set eth2 public IP address] **********************************************
2022-10-03 14:32:03.575162 | primary | Monday 03 October 2022 14:32:03 +0000 (0:00:00.049) 0:04:11.661 ********
2022-10-03 14:32:11.064281 | primary | fatal: [supplemental]: FAILED! => {"changed": false, "cmd": "ip a add dev eth2 10.0.0.250/24;\nip l set eth2 up;\nip link set dev eth2 mtu 1450;\nping 10.0.0.1 -c 4 -q;\n", "delta": "0:00:06.150737", "end": "2022-10-03 10:32:10.823018", "msg": "non-zero return code", "rc": 1, "start": "2022-10-03 10:32:04.672281", "stderr": "", "stderr_lines": [], "stdout": "PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.\n\n--- 10.0.0.1 ping statistics ---\n4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3062ms\npipe 3", "stdout_lines": ["PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.", "", "--- 10.0.0.1 ping statistics ---", "4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3062ms", "pipe 3"]}
```
- for job: periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-master
```
2022-10-03 15:11:07.317795 | primary | TASK [did the deployment pass or fail?] ****************************************
2022-10-03 15:11:07.317869 | primary | Monday 03 October 2022 15:11:07 +0000 (0:00:00.050) 1:07:02.872 ********
2022-10-03 15:11:07.344113 | primary | fatal: [undercloud -> localhost]: FAILED! => {
2022-10-03 15:11:07.344176 | primary | "failed_when_result": true,
2022-10-03 15:11:07.344188 | primary | "overcloud_deploy_result": "failed"
2022-10-03 15:11:07.344198 | primary | }
```

Other jobs seems to have similar issue.
More information and logs are available in test job patch set [3].

Thanks for help,
Dan

[1] https://softwarefactory-project.io/r/c/config/+/26030
[2] https://softwarefactory-project.io/r/c/config/+/26166
[3] https://review.rdoproject.org/r/c/testproject/+/44982

Tags: alert
Changed in tripleo:
milestone: none → zed-1
importance: Undecided → Medium
tags: added: alert
description: updated
Revision history for this message
Harald Jensås (harald-jensas) wrote :
Download full text (3.9 KiB)

I am not 100% sure, but we have task:

2022-10-04 12:24:06.953686 | primary | TASK [Add eth2 interface from eth2.conf] ***************************************
2022-10-04 12:24:06.953712 | primary | Tuesday 04 October 2022 12:24:06 +0000 (0:00:00.052) 0:07:10.718 *******
2022-10-04 12:24:15.045581 | primary | changed: [undercloud]

This I belive should set 10.0.0.1/24 on eth2 on the undercloud. Looking at log[1] eth2 has no address. It is however configured in network-scripts with 10.0.0.1.

This is what the journal on the undercloud logs related to that task:

Oct 04 12:24:08 node-0003111047 ansible-ansible.legacy.command[72162]: Invoked with _raw_params=os-net-config -c /home/zuul/eth2.conf -v _uses_shell=False warn=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Oct 04 12:24:09 node-0003111047 ifdown[72172]: You are using 'ifdown' script provided by 'network-scripts', which are now deprecated.
Oct 04 12:24:09 node-0003111047 ifdown[72173]: 'network-scripts' will be removed from distribution in near future.
Oct 04 12:24:09 node-0003111047 ifdown[72174]: It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.3584] device (eth2): state change: disconnected -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Oct 04 12:24:09 node-0003111047 ifup[72231]: You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
Oct 04 12:24:09 node-0003111047 ifup[72232]: 'network-scripts' will be removed from distribution in near future.
Oct 04 12:24:09 node-0003111047 ifup[72233]: It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4719] device (eth2): carrier: link connected
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4726] device (eth2): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4745] policy: auto-activating connection 'Wired connection 2' (9db43117-3135-37cf-9ee1-8ceece888f45)
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4753] device (eth2): Activation: starting connection 'Wired connection 2' (9db43117-3135-37cf-9ee1-8ceece888f45)
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4754] device (eth2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4759] device (eth2): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4766] device (eth2): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Oct 04 12:24:09 node-0003111047 NetworkManager[789]: <info> [1664886249.4773] dhcp4 (eth2): activation: beginning transaction (timeout in 45 seconds)
Oct 04 12:24:12 node-0003111047 i...

Read more...

Revision history for this message
daniel.pawlik (daniel-pawlik) wrote :

Thanks Herald for checking.
It seems that cloud-centos-9-stream image is working much better for TripleO CI check jobs [1],
than new proposed image based on Opendev DIB image [2].
We were trying to use theirs image due instances were not booting after updating kernel in Centos 9 image.

Dan

[1] https://review.rdoproject.org/r/c/testproject/+/45470
[2] https://review.rdoproject.org/r/c/testproject/+/44982

Revision history for this message
daniel.pawlik (daniel-pawlik) wrote :

The bug can be closed.

Revision history for this message
yatin (yatinkarel) wrote :

<< It seems that cloud-centos-9-stream image is working much better for TripleO CI check jobs [1],
Yes it's glean which is also managing the interfaces with Network Manager which is causing issues in dib image. The same is not being used in cloud-centos-9-stream hence the jobs working fine with these.
May be can try to stop glean and related services in the OVB jobs, with that also it should work with DIB image?

Revision history for this message
daniel.pawlik (daniel-pawlik) wrote :
Revision history for this message
daniel.pawlik (daniel-pawlik) wrote :

After double check, it fails on same CI jobs, where cloud-centos-9-stream image is working normally.

Rabi Mishra (rabi)
Changed in tripleo:
status: New → Triaged
Revision history for this message
daniel.pawlik (daniel-pawlik) wrote :

Removing the upstream-centos-9-stream image.
https://softwarefactory-project.io/r/c/config/+/26280

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.