tripleoci: excludes in yum/dnf conf images can break testing beds

Bug #1821575 reported by Sorin Sbarnea
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Sorin Sbarnea

Bug Description

It seems that the prepared images used on rdo or openstack-upstream can have some package excludes added to them, excludes that would invalidate our testing code because we would run with adulterated systems instead of clean ones.

While I do understand that we may need to pre-configure some stuff (like a mirror/proxy, DNS, NTP...), I do not think that any yum/dnf package excludes should *not* be part of this in any case.

One such example was found at: https://logs.rdoproject.org/26/645626/1/openstack-check/tripleo-build-containers-fedora-28/a68d963/job-output.txt.gz

msg: 'Failed to find required executable virtualenv in paths: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'

Imagine that ansible pip module failed with this error, when previous task succeeded installing `python3-virtualenv`. That was surprise!

After some digging, I was able to figure out what seems to be the cause (yumlog):
DEBUG Excludes in dnf.conf: python2-pip, python2-setuptools, python2-virtualenv, python3-pip, python3-setuptools, python3-virtualenv

This means that if packages are listed as excludes, ansible package module will return SUCCESS even if it does not install them.... Clearly there is little joy in this and I am not sure if this counts as a feature or bug of ansible or yum/dnf.

Tags: alert ci
Sorin Sbarnea (ssbarnea)
Changed in tripleo:
importance: Undecided → High
Revision history for this message
Ian Wienand (iwienand) wrote :

In infra images, this is coming from [1].

The whole history of that is long; and there's many comments in there about what's going on. The crux is that due to many setuptools/pip bugs over time, we *need* the latest versions (to be able to even deal with requirements files, but many other little bugs have crept in too). However, on centos7 era, pip and system packages do not live nicely together. That's why we install the packages, *then* overwrite with latest versions from pip, *then* put the packages on hold to stop them being removed+re-installed again, which overwrites the upstream versions and creates a big mess (yes ... this happens :)

Two things -- Fedora now does a better job of keeping pip installs and rpm installs separate (as debuntu has always done). I would welcome someone really digging into this to understand what we can do. Also, pabelanger was mentioning that there might be better ways to put the packages on hold with dnf, such that they don't appear as completely missing. If someone has ideas on that, it's welcome too.

[1] http://git.openstack.org/cgit/openstack/diskimage-builder/tree/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-install-pip#n188

Sorin Sbarnea (ssbarnea)
Changed in tripleo:
assignee: nobody → Sorin Sbarnea (ssbarnea)
milestone: none → stein-rc1
Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

It seems that the decision to hack the system is hitting us hard, at least on fedora. I discovered that even removing the excludes does not repair the missing `virtualenv-3` executable. Now I have to perform a dnf reinstall and *hope* that this will bring the executable I am looking for.

I really need the virtualenv-3 executable because I want to assure a python3 approach works and ansible pip module which creates virtualenvs does require executable name (it cannot use the mocule calling method: `python -m virtualenv`).

I am not sure how much of damage-control I will need to do in order to make it work.

I think that we are better-off if we stop altering system images. If inidividual job really needs some hacks, it should do this on its own (risk).

As removing the package-hack would be really introsive, I propose to stop doing it for fedora only initially, keeping it for centos-7.

tags: added: ci quickstart
Revision history for this message
Paul Belanger (pabelanger) wrote :

How I workaround the issue for testing:

  https://opendev.org/openstack/ansible-role-virtualenv/src/branch/master/tests/playbooks/pre.yaml#L30

Long term, I think we can switch from excludes to dnf versionlock plugin (untested).

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

Thanks Paul.

I am currently testing https://review.openstack.org/#/c/645626/ which should make the logic of picking the correct virtualenv command to use bit more flexible, so it would work on pure and adulterated systems too.

wes hayutin (weshayutin)
Changed in tripleo:
status: New → Triaged
tags: added: alert
removed: quickstart
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.