Problems using Juju multi-cloud controller with multiple Openstack clouds having the same region name

Bug #1999824 reported by Nikolay Vinogradov
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

Hi all,

As per [1], in order to bootstrap and use Juju controller on Openstack, user should specify the images metadata that will be used by the controller when provisioning new machines for models running on the controller.

[2] introduced the multi-cloud controller feature of Juju, that allows deployment of Juju models to multiple cloud substrates without having to bootstrap a controller per cloud. So one would expect that a single controller to deploy models on 2 or more OpenStack clouds. Apparently those Openstack clouds will have different sets of images and Juju controller should be able to properly select images for the machines depending on the cloud.

The problem that we're seeing is that it seems likely getting into situation when the metadata of images from the cloud that we add to our multi-cloud controller overwrites metadata of the previously registered Openstack cloud making it impossible to add machines to the models running on the first cloud. juju metadata set of commands also confirm that.

The [3] suggests that the root cause is that Juju uses only Region name in the image metadata, which causes the image to be overwritten if OpenStack clouds used with the controller have the same region names (RegionOne for example).

[1] https://juju.is/docs/olm/cloud-image-metadata
[2] https://discourse.charmhub.io/t/feature-highlight-multi-cloud-controller/2219
[3] https://github.com/juju/juju/blob/5edf504b654c923215c1a04abf1cf1e61262408e/provider/openstack/provider.go#L307

Tags: sts
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.9.39
status: New → Triaged
importance: Undecided → High
Changed in juju:
milestone: 2.9.39 → 2.9.40
Changed in juju:
milestone: 2.9.40 → 2.9.41
Changed in juju:
milestone: 2.9.41 → 2.9.42
Changed in juju:
milestone: 2.9.42 → 2.9.43
Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

Hi team,

Please fix this as this has been impacting us for some time and it is getting more urgent now.

Thanks.

Harry Pidcock (hpidcock)
Changed in juju:
assignee: nobody → Harry Pidcock (hpidcock)
Harry Pidcock (hpidcock)
Changed in juju:
assignee: Harry Pidcock (hpidcock) → nobody
Revision history for this message
Arno van Huyssteen (avanhuys) wrote :

Any updates here - no update since March 2nd and it's impacting customer

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Do you have any logs and output to share on this issue.

Revision history for this message
Brian Holmes (holmesb5) wrote (last edit ):
Download full text (3.3 KiB)

Hi, I'm the end customer who worked with Nikolay to identify this bug affecting my multi-cloud OpenStack environment. Here's a sanitized version of the Juju CLI output that I shared with him previously via internal e-mail:

# A new model 'site1-model' created on the same OpenStack site 'site1-cloud' as the controller
# inherits the valid Glance image ID that was provided when bootstrapping Juju in that site
holmesb5@runner1:~$ juju add-model site1-model site1-cloud/RegionOne
Added 'site1-model' model on site1-cloud/RegionOne with credential 'my-credential' for user 'admin'

holmesb5@runner1:~$ juju metadata list-images -m site1-model
Source Version Arch Region Image ID Stream Virt Type Storage Type
custom 20.04 amd64 RegionOne fbc3a64c-bd93-48d5-8e6f-cf8d9f1f55bb released

# A second new model 'site2-model' in the distinct OpenStack cloud 'site2-cloud' (which uses
# the same region name as site1-cloud) unexpectedly picks up the image ID from site1-cloud,
# which of course does not exist in site2-cloud's Glance
holmesb5@runner1:~$ juju add-model site2-model site2-cloud/RegionOne
Added 'site2-model' model on site2-cloud/RegionOne with credential 'my-credential' for user 'admin'

holmesb5@runner1:~$ juju metadata list-images -m site2-model
Source Version Arch Region Image ID Stream Virt Type Storage Type
custom 20.04 amd64 RegionOne fbc3a64c-bd93-48d5-8e6f-cf8d9f1f55bb released

# Attempt to fix this by putting a valid site2-cloud image ID in manually
holmesb5@runner1:~$ juju metadata add-image -m site2-model --series focal --region RegionOne dd63d4be-8a9e-4a64-a4db-bc7f307ece1f

holmesb5@runner1:~$ juju metadata list-images -m site2-model
Source Version Arch Region Image ID Stream Virt Type Storage Type
custom 20.04 amd64 RegionOne dd63d4be-8a9e-4a64-a4db-bc7f307ece1f released

# This overwrote the Glance image ID stored for site1-cloud, leaving *that* site with an
# invalid image reference and blocking the creation of new machines there
holmesb5@runner1:~$ juju metadata list-images -m site1-model
Source Version Arch Region Image ID Stream Virt Type Storage Type
custom 20.04 amd64 RegionOne dd63d4be-8a9e-4a64-a4db-bc7f307ece1f released

# Looking at the Juju controller's actual MongoDB content, the 'cloudimagemetadata'
# collection only accounts for the cloud's region name in each image record. The cloud's
# endpoint URL needs to be stored as well to properly disambiguate images when
# multiple clouds share a common region name.
root@juju-c37988-controller-0:/var/snap/juju-db/common# juju-db.mongo localhost:37017/juju --authenticationDatabase admin --ssl --sslAllowInvalidCertificates --username machine-0 –password xxxxxxxx

juju:PRIMARY> db.cloudimagemetadata.find({})
{ "_id" : "released:RegionOne:focal:amd64:::custom", "image_id" : "fbc3a64c-bd93-48d5-8e6f-cf8d9f1f55bb", "stream" : "released", "region" : "RegionOne", "version" : "20.04", "series" : "focal", "arch" : "amd64", "root_storage_size" : NumberLong(0), "date_created" : NumberLong("1...

Read more...

Changed in juju:
status: Triaged → Won't Fix
status: Won't Fix → Triaged
Changed in juju:
milestone: 2.9.43 → 2.9.44
Arif Ali (arif-ali)
tags: added: sts
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

This issue is caused by juju not storing image metadata with a cloud name nor a model uuid. There are a few
other bugs associated with this as well; juju metadata add-image -m <model>, isn't actually model specific etc.

Fixing this is non trivial within juju due to surrounding details and will require a juju upgrade once the fix is implemented.

I've asked for help with a POC to verify a potential work around in the mean time is possible.

@nickolay, can you please verify the exact method being using in this config to supply cloud image metadata to juju? Glance-simplestreams-sync charm? image-agent-url model config? juju metadata add-image? A different one?

Changed in juju:
milestone: 2.9.44 → 2.9.45
Changed in juju:
milestone: 2.9.45 → 2.9.46
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.