Releasing fails with latest cloud-init on image 20241113

Bug #2089185 reported by James Coleman
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Committed
High
Jacopo Rota
3.4
Triaged
High
Unassigned
3.5
Fix Committed
High
Unassigned
cloud-init
Fix Released
Unknown
cloud-init (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

Hello,

It seems a bug was introduced in the `20241113` Jammy image and now when we release a server with disk erasing, after the wipe-disks script finishes successfully, cloud-init fails and causes the server to go into a release failed state.

Searching the internet, this issue seems to align with what I'm seeing: https://github.com/canonical/cloud-init/issues/5849

See attached logs, where cloud-init fails with `failed stage modules-final`.

I reviewed the cloud-config the server was getting and noticed the following:

```
power_state:
  condition: test ! -e /tmp/block-poweroff
  delay: now
  mode: poweroff
  timeout: 3600
```

If my understanding is correct, basically cloud-init finishes running the user script then powers off. After sending the power off signal, it starts the modules-final section which then is cancelled by the shutdown.

Tags: bug-council

Related branches

Revision history for this message
James Coleman (jcoleman-lw) wrote :
Revision history for this message
Jacopo Rota (r00ta) wrote :

since this is a bug in cloud init I'm marking this as invalid

Changed in maas:
status: New → Invalid
Revision history for this message
James Coleman (jcoleman-lw) wrote :

Instead of marking invalid, can you guys work to roll back to a stable version of cloud-init? I tried reviewing the source of MaaS and there isn't a way to version lock back to `20240802` which I know is stable. So the only way for me to roll back is to mirror/modify my mirror to get rid of the invalid version of the image.

Revision history for this message
Jacopo Rota (r00ta) wrote (last edit ):

The bug is in cloud-init and in the ubuntu image, not in MAAS. This is why you can't find any reference in the MAAS codebase

Revision history for this message
Alberto Contreras (aciba) wrote :

Adding cloud-init as a target, for visibility and to let the team properly triage it.

Revision history for this message
Brett Holman (holmanb) wrote (last edit ):

Please gather the logs as described in [1] and report back.

Which image did this last succeed in?

I just linked the upstream bug and set the downstream bug to "invalid".

[1] https://docs.cloud-init.io/en/latest/howto/bugs.html#collect-logs

Changed in cloud-init (Ubuntu):
status: New → Invalid
Changed in cloud-init:
status: Unknown → New
Revision history for this message
James Coleman (jcoleman-lw) wrote :

I have attached logs before, but here is the full logs if you need it.

Revision history for this message
James Falcon (falcojr) wrote :

Is it possible to get the full cloud-init logs as specified in https://docs.cloud-init.io/en/latest/howto/bugs.html#collect-logs ? The result is a tarball that also includes the cloud-init logs. The posted 'Full logs' link does not link to the full logs.

Revision history for this message
James Falcon (falcojr) wrote (last edit ):

While GH-5849 could have the same root cause, at first glance, the code paths appear to be different. I think the cloud-init part of this issue was closed prematurely. I'm setting this to incomplete until we can get the full logs.

Changed in cloud-init (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
Brett Holman (holmanb) wrote :

> While GH-5849 could have the same root cause, at first glance, the code paths appear to be different.

Agreed.

> until we can get the full logs

A minimal reproducer is also required so that a fix can be developed and validated.

Revision history for this message
Brett Holman (holmanb) wrote :

I may have a fix for this issue. Please test the code in this PR: https://github.com/canonical/cloud-init/pull/5913/files

Changed in cloud-init:
status: New → Fix Released
Jacopo Rota (r00ta)
tags: added: bug-council
Changed in maas:
status: Invalid → Triaged
importance: Undecided → High
milestone: none → 3.6.0
Jacopo Rota (r00ta)
Changed in maas:
assignee: nobody → Jacopo Rota (r00ta)
status: Triaged → In Progress
Changed in maas:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.