Please remove lxd.snap from lxd images, as it fails to seed thus failing the first boot - snapd.seeded.service waits forever (?) to have snaps seeded in LXD on s390x and arm64

Bug #1878225 reported by Balint Reczey
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Auto Package Testing
Triaged
Undecided
Unassigned
cloud-images
Invalid
Undecided
Unassigned
snapd
Triaged
Medium
Unassigned
autopkgtest (Ubuntu)
Triaged
Undecided
Unassigned
Groovy
Triaged
Undecided
Unassigned
lxd (Ubuntu)
Invalid
Undecided
Unassigned
Groovy
Invalid
Undecided
Unassigned
snapd (Ubuntu)
Invalid
Critical
Unassigned
Groovy
Invalid
Critical
Unassigned

Bug Description

lxc launch ubuntu-daily:groovy gg-test
lxc shell gg-test
root@gg-test:~# service snapd.seeded status
● snapd.seeded.service - Wait until snapd is fully seeded
     Loaded: loaded (/lib/systemd/system/snapd.seeded.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2020-05-12 14:14:52 UTC; 30min ago
   Main PID: 249 (snap)
      Tasks: 10 (limit: 4704)
     Memory: 11.3M
     CGroup: /system.slice/snapd.seeded.service
             └─249 /usr/bin/snap wait system seed.loaded

May 12 14:14:52 gg-test systemd[1]: Starting Wait until snapd is fully seeded...

root@gg-test:~# systemctl list-jobs
JOB UNIT TYPE STATE
132 systemd-update-utmp-runlevel.service start waiting
119 cloud-config.service start waiting
122 snapd.seeded.service start running
2 multi-user.target start waiting
115 cloud-init.target start waiting
1 graphical.target start waiting
138 snapd.autoimport.service start waiting
121 cloud-final.service start waiting

8 jobs listed.

root@gg-test:~# journalctl -a | pastebinit
https://paste.ubuntu.com/p/PtdcvvdKCM/

Revision history for this message
Balint Reczey (rbalint) wrote :

This prevents creating LXD autopkgtest images on the affected architectures, see latest systemd autopkgtest logs.

tags: added: rls-gg-incoming
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This looks like a problem in the seed used to create this image. Can you please attach:

/var/lib/snapd/seed/seed.yaml as well as find /var/lib/snapd/seed please?

Revision history for this message
Balint Reczey (rbalint) wrote :

# cat /var/lib/snapd/seed/seed.yaml
snaps:
  -
    name: core18
    channel: stable
    file: core18_1756.snap
  -
    name: snapd
    channel: stable
    file: snapd_7262.snap
  -
    name: lxd
    channel: stable/ubuntu-20.10
    file: lxd_14953.snap

# find /var/lib/snapd/seed/
/var/lib/snapd/seed/
/var/lib/snapd/seed/assertions
/var/lib/snapd/seed/assertions/account
/var/lib/snapd/seed/assertions/account-key
/var/lib/snapd/seed/assertions/core18_1756.assert
/var/lib/snapd/seed/assertions/lxd_14953.assert
/var/lib/snapd/seed/assertions/model
/var/lib/snapd/seed/assertions/snapd_7262.assert
/var/lib/snapd/seed/seed.yaml
/var/lib/snapd/seed/snaps
/var/lib/snapd/seed/snaps/core18_1756.snap
/var/lib/snapd/seed/snaps/lxd_14953.snap
/var/lib/snapd/seed/snaps/snapd_7262.snap

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

We've retrieved the OOPS from the ID mentioned in the log you've attached. It seems the error is

ERROR run hook "install": cannot perform operation: mount --rbind /snap /snap: Permission denied

Can you provide the output of "dmesg | grep DENIED" please? In addition, can you please add "ls -ld /snap" (mainly to check if it's a directory or something else).

Revision history for this message
Michael Vogt (mvo) wrote :

In the arm64 oops the error is:
"""
change "seed": "Initialize system state"
prerequisites: Undo
snap-setup: "snapd" (7267) ""
prepare-snap: Undoing
mount-snap: Undone
copy-snap-data: Undone
setup-profiles: Error
ERROR cannot reload udev rules: exit status 1
udev output:
Failed to send reload request: No such file or directory
"""

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

The udev error is the well-known problem that of snapd pulling udev into containers.

Revision history for this message
Johan Ehnberg (johan-ehnberg) wrote :

Have you tested this on amd64? I am seeing the same outputs on amd64 and trying to find out if it is a regression or something in my config. A similar report has suggestions it may be due to faulty network setup for the container:

https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1806070

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in snapd (Ubuntu):
status: New → Confirmed
Revision history for this message
Balint Reczey (rbalint) wrote :

# dmesg | grep DENIED | pastebinit
https://paste.ubuntu.com/p/5DXJb9RbWS/
root@clean-lacewing:~# ls -ld /snap
drwxr-xr-x 1 root root 46 May 12 13:37 /snap

It is a vanilla LXD container instance (on s390x), I have not done any changes to it to trigger the issue apart from starting it.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I believe this is really the bug that snapd cannot install "core" or "core18" inside a container without failing on udev.

The well-known workaround is to do it twice.

I think we should sit down and discuss this but I think this can only be done after core20 beta is out as we simply have no time for another topic.

Revision history for this message
Johan Ehnberg (johan-ehnberg) wrote :

OK, so for anyone hitting this error until there may be a fix: For cases where snapd is not needed (and perhaps purged later), and orchestration relies on cloud-init to finish, the workaround in the container is:

systemctl stop snapd.seeded.service

or on the host:

lxc exec $CONTAINERNAME -- systemctl stop snapd.seeded.service

tags: added: id-5ebd4319f41bed3faf85e184
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@jdstrand it sounds odd to have snapd seeding failing with apparmor denials on s390x/arm64, can you take a look?

Changed in snapd (Ubuntu):
importance: Undecided → High
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

@xnox - I took a look at the paste from Balint and all the denials seem to be coming from lxd's policy. I don't know how the autopkgtest's lxd apparmor policy is setup, but it may need adjusting. Perhaps @stgraber can comment?

Revision history for this message
Balint Reczey (rbalint) wrote :

@jdstrand It can be reproduced without autopkgtest being used. Adding the lxd package per your comment because @stgraber man not monitor snapd bugs.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Looks like a privileged container without nesting enabled. This gets some pretty strict apparmor rules to prevent trivial privilege escalation. I'm not sure that there's really much that can be done here especially considering the many issues with apparmor and its mount rules.

We allow a lot more in unprivileged containers because we don't really rely on apparmor there for security and so can relax rules quite a bit to make systemd and others happy. This relaxing makes bypass of mount rules trivial but the user namespace is the enforcement mechanism in that case and will prevent you from escaping.

Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber the simplest case failing is starting a not privileged container and snapd.seeded.service is stuck in "running".
As the plan I'd like to fix both privileged and non-privileged containers to let systemd enter non-degraded mode:
https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/tree/debian/tests/tests-in-lxd?h=ubuntu-groovy

As of now autopkgtest-build-lxd ubuntu-daily: groovy fails on multiple architectures in this test.

Revision history for this message
Stéphane Graber (stgraber) wrote :

What's the actual reproducer for this?

```
stgraber@castiana:~/data/code/lxc/lxd (stgraber/master)$ lxc launch ubuntu-daily:groovy gg-test
Creating gg-test
Starting gg-test
stgraber@castiana:~/data/code/lxc/lxd (stgraber/master)$ lxc shell gg-test
root@gg-test:~# service snapd.seeded status
● snapd.seeded.service - Wait until snapd is fully seeded
     Loaded: loaded (/lib/systemd/system/snapd.seeded.service; enabled; vendor preset: enabled)
     Active: activating (start) since Mon 2020-06-22 22:27:16 UTC; 4s ago
   Main PID: 277 (snap)
      Tasks: 6 (limit: 18806)
     Memory: 22.3M
     CGroup: /system.slice/snapd.seeded.service
             └─277 /usr/bin/snap wait system seed.loaded

Jun 22 22:27:16 gg-test systemd[1]: Starting Wait until snapd is fully seeded...
root@gg-test:~# systemctl list-jobs
No jobs running.
root@gg-test:~# service snapd.seeded status
● snapd.seeded.service - Wait until snapd is fully seeded
     Loaded: loaded (/lib/systemd/system/snapd.seeded.service; enabled; vendor preset: enabled)
     Active: active (exited) since Mon 2020-06-22 22:27:29 UTC; 6s ago
    Process: 277 ExecStart=/usr/bin/snap wait system seed.loaded (code=exited, status=0/SUCCESS)
   Main PID: 277 (code=exited, status=0/SUCCESS)

Jun 22 22:27:16 gg-test systemd[1]: Starting Wait until snapd is fully seeded...
Jun 22 22:27:29 gg-test systemd[1]: Finished Wait until snapd is fully seeded.
root@gg-test:~#
```

Changed in lxd (Ubuntu):
status: New → Incomplete
Revision history for this message
Balint Reczey (rbalint) wrote :

On s390x running 20.04 (upgraded from 18.04 and not rebooting after that) as host:

ubuntu@juju-d7a408-generic-21:~$ lxc launch ubuntu-daily:groovy gg2
Creating gg2
Starting gg2
ubuntu@juju-d7a408-generic-21:~$ lxc shell gg2
root@gg2:~# service snapd.seeded status
● snapd.seeded.service - Wait until snapd is fully seeded
     Loaded: loaded (/lib/systemd/system/snapd.seeded.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2020-06-23 12:14:41 UTC; 53s ago
   Main PID: 247 (snap)
      Tasks: 7 (limit: 4782)
     Memory: 19.7M
     CGroup: /system.slice/snapd.seeded.service
             └─247 /usr/bin/snap wait system seed.loaded

Jun 23 12:14:41 gg2 systemd[1]: Starting Wait until snapd is fully seeded...
root@gg2:~# systemctl list-jobs
JOB UNIT TYPE STATE
122 cloud-final.service start waiting
118 cloud-config.service start waiting
137 systemd-update-utmp-runlevel.service start waiting
115 cloud-init.target start waiting
123 snapd.seeded.service start running
1 graphical.target start waiting
112 snapd.autoimport.service start waiting
2 multi-user.target start waiting

8 jobs listed.
root@gg2:~# uname -a
Linux gg2 5.4.0-14-generic #17-Ubuntu SMP Thu Feb 6 22:46:43 UTC 2020 s390x s390x s390x GNU/Linux
root@gg2:~# logout
ubuntu@juju-d7a408-generic-21:~$ lxc --version
3.0.4

----

Latest systemd in groovy has tests-in-lxd autopkgtest which fails in autopkgtest-build-lxd (hence the SKIP) on the autopkgtest infra but passes locally in qemu.

I've added a bit of debugging when it fails to speed up triaging going forward in:
https://code.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/+git/systemd/+ref/ubuntu-groovy

The failure can be observed with this change at:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-groovy-rbalint-scratch/groovy/amd64/s/systemd/20200624_210145_882ea@/log.gz
...
1 loaded units listed.
JOB UNIT TYPE STATE
144 snapd.autoimport.service start waiting
2 multi-user.target start waiting
110 systemd-update-utmp-runlevel.service start waiting
1 graphical.target start waiting
124 cloud-config.service start waiting
123 cloud-final.service start waiting
108 snapd.seeded.service start running
122 cloud-init.target start waiting
...

Changed in lxd (Ubuntu):
status: Incomplete → New
Revision history for this message
Stéphane Graber (stgraber) wrote :

"""
ubuntu@juju-d7a408-generic-21:~$ lxc --version
3.0.4
"""

Ubuntu 20.04 does not contain that version of LXD, so there's something wrong with your system.

Changed in lxd (Ubuntu):
status: New → Incomplete
Balint Reczey (rbalint)
Changed in lxd (Ubuntu):
status: Incomplete → New
Revision history for this message
Balint Reczey (rbalint) wrote :

Please ignore the juju reproducer then and focus on the failure on autopkgtest infra.
The system set up by juju was upgraded from 18.04 as I wrote and the failure is not that interesting for me. I'm interested in making the systemd autopkgtest not failing.

Revision history for this message
Stéphane Graber (stgraber) wrote :

```
root@bos02-s390x-01:~# uname -a
Linux bos02-s390x-01 5.4.0-37-generic #41~18.04.1-Ubuntu SMP Mon Jun 8 13:36:31 UTC 2020 s390x s390x s390x GNU/Linux
root@bos02-s390x-01:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
root@bos02-s390x-01:~# lxc launch ubuntu-daily:groovy gg2
Creating gg2
Error: Failed instance creation: Unable to fetch https://cloud-images.ubuntu.com/daily/server/groovy/20200619/groovy-server-cloudimg-s390x-lxd.tar.xz: 404 Not Found
root@bos02-s390x-01:~# systemctl reload snap.lxd.daemon
root@bos02-s390x-01:~# lxc launch ubuntu-daily:groovy gg2
Creating gg2
Starting gg2
root@bos02-s390x-01:~# lxc shell gg2
root@gg2:~# service snapd.seeded status
● snapd.seeded.service - Wait until snapd is fully seeded
     Loaded: loaded (/lib/systemd/system/snapd.seeded.service; enabled; vendor preset: enabled)
     Active: active (exited) since Fri 2020-06-26 03:11:52 UTC; 1min 11s ago
    Process: 395 ExecStart=/usr/bin/snap wait system seed.loaded (code=exited, status=0/SUCCESS)
   Main PID: 395 (code=exited, status=0/SUCCESS)

Jun 26 03:11:43 gg2 systemd[1]: Starting Wait until snapd is fully seeded...
Jun 26 03:11:52 gg2 systemd[1]: Finished Wait until snapd is fully seeded.
root@gg2:~# systemctl list-jobs
No jobs running.
root@gg2:~# logout
root@bos02-s390x-01:~#
```

That's your reproducer running on 18.04 s390x.

Can we please get clear, reliable instructions on how to reproduce this?

Changed in lxd (Ubuntu):
status: New → Incomplete
Revision history for this message
Stéphane Graber (stgraber) wrote :

AppArmor mount rules have had a lot of issues in the past (and still do) depending on the version of kernel, the parser and the exact rule. If you want an easy way out of this, setting `raw.apparmor=mount,` on your container will almost certainly get such issues to disappear.

LXD 4.0 has a number of tweaks in the rules to workaround a bunch of those issues.
It's a trick we can do on unprivileged containers as we don't rely on apparmor for security there, for privileged containers, we don't get to do the same and so our policy is quite a bit more strict.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@snapd team

This is automated, unattended deployments, which have no ability to "stop snapd.seeding, and run it again". That's not an acceptable solution. The unit is pulled into the initial transaction, failing or stopping it, will make the transaction fail, and will fail to complete the boot correctly and finish cloud-init on first boot, which needs to install snaps and have operation snapd which has completed seeding.

It sounds like we must remove seeded snaps from our LXD images, and not install any seeded snaps inside our container. And like only install the lxd stub deb. Cause it looks like seeding snaps is not supported inside classic lxd containers.

Changed in snapd (Ubuntu):
importance: High → Critical
summary: - snapd.seeded.service waits forever (?) to have snaps seeded in LXD on
- s390x and arm64
+ Please remove lxd.snap from lxd images, as it fails to seed thus failing
+ the first boot - snapd.seeded.service waits forever (?) to have snaps
+ seeded in LXD on s390x and arm64
Changed in autopkgtest (Ubuntu):
status: New → Invalid
Changed in auto-package-testing:
status: New → Invalid
Revision history for this message
Robert C Jennings (rcj) wrote :

@xnox wrote:

> It sounds like we must remove seeded snaps from our LXD images, and not install
> any seeded snaps inside our container. And like only install the lxd stub deb.
> Cause it looks like seeding snaps is not supported inside classic lxd containers.

Let's remember that these aren't just LXD images, they are generic squashfs images from cloud-images.ubuntu.com. This raises the issue that if the generic cloud squashfs is run in any environment where snap seeding can not complete then boot is stuck and the instance is broken without cloud-init completing. This might mean no ssh keys and difficulties debugging. So I'll reiterate that this is not an acceptable solution and note that the issue is broader than even just unprivileged LXD containers. Removing all snaps from the squashfs feels like a significant lost opportunity.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

On classic, snapd.seeding must not block boot indefinately, as it prevents normal operation of the classic system and potentially access to it.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Hmm, also, so far we've not seen any reproducer of this being an issue with current LXD.
All my tests show the seeding working properly and we definitely have a bunch of users using the pre-seeded LXD in Ubuntu 20.04 images run inside containers.

Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber Do you tell that the default LXD in groovy in autopkgtest CI does not qualify as current?

Revision history for this message
Stéphane Graber (stgraber) wrote :

Ok, I'm getting really tired of this one.

Can you please get me a reproducer for this on stock (not some weird adt environment) groovy or focal please?

As in, deploy the host using a clean image, install LXD normally from the snap store and then deploy an official Ubuntu image on top of that.

So far none of the instructions showed in this bug have reproduced this issue here, trying on 18.04, 20.04 or groovy and on both amd64 and s390x.

Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber The "weird adt environment" is the autopkgtest infra we use for gating packages to prevent regressions sneaking in to the release pocket. LXD is not a .deb package but a snap so we can't gate regressions as part of the standard Ubuntu classic process, but at the moment it is somehow part of the snaps installed on images by default.

The reason for adding the tests-in-lxd test to systemd is to ensure that new systemd versions work well running LXD and being installed in LXD container at the same time. Upstream changes frequently introduce LXC specific minor regressions thus I believe this test helps making the user experience better.

If you see something in the aforementioned new test that is not expected to work, please state that. Otherwise please triage the failure that we are observing consistently on the autopkgtest infra.

The easiest way of reproducing the problem is clicking on the "♻" links on this page:
http://autopkgtest.ubuntu.com/packages/systemd/groovy/amd64

I agree with @xnox that snapd.seeding must not block boot indefinitely and I also believe that snap upstreams (LXD here) most ensure fixing problems affecting Ubuntu classic to have the privilege of being seeded on images.

Shipping LXD as a .deb would also resolve the problem it is causing here by being a snap to be seeded inside the LXC container and the autopkgtest infra could gate LXD regressions again.

Revision history for this message
Stéphane Graber (stgraber) wrote :

I've managed to reproduce this issue by reproducing the network setup of autopkgtest.

All you have to do is setup your network to allow access only through an http proxy and drop all other traffic. Do that and you'll get:

stgraber@castiana:~$ autopkgtest-build-lxd ubuntu-daily:groovy
Creating autopkgtest-prepare-yzE
Starting autopkgtest-prepare-yzE
Created symlink /<email address hidden> → /dev/null.
Timed out waiting for container to boot
stgraber@castiana:~$

Note that the autopkgtest network drops traffic, it doesn't reject it, so you're not hitting normal connection failures that can be nicely handled. As a result snapd gets stuck in seeding.

Manually fixing the autopkgtest-build-lxd script to configure snapd's proxy in the container, gets you.

stgraber@castiana:~$ autopkgtest-build-lxd ubuntu-daily:groovy
Creating autopkgtest-prepare-ybH
Starting autopkgtest-prepare-ybH
Created symlink /<email address hidden> → /dev/null.
Container finished booting. Distribution Ubuntu, release groovy, architecture amd64
Running setup script /usr/share/autopkgtest/setup-commands/setup-testbed...
sh: Attempting to set up Debian/Ubuntu apt sources automatically
sh: Distribution appears to be Ubuntu
Get:1 http://us.archive.ubuntu.com/ubuntu groovy InRelease [267 kB]
Hit:2 http://us.archive.ubuntu.com/ubuntu groovy-updates InRelease
Hit:3 http://us.archive.ubuntu.com/ubuntu groovy-security InRelease
Get:4 http://us.archive.ubuntu.com/ubuntu groovy/main Sources [841 kB]
Get:5 http://us.archive.ubuntu.com/ubuntu groovy/multiverse Sources [177 kB]
Get:6 http://us.archive.ubuntu.com/ubuntu groovy/universe Sources [9906 kB]
Get:7 http://us.archive.ubuntu.com/ubuntu groovy/restricted Sources [6476 B]
Get:8 http://us.archive.ubuntu.com/ubuntu groovy/main amd64 Packages [975 kB]
Get:9 http://us.archive.ubuntu.com/ubuntu groovy/main amd64 c-n-f Metadata [29.5 kB]
Get:10 http://us.archive.ubuntu.com/ubuntu groovy/universe amd64 Packages [8734 kB]
Get:11 http://us.archive.ubuntu.com/ubuntu groovy/universe amd64 c-n-f Metadata [267 kB]
Get:12 http://us.archive.ubuntu.com/ubuntu groovy/multiverse amd64 Packages [154 kB]
Get:13 http://us.archive.ubuntu.com/ubuntu groovy/multiverse amd64 c-n-f Metadata [9320 B]
Get:14 http://us.archive.ubuntu.com/ubuntu groovy-updates/universe amd64 c-n-f Metadata [112 B]
Get:15 http://us.archive.ubuntu.com/ubuntu groovy-updates/multiverse amd64 c-n-f Metadata [116 B]
Get:16 http://us.archive.ubuntu.com/ubuntu groovy-security/universe amd64 c-n-f Metadata [116 B]
Get:17 http://us.archive.ubuntu.com/ubuntu groovy-security/multiverse amd64 c-n-f Metadata [116 B]
Fetched 21.4 MB in 4s (4957 kB/s)

Marking all tasks invalid and re-opening autopkgtest task as that's what's broken here.

Changed in lxd (Ubuntu):
status: Incomplete → Invalid
Changed in snapd (Ubuntu):
status: Confirmed → Invalid
Changed in snapd:
status: New → Invalid
Changed in cloud-images:
status: New → Invalid
Changed in autopkgtest (Ubuntu):
status: Invalid → Triaged
Changed in auto-package-testing:
status: Invalid → Triaged
Revision history for this message
Iain Lane (laney) wrote :

Can someone provide a merge request for lp:autopkgtest-cloud please?

Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber Thanks for triaging this

@juliank @laney @johan-ehnberg I think snapd should not block cloud-init forever so I think autopkgtest nor autopkgtest infra is not at fault here and should not be fixed.
I place a workaround to systemd's autopkgtest and I hope snapd(.seeded) will be able to detect when packets are dropped and fail unblocking cloud-init.

Changed in snapd:
status: Invalid → New
Revision history for this message
Stéphane Graber (stgraber) wrote :

Go's http stack will eventually notice, but a TCP timeout can take up to 5 minutes, so the timeout in autopkgtest is nowhere near long enough to detect that.

It's also by far not a snapd-specific thing.

apt will experience the exact same issue and would also hold up boot if used through cloud-init.

The only reason why apt doesn't hold up boot here is because autopkgtest has specific logic to configure its proxy. That same logic is missing for snapd causing this issue.

So I don't think that messing with http/tcp timeouts in snapd is the right solution here.
Instead the main actionable items would be:
 - Have the autopkgtest network reject packets rather than drop them (that would have avoided this issue)
 - Have autopkgtest's proxy config logic also configure the proxy for snapd (either through unit override or through /etc/environment)

Either of those is sufficient to avoid this problem.

Revision history for this message
Balint Reczey (rbalint) wrote :

@stgraber I'd fully agree if snapd.seeded timed out after a few minutes, but I have a container where snapd.seeded.service is running for 49 days keeping the system in a starting state.

While this problem may not occur only with snapd, but the snapd package plays an important enough role in the software stack around Ubuntu to be fixed.

I agree that autopkgtest network should reject rather than drop packets because the assumption in autopkgtests is having Internet access. From https://people.debian.org/~mpitt/autopkgtest/README.package-tests.html:
...
Network access

autopkgtest needs access to the network at least for downloading test dependencies and possibly dist-upgrading testbeds. In environments with restricted internet access you need to set up an apt proxy and configure the testbed to use it. (Note that the standard tools like autopkgtest-build-lxc or mk-sbuild automatically use the apt proxy from the host system.)

In general, tests are also allowed to access the internet. As this usually makes tests less reliable, this should be kept to a minimum; but for many packages their main purpose is to interact with remote web services and thus their testing should actually cover those too, to ensure that the distribution package keeps working with their corresponding web service.

Debian's production CI infrastructure allows unrestricted network access, in Ubuntu's infrastructure access to sites other than *.ubuntu.com and *.launchpad.net happens via a proxy (limited to DNS and http/https).
....

By having a stricter network setup in Ubuntu we are set up for facing failures time to time and rejecting packets at least speed up those failures.

Autopkgtest can also be fixed to set up proxy for snaps.

tags: removed: rls-gg-incoming
Changed in snapd:
assignee: nobody → Samuele Pedroni (pedronis)
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm marking this as triaged. We should look at the seeding code to ensure it can eventually actually fail and allow systems to continue booting.

Changed in snapd:
status: New → Triaged
importance: Undecided → Medium
Changed in snapd:
assignee: Samuele Pedroni (pedronis) → nobody
Revision history for this message
Sebastian (slovdahl) wrote :

Looks like we are hit by a very similar issue (on amd64). Me and a colleague are having problems launching both ubuntu:22.04 and ubuntu:20.04 LXC containers since last week. Seems like someone else had asked about this on askubuntu.com too recently: https://askubuntu.com/questions/1484049/lxd-ubuntu-image-startup-issue-snap-lxd-activate-service-not-found. This response is by my colleague: https://askubuntu.com/a/1489496/148223. Also asked about on https://discourse.ubuntu.com/t/ubuntu-22-04-and-20-04-containers-stuck-in-system-is-booting-up-state/39470/4.

We are yet to find a way of reproducing it outside our own machines though. I tried setting up LXD on an internal server and there it works perfectly fine, so it's definitely not broken everywhere.

I'm on Ubuntu 22.04 with LXD 5.18-da72b8b.

Revision history for this message
Sebastian (slovdahl) wrote :
Revision history for this message
Sebastian (slovdahl) wrote :

Attaching the journalctl output from the host machine:

slovdahl@desk:~$ lxc launch ubuntu:22.04
Creating the instance
Instance name is: promoted-oyster
Starting promoted-oyster

Revision history for this message
Sebastian (slovdahl) wrote :

And inside the 'promoted-oyster' container:

Revision history for this message
Per Lundberg (perlun) wrote :
Download full text (3.3 KiB)

For reference, I am the colleague being referred to above. Seeing the exact same problem.

I am on Debian 12 (Bookworm) using the lxd 5.0.2-5 package provided with Debian (i.e. _not_ using it as a snap but as a normal .deb) package: https://packages.debian.org/bookworm/lxd.

I even went as far as purging and reinstalling lxd, re-running `lxd init` with the default settings. This unfortunately didn't fix the problem.

This doesn't seem to have been mentioned here previously, so posting it for completeness. It seems to be specifically the snap seeding of the lxd snap which seems to break things for me/us. Here are some logs:

root@emerging-caiman:~# journalctl -u snapd.seeded.service
Oct 18 11:02:19 emerging-caiman systemd[1]: Starting Wait until snapd is fully seeded...

snap debug gives me this:

seeded: false
seed-error: |
  cannot perform the following tasks:
  - Start snap "lxd" (24322) services (systemctl command [start snap.lxd.activate.service] failed
  with exit status 1: Job for snap.lxd.activate.service failed because the control process exited
  with error code.
  See "systemctl status snap.lxd.activate.service" and "journalctl -xeu snap.lxd.activate.service"
  for details.
  )
preseeded: true
image-preseeding: 4.005s
seed-completion: –
preseed-system-key: {
  "apparmor-features": [
    "caps",
    "dbus",
    "domain",
    "file",
    "mount",
    "namespaces",
    "network",
    "network_v8",
    "policy",
    "ptrace",
    "query",
    "rlimit",
    "signal"
  ],
  "apparmor-parser-features": [
    "cap-audit-read",
    "cap-bpf",
    "include-if-exists",
    "mqueue",
    "qipcrtr-socket",
    "snapd-internal",
    "unsafe",
    "userns",
    "xdp"
  ],
  "apparmor-parser-mtime": 1692983915,
  "build-id": "55447a37514c4a317439786251326b5f762d31392f6f6b7835704d635279724779346e4d6e78414a6a2f6d647a5247354a536e6e38616e6c31636c5954612f38496435624e72466c744770475332794967704c",
  "cgroup-version": "2",
  "nfs-home": false,
  "overlay-root": "",
  "seccomp-compiler-version": "0a51bc642597bb018aeaaeea931b5cf033bb47d9 2.5.4 c3c9b282ef3c8dfcc3124b2aeaef62f56b813bfd21f8806b30a6c9dbc2e6e58d bpf-actlog",
  "seccomp-features": [
    "allow",
    "errno",
    "kill_process",
    "kill_thread",
    "log",
    "trace",
    "trap",
    "user_notif"
  ],
  "version": 10
}
seed-restart-system-key: {
  "apparmor-features": [
    "caps",
    "domain",
    "file",
    "mount",
    "namespaces",
    "network_v8",
    "policy",
    "ptrace",
    "query",
    "rlimit",
    "signal"
  ],
  "apparmor-parser-features": [
    "cap-audit-read",
    "cap-bpf",
    "include-if-exists",
    "mqueue",
    "qipcrtr-socket",
    "snapd-internal",
    "unsafe",
    "userns",
    "xdp"
  ],
  "apparmor-parser-mtime": 1692983915,
  "build-id": "55447a37514c4a317439786251326b5f762d31392f6f6b7835704d635279724779346e4d6e78414a6a2f6d647a5247354a536e6e38616e6c31636c5954612f38496435624e72466c744770475332794967704c",
  "cgroup-version": "2",
  "nfs-home": false,
  "overlay-root": "",
  "seccomp-compiler-version": "0a51bc642597bb018aeaaeea931b5cf033bb47d9 2.5.4 c3c9b282ef3c8dfcc3124b2aeaef62f56b813bfd21f8806b30a6c9dbc2e6e58d bpf-actlog",
  "seccomp-features": [
  ...

Read more...

Revision history for this message
Sebastian (slovdahl) wrote :

Might have found the cause in our case: the kernel in use. I'm normally using Xanmod.

$ uname -a
Linux desk 6.5.7-x64v3-xanmod1 #0~20231010.gfdab4ec SMP PREEMPT_DYNAMIC Tue Oct 10 21:39:15 UTC x86_64 x86_64 x86_64 GNU/Linux

When I rebooted with the Ubuntu stock kernel everything just works.

$ uname -a
Linux desk 6.2.0-34-generic #34~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 13:12:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

So seems like non-Ubuntu kernels don't work at all with the Ubuntu upstream LXC containers that start the LXC snap by default.

Revision history for this message
John Chittum (jchittum) wrote :

if the core issue is falling to snap pre-seeding, then, yes, there is a reliance on matching kernel at build time of the squashfs and the build system.

this is because we build the images in a chroot, and have to mount in the apparmor features to the chroot. when launching a container with a "mismatched" kernel, with snapd running, it may not have the matching set of features. You can see this in the `apparmor-features` and `apparmor-parser-features` under the `preseed` and `seed-restart` keys.

unfortunately, this is a limitation in snap preseeding right now, where snapd has a hard requirement on knowing the kernel apparmor sets. here are some links:

1. the function that must be called during build the validate the seed

https://git.launchpad.net/livecd-rootfs/tree/live-build/functions?h=ubuntu/jammy#n759

2. the different kernel directories

https://git.launchpad.net/livecd-rootfs/tree/live-build/apparmor?h=ubuntu/jammy

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.