Snap builds mysteriously killed on arm64 only

Bug #1991162 reported by Frode Nordahl
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
New
Undecided
Unassigned

Bug Description

Hello,

We have a snap named `charm` that builds fine on all supported architectures, however, when building using a snap recipe and the launchpad buildfarm the arm64 build is somehow always stopped with the terse message: Killed.

What could be causing this, and what could we do to debug?

Excerpt:
...
[29/Sep/2022:00:56:29 +0000] "GET http://ftpmaster.internal/ubuntu/pool/main/libe/liberror-perl/liberror-perl_0.17029-1_all.deb HTTP/1.1" 200 26460 "-" "Debian APT-HTTP/1.3 (2.0.9) non-interactive"
Starting Snapcraft 7.1.3
Logging execution to '/root/.cache/snapcraft/log/snapcraft-20220929-005530.196177.log'
Running on arm64 for arm64
Initializing parts lifecycle
Executing parts lifecycle...
Executing parts lifecycle: pull charm-tools
Executing action
Executed: pull charm-tools
Executing parts lifecycle: pull patchelf
Executing action
Executed: pull patchelf
Executed parts lifecycle
Running build phase...
Killed
Build failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/lpbuildd/target/build_snap.py", line 206, in run
    self.build()
  File "/usr/lib/python3/dist-packages/lpbuildd/target/build_snap.py", line 190, in build
    self.run_build_command(["snapcraft"], cwd=output_path, env=env)
  File "/usr/lib/python3/dist-packages/lpbuildd/target/operation.py", line 46, in run_build_command
    return self.backend.run(args, cwd=cwd, env=full_env, **kwargs)
  File "/usr/lib/python3/dist-packages/lpbuildd/target/lxd.py", line 538, in run
    subprocess.check_call(cmd, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['lxc', 'exec', 'lp-jammy-arm64', '--env', 'LANG=C.UTF-8', '--env', 'SHELL=/bin/sh', '--env', 'http_proxy=http://10.10.10.1:8222/', '--env', 'https_proxy=http://10.10.10.1:8222/', '--env', 'GIT_PROXY_COMMAND=/usr/local/bin/lpbuildd-git-proxy', '--env', 'SNAPPY_STORE_NO_CDN=1', '--env', 'SNAPCRAFT_BUILD_INFO=1', '--env', 'SNAPCRAFT_IMAGE_INFO={"build-request-id": "lp-74362109", "build-request-timestamp": "2022-09-29T00:52:40Z", "build_url": "https://launchpad.net/~charm-toolers/charm-tools/+snap/charm-tools.master/+build/1895303"}', '--env', 'SNAPCRAFT_BUILD_ENVIRONMENT=host', '--env', 'SNAPCRAFT_BUILD_FOR=arm64', '--', '/bin/sh', '-c', 'cd /build/charm && linux64 snapcraft']' returned non-zero exit status 137.
Revoking proxy token...
RUN: /usr/share/launchpad-buildd/bin/in-target scan-for-processes --backend=lxd --series=jammy --arch=arm64 SNAPBUILD-1895303
Scanning for processes to kill in build SNAPBUILD-1895303

Full log: https://launchpadlibrarian.net/626170410/buildlog_snap_ubuntu_jammy_arm64_charm-tools.master_BUILDING.txt.gz
Recipe: https://launchpad.net/~charm-toolers/charm-tools/+snap/charm-tools.master

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Interestingly, just as I filed this bug, the s390x build failed for the first time in the same way. At this time of release schedule I gather the build farms would be quite busy?

The build is quiet, so I wonder if there could be some kill-the-job-if-it-has-not-said-anything-after-a-short-amount-of-time-(minutes) thing in effect here?

Revision history for this message
Jürgen Gmach (jugmac00) wrote (last edit ):

Googling for "exit code 137" suggests that you ran into an oom.

We just talked today at the infrastructure meeting about this topic in general (not your bug report), and it looks like there won't be any change soonish, also see https://portal.admin.canonical.com/C130104

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Ah, thank you for pointing out the exit code, I did not notice.

Is there a difference in flavor types used for building of charms, snaps and debs, or are they all the same?

Revision history for this message
Jürgen Gmach (jugmac00) wrote :

While I probably have to ask a colleague anyway... what do you mean with flavor type?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Ok, how much memory is provided for the launchpad instances used for snap builds on arm64?

This build being killed by oom is very surprising given how simple it is compared to other more complex builds such as ceph or the Linux kernel.

So the fact that it consistently gets into this situation on arm64 only, and once on s390x, hints of a bug somewhere. To find it I would like to know as much as possible about the environment in LP build so that I can set up a reproducer.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

PING?

Another build failure:
https://launchpad.net/~charm-toolers/charm-tools/+snap/charm-tools.master/+build/1904681/+files/buildlog_snap_ubuntu_jammy_arm64_charm-tools.master_BUILDING.txt.gz

The side effect of this is that we have to do the arm64 builds manually using instances on bos01, which is not very effective use of time.

We would very much appreciate any information you can provide to help figure out the root of the issue.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Instrumenting the build and performing a `snapcraft remote-build` reveals this:
:: + cat /proc/cpuinfo
:: processor : 0
:: BogoMIPS : 80.00
:: Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
:: CPU implementer : 0x50
:: CPU architecture: 8
:: CPU variant : 0x3
:: CPU part : 0x000
:: CPU revision : 2
::
:: processor : 1
:: BogoMIPS : 80.00
:: Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
:: CPU implementer : 0x50
:: CPU architecture: 8
:: CPU variant : 0x3
:: CPU part : 0x000
:: CPU revision : 2
::
:: processor : 2
:: BogoMIPS : 80.00
:: Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
:: CPU implementer : 0x50
:: CPU architecture: 8
:: CPU variant : 0x3
:: CPU part : 0x000
:: CPU revision : 2
::
:: processor : 3
:: BogoMIPS : 80.00
:: Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
:: CPU implementer : 0x50
:: CPU architecture: 8
:: CPU variant : 0x3
:: CPU part : 0x000
:: CPU revision : 2
::
:: + free
:: total used free shared buff/cache available
:: Mem: 8127308 246780 5841512 724 2039016 7880528
:: Swap: 0 0 0
:: + exit 1

However, doing the build manually on a similar instance, or by using the `snapcraft remote-build` command consistently succeeds, while the launchpad initiated build still consistently fails.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.