The dump file parsing issue arises from structural changes in Linux kernel 6.2

Bug #2038249 reported by Chengen Du
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
crash (Ubuntu)
Incomplete
Medium
Chengen Du
Jammy
In Progress
Medium
Chengen Du
Lunar
In Progress
Medium
Chengen Du
Mantic
In Progress
Medium
Chengen Du

Bug Description

[Impact]
Linux kernel 6.2 includes patches with structural changes that may render the crash utility unable to parse the dump file.
==========
d122019bf061 mm: Split slab into its own type
401fb12c68c2 mm: Differentiate struct slab fields by sl*b implementations
07f910f9b729 mm: Remove slab from struct page
0d9b1ffefabe arm64: mm: make vabits_actual a build time constant if possible
e36ce448a08d mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
130d4df57390 mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head
ac3b43283923 module: replace module_layout with module_memory
b69f0aeb0689 pid: Replace struct pid 1-element array with flex-array
==========

[Fix]
It is advisable to adopt commits that address the structural changes issue.
==========

In 8.0.1:
- 14f8c460473c memory: Handle struct slab changes on Linux 5.17-rc1 and later
- 5f390ed811b0 Fix for "kmem -s|-S" and "bt -F[F]" on Linux 5.17-rc1
- b89f9ccf511a Fix for "kmem -s|-S" on Linux 5.17+ with CONFIG_SLAB

In 8.0.2:
- f02c8e87fccb arm64: use TCR_EL1_T1SZ to get the correct info if vabits_actual is missing

In 8.0.3:
- d83df2fb66cd SLUB: Fix for offset change of struct slab members on Linux 6.2-rc1
- df1f0cba729f x86_64: Fix for move of per-cpu variables into struct pcpu_hot
- 120d6e89fc14 SLAB: Fix for "kmem -s|-S" options on Linux 6.1 and later
- ac96e17d1de5 SLAB: Fix for "kmem -s|-S" options on Linux 6.2-rc1 and later

In 8.0.3++ (8.0.4 development)
- 7750e61fdb2a Support module memory layout change on Linux 6.4
- 88580068b7dd Fix failure of gathering task table on Linux 6.5-rc1 and later
- 4ee56105881d Fix compilation error due to new strlcpy function that glibc added

==========

[Test Plan]
1. Install the required packages and then proceed to reboot the machine.
# sudo apt install crash linux-crashdump -y
# reboot
2. To check the status of kdump, use the `kdump-config show` command.
# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x64000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-6.2.0-33-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-6.2.0-33-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-6.2.0-33-generic root=UUID=3e72f5d5-870b-4b8e-9a0d-8ba920391379 ro console=tty1 console=ttyS0 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll usbcore.nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
3. To trigger a crash dump forcefully, execute the `echo c | sudo tee /proc/sysrq-trigger` command.
4. Download the kernel .ddeb file, which will be used for analyzing the dump file.
# sudo -i
# cd /var/crash
# pull-lp-ddebs linux-image-unsigned-$(uname -r)
# dpkg-deb -x linux-image-unsigned-$(uname -r)-*.ddeb dbgsym-$(uname -r)
5. Utilize the "crash" utility to parse and analyze the dump file.
crash 8.0.0
Copyright (C) 2002-2021 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2021 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
Copyright (C) 2015, 2021 VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

WARNING: VA_BITS: calculated: 46 vmcoreinfo: 48
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...

crash: seek error: kernel virtual address: ffffd59a92d48ae8 type: "possible"
WARNING: cannot read cpu_possible_map
crash: seek error: kernel virtual address: ffffd59a92d48b68 type: "present"
WARNING: cannot read cpu_present_map
crash: seek error: kernel virtual address: ffffd59a92d48aa8 type: "online"
WARNING: cannot read cpu_online_map
crash: seek error: kernel virtual address: ffffd59a92d48bb0 type: "active"
WARNING: cannot read cpu_active_map
crash: seek error: kernel virtual address: ffffd59a93288928 type: "shadow_timekeeper xtime_sec"
crash: seek error: kernel virtual address: ffffd59a9317b8f0 type: "init_uts_ns"
crash: dbgsym-6 and 202309251539/dump.202309251539 do not match!

Usage:

  crash [OPTION]... NAMELIST MEMORY-IMAGE[@ADDRESS] (dumpfile form)
  crash [OPTION]... [NAMELIST] (live system form)

Enter "crash -h" for details.

[Where problems could occur]
Significant structural changes have occurred between Linux kernel versions 5.15 and 6.2.
We are only incorporating patches to ensure the functionality of the "crash" command.
However, please be aware that these patches will alter the parsing logic and could potentially result in the "crash" utility being unable to parse the dump file in the worst-case scenario.

Tags: patch
Chengen Du (chengendu)
Changed in crash (Ubuntu Lunar):
assignee: nobody → Chengen Du (chengendu)
Changed in crash (Ubuntu Mantic):
assignee: nobody → Chengen Du (chengendu)
Changed in crash (Ubuntu Lunar):
status: New → In Progress
Changed in crash (Ubuntu Mantic):
status: New → In Progress
Revision history for this message
Chengen Du (chengendu) wrote :

debdiff for Lunar

Revision history for this message
Chengen Du (chengendu) wrote :

debdiff for Mantic

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lp2038249-crash-lunar.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Marking Jammy as affected due to the 6.2+ HWE kernel from Lunar (and later).

Changed in crash (Ubuntu Jammy):
status: New → In Progress
assignee: nobody → Chengen Du (chengendu)
importance: Undecided → Medium
Changed in crash (Ubuntu Lunar):
importance: Undecided → Medium
Changed in crash (Ubuntu Mantic):
importance: Undecided → Medium
Revision history for this message
Heitor Alves de Siqueira (halves) wrote :

As this will need fixing in the development release, I'll re-subscribe the ubuntu-sponsors team. After this has been fixed there, we can take a look at the SRU for the other stable releases.

Thanks, Chengen!

Revision history for this message
Paride Legovini (paride) wrote :

Hi, so AIUI then plan here is the following:

- Wait for Noble to be open for development;
- You'll prepare a Noble debdiff to be sponsored;
- Once fixed in Noble, we'll move to review/sponsor the SRUs.

Given that Noble is still frozen, and no Noble debdiff is present, I take there is nothing to sponsor here for now.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Updating the bug description with the crash versions/tags which _contains_ the commit IDs.

Verified with the provided commit list:

$ cat <<EOF | while read id subject; do echo; echo $id $subject; git describe --contains $id 2>/dev/null || { echo -n '(*) '; git describe --tags $id; }; done
14f8c460473c memory: Handle struct slab changes on Linux 5.17-rc1 and later
5f390ed811b0 Fix for "kmem -s|-S" and "bt -F[F]" on Linux 5.17-rc1
b89f9ccf511a Fix for "kmem -s|-S" on Linux 5.17+ with CONFIG_SLAB
f02c8e87fccb arm64: use TCR_EL1_T1SZ to get the correct info if vabits_actual is missing
d83df2fb66cd SLUB: Fix for offset change of struct slab members on Linux 6.2-rc1
df1f0cba729f x86_64: Fix for move of per-cpu variables into struct pcpu_hot
120d6e89fc14 SLAB: Fix for "kmem -s|-S" options on Linux 6.1 and later
ac96e17d1de5 SLAB: Fix for "kmem -s|-S" options on Linux 6.2-rc1 and later
7750e61fdb2a Support module memory layout change on Linux 6.4
88580068b7dd Fix failure of gathering task table on Linux 6.5-rc1 and later
4ee56105881d Fix compilation error due to new strlcpy function that glibc added
EOF

14f8c460473c memory: Handle struct slab changes on Linux 5.17-rc1 and later
8.0.1~34

5f390ed811b0 Fix for "kmem -s|-S" and "bt -F[F]" on Linux 5.17-rc1
8.0.1~28

b89f9ccf511a Fix for "kmem -s|-S" on Linux 5.17+ with CONFIG_SLAB
8.0.1~4

f02c8e87fccb arm64: use TCR_EL1_T1SZ to get the correct info if vabits_actual is missing
8.0.2~14

d83df2fb66cd SLUB: Fix for offset change of struct slab members on Linux 6.2-rc1
8.0.3~32

df1f0cba729f x86_64: Fix for move of per-cpu variables into struct pcpu_hot
8.0.3~44

120d6e89fc14 SLAB: Fix for "kmem -s|-S" options on Linux 6.1 and later
8.0.3~28

ac96e17d1de5 SLAB: Fix for "kmem -s|-S" options on Linux 6.2-rc1 and later
8.0.3~27

7750e61fdb2a Support module memory layout change on Linux 6.4
(*) 8.0.3-14-g7750e61fdb2a

88580068b7dd Fix failure of gathering task table on Linux 6.5-rc1 and later
(*) 8.0.3-15-g88580068b7dd

4ee56105881d Fix compilation error due to new strlcpy function that glibc added
(*) 8.0.3-16-g4ee56105881d

description: updated
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote (last edit ):

Submitted bug [1] (wishlist) to Debian maintainer to update crash from 8.0.2 to 8.0.3 (released April/2023) in sid/unstable.

With this we could merge a smaller set of changes later on.

[1] https://bugs.debian.org/1054805

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote (last edit ):

The version in Noble is still the same as in Mantic, and Debian does not have a newer version either to sync/merge [1], so it apparently will not change soon.

Let's proceed with the Mantic debdiff plus changes for Noble.

Changes:
- Modified d/changelog style to '* Description (LP: #)' entry with '- d/p/.patch' sub-entries (more common).
- Numbered the patches (1-7).

Great work on DEP-3 headers, thank you! (Bug-Ubuntu, Origin, and backport notes; all matching.)

I checked there have been no changes from these patches upstream (i.e., fix-up commits to add; except for patch 5 mentioned later, with another strategy).

I'll attach the updated debdiff for now (keeping your name in signature line, if you don't mind, as I haven't done serious changes).

...

$ rmadison -a source crash
 crash | 7.0.3-3ubuntu2 | trusty | source
 crash | 7.0.3-3ubuntu4.5 | trusty-updates | source
 crash | 7.1.4-1ubuntu4 | xenial | source
 crash | 7.2.1-1 | bionic | source
 crash | 7.2.3+real-1~16.04.1 | xenial-updates | source
 crash | 7.2.8-1ubuntu0.18.04.2 | bionic-updates | source
 crash | 7.2.8-1ubuntu1 | focal | source
 crash | 7.2.8-1ubuntu1.20.04.1 | focal-updates | source
 crash | 8.0.0-1ubuntu1 | jammy | source
 crash | 8.0.0-1ubuntu1 | lunar | source
 crash | 8.0.2-1ubuntu1 | mantic | source
 crash | 8.0.2-1ubuntu1 | noble | source

$ date
Fri Oct 27 05:10:39 PM -03 2023

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Chengen,

Thanks for the great work for these SRUs.

I'd like to suggest an update/improvement to the Test Plan section.
Even though crash is so broken that it can't even open a file,
once it starts working at all, it would be important to check
that it is working _correctly_.

So, please, could you add verifications for basic correctness
of the commands being addressed per patch (for GA and HWE)?
e.g., kmem -s|-S, module memory layout, etc.

And I had questions on 2 patches:

Patch 3:
---

 31 + * commit e36ce448a08d removed kmem_cache.freelist_cache in 6.1,
 32 + * so use freelist_size instead.
 33 */
 34 - if (MEMBER_EXISTS("kmem_cache", "freelist_cache")) {
 35 + if (MEMBER_EXISTS("kmem_cache", "freelist_size")) {

This is an inconditional change before/after 6.1, which thus could impact Jammy, as it ships both 5.15 and 6.2 kernels.

However, it seems the (new value) attribute 'freelist_size' already exists, so it should be fine in Jammy _if_ 5.15 has it too.

Could you please confirm?

Patch 5:
---

Some of this backport's context update is because this patch is not included,
and it would also fit the 6.2 criteria (changes in 6.1). It seems only code
adds though, not sure how it changes any behavior (or more patches needed).

Can you clarify if it's not really needed for 6.5?
(I haven't followed the maple tree closely, but the commit message suggests it's important.)

If it's not needed _right_ now, i.e., if this SRU is priority, and crash
at least _works_ (which is a good improvement), I think it would be fine
to add it later.

 commit 872cad2d63b3a07f65323fe80a7abb29ea276b44
 Author: Tao Liu <email address hidden>
 Date: Tue Jan 10 14:56:27 2023 +0800

     Port the maple tree data structures and functions

     There have been two ways to iterate vm_area_struct until Linux 6.0:
      1) by rbtree, aka vma.vm_rb;
      2) by linked list, aka vma.vm_{next,prev}.
     However with the maple tree patches[1][2] in Linux 6.1, vm_rb and
     vm_{next,prev} are removed from vm_area_struct. The vm_area_dump()
     in crash mainly uses the linked list for vma iteration, which will
     not work for this case. So the maple tree iteration needs to be
     ported to crash.

This patch 5 is also big, adding a lot, anyway, but it goes to gdb-10.2.patch,
which only changed context lines for the backport, indeed.

A pure-code review against upstream seems a big effort (~2000 lines, I managed
up to ~1000, and it seems to match upstream).

I guess this patch will be reasonably verified if the crash commands to show
module memory layout on kernel 6.4+ (6.5 in our case) run correctly.

Changed in crash (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

(Comment #8)

> Submitted bug [1] (wishlist) to Debian maintainer to update crash from 8.0.2 to 8.0.3 (released April/2023) in sid/unstable.
> With this we could merge a smaller set of changes later on.
> [1] https://bugs.debian.org/1054805

Crash 8.0.3 should soon be available in Debian, and we can merge it into Noble
with a few more recent fixes for the 6.x kernels (Noble is 6.5+), which we'll
need to SRU in older releases too (Mantic is 6.5, Lunar is 6.2, Jammy HWE is 6.2 now, will be 6.5).

https://tracker.debian.org/news/1474801/accepted-crash-803-1-source-into-unstable/

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Debian's crash 8.0.3-1 FTBFS, waiting on 8.0.3-2 [1].

[1] https://bugs.debian.org/1055117

Revision history for this message
Chengen Du (chengendu) wrote :

Hi, I'm focusing on resolving all issues caused by structural changes rather than solely ensuring the crash utility functions across different kernel versions. The patch set will be extensive and require additional time for backporting and testing. I'll provide the new patch as soon as possible.

Revision history for this message
Chengen Du (chengendu) wrote :

debdiff for Jammy HWE

Revision history for this message
Chengen Du (chengendu) wrote :

Test plan for Jammy HWE

Revision history for this message
Chengen Du (chengendu) wrote :

debdiff for Lunar

Revision history for this message
Chengen Du (chengendu) wrote :

Test plan for Lunar

Revision history for this message
Chengen Du (chengendu) wrote :

debdiff for Mantic

Revision history for this message
Chengen Du (chengendu) wrote :

Test plan for Mantic

Revision history for this message
Chengen Du (chengendu) wrote :

Regarding the swapped page read failure issue, I've sent a patch upstream for review. Visit the link if you're interested:
https://<email address hidden>/thread/76EA25KPIRKQY5JDFFNYQY2C4CDRLZP2/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.