Unable to use nvme drive to install Ubuntu 23.10

Bug #2040157 reported by Ivo Jansky
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned
Jammy
Fix Committed
High
Unassigned
Lunar
Fix Committed
High
Unassigned
Mantic
Fix Committed
High
Unassigned

Bug Description

The 6.5 kernel in the 23.10 installer ISO is unable to work with an NVME drive in the laptop, and it is not possible to install Ubuntu. It might be related to Kernel bug https://bugzilla.kernel.org/show_bug.cgi?id=217802 fixed in 6.5.6.

dmesg:
[ 42.116742] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[ 42.116764] nvme nvme0: Does your device have a faulty power saving mode enabled?
[ 42.116769] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[ 42.149334] nvme0n1: I/O Cmd(0x2) @ LBA 370339840, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
[ 42.149357] I/O error, dev nvme0n1, sector 370339840 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ 42.164760] nvme 0000:04:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 42.165160] nvme nvme0: Disabling device after reset failure: -19
[ 42.180818] Buffer I/O error on dev nvme0n1p7, logical block 0, async page read

$ lsb_release -rd
No LSB modules are available.
Description: Ubuntu 23.10
Release: 23.10

ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: linux-image-6.5.0-9-generic 6.5.0-9.9
ProcVersionSignature: Ubuntu 6.5.0-9.9-generic 6.5.3
Uname: Linux 6.5.0-9-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.27.0-0ubuntu5
Architecture: amd64
CRDA: N/A
CasperMD5CheckResult: pass
CasperVersion: 1.486
CloudArchitecture: x86_64
CloudID: nocloud
CloudName: unknown
CloudPlatform: nocloud
CloudSubPlatform: seed-dir (/var/lib/cloud/seed/nocloud)
CurrentDesktop: ubuntu:GNOME
Date: Mon Oct 23 10:53:50 2023
LiveMediaBuild: Ubuntu 23.10.1 "Mantic Minotaur" - Release amd64 (20231016.1)
MachineType: {report['dmi.sys.vendor']} {report['dmi.product.name']}
ProcEnviron:
 LANG=C.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz layerfs-path=minimal.standard.live.squashfs --- quiet splash
RelatedPackageVersions:
 linux-restricted-modules-6.5.0-9-generic N/A
 linux-backports-modules-6.5.0-9-generic N/A
 linux-firmware 20230919.git3672ccab-0ubuntu2.1
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/04/2023
dmi.bios.release: 1.33
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.33.0
dmi.board.name: 0R6JFH
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.33.0:bd07/04/2023:br1.33:svnDellInc.:pnPrecision5520:pvr:rvnDellInc.:rn0R6JFH:rvrA00:cvnDellInc.:ct10:cvr:sku07BF:
dmi.product.family: Precision
dmi.product.name: Precision 5520
dmi.product.sku: 07BF
dmi.sys.vendor: Dell Inc.

Revision history for this message
Ivo Jansky (ijansky) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Stefan Bader (smb) wrote :

The upstream bug report ends in bisecting to:

commit 69304c8d285b77c9a56d68f5ddb2558f27abf406
Author: Ricky WU <email address hidden>
Date: Tue Jul 25 09:10:54 2023 +0000

     misc: rtsx: judge ASPM Mode to set PETXCFG Reg

     commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.

     ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
     to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
     always set to HIGH during the initialization.

     Cc: <email address hidden>
     Signed-off-by: Ricky Wu <email address hidden>
     Link:
https://<email address hidden>
     Signed-off-by: Greg Kroah-Hartman <email address hidden>

  drivers/misc/cardreader/rts5227.c | 2 +-
  drivers/misc/cardreader/rts5228.c | 18 ------------------
  drivers/misc/cardreader/rts5249.c | 3 +--
  drivers/misc/cardreader/rts5260.c | 18 ------------------
  drivers/misc/cardreader/rts5261.c | 18 ------------------
  drivers/misc/cardreader/rtsx_pcr.c | 5 ++++-
  6 files changed, 6 insertions(+), 58 deletions(-)

All the changes are not directly in the nvme code but seem to have some influence there. One comment suggested to blacklist the resulting module "rtsx_pci". According to kernel-parameters.txt one can pass in a blacklist via "module_blacklist=rtsx_pci".

Revision history for this message
Stefan Bader (smb) wrote :

Additional note: the comment about blacklisting in the bug report was somewhat mangled:

> I can confirm that blacklisting the drivers (rtsx_pci_and sdmmc and rtsx_pci) and rebuilding the initramfs...

I found there is a rtsx_pci_sdmmc module, so potentially the complete kernel argument is:

module_blacklist=rtsx_pci_sdmmc,rtsx_pci

Stefan Bader (smb)
Changed in linux (Ubuntu Mantic):
importance: Undecided → High
Revision history for this message
Stefan Bader (smb) wrote :

The upstream fix for this is:

commit 0e4cac557531a4c93de108d9ff11329fcad482ff
[PATCH] misc: rtsx: Fix some platforms can not boot and move the l1ss
 judgment to probe

commit 101bd907b424 ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg")
some readers no longer force #CLKREQ to low
when the system need to enter ASPM.
But some platform maybe not implement complete ASPM?
it causes some platforms can not boot

Like in the past only the platform support L1ss we release the #CLKREQ.
Move the judgment (L1ss) to probe,
we think read config space one time when the driver start is enough

Fixes: 101bd907b424 ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg")
Cc: stable <email address hidden>
Reported-by: Paul Grandperrin <email address hidden>
Signed-off-by: Ricky Wu <email address hidden>
Tested-By: Jade Lovelace <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Greg Kroah-Hartman <<email address hidden>

I am also nominating Lunar and Jammy for this because we just picked up the problematic change via stable (for next cycle, not in process right now).

Changed in linux (Ubuntu Lunar):
importance: Undecided → High
Changed in linux (Ubuntu Jammy):
importance: Undecided → High
Changed in linux (Ubuntu Lunar):
status: New → Triaged
Changed in linux (Ubuntu Jammy):
status: New → Triaged
Stefan Bader (smb)
Changed in linux (Ubuntu Mantic):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Lunar):
status: Triaged → Fix Committed
Changed in linux (Ubuntu Jammy):
status: Triaged → Fix Committed
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-90.100 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux' to 'verification-done-jammy-linux'. If the problem still exists, change the tag 'verification-needed-jammy-linux' to 'verification-failed-jammy-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-v2 verification-needed-jammy-linux
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.2.0-38.39 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lunar-linux' to 'verification-done-lunar-linux'. If the problem still exists, change the tag 'verification-needed-lunar-linux' to 'verification-failed-lunar-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-lunar-linux-v2 verification-needed-lunar-linux
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.5.0-12.12 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux' to 'verification-done-mantic-linux'. If the problem still exists, change the tag 'verification-needed-mantic-linux' to 'verification-failed-mantic-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-v2 verification-needed-mantic-linux
Revision history for this message
Luis Alberto Pabón (copong) wrote (last edit ):

I have tested linux/6.5.0-12 and it's been a mixed bag:

 * The nvme fix seems to work, I can see my services loading and whatnot on the boot log 👍
 * The Plymouth theme displays at half the size than with other kernels (I have a 4k display with intel integrated graphics and nvidia gtx 1050 which I never use) 👎
 * GDM fails to load and hangs forever. I can't switch to any TTYs. The system isn't hard-blocked though, it reacts to ctrl+alt+del and reboots 👎

tags: added: verification-failed-mantic-linux
removed: verification-needed-mantic-linux
Revision history for this message
Eric Rouleau (xblitz) wrote :

@copong , This bug tracks the issue with NVMe in the kernel, if you have other issues please submit another bug.

tags: added: verification-done-mantic-linux
removed: verification-failed-mantic-linux
Revision history for this message
Luis Alberto Pabón (copong) wrote :

@Eric, those issues are introduced with this kernel as well, either as a result of the patch or something else.

Also, ubuntu lunar's current kernel (6.2.0-36-generic) does not suffer from the nvme issue. I'm writing this from there.

Revision history for this message
Luis Alberto Pabón (copong) wrote :

Just to clarify, on mantic and kernel 6.5.0-12 the nvme issue is there, but not the graphical issue with Plymouth or GDM hanging - I can boot up the NVME drive from a usb-c adaptor and the problems aren't there.

Revision history for this message
Roxana Nicolescu (roxanan) wrote :

@copong This bug does not represent the entire latest 6.5 update, therefore if you have a different issue, please create another bug report and we can take action from there.

Revision history for this message
Stefan Bader (smb) wrote :

The new 6.5 kernel contains other changes beside the fix for NVME, so the new issues may or may not be related. Please file a new bug and try to provide some log of the failed boot (you can do a boot into 6.5 and then reboot back into 6.2. Then "sudo journalctl -b-1" might contain useful info to work with. Then mention the new bug number here. Thanks

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.