Nvidia module crash: GPU has fallen off the bus.

Bug #1154153 reported by Dave Chiluk
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

The last two mornings, the nvidia module has crashed during the middle of the night (4 a.m.). The displays are set to turn off and lock, but the machine is not set up to go into suspend.

Here's an excerpt from the earliest error in kern.log.

kernel: [144532.511018] hda-intel: spurious response 0x1:0x0, last cmd=0x4f0700
kernel: [144532.511021] hda-intel: spurious response 0x40:0x0, last cmd=0x4f0700
kernel: [144532.511024] hda-intel: spurious response 0x0:0x0, last cmd=0x4f0700
kernel: [144532.777203] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
kernel: [144532.777214] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
kernel: [144582.521101] show_signal_msg: 51 callbacks suppressed
kernel: [144582.521108] Xorg[2151]: segfault at 617461481d ip 00007fe6533bbd41 sp 00007fff94311438 error 4 in nvidia_drv.so[7fe652f47000+634000]
kernel: [144582.814351] init: lightdm main process (2129) terminated with status 1
kernel: [144597.810481] init: failsafe-x main process (28549) terminated with status 1
kernel: [144608.298745] BUG: soft lockup - CPU#1 stuck for 22s! [dconf worker:3334]
kernel: [144608.298883] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bnep rfcomm parport_pc ppdev binfmt_misc nfsd snd_hda_codec_hdmi snd_hda_codec_realtek joydev uvcvideo snd_usb_audio videodev snd_usbmidi_lib btusb v4l2_compat_ioctl32 hid_microsoft bluetooth bridge stp snd_hda_intel nvidia(P) snd_hda_codec snd_hwdep snd_pcm psmouse snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer i7core_edac snd_seq_device dm_multipath edac_core serio_raw snd soundcore snd_page_alloc mac_hid lp parport nfs lockd fscache auth_rpcgss nfs_acl sunrpc usbhid hid r8169 pata_jmicron
kernel: [144608.298925] CPU 1
kernel: [144608.298926] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bnep rfcomm parport_pc ppdev binfmt_misc nfsd snd_hda_codec_hdmi snd_hda_codec_realtek joydev uvcvideo snd_usb_audio videodev snd_usbmidi_lib btusb v4l2_compat_ioctl32 hid_microsoft bluetooth bridge stp snd_hda_intel nvidia(P) snd_hda_codec snd_hwdep snd_pcm psmouse snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer i7core_edac snd_seq_device dm_multipath edac_core serio_raw snd soundcore snd_page_alloc mac_hid lp parport nfs lockd fscache auth_rpcgss nfs_acl sunrpc usbhid hid r8169 pata_jmicron
kernel: [144608.298956]
kernel: [144608.298958] Pid: 3334, comm: dconf worker Tainted: P O 3.2.0-39-generic #62-Ubuntu BIOSTAR Group TP55/TP55
kernel: [144608.298961] RIP: 0010:[<ffffffffa02b54ed>] [<ffffffffa02b54ed>] _nv012125rm+0x49/0x51 [nvidia]
kernel: [144608.299043] RSP: 0018:ffff8807ed7e5828 EFLAGS: 00000246
kernel: [144608.299044] RAX: 0000000000000000 RBX: ffff8807ed7e57b8 RCX: 000000000000000d
kernel: [144608.299045] RDX: 0000000000002000 RSI: 000000000000548d RDI: ffff88080b57c034
kernel: [144608.299047] RBP: ffff8807fe03de80 R08: 0000000000070004 R09: ffff8807fe03dea8
kernel: [144608.299048] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8103ec69
kernel: [144608.299049] R13: ffff8807ed7e5798 R14: ffffffff8103ec69 R15: ffff8807ed7e5788
kernel: [144608.299051] FS: 00007f084ad5b700(0000) GS:ffff88083fc40000(0000) knlGS:0000000000000000
kernel: [144608.299052] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
kernel: [144608.299054] CR2: 000012367832e4f0 CR3: 0000000001c05000 CR4: 00000000000006e0
kernel: [144608.299055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: [144608.299057] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
kernel: [144608.299058] Process dconf worker (pid: 3334, threadinfo ffff8807ed7e4000, task ffff8807fcc64500)
kernel: [144608.299059] Stack:
kernel: [144608.299081] 00000000ffffffff ffffffffa02b4a6a ffff88080b57c008 ffff88080b57c008
kernel: [144608.299084] 0000000000070004 ffffffffa0584399 ffff88080b57c008 ffffffffa05a09e4
kernel: [144608.299086] ffff88080b57c008 0000000000000045 ffff88080b57c008 0000000000070004
kernel: [144608.299089] Call Trace:
kernel: [144608.299144] [<ffffffffa02b4a6a>] ? _nv011835rm+0xba/0x1c2 [nvidia]
kernel: [144608.299221] [<ffffffffa0584399>] ? _nv007992rm+0x26/0xb3 [nvidia]
kernel: [144608.299299] [<ffffffffa05a09e4>] ? _nv003210rm+0x48b3/0xaf83 [nvidia]
kernel: [144608.299372] [<ffffffffa0521450>] ? _nv005265rm+0x116/0x1b9 [nvidia]
kernel: [144608.299446] [<ffffffffa052163d>] ? _nv005084rm+0x14a/0x1ed [nvidia]
kernel: [144608.299508] [<ffffffffa0628592>] ? _nv010822rm+0x127/0x1cb [nvidia]
kernel: [144608.299569] [<ffffffffa0628722>] ? _nv010828rm+0xec/0x111 [nvidia]
kernel: [144608.299601] [<ffffffffa02a6515>] ? _nv000771rm+0x28ca/0x2b89 [nvidia]
kernel: [144608.299633] [<ffffffffa02a3b94>] ? _nv000738rm+0xe23/0xe59 [nvidia]
kernel: [144608.299664] [<ffffffffa02a3c33>] ? _nv013355rm+0xe/0x26 [nvidia]
kernel: [144608.299695] [<ffffffffa02a44a7>] ? _nv000771rm+0x85c/0x2b89 [nvidia]
kernel: [144608.299727] [<ffffffffa02a3b94>] ? _nv000738rm+0xe23/0xe59 [nvidia]
kernel: [144608.299758] [<ffffffffa02a3c33>] ? _nv013355rm+0xe/0x26 [nvidia]
kernel: [144608.299789] [<ffffffffa02a3ed3>] ? _nv000771rm+0x288/0x2b89 [nvidia]
kernel: [144608.299821] [<ffffffffa02a3b94>] ? _nv000738rm+0xe23/0xe59 [nvidia]
kernel: [144608.299852] [<ffffffffa02a3c07>] ? _nv013357rm+0x3d/0x5b [nvidia]
kernel: [144608.299888] [<ffffffffa0756a57>] ? _nv000780rm+0xdf/0x1c3 [nvidia]
kernel: [144608.299924] [<ffffffffa0758d26>] ? rm_free_unused_clients+0x60/0xdb [nvidia]
kernel: [144608.299928] [<ffffffff810914b2>] ? up+0x32/0x50
kernel: [144608.299962] [<ffffffffa077697f>] ? nv_kern_ctl_close+0x7f/0x130 [nvidia]
kernel: [144608.299996] [<ffffffffa07776db>] ? nv_kern_close+0x3bb/0x450 [nvidia]
kernel: [144608.299999] [<ffffffff8117aa3e>] ? __fput+0xbe/0x210
kernel: [144608.300001] [<ffffffff8117abb5>] ? fput+0x25/0x30
kernel: [144608.300003] [<ffffffff81177756>] ? filp_close+0x66/0x90
kernel: [144608.300006] [<ffffffff8106a7ba>] ? put_files_struct.part.10+0x7a/0xe0
kernel: [144608.300009] [<ffffffff8106c1d8>] ? put_files_struct+0x18/0x20
kernel: [144608.300011] [<ffffffff8106c2a4>] ? exit_files+0x54/0x70
kernel: [144608.300013] [<ffffffff8106c785>] ? do_exit+0x195/0x450
kernel: [144608.300016] [<ffffffff8107b04a>] ? __dequeue_signal+0x6a/0xb0
kernel: [144608.300018] [<ffffffff8106cbe4>] ? do_group_exit+0x44/0xa0
kernel: [144608.300020] [<ffffffff8107dc0c>] ? get_signal_to_deliver+0x21c/0x420
kernel: [144608.300023] [<ffffffff81014865>] ? do_signal+0x45/0x130
kernel: [144608.300025] [<ffffffff8117983d>] ? vfs_read+0x10d/0x180
kernel: [144608.300027] [<ffffffff81014b15>] ? do_notify_resume+0x65/0x80
kernel: [144608.300030] [<ffffffff81666250>] ? int_signal+0x12/0x17
kernel: [144608.300031] Code: e8 49 00 48 89 c2 be 01 00 00 00 bf 00 00 00 00 e8 64 ef 00 00 b8 00 00 00 00 eb 12 89 c0 ba 01 00 00 00 d3 e2 85 14 87 0f 95 c0 <0f> b6 c0 48 83 c4 08 c3 41 55 41 54 53 48 89 fb 49 89 f5 49 89
kernel: [144608.301447] Call Trace:
kernel: [144608.301487] [<ffffffffa02b4a6a>] ? _nv011835rm+0xba/0x1c2 [nvidia]
kernel: [144608.301573] [<ffffffffa0584399>] ? _nv007992rm+0x26/0xb3 [nvidia]
kernel: [144608.301654] [<ffffffffa05a09e4>] ? _nv003210rm+0x48b3/0xaf83 [nvidia]
kernel: [144608.301731] [<ffffffffa0521450>] ? _nv005265rm+0x116/0x1b9 [nvidia]
kernel: [144608.301807] [<ffffffffa052163d>] ? _nv005084rm+0x14a/0x1ed [nvidia]
kernel: [144608.301871] [<ffffffffa0628592>] ? _nv010822rm+0x127/0x1cb [nvidia]
kernel: [144608.301935] [<ffffffffa0628722>] ? _nv010828rm+0xec/0x111 [nvidia]
kernel: [144608.301967] [<ffffffffa02a6515>] ? _nv000771rm+0x28ca/0x2b89 [nvidia]
kernel: [144608.302000] [<ffffffffa02a3b94>] ? _nv000738rm+0xe23/0xe59 [nvidia]
kernel: [144608.302033] [<ffffffffa02a3c33>] ? _nv013355rm+0xe/0x26 [nvidia]
kernel: [144608.302065] [<ffffffffa02a44a7>] ? _nv000771rm+0x85c/0x2b89 [nvidia]
kernel: [144608.302098] [<ffffffffa02a3b94>] ? _nv000738rm+0xe23/0xe59 [nvidia]
kernel: [144608.302130] [<ffffffffa02a3c33>] ? _nv013355rm+0xe/0x26 [nvidia]
kernel: [144608.302163] [<ffffffffa02a3ed3>] ? _nv000771rm+0x288/0x2b89 [nvidia]
kernel: [144608.302196] [<ffffffffa02a3b94>] ? _nv000738rm+0xe23/0xe59 [nvidia]
kernel: [144608.302228] [<ffffffffa02a3c07>] ? _nv013357rm+0x3d/0x5b [nvidia]
kernel: [144608.302265] [<ffffffffa0756a57>] ? _nv000780rm+0xdf/0x1c3 [nvidia]
kernel: [144608.302302] [<ffffffffa0758d26>] ? rm_free_unused_clients+0x60/0xdb [nvidia]
kernel: [144608.302305] [<ffffffff810914b2>] ? up+0x32/0x50
kernel: [144608.302339] [<ffffffffa077697f>] ? nv_kern_ctl_close+0x7f/0x130 [nvidia]
kernel: [144608.302375] [<ffffffffa07776db>] ? nv_kern_close+0x3bb/0x450 [nvidia]
kernel: [144608.302377] [<ffffffff8117aa3e>] ? __fput+0xbe/0x210
kernel: [144608.302379] [<ffffffff8117abb5>] ? fput+0x25/0x30
kernel: [144608.302381] [<ffffffff81177756>] ? filp_close+0x66/0x90
kernel: [144608.302383] [<ffffffff8106a7ba>] ? put_files_struct.part.10+0x7a/0xe0
kernel: [144608.302385] [<ffffffff8106c1d8>] ? put_files_struct+0x18/0x20
kernel: [144608.302387] [<ffffffff8106c2a4>] ? exit_files+0x54/0x70
kernel: [144608.302389] [<ffffffff8106c785>] ? do_exit+0x195/0x450
kernel: [144608.302391] [<ffffffff8107b04a>] ? __dequeue_signal+0x6a/0xb0
kernel: [144608.302393] [<ffffffff8106cbe4>] ? do_group_exit+0x44/0xa0
kernel: [144608.302395] [<ffffffff8107dc0c>] ? get_signal_to_deliver+0x21c/0x420
kernel: [144608.302398] [<ffffffff81014865>] ? do_signal+0x45/0x130
kernel: [144608.302400] [<ffffffff8117983d>] ? vfs_read+0x10d/0x180
kernel: [144608.302401] [<ffffffff81014b15>] ? do_notify_resume+0x65/0x80
kernel: [144608.302404] [<ffffffff81666250>] ? int_signal+0x12/0x17
kernel: [144636.226880] BUG: soft lockup - CPU#1 stuck for 22s! [dconf worker:3334]
kernel: [144636.227012] Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm bnep rfcomm parport_pc ppdev binfmt_misc nfsd snd_hda_codec_hdmi snd_hda_codec_realtek joydev uvcvideo snd_usb_audio videodev snd_usbmidi_lib btusb v4l2_compat_ioctl32 hid_microsoft bluetooth bridge stp snd_hda_intel nvidia(P) snd_hda_codec snd_hwdep snd_pcm psmouse snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer i7core_edac snd_seq_device dm_multipath edac_core serio_raw snd soundcore snd_page_alloc mac_hid lp parport nfs lockd fscache auth_rpcgss nfs_acl sunrpc usbhid hid r8169 pata_jmicron
kernel: [144636.227055] CPU 1

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: xorg 1:7.6+12ubuntu2
ProcVersionSignature: Ubuntu 3.2.0-39.62-generic 3.2.39
Uname: Linux 3.2.0-39-generic x86_64
NonfreeKernelModules: nvidia
.proc.driver.nvidia.gpus.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 310.14 Tue Oct 9 11:52:41 PDT 2012
 GCC version: gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
.tmp.unity.support.test.0:

ApportVersion: 2.0.1-0ubuntu17.1
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
Date: Tue Mar 12 10:16:18 2013
DistUpgraded: Fresh install
DistroCodename: precise
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, even including gdb or git bisection work if needed
GraphicsCard:
 NVIDIA Corporation Device [10de:1183] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: Device [196e:1000]
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
JockeyStatus:
 xorg:nvidia_current_updates - NVIDIA accelerated graphics driver (post-release updates) (Proprietary, Disabled, Not in use)
 xorg:nvidia_experimental_304 - NVIDIA accelerated graphics driver (**experimental** beta) (Proprietary, Disabled, Not in use)
 xorg:nvidia_experimental_310 - NVIDIA accelerated graphics driver (**experimental** beta) (Proprietary, Enabled, In use)
MachineType: BIOSTAR Group TP55
MarkForUpload: True
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-39-generic root=UUID=fd58abb9-56f2-4094-8191-d931dea02669 ro crashkernel=384M-2G:64M,2G-:128M quiet splash
SourcePackage: xorg
Symptom: display
Title: Xorg crash
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/02/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 080015
dmi.board.asset.tag: None
dmi.board.name: TP55
dmi.board.vendor: BIOSTAR Group
dmi.chassis.asset.tag: None
dmi.chassis.type: 3
dmi.chassis.vendor: BIOSTAR Group
dmi.chassis.version: 6.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr080015:bd06/02/2010:svnBIOSTARGroup:pnTP55:pvr6.0:rvnBIOSTARGroup:rnTP55:rvr:cvnBIOSTARGroup:ct3:cvr6.0:
dmi.product.name: TP55
dmi.product.version: 6.0
dmi.sys.vendor: BIOSTAR Group
version.compiz: compiz 1:0.9.7.12-0ubuntu1
version.ia32-libs: ia32-libs 20090808ubuntu36
version.libdrm2: libdrm2 2.4.39-0ubuntu0.1
version.libgl1-mesa-dri: libgl1-mesa-dri 8.0.4-0ubuntu0.4
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 8.0.4-0ubuntu0.4
version.nvidia-graphics-drivers: nvidia-graphics-drivers N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.11.4-0ubuntu10.12
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.0-0ubuntu1.2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20111219.aacbd629-0ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.17.0-1ubuntu4.3
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20111201+b5534a1-1build3

Revision history for this message
Dave Chiluk (chiluk) wrote :
Revision history for this message
Dave Chiluk (chiluk) wrote :

This has some interesting, and possibly related reading. http://www.cyberciti.biz/faq/debian-ubuntu-rhel-fedora-linux-nvidia-nvrm-gpu-fallen-off-bus/

The suggested solution is to turn on persistence for the nvidia card using
# /usr/bin/nvidia-smi -pm 1

This was never an issue with 3.2.0-38+310.14-0ubuntu0.1 *(the same version I am currently running).

This also could be as simple as a real hardware issue.

bugbot (bugbot)
affects: xorg (Ubuntu) → nvidia-graphics-drivers (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers (Ubuntu):
status: New → Confirmed
Revision history for this message
Jens Jorgensen (jorgensen) wrote :

I have the same behavior -- ie. when I leave my computer at night the monitor eventually goes into suspend mode. Then in the morning I come in and touch the keyboard nothing happens. If I log in remotely I can see that X has died and I see the same "fallen off the bus" kernel messages. I put the suggested 'nvidia-smi -pm 1' into /etc/rc.local and verified that the persistent mode is enabled but I still have the same problem. One interesting note for me is that I recently attached a second monitor to my quad-gpu card, and I never had this problem before I added the second monitor. It's possible I had gotten an udpated package at about the same time, but I'm not sure. I'm running up-to-date 13.04.

Revision history for this message
CassieMoondust (cassie-lx) wrote :

Ubuntu 14.04.1: The same issue with nvidia 331.38 Kernel 3.13-34 and a GeForxe GTX285.

Driver 304 runs flawlessly.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.