[Latitude 2120] Memory test fails on Alpha 3

Bug #823419 reported by Jeff Lane 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox
Fix Released
Low
Jeff Lane 
linux (Ubuntu)
Invalid
Medium
Ara Pulido

Bug Description

During the test certification run on Oneiric Alpha 3, the Memory test fails. This may be an issue with the test on Oneiric, or it may be an issue with Oneiric on this machine, so I'm marking it as a blocker and also opening this against the OS as well as Checkbox, we can mark invalid later for the correct package.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: linux-image-3.0.0-8-generic 3.0.0-8.10
ProcVersionSignature: Ubuntu 3.0.0-8.10-generic 3.0.1
Uname: Linux 3.0.0-8-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC269VB Analog [ALC269VB Analog]
   Subdevices: 0/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC269VB Analog [ALC269VB Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1594 F.... pulseaudio
 /dev/snd/pcmC0D0p: ubuntu 1594 F...m pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf6dfc000 irq 46'
   Mixer name : 'Realtek ALC269VB'
   Components : 'HDA:10ec0269,102804a5,00100100'
   Controls : 14
   Simple ctrls : 9
Date: Tue Aug 9 11:58:26 2011
HibernationDevice: RESUME=UUID=5f260aea-daa2-47cc-92b2-31114e3c8ef6
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Alpha i386 (20110803.1)
MachineType: Dell Inc. Latitude 2120
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-8-generic root=UUID=08fbc92a-95f3-46a5-b5ea-ea48d3df5663 ro quiet splash initcall_debug vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-8-generic N/A
 linux-backports-modules-3.0.0-8-generic N/A
 linux-firmware 1.56
SourcePackage: linux
StagingDrivers: brcmsmac rts_pstor brcmutil
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/06/2010
dmi.bios.vendor: Dell Inc.
dmi.bios.version: W06
dmi.board.name: DOE3C2
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrW06:bd10/06/2010:svnDellInc.:pnLatitude2120:pvr:rvnDellInc.:rnDOE3C2:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Latitude 2120
dmi.sys.vendor: Dell Inc.

Revision history for this message
Jeff Lane  (bladernr) wrote :
tags: added: blocks-hwcert
Revision history for this message
Jeff Lane  (bladernr) wrote :

Update: this successfully completed on the Inspiron 1122, so it seems to not necessarily be a test issue. More likely a system issue with the 2120.

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Can you attach the output of the memory test from this system? I want to see if it matches the failure on a different system in Natty.

Revision history for this message
Jeff Lane  (bladernr) wrote :

System Memory: 2003 MB
Free Memory: 1530 MB
Swap Memory: 2036 MB
i686 arch, Limiting Process Memory: 1024
Testing memory with 2 processes
Running threaded memory test:
Started: process 0 pid 23993: /usr/share/checkbox/scripts/threaded_memtest -qpv -m1024m -t60
Started: process 1 pid 23994: /usr/share/checkbox/scripts/threaded_memtest -qpv -m1024m -t60
Error: process 0 pid 23993 retuned 137 process 0 pid 23993 returned success
Error: process 1 pid 23994 retuned 137 process 1 pid 23994 returned success
Multi-process, threaded memory Test FAILED

^^^
That is from the checkbox run that had the failure... I didn't capture the messages when I ran memory test alone, but htey were similar to this.

That being said, I booted the 2120 just now and running memory_test alone has succeeded 3 of 3 times... :/

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Heh, I'm also having trouble finding the place where I saw this fail before. It seems like this failure is random to an extent. Probably best to keep an eye on it.

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Ara Pulido (ara)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
assignee: nobody → Ara Pulido (apulido)
Revision history for this message
Chris Van Hoof (vanhoof) wrote :

Ayan -- Mind having a look at Oneiric on your 2120 to see if you also see this same failure?

Revision history for this message
Ayan George (ayan) wrote :

Test passed on my 2120 with 1Gb RAM.

Revision history for this message
Ayan George (ayan) wrote :

Does threaded_memtest give more information about how it failed?

Revision history for this message
Ayan George (ayan) wrote :

It would also be interesting to know if memcheck86 reports any errors.

Revision history for this message
Ayan George (ayan) wrote :

I don't think this points to a real memory error. If you look at the output of a failed memory_test run, you'll notice it claims threaded_memtest exited with error code 127:

System Memory: 2003 MB
Free Memory: 1791 MB
Swap Memory: 16658 MB
i686 arch, Limiting Process Memory: 1024
Testing memory with 2 processes
Running threaded memory test:
Started: process 0 pid 2187: /home/ubuntu/checkbox/scripts/threaded_memtest -qpv -m1024m -t60
Started: process 1 pid 2188: /home/ubuntu/checkbox/scripts/threaded_memtest -qpv -m1024m -t60
Error: process 0 pid 2187 retuned 127
process 0 pid 2187 returned success
Error: process 1 pid 2188 retuned 127
process 1 pid 2188 returned success
Multi-process, threaded memory Test FAILED

If you look at the source code (threaded_memset.c), you will not find a way for it to ever exit with that code (unless the rv variable in main() is being clobbered).

I think the error is in run_processes procedure in scripts/memory_test (a python script).

A way to verify this is to run threaded_memtest manually. I've attached a test script -- you may have to edit the TMT variable at the top of the script -- and I suspect that the program will never fail and never exit with a 127 error code.

You can find the results of the tests in /tmp/tmt.1 and /tmp/tmt.2

Revision history for this message
Jeff Lane  (bladernr) wrote :

So the big question is where is that 127 coming from...

As Ayan pointed out, threaded_memtest can only return a 0 or 1 explicitly via the rv variable.
However, check this bit out from memory_test in run_processes() which is only called when running in multi-thread mode:

if line and len(line) > 1:
    print "process %u pid %u: %s" % (i, pipe[i].pid, line)
    sys.stdout.flush()
if pipe[i].poll() == -1:
    waiting = True
else:
    return_value = pipe[i].poll()
    if return_value != 0:
        print "Error: process %u pid %u retuned %u" % (i, pipe[i].pid, return_value)
        passed = False
    print "process %u pid %u returned success" % (i, pipe[i].pid)
    pipe[i] = None

First, a minor detail... that second print statement is incorrect. it will ALWAYS be printed, regardless of outcome, which is why the logs show
Error: process 1 pid 2188 retuned 127
process 1 pid 2188 returned success

More importantly, on these failing systems the subprocess.poll() function is returning the 127. So this has to be generated somewhere between the memory_test script and threaded_memtest. As long as poll returns a -1, memory_test keeps waiting for something other than -1.

We also know that threaded_memtest can only return 0, or 1.

So, going on that, perhaps we should change the wait statement to this:

state = pipe[i].poll()
if state != 0 or state != 1:
    waiting = True

I wonder if that 127 is the kernel saying that the process is closing, but isn't quite done yet (perhaps the kernel is blocking the process exit while it recovers resources?)

I'm attaching a branch of checkbox to this (for oneiric) that incorporates this... Can someone test this on a failing system and see if it resolves the issue?

Coincidentally, from looking at memory_test, I don't think this would happen when running in a single-thread situation. I think this will only affect multi-threaded testing.

Finally, while this seems to be pretty frequent on the Latitude 2120, I don't think this is machine specific either. Perhaps this one is "just slow enough" that we're seeing this... I don't know, but my gut feeling is that the 2120 just happens to be the right combination of factors to highlight what's actually a flaw in the test itself.

Revision history for this message
Ayan George (ayan) wrote :

I agree -- this is a generic bug. Maybe something about the 2120 tickles it more often. I'll bow out as to avoid bikeshedding this but I wanted to note that threaded_memtest can also return 2.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Huh... I completely overlooked that line... :/ thanks for pointing that out. I've pushed a change to the branch linked here that accounts for the 2 exit code as well.

Ara Pulido (ara)
tags: removed: blocks-hwcert
Changed in checkbox:
status: New → In Progress
importance: Undecided → Low
assignee: nobody → Jeff Lane (bladernr)
Revision history for this message
Brad Figg (brad-figg) wrote :

I don't believe this is a kernel bug.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Daniel Manrique (roadmr) wrote :
Download full text (3.3 KiB)

This is a test run result (PASSED) on the 2120 with 2003 MB RAM:

 System Memory: 2003 MB Free Memory: 1700 MB Swap Memory: 2036 MB i686 arch, Limiting Process Memory: 1024 Testing memory with 2 processes Running threaded memory test: Started: process 0 pid 16777: /usr/share/checkbox/scripts/threaded_memtest -qpv -m1024m -t60 Started: process 1 pid 16778: /usr/share/checkbox/scripts/threaded_memtest -qpv -m1024m -t60 process 0 pid 16777: Detected 4 processors. RAM: 56.9% free (1139M/2003M) Testing 1024M RAM for 60 seconds using 8 threads: thread 1: mapping 128M RAM thread 2: mapping 128M RAM thread 0: mapping 128M RAM thread 3: mapping 128M RAM thread 4: mapping 128M RAM thread 5: mapping 128M RAM thread 6: mapping 128M RAM thread 7: mapping 128M RAM thread 5: mapping complete thread 1: mapping complete thread 6: mapping complete thread 4: mapping complete thread 0: mapping complete thread 2: mapping complete thread 7: mapping complete thread 3: mapping complete thread 5: test start thread 1: test start thread 6: test start thread 4: test start thread 0: test start thread 2: test start thread 7: test start thread 3: test start thread 0 finished. thread 6 finished. thread 2 finished. thread 1 finished. thread 7 finished. thread 3 finished. thread 5 finished. thread 4 finished. thread 4 unmapping and exiting thread 0 unmapping and exiting thread 6 unmapping and exiting thread 2 unmapping and exiting thread 1 unmapping and exiting thread 7 unmapping and exiting thread 3 unmapping and exiting thread 5 unmapping and exiting Runtime was 60.74s thread 0: 2845 loops thread 1: 2705 loops thread 2: 2462 loops thread 3: 2889 loops thread 4: 3355 loops thread 5: 2716 loops thread 6: 2711 loops thread 7: 2912 loops Total loops per second: 372.01 Testing complete. process 0 pid 16777 returned success process 1 pid 16778: Detected 4 processors. RAM: 56.9% free (1139M/2003M) Testing 1024M RAM for 60 seconds using 8 threads: thread 0: mapping 128M RAM thread 1: mapping 128M RAM thread 2: mapping 128M RAM thread 3: mapping 128M RAM thread 4: mapping 128M RAM thread 5: mapping 128M RAM thread 6: mapping 128M RAM thread 7: mapping 128M RAM thread 1: mapping complete thread 5: mapping complete thread 2: mapping complete thread 4: mapping complete thread 6: mapping complete thread 7: mapping complete thread 3: mapping complete thread 0: mapping complete thread 1: test start thread 5: test start thread 2: test start thread 4: test start thread 6: test start thread 7: test start thread 3: test start thread 0: test start thread 5 finished. thread 7 finished. thread 6 finished. thread 0 finished. thread 2 finished. thread 1 finished. thread 3 finished. thread 4 finished. thread 4 unmapping and exiting thread 5 unmapping and exiting thread 7 unmapping and exiting thread 6 unmapping and exiting thread 0 unmapping and exiting thread 2 unmapping and exiting thread 1 unmapping and exiting thread 3 unmapping and exiting Runtime was 60.96s thread 0: 2656 loops thread 1: 2780 loops thread 2: 2964 loops thread 3: 2509 loops thread 4: 2128 loops thread 5: 2108 loops thread 6: 2306 loops thread 7: 3576 loops Total loops per second: 344.93 Testing complete. process 1 pid 16778 re...

Read more...

Changed in checkbox:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.