Activity log for bug #574910

Date Who What changed Old value New value Message
2010-05-04 04:21:20 Rod bug added bug
2010-05-04 15:08:01 Scott Moser bug task added linux-ec2 (Ubuntu)
2010-05-04 15:08:35 Scott Moser ubuntu-on-ec2: status New Invalid
2010-05-04 15:11:12 Scott Moser tags ec2-images lucid
2010-05-17 21:49:23 John Johansen linux-ec2 (Ubuntu): status New Confirmed
2010-05-17 21:49:56 John Johansen linux-ec2 (Ubuntu): status Confirmed In Progress
2010-05-17 21:50:04 John Johansen linux-ec2 (Ubuntu): assignee John Johansen (jjohansen)
2010-06-11 22:31:36 Josh Koenig bug task added pantheon
2010-06-11 22:33:48 Josh Koenig pantheon: importance Undecided Medium
2010-06-22 19:41:09 Greg Coit pantheon: status New Triaged
2010-07-06 10:50:37 Alex Howells bug task added linux-meta (Ubuntu)
2010-07-06 10:56:49 Alex Howells attachment added ubuntu-574910-loadavg.zip http://launchpadlibrarian.net/51455579/ubuntu-574910-loadavg.zip
2010-07-06 12:08:07 Alex Howells attachment added ubuntu-574910-loadavg-karmickernel.zip http://launchpadlibrarian.net/51458418/ubuntu-574910-loadavg-karmickernel.zip
2010-07-06 15:52:56 Alex Howells summary High load averages on Lucid EC2 while idling High load averages on Lucid while idling
2010-07-10 18:15:10 Jonathan Davies bug added subscriber Jonathan Davies
2010-07-14 12:21:26 vavrkok bug task added linux
2010-07-15 17:48:29 Dmitry Agafonov bug added subscriber Dmitry Agafonov
2010-07-17 09:07:42 Dirk Schaare bug added subscriber Dirk Schaare
2010-07-21 17:32:52 sterios prosiniklis bug added subscriber sterios prosiniklis
2010-07-22 06:50:28 Al Sutton bug added subscriber Al Sutton
2010-07-27 17:12:53 Alexandre Bourget bug added subscriber Alexandre Bourget
2010-07-28 16:29:55 Greg Coit pantheon: importance Medium Critical
2010-07-28 19:03:22 Greg Coit pantheon: status Triaged Confirmed
2010-07-30 00:36:30 zlj bug added subscriber Ilya Zakreuski
2010-08-03 10:28:36 Alvin bug added subscriber Alvin
2010-08-05 01:45:52 Josh Koenig pantheon: importance Critical Medium
2010-08-19 21:38:42 weswinham bug added subscriber weswinham
2010-08-20 17:44:01 John Johansen description ami-2d4aa444 Description: Ubuntu 10.04 LTS Linux domU-XX-XX-XX-XX-XX-XX 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 04:14:01 UTC 2010 i686 GNU/Linux Description copied (and edited) from post at http://groups.google.com/group/ec2ubuntu/browse_thread/thread/4be26e81b7c597bc Posted as a bug here as I'm not the only one experiencing these issues, see very similar post at http://groups.google.com/group/ec2ubuntu/browse_thread/thread/a7e9bc45cf923f8c ---------------------------------- I've been running a customised version of an Intrepid image by Eric Hammond for a long while now and decided it was time to upgrade so I've configured a fresh image based on the official Lucid 32-bit in us-east (ami-2d4aa444). And I'm having some strange issues. I run on a c1.medium instance and normally expect a load average of between 0.2 and 0.6, roughly averaged throughout the day, with spikes usually no more than about 2.0. So it's fairly relaxed. When all my services are shut down the load averages go down to ~0.0. Now I'm on to Lucid I'm getting load averages that are roughly 10 times higher than I expect it to be, hovering between around 1.8 and 2.5 and I can see no reason why it should be reported this high. There are no processes hogging CPU, just occasionally coming in and out, watching 'top' doesn't reveal anything obvious and it just looks like a major disconnect between the activity and the load averages. I can't catch any processes running uninterruptable [ ps auxw | awk '{if ($8 == "D") print }' ]. If I run my custom image without any of my services running, load averages hover between approximately 0.1 and 0.6, nothing like the ~0.0 I used to get with nothing happening; I can't see any reason for it moving but it goes up and down, apparently at random. I've tried the same thing on a fresh instance of ami-2d4aa444 and it does roughly the same thing so it doesn't seem to be anything I've done ontop of the base image. When I start my services it shoots up to the ~2.0 levels, even though they don't do much work, although they do take up a fair bit of memory. I've tried swapping to a new instance but it's the same. The main applications run on this server are Apache, MySQL and a bunch of separate Tomcat (Java) instances. I have a number of EBS volumes mounted, a combination of ext3 and XFS. Here's a [ top -bn1 | head -20 ] that's taken at random. 'java' and 'mysql' come in and out of the top of the list but never stay for very long. top - 20:55:35 up 6:47, 3 users, load average: 2.33, 2.35, 2.31 Tasks: 137 total, 1 running, 134 sleeping, 2 stopped, 0 zombie Cpu(s): 5.1%us, 0.5%sy, 0.2%ni, 93.5%id, 0.3%wa, 0.0%hi, 0.0%si, 0.5%st Mem: 1781976k total, 1684628k used, 97348k free, 29108k buffers Swap: 917496k total, 26628k used, 890868k free, 660448k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 2804 1476 1204 S 0 0.1 0:00.13 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.01 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root 20 0 0 0 0 S 0 0.0 0:00.01 events/0 7 root 20 0 0 0 0 S 0 0.0 0:00.00 cpuset 8 root 20 0 0 0 0 S 0 0.0 0:00.00 khelper 9 root 20 0 0 0 0 S 0 0.0 0:00.00 netns 10 root 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr 11 root 20 0 0 0 0 S 0 0.0 0:00.00 xenwatch 12 root 20 0 0 0 0 S 0 0.0 0:00.00 xenbus 14 root RT 0 0 0 0 S 0 0.0 0:00.03 migration/1 ... looks like a system doing not much, except for those numbers at the top right. Are these new kernels doing something different to calculate those averages now? The main thing I'd like to know is: are these numbers a true reflection of the load on my server or are they skewed or scaled somehow? I've got used to measuring the utilisation of my servers in the lower numbers, but now I have these large numbers I'm not sure what to make of it. The graphs of my load throughout the day look completely different to what they used to but the workload hasn't changed at all. --------------------------- Having done a bit more playing my current suspicion is that this is related to the amount of memory being used by running applications. If I install mysql on a base system then the load averages go up and it's using ~140m, apparently the same thing happens if you install postgresql. I've tested on c1.medium and m1.small, the other reporting user is having the same issues on a 64-bit machine (ami-4b4ba522). See posts at Google groups for more information and data. SRU Justification: Impact: Fixes loadavg reporting on EC2. Fix: This reverts commit 0d843425672f4d2dc99b9004409aae503ef4d39f which fixes a bug in load accounting when a tickless (no idle HZ) kernel is used. However the Xen patchset used on EC2 is not tickless but the accounting modifications are still being done, resulting in phantom load. Testcase: Start any Ubuntu Lucid based instance on EC2, let it idle while logging the load average. while true ; do cat /proc/loadavg >>load.log ; sleep 5 ; done Alternately simply run top or htop and monitor the load average. Without the revert the reported load will vary from 0 up to about .5 for a clean image with no extra tasks launched. With the revert the load stays steady around 0 with only occasional small bump when a background task is run. ami-2d4aa444 Description: Ubuntu 10.04 LTS Linux domU-XX-XX-XX-XX-XX-XX 2.6.32-305-ec2 #9-Ubuntu SMP Thu Apr 15 04:14:01 UTC 2010 i686 GNU/Linux Description copied (and edited) from post at http://groups.google.com/group/ec2ubuntu/browse_thread/thread/4be26e81b7c597bc Posted as a bug here as I'm not the only one experiencing these issues, see very similar post at http://groups.google.com/group/ec2ubuntu/browse_thread/thread/a7e9bc45cf923f8c ---------------------------------- I've been running a customised version of an Intrepid image by Eric Hammond for a long while now and decided it was time to upgrade so I've configured a fresh image based on the official Lucid 32-bit in us-east (ami-2d4aa444). And I'm having some strange issues. I run on a c1.medium instance and normally expect a load average of between 0.2 and 0.6, roughly averaged throughout the day, with spikes usually no more than about 2.0. So it's fairly relaxed. When all my services are shut down the load averages go down to ~0.0. Now I'm on to Lucid I'm getting load averages that are roughly 10 times higher than I expect it to be, hovering between around 1.8 and 2.5 and I can see no reason why it should be reported this high. There are no processes hogging CPU, just occasionally coming in and out, watching 'top' doesn't reveal anything obvious and it just looks like a major disconnect between the activity and the load averages. I can't catch any processes running uninterruptable [ ps auxw | awk '{if ($8 == "D") print }' ]. If I run my custom image without any of my services running, load averages hover between approximately 0.1 and 0.6, nothing like the ~0.0 I used to get with nothing happening; I can't see any reason for it moving but it goes up and down, apparently at random. I've tried the same thing on a fresh instance of ami-2d4aa444 and it does roughly the same thing so it doesn't seem to be anything I've done ontop of the base image. When I start my services it shoots up to the ~2.0 levels, even though they don't do much work, although they do take up a fair bit of memory. I've tried swapping to a new instance but it's the same. The main applications run on this server are Apache, MySQL and a bunch of separate Tomcat (Java) instances. I have a number of EBS volumes mounted, a combination of ext3 and XFS. Here's a [ top -bn1 | head -20 ] that's taken at random. 'java' and 'mysql' come in and out of the top of the list but never stay for very long. top - 20:55:35 up 6:47, 3 users, load average: 2.33, 2.35, 2.31 Tasks: 137 total, 1 running, 134 sleeping, 2 stopped, 0 zombie Cpu(s): 5.1%us, 0.5%sy, 0.2%ni, 93.5%id, 0.3%wa, 0.0%hi, 0.0%si, 0.5%st Mem: 1781976k total, 1684628k used, 97348k free, 29108k buffers Swap: 917496k total, 26628k used, 890868k free, 660448k cached   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND     1 root 20 0 2804 1476 1204 S 0 0.1 0:00.13 init     2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd     3 root RT 0 0 0 0 S 0 0.0 0:00.01 migration/0     4 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0     5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0     6 root 20 0 0 0 0 S 0 0.0 0:00.01 events/0     7 root 20 0 0 0 0 S 0 0.0 0:00.00 cpuset     8 root 20 0 0 0 0 S 0 0.0 0:00.00 khelper     9 root 20 0 0 0 0 S 0 0.0 0:00.00 netns    10 root 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr    11 root 20 0 0 0 0 S 0 0.0 0:00.00 xenwatch    12 root 20 0 0 0 0 S 0 0.0 0:00.00 xenbus    14 root RT 0 0 0 0 S 0 0.0 0:00.03 migration/1 ... looks like a system doing not much, except for those numbers at the top right. Are these new kernels doing something different to calculate those averages now? The main thing I'd like to know is: are these numbers a true reflection of the load on my server or are they skewed or scaled somehow? I've got used to measuring the utilisation of my servers in the lower numbers, but now I have these large numbers I'm not sure what to make of it. The graphs of my load throughout the day look completely different to what they used to but the workload hasn't changed at all. --------------------------- Having done a bit more playing my current suspicion is that this is related to the amount of memory being used by running applications. If I install mysql on a base system then the load averages go up and it's using ~140m, apparently the same thing happens if you install postgresql. I've tested on c1.medium and m1.small, the other reporting user is having the same issues on a 64-bit machine (ami-4b4ba522). See posts at Google groups for more information and data.
2010-08-23 16:11:00 Mark Goris bug added subscriber gorism
2010-09-02 01:05:01 Rod attachment added 20100827loadspike.png https://bugs.launchpad.net/ubuntu-on-ec2/+bug/574910/+attachment/1536390/+files/20100827loadspike.png
2010-09-02 12:25:22 Jan R bug added subscriber Jan R
2010-09-04 11:56:59 asasoft bug added subscriber asasoft
2010-09-16 21:52:45 Ivan Kanevski bug added subscriber Ivan Kanevski
2010-09-17 21:22:04 Mark Aiken bug added subscriber Mark Aiken
2010-09-18 11:29:49 zlj removed subscriber Ilya Zakreuski
2010-09-19 11:19:02 Maciej Pasternacki bug added subscriber Maciej Pasternacki
2010-09-20 15:31:06 Scott Moser linux-meta (Ubuntu): status New Invalid
2010-09-20 15:54:41 Joseph Salisbury bug added subscriber Joseph Salisbury
2010-10-05 14:31:47 Launchpad Janitor branch linked lp:ubuntu/lucid-proposed/linux-ec2
2010-10-12 09:17:09 Alvin bug added subscriber LapieTopie
2010-10-14 15:48:44 Mike Conigliaro bug added subscriber Michael Conigliaro
2010-10-16 09:01:22 Kostas Chatzikokolakis bug added subscriber Kostas Chatzikokolakis
2010-10-19 11:23:48 Murali Krishnan bug added subscriber Murali Krishnan
2010-10-28 09:38:07 Mikael Gueck bug added subscriber Mikael Gueck
2010-11-03 14:54:18 Scott Moser linux-ec2 (Ubuntu): status In Progress Fix Released
2010-12-01 14:06:16 Till Klampaeckel bug added subscriber Till Klampaeckel
2011-04-10 17:39:30 DLHDavidLH ubuntu-on-ec2: status Invalid New
2011-04-28 00:21:24 Martin removed subscriber Martin
2011-04-28 01:02:40 Alex Howells removed subscriber Alex Howells
2011-04-28 05:37:29 Mikael Gueck removed subscriber Mikael Gueck
2011-04-28 05:50:55 Al Sutton removed subscriber Al Sutton
2011-05-13 21:11:32 Chetan Sarva bug added subscriber Chetan Sarva
2011-06-04 14:54:55 Steffen Rusitschka removed subscriber Steffen Rusitschka
2011-08-24 17:25:46 Scott Moser ubuntu-on-ec2: status New Invalid