RPM

rpm: selinux context initialization memory leak

Bug #651428 reported by Jeff Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RPM
Confirmed
High
Jeff Johnson
Fedora
Fix Released
Medium

Bug Description

tracker

Tags: selinux
Revision history for this message
In , daryl (daryl-redhat-bugs) wrote :

Description of problem:
Running rhn_check with a number of scheduled actions applicable on RHN will cause rhn_check to loop over each scheduled item. Each iteration leaks memory and it takes very little time to reach the OOM killer.

Version-Release number of selected component (if applicable):
rpm-check 0.4.20-9.el5

How reproducible:
everything

Steps to Reproduce:
1. schedule some actions on RHN
2. run rhn_check
3. watch memory grow

Additional info:

Didn't see a rhn-check component on bugzilla.

thanks

Revision history for this message
In , daryl (daryl-redhat-bugs) wrote :

RHEL5.5 beta fails in the same manner.

Processing 44 of 71 scheduled errata was enough to hit OOM on 650M/1.2G swap memory machine.

I also created SR# 2000154 on this

thank you.

Revision history for this message
In , daryl (daryl-redhat-bugs) wrote :

Hi,

This leak only appears to happen when you have scheduled errata for application. Just scheduling packages for install will not provoke the leak.

thank you!

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

The memory consumption grows in /usr/share/rhn/actions/packages.py,
in _run_yum_action() routine, when calling yum_base.buildTransaction()
and yum_base.doTransaction()

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

This problem is also present in RHEL 5.3 and was not introduced
in RHEL 5.4 as some of the above comments (or currently attached
ITs) suggest.

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

This essentially is the same problem as the one described in bug #470838

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

Greetings James, I'd very much appreciate your advice or hint on this
bug report.

Here's a link to yum-rhn-plugin code, that's used by rhn_check
when applying errata to a system:

http://git.fedorahosted.org/git/?p=spacewalk.git;a=blob_plain;f=client/rhel/yum-rhn-plugin/actions/packages.py;hb=HEAD

For every scheduled errata, YumAction.doTransaction() is called at some point,
which calls YumBases's runTransaction() at the very end.

The memory consumption grows significantly, when self.ts.run() is called
inside runTransaction().

The reason I'm asking you for advice is that I'm not sure whether we're
looking at some rpm-python bug or whether the way we're using yum libraries
is plain broken.

Thank you.

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

(In reply to comment #11)
> I think a lot of the RHN code that uses yum APIs is "non-optimal" at
> least, but then it's pretty old.
>
> So I'm not sure which bits you want me to look at in particular.
>
> I don't understand the old code in comment #4, p[0] should traceback with
> KeyError ... no? Looking at getInstalledPackageList closer, this is
> duplicating a bunch of objects in rpmdb, although it is throwing
> the headers away.

Comment #4 is a bit misleading. It shows some changes made in
rhn-client-tools code between RHEL-5.4 and RHEL-5.5, though
I don't believe those changes cause the discussed problem (the
big memory consumption was present also before RHEL-5.4.

> The doTransaction() in that file doesn't look like it is doing much that
> the > yum side wouldn't do. In general I'd expect memory usage to grow in
> runTransaction() because the depsolver runs then, and (although I'm not sure)
> you might be hitting a bunch of caching stuff in yum that doesn't get hit
> before that in your call paths. It's really hard to say if this is "bad"
> or not.
>
> Just looking in that file:
>
> getInstalledPkgObject is slow, I guess you should be calling
> rpmdb.searchNevra(). Certainly never parsePackages.
>
> I'm unsure how runTransaction() can work, it's altering tuples ...
> which should give:
>
> TypeError: 'tuple' object does not support item assignment
>
> ...and add_transaction_data() doesn't do any checking. But neither
> of the last two should cause memory leaks.
>
> What do you do after the transaction runs ... do you del the YumBase
> object (does it all go away, if you do)? We've had a couple of circular
> reference bugs in YumBase, over time.

There's only one yum_base object (instance of YumAction(YumBase) class)
defined at the packages.py module level, no deleting.

Nonetheless, the memory leak (or memory consumption) problem can
be reproduced without involving any RHN code whatsoever.

Install RHEL-5.5 (latest - greatest), setup a yum repo (for example
EPEL-5, no registration to RHN is required) and start yum shell.

In yum shell, install couple of packages, single transaction for every
package:

> install package1
> ts run
...
> install package2
> ts run
...
> etc ...

Never leave yum shell!

Watch the memory of yum process growing every time you execute
the transaction. Sooner or later (depending on how much memory your
system has), ook-killer zooms in and kills your yum.

Revision history for this message
In , James (james-redhat-bugs) wrote :

Ahh, cool, thanks ... I should be able to fix that, although $DIETY knows when it'll get into RHEL :).

I'll reassign to me for now.

Revision history for this message
In , James (james-redhat-bugs) wrote :

 This is interesting, if I do a loop of "remove blah; install blah;" then on RHEL-5 I lose about 13 MB for each op. (26MB for each pass of the loop).
 On F-13 I lose maybe a couple of 100k.

 Cc'ing David Malcolm.

 David I remember you saying something about a leak you'd found out about at pycon ... could this be it?

 FYI to the RHN guys, RHEL-5 doesn't leak if I do the "normal" YumBase() create/del test ... how hard would it be to create a new YumBase() for each install set?

Revision history for this message
In , James (james-redhat-bugs) wrote :

The python 2.4 bug is:

https://bugzilla.redhat.com/show_bug.cgi?id=569093

...and I'd hope that wouldn't be what is hitting us here, but I can't be sure (David ... I don't suppose you have a test python I can use?).

Revision history for this message
In , Dave (dave-redhat-bugs) wrote :

(In reply to comment #15)
> The python 2.4 bug is:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=569093
>
> ...and I'd hope that wouldn't be what is hitting us here, but I can't be sure
> (David ... I don't suppose you have a test python I can use?).
See https://bugzilla.redhat.com/show_bug.cgi?id=569093#c4

Revision history for this message
In , James (james-redhat-bugs) wrote :

I already tried that ... but it seems to have timed out or something. At least I can't see any rpms to download from the build. I was hoping you might have saved them somewhere.

Revision history for this message
In , James (james-redhat-bugs) wrote :

Ok, just checked a rebuild and that didn't fix it.

Revision history for this message
In , blackforest (blackforest-redhat-bugs) wrote :

I keep hitting this memory leakage myself. Anyone know a work around?

Revision history for this message
In , blackforest (blackforest-redhat-bugs) wrote :

Even after I update the rhn_check to: rhn-check-0.4.20-33.el5_5.1
The issue of memory grow and eventual crash resurfaces.

Revision history for this message
In , James (james-redhat-bugs) wrote :

Ok, so after many hours of debugging the problem appears to be this line in runTransaction():

        errors = self.ts.run(cb.callback, '')

...my understanding is that this is all rpm. And this happens even if I start a new YumBase() for each transaction.
 So Panu, and known leaks in ts.run?

Revision history for this message
In , Panu (panu-redhat-bugs) wrote :

I don't recall any known memory leaks in rpmtsRun() of 4.4.x, but that doesn't mean there aren't any... however such leaks would've been there forever. Any idea when did this problem start occurring? Comment #8 says it was present in RHEL 5.3 already, what about older?

What I do remember though is a severe memory fragmentation issue when calling ts.run() several times (especially bad from python, for whatever reason), see bug 472507: the first ts.run() call runs in "reasonable" memory, the second one already blows through the roof in some circumstances and more ts.run() calls you do, the worse it probably gets. The fragmentation issue was addressed in RHEL 5.4 by using a more reasonable reallocation scheme for the problematic case but addressed != entirely fixed.

If somebody can reproduce this with valgrind (run those single item transactions until memory starts ballooning, exit before it gets killed by OOM), that'd make it easier to see if its actually leaking or if its something else.

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

Created attachment 409781
valgrind --tool=memcheck yum shell

Revision history for this message
In , Panu (panu-redhat-bugs) wrote :

Thanks, Milan. Does the problem go away if you boot with SELinux fully disabled, ie append 'selinux=0' to kernel command line in grub? (note that this will mess up SELinux context labeling, dont try on production boxes)

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

Interestingly, the problem does go away with selinux fully disabled.

The memory grows a little during the transaction execution, though
drops back when it finishes (which it did not with selinux on).

You can run more transactions from inside yum shell, the memory
always drops back to the state before.

Revision history for this message
In , Panu (panu-redhat-bugs) wrote :

Good, thanks for confirming. Easy fix then.

This selinux context initialization leak is about as old as SELinux "support" in rpm: it calls matchpathcon_init() at beginning of every transaction but never calls matchpathcon_fini() which would free up the memory. In normal rpm/yum usage patterns this doesn't make much of a difference but with a big number of transactions within a process lifetime it starts adding up.

(aside: it's also a somewhat dumb behavior from libselinux - matchpathcon_init() doesn't return a handle for the caller to free but takes care of bookkeeping by internally, so it could just as well handle repeated matchpathcon_init() calls intelligently but doesn't)

Revision history for this message
In , daryl (daryl-redhat-bugs) wrote :

Hello Red Hat,

Please consider this for an async errata release. Waiting till RHEL 5.6 will only break more machines as the fix won't be in place in time for when the bug occurs. Yes, I asked on my GSS Support Ticket as well.

Regardless, thanks for fixing this issue.

Revision history for this message
In , blackforest (blackforest-redhat-bugs) wrote :

Disabling SELinux is not a fix. It's a work around. We need an official fix for this bug.

Revision history for this message
In , Panu (panu-redhat-bugs) wrote :

I didn't suggest disabling SELinux as a fix or a workaroud but to confirm the leak was indeed related to SELinux handling within rpm.

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

*** Bug 470838 has been marked as a duplicate of this bug. ***

Revision history for this message
In , daryl (daryl-redhat-bugs) wrote :

RedHat, any comments on getting this out for async errata? Again, waiting till RHEL 5.6 defeats the purpose of fixing this bug.

daryl

Revision history for this message
In , blackforest (blackforest-redhat-bugs) wrote :

Is there any progress in addressing this bug? It is creating skepticism at my shop towards Red Hat, as upper management comments on how derided MS is when they are slow in releasing bug fixes...and now Red Hat is following suite...?
I still got "Faith of Heart."
Red Hat..."Don't Let me Down!"

Jeff Johnson (n3npq)
tags: added: selinux
Revision history for this message
Jeff Johnson (n3npq) wrote :

There's a segfault on CentOS 5 on exit with selinux enabled
from a double free.

Changed in rpm:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Jeff Johnson (n3npq)
milestone: none → 5.3.6
Revision history for this message
In , errata-xmlrpc (errata-xmlrpc-redhat-bugs) wrote :

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0124.html

Revision history for this message
In , Milan (milan-redhat-bugs) wrote :

*** Bug 651501 has been marked as a duplicate of this bug. ***

Changed in fedora:
importance: Unknown → Medium
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.