Bug #1676565 “apparmor_parser profile replace speed issues in 4....” : Bugs : AppArmor

Revision history for this message

Jason Short (shortj) wrote on 2017-03-27:

#1

apparmor_parser strace Edit (163.6 KiB, text/plain)

Tyler Hicks (tyhicks) on 2017-03-27

tags:

added: aa-kernel

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2017-03-27:

#2

Curious if https://launchpad.net/ubuntu/+source/linux/4.4.0-67.88 or https://launchpad.net/ubuntu/+source/linux/4.4.0-69.90 resolves the issue for you? IIRC, I was seeing some slowness that looked like hangs in the 4.10 kernel until the patches in 67 and 69 were in there. Note: https://launchpad.net/ubuntu/+source/linux/4.4.0-70.91 reverted these patches for unrelated reasons.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2017-03-28:

#3

This may be a duplicate of https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1645037. If the 67 kernel fixes this for you, I think we can mark this a dupe of 1645037.

Revision history for this message

John Johansen (jjohansen) wrote on 2017-04-03:

#4

I would guess the 67 kernel should fix this for them. Those patches are being replied but I am currently unsure if they will land in the 71 kernel, as they have not been committed to it yet.

Revision history for this message

Jason Short (shortj) wrote on 2017-04-18:

#5

We've been testing on 67 as well as 70, 71, 72, and have only seen one instance of this issue on 67 so far. The last kernel to work without issue was 4.4.0-57.

I apologize for the lack of actual evidence here, but there's really not much going on when the issue occurs - a single apparmor_parser thread might be running, or our code might have stacked up 10 or more of them trying to reload profiles. There's not even a consistent failure mode for aa-status - it will sometimes continue to return, will sometimes pause for a seconds to minutes before returning, and sometimes immediately enter a disk-sleep state and never respond.

Any suggestions for further data collection?

Revision history for this message

Jason Short (shortj) wrote on 2017-04-25:

#6

Update: 4.4.0-57 is also affected.

I inspected a latent profile replace operation on a 4.4.0-72 host and watched it build a 21 megabyte /etc/apparmor.d/cache/ file over a period of 24 minutes. While I'm prepared to admit that our profiles are a bit bulky, that's still ludicrously slow.

Revision history for this message

Jason Short (shortj) wrote on 2017-05-17:

#7

We've now seen this issue affect both 4.8 and 4.10 series as well.

My working theory at the moment is that changes to the kernel code from 3.13 -> 4.x introduced a memory leak that's eventually overflowing.

Based on the failures we're seeing in the field, there seems to be no reliable predictor of when sysfs goes unresponsive, but the rate of occurrence is higher on machines with higher numbers of apache2 hats.

I set up a test machine with 15 hats that compiled to an ~820K cache file:

File: '/etc/apparmor.d/cache/usr.sbin.apache2'
Size: 819297 Blocks: 1608 IO Block: 4096 regular file

and ran apparmor_parser -rB in a loop. so far, after a dozen passes at this, the magic number at which the kernel becomes unstable is around 3 gigabytes.

Running the same test against 3.13 now, will report results

Revision history for this message

Jason Short (shortj) wrote on 2017-05-17:

#8

As of this writing, I've been able to load a 341Kb cache file into a 3.13.0-116 kernel over 80,000 times, for a grand total of 27 gigabytes without issue.

FWIW, Both the 4.8.0-52 and 3.13.0-116 boxes are 4 core 4 gig KVM virtual private servers.

Revision history for this message

John Johansen (jjohansen) wrote on 2017-05-18:

#9

That is worse than ludicrously slow. Something is definitely broken

Revision history for this message

John Johansen (jjohansen) wrote on 2017-05-18:

#10

@Jason,

can you tar up your profiles directory and send it to me, so I can test with your exact profile set?

Revision history for this message

John Johansen (jjohansen) wrote on 2017-05-18:

#11

attaching to this bug would work as well, if you are comfortable doing that

Revision history for this message

Jason Short (shortj) wrote on 2017-05-18:

#12

apparmor configs Edit (35.7 KiB, application/x-tar)

Download full text (11.8 KiB)

I'm attaching a tarball of an example apparmor directory, sanitized a bit for environment-specific things.

since the last update, I've had time to test on additional server configurations, and it does seem that the cache size required to overflow the kernel is a function of total memory - the cache file generated by this usr.sbin.apache2 profile will crash a 4 gigabyte box after ~4000 loops of `apparmor_parser -rB`, but not a larger machine.

Unfortunately, that also means I don't yet have reliable steps to reproduce the slowdown condition, as the out of memory condition happened first on the smaller test environment. I am continuing to test on an 8-core 30-gigabyte machine with additional hats.

Here's a diff of meminfo after several hundred passes vs several thousand, you can see the kernel eating memory for slabs, but it seems like we outpace slab reclamation with new allocation:

MemTotal: 4045864 kB MemTotal: 4045864 kB 3683 mysql 20 0 1281M 171M 20008 S 0.0 4.3 0:06.51 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
MemFree: 2104988 kB | MemFree: 407040 kB 3694 mysql 20 0 1281M 171M 20008 S 0.0 4.3 0:00.09 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
MemAvailable: 2963900 kB | MemAvailable: 1270908 kB 3792 mysql 20 0 1281M 171M 20008 S 0.0 4.3 0:00.44 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Buffers: 118524 kB | Buffers: 118848 kB 3795 mysql 20 0 1281M 171M 20008 S 0.0 4.3 0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Cached: 915136 kB | Cached: 924612 kB 3796 mysql 20 0 1281M 171M 20008 S 0.0 4.3 0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
SwapCached: 0 kB SwapCached: 0 kB 3797 mysql 20 0 1281M 171M 20008 S 0.0 4.3 0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Active: 815716 kB | Active: 816332 kB 3798 mysql 20 0 1281M 171M 20008 S 0.0 4.3 0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Inactive: 593944 kB | Inactive: 603488 kB ...

I'm attaching a tarball of an example apparmor directory, sanitized a bit for environment-specific things.

since the last update, I've had time to test on additional server configurations, and it does seem that the cache size required to overflow the kernel is a function of total memory - the cache file generated by this usr.sbin.apache2 profile will crash a 4 gigabyte box after ~4000 loops of `apparmor_parser -rB`, but not a larger machine.

Unfortunately, that also means I don't yet have reliable steps to reproduce the slowdown condition, as the out of memory condition happened first on the smaller test environment.  I am continuing to test on an 8-core 30-gigabyte machine with additional hats.

Here's a diff of meminfo after several hundred passes vs several thousand, you can see the kernel eating memory for slabs, but it seems like we outpace slab reclamation with new allocation:

MemTotal:        4045864 kB                                     MemTotal:        4045864 kB                                                                  3683 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:06.51 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
MemFree:         2104988 kB                                   | MemFree:          407040 kB                                                                  3694 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.09 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
MemAvailable:    2963900 kB                                   | MemAvailable:    1270908 kB                                                                  3792 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.44 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Buffers:          118524 kB                                   | Buffers:          118848 kB                                                                  3795 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Cached:           915136 kB                                   | Cached:           924612 kB                                                                  3796 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
SwapCached:            0 kB                                     SwapCached:            0 kB                                                                  3797 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Active:           815716 kB                                   | Active:           816332 kB                                                                  3798 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Inactive:         593944 kB                                   | Inactive:         603488 kB                                                                  3799 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.01 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Active(anon):     379872 kB                                   | Active(anon):     380040 kB                                                                  3800 mysql      20   0 1281M  171M 20008 S  0.0  4.3  0:00.04 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
Inactive(anon):    44580 kB                                   | Inactive(anon):    49832 kB                                                                 F1Help  F2Setup F3SearchF4FilterF5Tree  F6SortByF7Nice -F8Nice +F9Kill  F10Quit
Active(file):     435844 kB                                   | Active(file):     436292 kB                                                                    1 bash
Inactive(file):   549364 kB                                   | Inactive(file):   553656 kB                                                                  Active / Total Objects (% used)    : 850270 / 888877 (95.7%)
Unevictable:        3656 kB                                     Unevictable:        3656 kB                                                                  Active / Total Slabs (% used)      : 31518 / 31518 (100.0%)
Mlocked:            3656 kB                                     Mlocked:            3656 kB                                                                  Active / Total Caches (% used)     : 79 / 121 (65.3%)
SwapTotal:       2097148 kB                                     SwapTotal:       2097148 kB                                                                  Active / Total Size (% used)       : 349403.16K / 373125.54K (93.6%)
SwapFree:        2097148 kB                                     SwapFree:        2097148 kB                                                                  Minimum / Average / Maximum Object : 0.01K / 0.42K / 8.00K
Dirty:               412 kB                                   | Dirty:               224 kB
Writeback:             0 kB                                     Writeback:             0 kB                                                                   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
AnonPages:        357152 kB                                   | AnonPages:        357448 kB                                                                 137970 127417  92%    0.19K   6570       21     26280K dentry
Mapped:           226764 kB                                   | Mapped:           227348 kB                                                                 136071 136071 100%    0.10K   3489       39     13956K buffer_head
Shmem:             46040 kB                                   | Shmem:             51100 kB                                                                  76534  76534 100%    0.12K   2251       34      9004K kernfs_node_cache
Slab:             234096 kB                                   | Slab:             376100 kB                                                                  67770  50089  73%    1.05K   2259       30     72288K ext4_inode_cache
SReclaimable:     146524 kB                                   | SReclaimable:     146960 kB                                                                  53760  53380  99%    0.06K    840       64      3360K anon_vma_chain
SUnreclaim:        87572 kB                                   | SUnreclaim:       229140 kB                                                                  50898  50655  99%    0.04K    499      102      1996K ext4_extent_status
KernelStack:       11056 kB                                     KernelStack:       11056 kB                                                                  39232  39006  99%    1.00K   2452       16     39232K kmalloc-1024
PageTables:        11740 kB                                   | PageTables:        11672 kB                                                                  32960  28248  85%    0.06K    515       64      2060K kmalloc-64
NFS_Unstable:          0 kB                                     NFS_Unstable:          0 kB                                                                  31872  31872 100%    0.03K    249      128       996K kmalloc-32
Bounce:                0 kB                                     Bounce:                0 kB                                                                  31314  31314 100%    0.08K    614       51      2456K anon_vma
WritebackTmp:          0 kB                                     WritebackTmp:          0 kB                                                                  27500  27170  98%    0.20K   1375       20      5500K vm_area_struct
CommitLimit:     4120080 kB                                     CommitLimit:     4120080 kB                                                                  22015  22015 100%    0.05K    259       85      1036K ftrace_event_field
Committed_AS:    2326844 kB                                   | Committed_AS:    2336444 kB                                                                  19992  19472  97%    0.09K    476       42      1904K kmalloc-96
VmallocTotal:   34359738367 kB                                  VmallocTotal:   34359738367 kB                                                               19521  19302  98%    0.58K    723       27     11568K inode_cache
VmallocUsed:           0 kB                                     VmallocUsed:           0 kB                                                                  15400  14241  92%    0.57K    550       28      8800K radix_tree_node
VmallocChunk:          0 kB                                     VmallocChunk:          0 kB                                                                  14700  13977  95%    0.63K    588       25      9408K proc_inode_cache
HardwareCorrupted:     0 kB                                     HardwareCorrupted:     0 kB                                                                  14400  14348  99%    0.50K    900       16      7200K kmalloc-512
AnonHugePages:    221184 kB                                     AnonHugePages:    221184 kB                                                                  12648  12595  99%    8.00K   3162        4    101184K kmalloc-8192
ShmemHugePages:        0 kB                                     ShmemHugePages:        0 kB                                                                  12117  12057  99%    0.19K    577       21      2308K cred_jar
ShmemPmdMapped:        0 kB                                     ShmemPmdMapped:        0 kB                                                                   7344   6360  86%    0.25K    459       16      1836K kmalloc-256
CmaTotal:              0 kB                                     CmaTotal:              0 kB                                                                   5376   5376 100%    0.02K     21      256        84K kmalloc-16
CmaFree:               0 kB                                     CmaFree:               0 kB                                                                   5313   5313 100%    0.19K    253       21      1012K kmalloc-192
HugePages_Total:       0                                        HugePages_Total:       0                                                                      4608   4608 100%    0.01K      9      512        36K kmalloc-8
HugePages_Free:        0                                        HugePages_Free:        0                                                                      4456   4433  99%    4.00K    557        8     17824K kmalloc-4096
HugePages_Rsvd:        0                                        HugePages_Rsvd:        0                                                                      4448   4386  98%    0.12K    139       32       556K pid
HugePages_Surp:        0                                        HugePages_Surp:        0                                                                      3300   3261  98%    0.62K    132       25      2112K sock_inode_cache
Hugepagesize:       2048 kB                                     Hugepagesize:       2048 kB                                                                   3216   3201  99%    2.00K    201       16      6432K kmalloc-2048
DirectMap4k:       67572 kB                                     DirectMap4k:       67572 kB                                                                   3150   3004  95%    1.06K    105       30      3360K signal_cache
DirectMap2M:     4126720 kB                                     DirectMap2M:     4126720 kB                                                                   2912   2808  96%    0.12K     91       32       364K kmalloc-128
DirectMap1G:     2097152 kB                                     DirectMap1G:     2097152 kB

Revision history for this message

John Johansen (jjohansen) wrote on 2017-05-18:

#13

thanks,

there is definitely a memleak refcount issue going on. The rawdata info is not getting freed which exacerbates the problem, and I think why you see this in 4.4. A single profile leak can result in a much larger data set being leaked.

I am chasing it, hopefully I will have a test kernel soon.

Revision history for this message

Jason Short (shortj) wrote on 2017-05-19:

#14

FWIW, the out of memory condition was actually tested on a 4.8.0-52 kernel from linux-virtual-hwe-16.04

Revision history for this message

John Johansen (jjohansen) wrote on 2017-05-19:

#15

ooops, yeah sorry. It doesn't make much of a difference here as the same version of apparmor was SRUed back to 4.4, but I should still get the versioning right in the bug.

Revision history for this message

Jason Short (shortj) wrote on 2018-02-23:

#16

any update on this? i can still crash any 4.x kernel by reloading app armor for half an hour. 3.13 remains unaffected.

Revision history for this message

John Johansen (jjohansen) wrote on 2018-02-24:

#17

Sadly some leaks remains in all 4.x kernels. I have fixed several reference count bugs and we still have a leak happening. I am actively working on this again so hopefully we can finially get this closed.

Revision history for this message

John Johansen (jjohansen) wrote on 2018-06-29:

#18

There have been several more bug fixes, since I commented on this last. I am going to work on updating the 4.4 kernel, and build a test kernel.

Revision history for this message

John Johansen (jjohansen) wrote on 2018-07-06:

#19

Note: this is not related to Bug 1750594 which was due to the profile dedup features that were added in the 4.13 kernels.

AppArmor

apparmor_parser profile replace speed issues in 4.4

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches