AppArmor policy compile improvements

Bug #1350598 reported by Alan Pope 🍺🐧🐱 🦄
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
AppArmor
Triaged
Low
Unassigned
Canonical System Image
Confirmed
Low
Unassigned
apparmor (Ubuntu)
Triaged
High
Unassigned
click-apparmor (Ubuntu)
Triaged
Low
Unassigned

Bug Description

apparmor_parser can take a long time to compile policy especially when there is a lot of policy, so we want to utilize compiled cache profile as much as possible. Cache files will have to be regenerated in the following cases:
 * the kernel .features file is updated (eg, new features are added to
   apparmor in the new kernel)
 * apparmor itself is updated
 * on devices with click packages, apparmor-easyprof-ubuntu and/or
   click-apparmor is updated

As of 2014-10-02, what can be expected is:

- Systems with system-image updates (eg, Ubuntu Touch):
  - First boot will use the precompiled cache files in the rootfs or custom
    tarball and be fast
  - Reboots will use the cache files on the device and be fast
  - First boots after upgrades will use the cache files on the device if the
    above conditions are not met and be fast
  - Production devices will not meet any of those conditions except under
    exceptional and rare circumstances (eg, major OS upgrades like 14.10 to
    15.04) and be fast
  - First boots after upgrades that meet one of the conditions will need to
    regenerate the cache. This can happen on development releases where the
    kernel features file, apparmor, apparmor-easyprof-ubuntu or
    click-apparmor are still under development and getting updates
- Systems with apt updates (eg, current Ubuntu Desktop and Server):
  - First boot will compile cache files
  - Reboots will use the cache files on the machine and be fast
  - First boots after upgrades will use the cache files on the machine if the
    above conditions are not met and be fast
  - Stable releases of Ubuntu will not meet any of those conditions except
    under exceptional and rare circumstances (eg, major OS upgrades like
    14.10 to 15.04) and be fast
  - First boots after upgrades that meet one of the above conditions will
    need to regenerate the cache. This can happen on development releases
    where the kernel features file, apparmor, apparmor-easyprof-ubuntu or
    click-apparmor are still under development and getting updates

In addition to the above, updates to only apparmor-easyprof-ubuntu will regenerate the cache files for only the policy that is affected (eg, if there is a change to the location policy group in policy version 1.2, only apps using this policy version and this policy group will need to be recompiled).

Planned improvements (in order of most likely to be done first):
1. Finetuning the checks to invalidate the cache (eg, .md5sums could only
   be for /etc/apparmor.d/abstractions, ...): WONTFIX (will want an
   md5sum on apparmor_parser since it could change the cache and the md5sum
   will always change. Furthermore, apparmor-easyprof-ubuntu is all policy
   so there is no gain there. click-apparmor could possibly benefit, but
   it doesn't change often and when it does, it is typically for policy)
2. Investigate ways to utilize the custom tarball and rootfs precompiled
   cache files on upgrades when apparmor, apparmor-easyprof-ubuntu and
   click-apparmor are updated: DONE
3. Improve cache handling for app store apps (eg, having the app store
   server precompile them so that the device can download them when it
   needs to rather than having to regenerate them itself): WONTFIX
   (doesn't scale)
4. For systems with apt upgrades, compile the policy either during
   install or on kernel upgrade rather than on boot. For systems with
   read-only fs-style upgrades, compile the policy prior to reboot
   rather than on boot.
5. Support cache files per kernel .features file (or kernel version).
   This will allow people to boot into a previous kernel without having
   to recompile policy
6. Improve parser compile time
7. Investigate how to utilize profile composition and profile stacking to
   decrease compile and load times (eg, one idea is that the policy template
   is compiled once and each policy group once such that the parser need
   only stitch the already compiled bits together)
8. For Ubuntu Touch/Personal system-image based systems, investigate ways
   to utilize the update tarball and compile policy before rebooting (bug
   1385410)

Work is planned for the medium term for '1-3' with '4' and '5' coming as needed. '6' will of course help, but it is already very optimized and compile average ~2 seconds on armhf per profile (note: already faster than Android's 'optimizing apps' per app on a nexus 7) -- if we cut that in half a typical user with 300 apps would still have to wait 300 seconds so other techniques like '2' should be employed. '6' and '7' will be handled in the long term.

'8' can be implemented now to improve the user experience:
"
> Sorry for not being clear. The idea is that when the phone says that
> there is an update, the user has to tap 'Install and Reboot'. The idea > is that before reboot (and therefore still in the unity8 session), we
> look inside what is downloaded, see if there are any policy changes. If
> there are, we extract them and then compile policy with a progress
> meter. The question I posed to you is how hard is it to look inside (or
> provide a manifest of changed packages) and extract what is needed to
> compile policy?

Ok. The update is available as a set of tarballs, available in a fixed
directory. It should be straightforward to check whether any of those
tarballs contains files matching a particular path. If you want to know
whether particular packages have changed, that would be a matter of
extracting the dpkg database and comparing. (We don't otherwise track the
packagewise delta between the images.)

A partial extraction of the tarball based on particular filenames is a
simple matter of tar arguments."

Original description:
Just updated my Nexus 7 2013 from #160 to #161. It's been sat at the Google logo for 15 minutes now. It looks and feels like it's hung. As a user I'd be rebooting it thinking it had crashed by now. I shell in and find apparmor_parser using a lot of cpu for a long time.

top - 00:14:01 up 15 min, 2 users, load average: 5.12, 4.85, 3.21
Tasks: 202 total, 2 running, 200 sleeping, 0 stopped, 0 zombie
%Cpu(s): 50.5 us, 0.8 sy, 0.0 ni, 48.5 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 1848024 total, 787400 used, 1060624 free, 54216 buffers
KiB Swap: 32764 total, 0 used, 32764 free. 579228 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 1970 root 20 0 4976 3652 852 R 99.8 0.2 14:31.04 apparmor_parser
 2596 phablet 20 0 5996 1264 824 R 1.3 0.1 0:08.79 top
  914 root 0 -20 7572 552 396 S 0.7 0.0 0:05.02 mpdecision
   21 root 20 0 0 0 0 S 0.3 0.0 0:00.92 kworker/0:1
  229 root 20 0 0 0 0 S 0.3 0.0 0:00.10 jbd2/mmcblk0p30
  982 root 20 0 38856 1164 868 S 0.3 0.1 0:01.77 adbd
 2570 phablet 20 0 10540 1456 692 S 0.3 0.1 0:02.30 sshd
    1 root 20 0 3884 2648 1068 S 0.0 0.1 0:05.98 init
    2 root -2 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/0

... it eventually finished after 18 minutes.

Changed in apparmor (Ubuntu):
status: New → Confirmed
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This is a known issue and most affects users who perform lots of system updates with certain kernel and/or policy changes and is exacerbated by a high number of installed packages. We employ caching in various ways to reduce the time to recompile all policy to only needing to do it for certain first boot situations. Recently, there was an update that required recompiling the policy. The next time you boot, the cache will be used again.

This will not normally affect consumers because neither the kernel nor the policy will change during the system image update process. There are plans to make some sort of progress bar when it does happen, and to improve policy compiles.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Also, bug #1342858 will also aggravate the situation since more policy is in the device than is required.

summary: - apparmor_parser takes a long time
+ apparmor_parser compile times should be improved
description: updated
Changed in apparmor (Ubuntu):
importance: Undecided → High
summary: - apparmor_parser compile times should be improved
+ AppArmor policy compile improvements
Changed in apparmor (Ubuntu):
status: Confirmed → Triaged
tags: added: application-confinement
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
tags: added: aa-feature
Changed in click-apparmor (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

AppArmor upstream improvements for this are diminishing returns, so marking as Low.

Changed in apparmor:
importance: Undecided → Low
status: New → Triaged
tags: added: aa-parser
description: updated
no longer affects: click-apparmor (Ubuntu)
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Adding 'canonical-devices-system-image' for the UX improvement work: "For Ubuntu Touch/Personal system-image based systems, investigate ways to utilize the update tarball and compile policy before rebooting to improve the user experience"

Changed in canonical-devices-system-image:
assignee: nobody → Jamie Strandboge (jdstrand)
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Michi Henning (michihenning) wrote :

It would be ultra-cool to get a solution that has tolerable overhead on first boot. In effect, if we can't touch the apparmor profiles, that means that we can't evolve our code. Not being able to add a new dbus method to a service can be a show stopper for new features, for example.

Revision history for this message
John Johansen (jjohansen) wrote :

Sure we want a good user experience.

We need to land the 2.11 version of apparmor which provides several performance improvements. Its can be up to about 35% faster.

Another potential solution not discussed so far is kicking off a low priority background process. This has its own issues, it would take memory and processing resources from the foreground task potentially causing a laggy user experience while it runs. However with a low enough priority and a proper OOM score (kill it before the foreground task) I think its a potential interim solution.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I mentioned this to Marcus via irc yesterday and I think it is useful context (leaving out parts that John already mentioned):

"07:48 <marcustomlinson> so question: when we have that slow boot, what screen is everyone left on to wait
07:48 <marcustomlinson> blank? spinning ubuntu icom
07:49 <jdstrand> blank
07:49 <jdstrand> it is after the image is flashed and is comin up. there is no feedback
...
07:54 <marcustomlinson> I think we can hold back on Michi's MP as worst case scenario
...
07:55 <jdstrand> basically, the way it is all framed is that these sorts of updates should typically only happen when jumping base releases, since a policy recompile has to happen anyway
07:55 <jdstrand> so, vivid to xenial
07:55 <jdstrand> I can push this change into xenial no problem
07:55 <jdstrand> then whenever that ota comes out, it is there
07:55 <marcustomlinson> ah ok
07:55 <jdstrand> but we try to reduce the pain of interim updates
07:56 <jdstrand> we don't want every ota to have a slow boot
07:56 <marcustomlinson> do you see these updates as that rarely required?
07:56 <jdstrand> but it is possible to batch them or make a decision that this is important enough to make everyone wait
07:56 <jdstrand> yes
07:56 <jdstrand> we've maybe only had to recompile all policy once since vivid base
...
07:57 <jdstrand> we do make other profile changes, but that might be in say, the calendar policy
07:57 <jdstrand> the calendar policy is only used by a few apps-- so no big deal
...
07:57 <jdstrand> but something that affects all scopes is a different thing
07:58 <marcustomlinson> ok, thanks for the info!
07:58 <jdstrand> np
07:58 <jdstrand> if you are interested in the nitty gritty details, do see the apparmor bug
07
:58 <marcustomlinson> will do
07:59 <jdstrand> it should be noted that since that bug has been filed, we have made many policy compile improvements
07:59 <jdstrand> but the problem can never go away
07:59 <jdstrand> it was at 2.5 seconds
07:59 <jdstrand> per profile
08:00 <jdstrand> I think we are under 1.5 now
08:00 <marcustomlinson> nice
08:00 <jdstrand> but even if we got it to 1, if you have 200 apps installed, that is almost 3.5 minutes of waiting
08:00 <marcustomlinson> like I said, I think we actually "make this go away" by showing people a progress bar or something
...
08:00 <jdstrand> I agree. the pieces are there on click. something different would have to be done for snappy
"

In a nutshell, John has done some amazing stuff already with policy compile times, and more improvements will come and in fact, we are already considerably faster than IOS (aiui) and Android's app update process after reboot even with full policy recompiles, but the problem is we don't give people any feedback what is happening (I discuss ways for unity/the platform to deal with this above). Do keep in mind that because of the sheer volume of apps that can be installed on a phone and because the compile time won't ever be '0', we will always want to be judicious about policy updates that affect all apps-- that's where visual feedback can help.

Revision history for this message
Michi Henning (michihenning) wrote :

As far as the profile change for the thumbnailer is concerned, it can wait. It's not the end of the world if we can't get at the parameters on the client side for the time being.

I don't want to belittle all the work that's already gone into the policy compilation, so please don't take this the wrong way! We are currently discussing some additional thumbnailer features. If they are to go ahead, it'll require adding a new DBus method, which we can't do because of the policy compilation cost.

In other words, we have a rather serious catch-22: the policy mechanism is so expensive to reconfigure that it prevents us from adding new features to services. Obviously, that's bad.

Could we maybe do the compilation at the end of the upgrade process, before the restart? At that point, we still can do progress indicators and, because the user expects upgrades to be slow, the experience would be better that way.

If this isn't possible, at least showing a progress indicator during the boot instead of a black screen would make a huge difference. I think the problem isn't necessarily that things are slow, but that there is absolutely no feedback.

Revision history for this message
John Johansen (jjohansen) wrote :

Yes kicking off a policy compile as part of an update should be possible. It certainly is for .debs, I am not sure of the exact details for click or snappy.

As mentioned above, this compile could even be done as a low priority background task so that the user update wouldn't pick up the cost. Policy compiles are rather cpu intensive so it could cause stuttering but if the priority was low enough I don't think it would be much worse than the delay in ramping up an idle down clocked cpu.

Memory wise compiles tend to be in the 10-50 MB range, so not great but not so bad that they should cause issue for anything but things like browsers. If the oom score was set so that the background compile was killed before the foreground task, the worst you might see would be the oom causing a pause while it frees up some memory.

The policy compiles are done in such a way that killing the compiler should not cause an issue, the updates are built in tmp and moved into place only once complete, so that the worst that would happen if the update was killed was the compile happening on next boot.

So yes, there are lots of options currently available to improve the current experience.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

"In other words, we have a rather serious catch-22: the policy mechanism is so expensive to reconfigure that it prevents us from adding new features to services. Obviously, that's bad."

Note, the current process isn't always 'no'! :) At a minimum, you'd see this when the image moves to a 16.04 base since we need a policy recompile anyway, but the process for normal OTAs is that the Touch release team makes the call on if it is worth it. If it isn't this time, perhaps your next feature combined with this one is. If not, perhaps batched with other policy updates. What I'm getting at is even now it is a conversation, not unyielding dictum.

"If this isn't possible, at least showing a progress indicator during the boot instead of a black screen would make a huge difference. I think the problem isn't necessarily that things are slow, but that there is absolutely no feedback."

This isn't possible due to when the compiles must happen relative to when the screen that would provide the feedback comes up.

"Could we maybe do the compilation at the end of the upgrade process, before the restart? At that point, we still can do progress indicators and, because the user expects upgrades to be slow, the experience would be better that way."

That is what I suggest in the description. The phone team needs to allocate resources to it (and the security team can help guide as needed).

Revision history for this message
Michi Henning (michihenning) wrote :

> Note, the current process isn't always 'no'! :)

Sure, I understand. But, even if the answer is "no" only some of the time, it still means that the apparmor profiles interfere with our normal development process. If I can't add a DBus method when I feel like it, that's a pain. And, depending on how features interact with each other, delaying this may, in turn delay something else, or otherwise make progress more complicated.

Basically, what I'm suggesting is that, if we have a tool that interferes with development, then it is the tool that should change, not the development process.

Anyway, I think we've thoroughly canvassed the issue by now :-) If it's possible to move compilation to the end of the upgrade process while the old image is still running, let's do it please!

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

@jamie what package needs to change to implement comment #4

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

@Pat, I think we actually want '8' for the phone. I mentioned what needs to happen in the description. I'll mention it again here:

'8' can be implemented now to improve the user experience:
"
> Sorry for not being clear. The idea is that when the phone says that
> there is an update, the user has to tap 'Install and Reboot'. The idea > is that before reboot (and therefore still in the unity8 session), we
> look inside what is downloaded, see if there are any policy changes. If
> there are, we extract them and then compile policy with a progress
> meter. The question I posed to you is how hard is it to look inside (or
> provide a manifest of changed packages) and extract what is needed to
> compile policy?

Ok. The update is available as a set of tarballs, available in a fixed
directory. It should be straightforward to check whether any of those
tarballs contains files matching a particular path. If you want to know
whether particular packages have changed, that would be a matter of
extracting the dpkg database and comparing. (We don't otherwise track the
packagewise delta between the images.)

A partial extraction of the tarball based on particular filenames is a
simple matter of tar arguments."

Basically, when the user presses 'Install and reboot". Essentially wherever the upgrade code lives it should do something along the lines of:
* download everything
* before installing, peak inside the rootfs to see if things changed, unpack the bits that changed, regenerate and recompile click policy with progress meter

I don't know where the upgrade code lives and/or if something needs to be done with unity8 for the progress meter. aa-clickhook would need to gain --with-progress.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

A few things I neglected to mention that John reminded me of are

1. if the kernel changes its apparmor feature set or the apparmor parser itself changes how it generates policy, we'll still be running the old kernel and parser. This will happen on an upgrade from 15.04 to 16.04 for example. If the device tarball exported the apparmor feature set (it can be mocked as a text file-- we do that for custom tarballs already for example) then we can point the parser at that. Furthermore the parser could be extracted from the tarball as part of the recompile step.

Since most upgrades very-much-intentionally do not change the parser or the feature set, I think the first iteration could avoid dealing with the apparmor features file and the parser. The second iteration could incorporate it.

2. Due to how profile caching is currently implemented, the recompiling caches is a one way operation. In other words, if prior to reboot we recompile all the policy that invalidates the old caches. If on reboot the boot detects that a recompile is needed (eg, we got something wrong before reboot-- eg, the kernel in the device tarball was updated but someone forgot to update the mocked features file), then the recompile will still happen on reboot. If the processes surrounding updates are sufficient and QA is involved, this should be an avoidable situation.

3. The security team has worked on versioned policy caches in previous cycles but the work is incomplete and deprioritized. Depending on how things are ultimately implemented, its possible versioned policy caches could help with the implementation. I think performing point '8' on Touch and carefully handling the features file and parser changes, this is probably not needed (there isn't a concept of rolling back on Touch for example).

Revision history for this message
John Johansen (jjohansen) wrote :

Versioned policy is needed on touch if the compile is going to be done before reboot. You do not want to blow away currently enforcing policy and install the new version and then run into a situation where you fail, or don't reboot. So at the very least for the failure case we need to support versioned policy.

Another reason however is that policy compiles currently still take a long time and the user should still be able to use their phone while the compile is happening.

The good news is that the features set for a given phone kernel has not been updated, and so we won't hit this yet and can do a first pass without it.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

@John, I was thinking of using --skip-kernel-load so the policy is still in the running kernel. I agree that versioned policy is safer though.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

@Pat, this is assigned to me but the changes needed are probably for phonedations and possibly unity8. The click-apparmor patch for --with-progress would be fast and wouldn't need our help (but we could do that if needed). The security team would be happy to answer any questions and advise on the implementation of course. I'm going to remove the assignment from me for now.

Changed in canonical-devices-system-image:
assignee: Jamie Strandboge (jdstrand) → nobody
Changed in click-apparmor (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Adding click-apparmor task with same priority as the for canonical system image.

Revision history for this message
John Johansen (jjohansen) wrote :

@Jamie, I had assumed we would be using --skip-kernel-load. I was just bringing up that policy versioning is not just about having different versions of policy for different kernels but also about dealing with failure cases.

description: updated
Revision history for this message
Oliver Grawert (ogra) wrote :

hmm, does generating on shutdown really make sense ?
what if i skip a few upgrades (which is a pretty common case, i.e. my moms phone only gets updated when i visit her. and i know enough (non geeky) people that simply ignore upgrade notifications altogether on their phones), meanwhile apparmor changed profile handling in an incompatible way ... the new binary wont be available until after reboot to generate the profiles the right way, so i end up with broken profiles after reboot ...
while it is surely easy on deb based systems simply because you have the new app binary around, doing it before reboot on system-image or snappy installs means your existing binary needs to be forward compatible to all possible changes that may come with the new binary only after reboot (this could be a one version, a ten version or even a 100 version jump depending how long you didnt upgrade).

having it done after reboot and simply implementing some feedback UI seems to make a lot more sense as it has a lot less risks.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

@Oliver, that is what I was getting at with pre-reboot-- there are many things that could change that would impact policy and a full implementation would have to account for all of them. The most likely change is simple policy updates (ie, apparmor-easyprof-ubuntu) and that is easy enough to implement preboot.

However, the only reason why I detailed all of this is because people said it was impossible to provide feedback otherwise. If there is some way to do it post reboot, everything becomes a lot simpler and all that is needed is some sort of progress meter that the UI can hook up.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Pat had the idea of implementing a variation of '8'. Essentially, look inside the tar file and see if apparmor, click-apparmor or apparmor-easyprof-ubuntu changed, then say something along the lines of "Security policy will be updated after the device is restarted. This process may take several minutes." The result of the detection could flag displaying a static string during the boot process if people wanted. The scope of the recompile is not communicated (eg, changing a rule in a little used policy group won't trigger a recompile of all policy), but this is handled by the phrasing of 'may take' and people can simply be pleasantly surprised when it is faster.

This technique has one minor flaw: it doesn't detect the kernel .features file changing but I don't think that should block implementing this improvement. In practice, the kernel won't be changing this file during a normal OTA, and for an OTA for an OS version update that may have the .features change in the kernel (eg, 15.04 to 16.04) the other components will be changing with it.

Revision history for this message
Michi Henning (michihenning) wrote :

I like this idea. The fact that it takes a long time is less problematic than the lack of user feedback. So, if there is any indication that the phone is doing a normal thing and isn't dead, that's reassuring to the user and also will stop people from resetting the phone, thinking it died (typically just as it's about to finish the compilation…)

I would probably not mention anything about security in the message because it is potentially alarming. ("What, you mean it wasn't secure before?")

Instead, I would just say something along the lines of "complete the installation process" or the like.

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Saying that the update “may take several minutes” is, I think, probably not an effective solution to the problem of people powering off the phone because it looks like it’s stuck. It doesn’t matter how long they expect it to take: if nothing on the screen is changing, it would still look like it’s stuck.

I don’t know why Plymouth can show changing text but the Ubuntu Touch startup screen can’t, but if that will continue to be the case, I guess fixing bug 1385410 is the only way to provide the necessary reassurance.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.