generate-ppa-htaccess is too slow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Michael Nelson |
Bug Description
Looking at the logs, generate-
30 seconds of each run is setup/teardown (ie. time between creating/removing lockfiles and starting/finishing the script), but 1:30 is running the actual script.
Looking at the code, the script is doing three things for each private PPA:
1) Iterating all subscriptions for the PPA to check if they are expired, and expiring where appropriate
2) Iterates through all the tokens for the PPA and ensures they are all still valid, deactivating where appropriate
3) Checks and updates the htaccess file:
i) gets a publisher configuration for the PPA and checks that an htaccess exists
ii) gets a publisher configuration for the PPA and generates a new htpasswd file
iii) gets a publisher configuration for the PPA and compares the new passwd file with the old, replacing when appropriate.
Finally, the transaction is committed.
First, it doesn't look like the getPubConfig is too expensive, but still, I can't see a reason why we can't call this just once (actually, even just once for the complete set as we know they are all private PPAs).
Second, if we were to split this job so that (1) and (2) above were in a separate cron that ran hourly (or some other interval other than 5 minutes), the remainder of (3) would be much faster - in fact, I don't see why it would even need to run in a transaction.
Related branches
- Graham Binns (community): Approve (release-critical)
- Brad Crittenden (community): Approve (code)
-
Diff: 328 lines (+148/-54)2 files modifiedlib/lp/archivepublisher/scripts/generate_ppa_htaccess.py (+111/-50)
lib/lp/archivepublisher/tests/test_generate_ppa_htaccess.py (+37/-4)
Changed in soyuz: | |
status: | Triaged → In Progress |
assignee: | nobody → Michael Nelson (michael.nelson) |
tags: |
added: qa-ok removed: qa-needstesting |
Changed in soyuz: | |
status: | Fix Committed → Fix Released |
I think (1) could potentially be separate, expiring doesn't have to be *that* timely.
However, (2) *does* need to be timely since someone may be deactivating a token for a malicious user or for a security leak.
We might be able to speed this script up by having an intermediate table that holds a set of actions that must take place, which gets populated when someone a) deactivates, b) creates a token, c) regenerates a token (which is similar to the publisher's "dirty pockets" concept).
Obviously, the other cope optimisations you mention would also be useful.