apache-log-parser should parse one file at a time
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
Undecided
|
Michael Nelson |
Bug Description
With 10.07 we fixed bug 588288 which allowed us to set the maximum number of lines of (each) log file that will be parsed.
Initially we'd thought this would help us solve the memory issue when running the parser against the backlog of ppa access logs, but after trialling with logparser_
Going back to the code, there are a lot of other improvements that could be made. One that stands out is that currently *all* log files with new lines to parse are opened during get_files_
Note: there is also a comment related to the librarian logfile parser in the docstring at:
cronscripts/
which, applying it to the PPA log file parser, implies that we could additionally update the script to clear the storm cache
(store.
Also, with this knowledge, we can QA such a change on dogfood or locally by simply copying the log file that we have many times before running the script.
Related branches
- Henning Eggers (community): Approve (code)
-
Diff: 250 lines (+73/-24)4 files modifiedlib/canonical/launchpad/scripts/tests/test_librarian_apache_log_parser.py (+2/-2)
lib/lp/services/apachelogparser/base.py (+7/-3)
lib/lp/services/apachelogparser/script.py (+9/-1)
lib/lp/services/apachelogparser/tests/test_apachelogparser.py (+55/-18)
description: | updated |
tags: | added: canonical-losa-lp |
Changed in launchpad-foundations: | |
assignee: | nobody → Benji York (benji) |
status: | New → In Progress |
Changed in launchpad-foundations: | |
status: | Fix Committed → Fix Released |
I'm not sure that the given QA strategy is correct: the log parser infrastructure identifies files by their first line, not their filename.