Unable to delete messages from list archive

Bug #684668 reported by Tom Haddon
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Curtis Hovey

Bug Description

I've been trying to delete messages from a list archive using the instructions defined on https://wiki.canonical.com/InformationInfrastructure/OSA/LPHowTo/DeleteEmailFromArchive per https://answers.launchpad.net/launchpad-registry/+question/129393. An example message I'm trying to delete is https://lists.launchpad.net/cuneiform/msg00772.html. However, in each case, it's telling me the message isn't in the archive: https://pastebin.canonical.com/40460/.

Tom Haddon (mthaddon)
tags: added: canonical-losa-lp
affects: launchpad → launchpad-foundations
Gary Poster (gary)
affects: launchpad-foundations → launchpad-registry
Curtis Hovey (sinzui)
tags: added: ml-archive-sucks
Curtis Hovey (sinzui)
Changed in launchpad-registry:
assignee: nobody → Curtis Hovey (sinzui)
importance: Undecided → High
status: New → In Progress
tags: added: docs
Revision history for this message
Curtis Hovey (sinzui) wrote :

The messages are not in the thread index. As was report many weeks ago, the messages were indeed removed from the archive, the threads and messages were rewritten without the removed message. The output does not say it regenerated the messages, sans the deleted message.

The pages for the message *were not* updated, nor were the pages for the deleted messages deleted. Deleting the messages. I suspect that KEEPONRMM is enabled in the archive db or M2H_KEEPONRMM=1 in the env.

Also, the loop is very slow. This is faster and may do the right thing:
    mhonarc --nokeeponrmm -rmm 722 724 725 726 727 728 729 73 731 732 737 742 747 752 756 762 767 772 772

Revision history for this message
Michael Barnett (mbarnett) wrote :

I added the --nokeeponrmm flag to the previous removal run. It bitched again about the messages not being in the db:

https://pastebin.canonical.com/40615/

After running with that flag, i can still pull up messages that were "removed" on the web:

https://lists.launchpad.net/cuneiform/msg00772.html

Revision history for this message
Curtis Hovey (sinzui) wrote :

I can see from the last pasetbin that it generated 727 messages. We should check the timestamps. I suspect that we may need to delete all the html files before calling `mhonarc --nokeeponrmm -rm <ids>`.

Revision history for this message
Curtis Hovey (sinzui) wrote :

I updated the how too to include the --nokeeponrmm and inlined the regenerate instructions. I think we should verify there is not a cache issue involved here. The pastebins report that threads.html was written. We expect the last-modified to be "Tue, 07 Dec, 2010", but the response headers for the file state
    Last-Modified: Sun, 07 Nov 2010 15:40:03 GMT

Someone should look at contents of <root>/var/mailman/mhonarc/cuneiform. When was thread.html last modied? Does msg00772.html really exist, and if so what is its modtime? What is the modtime for msg00771.html? Does it really have a link to msg00772.html?

Revision history for this message
Michael Barnett (mbarnett) wrote :

launchpad@forster:/srv/lists.launchpad.net/var/mailman/mhonarc/cuneiform$ ls -alF | grep msg00772
-rw-r--r-- 1 launchpad launchpad 5830 2010-10-11 10:32 msg00772.html

launchpad@forster:/srv/lists.launchpad.net/var/mailman/mhonarc/cuneiform$ ls -alF | grep 771
-rw-r--r-- 1 launchpad launchpad 5389 2010-10-11 10:31 msg00771.html

launchpad@forster:/srv/lists.launchpad.net/var/mailman/mhonarc/cuneiform$ grep 772 msg00771.html
[<a href="msg00770.html">Date Prev</a>][<a href="msg00772.html">Date Next</a>][<a href="msg00729.html">Thread Prev</a>][Thread Next][<a href="maillist.html#00771">Date Index
<strong><a href="msg00772.html">Re: [Cuneiform] [Bug 640051] Re: Type mismatch: va_list and char*</a></strong>

Revision history for this message
Curtis Hovey (sinzui) wrote :

I wonder if we need to delete the files before asking mhonarc to regenerate them. I suppose we could test this by tarring up the messages first, then attempt the delete again. If the messages are regenerated, we declare success, otherwise we untar the files.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Michael confirmed that regenerating the entire web archive does indeed update all pages, but the problem messages are still there. So how can they be deleted from the archive/db, yet still be regenerated?

Revision history for this message
Curtis Hovey (sinzui) wrote :

I think I understand the nature of the problem. MHonArc silently falls back to bad defaults. We must always specify the -dbfile and -outdir because most operations involved a comparison of the two. In this case we will see reports that the messages were not found in the archive (the html directory), then it wrote out new html files. We did not see an update because we were looking at the expected -outdir that we never specified; I do not know where we generated the other archives ;).

I updated the howto with the explicit options that allowed me to perform a delete that updates the correct archive.

Changed in launchpad-registry:
milestone: none → 10.12
Curtis Hovey (sinzui)
Changed in launchpad-registry:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.