etree.strip_tags() does not remove all instances of a defined tag

Bug #485040 reported by phatfish
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

I have the following python code on lxml 2.2.2 with libxml 2.7.6 on FreeBSD 7.2:

from lxml import etree
html = """
<div>
  <div>
    I like <strong>beer</strong>.
    <br/>
    I like lots of <strong>beer</strong>.
    <br/>
    Click <a href="www.beer.com">here</a> for <a href="www.beer.com">this</a> beer.
    <br/>
  </div>
</div>
"""
element = etree.fromstring(html)
etree.strip_tags(element, 'a','br')
print etree.tostring(element)

which prints:

<div>
  <div>
    I like <strong>beer</strong>.

    I like lots of <strong>beer</strong>.

    Click here for <a href="www.beer.com">this</a> beer.
    <br/>
  </div>
</div>

I would expect *all* the "br" and "a" tags to be stripped.
Another example, use "etree.strip_tags(element, 'strong','br')", you get this output:

<div>
  <div>
    I like beer.
    <br/>
    I like lots of <strong>beer</strong>.
    <br/>
    Click <a href="www.beer.com">here</a> for <a href="www.beer.com">this</a> beer.
    <br/>
    </div>
</div>

Again i would expect all the defined tags to be stripped.
Thanks

Revision history for this message
scoder (scoder) wrote :

Thanks for the report, this has been fixed in trunk rev 69607.

https://codespeak.net/viewvc/?view=rev&revision=69607

Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fix released in lxml 2.2.5.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.