etree.strip_tags() does not remove all instances of a defined tag
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
scoder |
Bug Description
I have the following python code on lxml 2.2.2 with libxml 2.7.6 on FreeBSD 7.2:
from lxml import etree
html = """
<div>
<div>
I like <strong>
<br/>
I like lots of <strong>
<br/>
Click <a href="www.
<br/>
</div>
</div>
"""
element = etree.fromstrin
etree.strip_
print etree.tostring(
which prints:
<div>
<div>
I like <strong>
I like lots of <strong>
Click here for <a href="www.
<br/>
</div>
</div>
I would expect *all* the "br" and "a" tags to be stripped.
Another example, use "etree.
<div>
<div>
I like beer.
<br/>
I like lots of <strong>
<br/>
Click <a href="www.
<br/>
</div>
</div>
Again i would expect all the defined tags to be stripped.
Thanks
Thanks for the report, this has been fixed in trunk rev 69607.
https:/ /codespeak. net/viewvc/ ?view=rev& revision= 69607