There are a couple posts out there that discuss stripping tags with html5lib, but they seem intent on preserving the "acceptable elements" such as <span>
and <code>
.
This is fine unless you really want to friggin' strip out the tags, like I needed for Emend. The following is my solution.
Source code for stripping tags with html5lib and unit test.
For example,
>>> from strip_tags import strip_tags
>>> strip_tags('<p>foo</p> <script>bar</script>')
u'foo bar'
Thanks go to Edward O’Connor for pointing me towards html5lib in the first place. It's a huge improvement over HTMLParser.