Skip to main content

Cuts the tags and attributes from HTML that are not in the whitelist. Their content is leaves.

Project description

Python HTML purifier
====================

About
-----

Cuts the tags and attributes from HTML that are not in the whitelist.
Their content is leaves. Signature of whitelist:
```python
{
'enabled tag name' : ['list of enabled tag\'s attributes']
}
```
You can use the symbol ``*`` to allow all tags and/or attributes.

Note that the ``script`` and ``style`` tags are removed with content.

The module is based on
[HTMLParser](http://docs.python.org/2/library/htmlparser.html)
Class - in the standard Python package.
No need to pull a dependence, what is also sometimes can be a plus.

[In my blog](http://pixxxxxel.blogspot.ru/2013/07/html-purifier-python.html)

Basic Use
---------
```python
>>> purifier = HTMLPurifier({
'div': ['*'], # разрешает все атрибуты у тега div
'span': ['attr-2'], # разрешает только атрибут attr-2 у тега span
# все остальные теги удаляются, но их содержимое остается
})
>>> print purifier.feed('<div class="e1" id="e1">Some <b>HTML</b> for <span attr-1="1" attr-2="2">purifying</span></div>')
<div class="e1" id="e1">Some HTML for <span attr-2="2">purifying</span></div>
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html-purifier-0.1.1.zip (86.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page