Skip to main content

FilterHTML: A whitelisting HTML filter

Project description

FilterHTML: A simple to use whitelisting HTML filter.
Clean/purify untrusted HTML: only a well-defined subset of HTML can pass through.
Class and style parsing, and filters for urls, colors, measurements, regular expressions, and custom functions.


import FilterHTML

# only allow:
# <a> tags with valid href URLs
# <img> tags with valid src URLs and measurements
# <span> tags with valid color styles
whitelist = {
'a': {
'href': 'url',
'target': [
'_blank',
'_self'
],
'class': [
'button'
]
},

'img': {
'src': 'url',
'width': 'measurement',
'height': 'measurement'
},

'span': {
'style': {
'color': 'color',
'background-color': 'color'
}
}
}

# perform replacements on text (between tags)
def replace_text(text, tags):
return text.replace('sad', '<strong>happy</strong>')

# filter the unfiltered_html, using the above whitelist, using specified allowed url schemes, and a text replacement function
filtered_html = FilterHTML.filter_html(unfiltered_html, whitelist, ('http', 'https', 'mailto', 'ftp'), replace_text)

# simpler usage: filter using the default (same as above) url schemes, and no replacement function:
filtered_html = FilterHTML.filter_html(unfiltered_html, whitelist)

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FilterHTML-0.2.2.tar.gz (4.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page