FilterHTML: A whitelisting HTML filter
Project description
FilterHTML: A simple to use whitelisting HTML filter.
Clean/purify untrusted HTML: only a well-defined subset of HTML can pass through.
Class and style parsing, and filters for urls, colors, measurements, regular expressions, and custom functions.
import FilterHTML
# only allow:
# <a> tags with valid href URLs
# <img> tags with valid src URLs and measurements
# <span> tags with valid color styles
whitelist = {
'a': {
'href': 'url',
'target': [
'_blank',
'_self'
],
'class': [
'button'
]
},
'img': {
'src': 'url',
'width': 'measurement',
'height': 'measurement'
},
'span': {
'style': {
'color': 'color',
'background-color': 'color'
}
}
}
# perform replacements on text (between tags)
def replace_text(text, tags):
return text.replace('sad', '<strong>happy</strong>')
# filter the unfiltered_html, using the above whitelist, using specified allowed url schemes, and a text replacement function
filtered_html = FilterHTML.filter_html(unfiltered_html, whitelist, ('http', 'https', 'mailto', 'ftp'), replace_text)
# simpler usage: filter using the default (same as above) url schemes, and no replacement function:
filtered_html = FilterHTML.filter_html(unfiltered_html, whitelist)
Clean/purify untrusted HTML: only a well-defined subset of HTML can pass through.
Class and style parsing, and filters for urls, colors, measurements, regular expressions, and custom functions.
import FilterHTML
# only allow:
# <a> tags with valid href URLs
# <img> tags with valid src URLs and measurements
# <span> tags with valid color styles
whitelist = {
'a': {
'href': 'url',
'target': [
'_blank',
'_self'
],
'class': [
'button'
]
},
'img': {
'src': 'url',
'width': 'measurement',
'height': 'measurement'
},
'span': {
'style': {
'color': 'color',
'background-color': 'color'
}
}
}
# perform replacements on text (between tags)
def replace_text(text, tags):
return text.replace('sad', '<strong>happy</strong>')
# filter the unfiltered_html, using the above whitelist, using specified allowed url schemes, and a text replacement function
filtered_html = FilterHTML.filter_html(unfiltered_html, whitelist, ('http', 'https', 'mailto', 'ftp'), replace_text)
# simpler usage: filter using the default (same as above) url schemes, and no replacement function:
filtered_html = FilterHTML.filter_html(unfiltered_html, whitelist)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
FilterHTML-0.2.2.tar.gz
(4.9 kB
view hashes)