HTML cleaner from lxml project
Project description
lxml_html_clean
Motivation
This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.
Important: the HTML Cleaner in lxml_html_clean
is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.
This project uses functions from Python's urllib.parse
for URL parsing which do not validate inputs. For more information on potential security risks, refer to the URL parsing security documentation. A maliciously crafted URL could potentially bypass the allowed hosts check in Cleaner
.
Installation
You can install this project directly via pip install lxml_html_clean
or as an extra of lxml
via pip install lxml[html_clean]
. Both ways install this project together with lxml itself.
Security
For discussions regarding security-related issues or any sensitive reports, please contact us privately. You can reach out to lbalhar(at)redhat.com or frenzy.madness(at)gmail.com to ensure your concerns are addressed confidentially and securely.
Documentation
https://lxml-html-clean.readthedocs.io/
License
BSD-3-Clause
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lxml_html_clean-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc5e34412158040959c9d0b3681b3ad49276ac02eb8c576afd43e351b127b3ef |
|
MD5 | 4c2bc324941259d73c0d8a0a25c2b584 |
|
BLAKE2b-256 | 157d52511b6d0f3e2bee4e62db69cc559ca9ffa6ec726b28dced9e786ae49c05 |