Skip to main content

HTML cleaner from lxml project

Project description

lxml_html_clean

Motivation

This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.

Important: the HTML Cleaner in lxml_html_clean is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.

This project uses functions from Python's urllib.parse for URL parsing which do not validate inputs. For more information on potential security risks, refer to the URL parsing security documentation. A maliciously crafted URL could potentially bypass the allowed hosts check in Cleaner.

Installation

You can install this project directly via pip install lxml_html_clean or as an extra of lxml via pip install lxml[html_clean]. Both ways install this project together with lxml itself.

Security

For discussions regarding security-related issues or any sensitive reports, please contact us privately. You can reach out to lbalhar(at)redhat.com or frenzy.madness(at)gmail.com to ensure your concerns are addressed confidentially and securely.

Documentation

https://lxml-html-clean.readthedocs.io/

License

BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lxml_html_clean-0.4.1.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

lxml_html_clean-0.4.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file lxml_html_clean-0.4.1.tar.gz.

File metadata

  • Download URL: lxml_html_clean-0.4.1.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for lxml_html_clean-0.4.1.tar.gz
Algorithm Hash digest
SHA256 40c838bbcf1fc72ba4ce811fbb3135913017b27820d7c16e8bc412ae1d8bc00b
MD5 27a981135a8ee25ab96c9f7af49013f1
BLAKE2b-256 81f2fe319e3c5cb505a361b95d1e0d0d793fe28d4dcc2fc39d3cae9324dc4233

See more details on using hashes here.

File details

Details for the file lxml_html_clean-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for lxml_html_clean-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b704f2757e61d793b1c08bf5ad69e4c0b68d6696f4c3c1429982caf90050bcaf
MD5 8666edcc2285b289eed1bc4f659d51e6
BLAKE2b-256 f7ba2af7a60b45bf21375e111c1e2d5d721108d06c80e3d9a3cc1d767afe1731

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page