Skip to main content

HTML cleaner from lxml project

Project description

lxml_html_clean

Motivation

This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.

Important: the HTML Cleaner in lxml_html_clean is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.

This project uses functions from Python's urllib.parse for URL parsing which do not validate inputs. For more information on potential security risks, refer to the URL parsing security documentation. A maliciously crafted URL could potentially bypass the allowed hosts check in Cleaner.

Installation

You can install this project directly via pip install lxml_html_clean or as an extra of lxml via pip install lxml[html_clean]. Both ways install this project together with lxml itself.

Security

For discussions regarding security-related issues or any sensitive reports, please contact us privately. You can reach out to lbalhar(at)redhat.com or frenzy.madness(at)gmail.com to ensure your concerns are addressed confidentially and securely.

Documentation

https://lxml-html-clean.readthedocs.io/

License

BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lxml_html_clean-0.4.0.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

lxml_html_clean-0.4.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file lxml_html_clean-0.4.0.tar.gz.

File metadata

  • Download URL: lxml_html_clean-0.4.0.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for lxml_html_clean-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a8b517d3f46c19e9303eafb2a1b4b422fe724ad42ae53793637a8e5cc36ffbc1
MD5 a4aa7f2e96b593cb0965ade2ec806878
BLAKE2b-256 47093e767aa44302c8910df89e4dc862146b864edca030aeed01aeb0c1d4d80b

See more details on using hashes here.

File details

Details for the file lxml_html_clean-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for lxml_html_clean-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b5aedb6c2b4b684c0fbc8d4f1b901aae0a92c1ce525de84e71cc6dd1d9d4e3d
MD5 465e611b1118c618f54b75c9420dcc61
BLAKE2b-256 51e53d821fa25bc2afc54ba2b80f7b1e2b9c9a4665d91058fbc3b2b6ba44dfbd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page