Skip to main content

HTML cleaner from lxml project

Project description

lxml_html_clean

Motivation

This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.

Important: the HTML Cleaner in lxml_html_clean is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.

This project uses functions from Python's urllib.parse for URL parsing which do not validate inputs. For more information on potential security risks, refer to the URL parsing security documentation. A maliciously crafted URL could potentially bypass the allowed hosts check in Cleaner.

Installation

You can install this project directly via pip install lxml_html_clean or as an extra of lxml via pip install lxml[html_clean]. Both ways install this project together with lxml itself.

Security

For discussions regarding security-related issues or any sensitive reports, please contact us privately. You can reach out to lbalhar(at)redhat.com or frenzy.madness(at)gmail.com to ensure your concerns are addressed confidentially and securely.

Documentation

https://lxml-html-clean.readthedocs.io/

License

BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lxml_html_clean-0.4.3.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lxml_html_clean-0.4.3-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file lxml_html_clean-0.4.3.tar.gz.

File metadata

  • Download URL: lxml_html_clean-0.4.3.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lxml_html_clean-0.4.3.tar.gz
Algorithm Hash digest
SHA256 c9df91925b00f836c807beab127aac82575110eacff54d0a75187914f1bd9d8c
MD5 074612c15ebe88ec60e51c1784725bf7
BLAKE2b-256 d9cbc9c5bb2a9c47292e236a808dd233a03531f53b626f36259dcd32b49c76da

See more details on using hashes here.

File details

Details for the file lxml_html_clean-0.4.3-py3-none-any.whl.

File metadata

File hashes

Hashes for lxml_html_clean-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 63fd7b0b9c3a2e4176611c2ca5d61c4c07ffca2de76c14059a81a2825833731e
MD5 1a7a729cd52c327b5925b59892de8601
BLAKE2b-256 104a63a9540e3ca73709f4200564a737d63a4c8c9c4dd032bab8535f507c190a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page