Skip to main content

HTML cleaner from lxml project

Project description

lxml_html_clean

Motivation

This project was initially a part of lxml. Because HTML cleaner is designed as blocklist-based, many reports about possible security vulnerabilities were filed for lxml and that make the project problematic for security-sensitive environments. Therefore we decided to extract the problematic part to a separate project.

Important: the HTML Cleaner in lxml_html_clean is not considered appropriate for security sensitive environments. See e.g. bleach for an alternative.

This project uses functions from Python's urllib.parse for URL parsing which do not validate inputs. For more information on potential security risks, refer to the URL parsing security documentation. A maliciously crafted URL could potentially bypass the allowed hosts check in Cleaner.

Installation

You can install this project directly via pip install lxml_html_clean or as an extra of lxml via pip install lxml[html_clean]. Both ways install this project together with lxml itself.

Security

For discussions regarding security-related issues or any sensitive reports, please contact us privately. You can reach out to lbalhar(at)redhat.com or frenzy.madness(at)gmail.com to ensure your concerns are addressed confidentially and securely.

Documentation

https://lxml-html-clean.readthedocs.io/

License

BSD-3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lxml_html_clean-0.4.2.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

lxml_html_clean-0.4.2-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file lxml_html_clean-0.4.2.tar.gz.

File metadata

  • Download URL: lxml_html_clean-0.4.2.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for lxml_html_clean-0.4.2.tar.gz
Algorithm Hash digest
SHA256 91291e7b5db95430abf461bc53440964d58e06cc468950f9e47db64976cebcb3
MD5 3d1cc621109e8d5b86e93aa1bedfefbf
BLAKE2b-256 79b6466e71db127950fb8d172026a8f0a9f0dc6f64c8e78e2ca79f252e5790b8

See more details on using hashes here.

File details

Details for the file lxml_html_clean-0.4.2-py3-none-any.whl.

File metadata

File hashes

Hashes for lxml_html_clean-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 74ccfba277adcfea87a1e9294f47dd86b05d65b4da7c5b07966e3d5f3be8a505
MD5 936964822602b5a6bb122a68c0a44291
BLAKE2b-256 4e0b942cb7278d6caad79343ad2ddd636ed204a47909b969d19114a3097f5aa3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page