Skip to main content

A simple HTML cleaner utility built with BeautifulSoup. This library allows you to easily clean HTML content by removing unwanted tags like styles, scripts, iframes, and more. It provides an easy-to-use interface for cleaning HTML from various sources, making it ideal for web scraping, data extraction, or sanitizing HTML content before processing.

Project description

html-scrubber

html-scrubber is a simple Python library built with BeautifulSoup that allows you to clean HTML content by removing unwanted tags such as styles, scripts, iframes, and more. It's ideal for web scraping, data extraction, or sanitizing HTML content before further processing.

Features

  • Clean HTML content by removing unnecessary elements
  • Supports removal of <style>, <script>, <iframe>, <svg>, <meta>, and <noscript> tags
  • Easily extendable with custom tags to remove
  • Option to return the cleaned HTML as a BeautifulSoup object or a plain string

Installation

To install html-scrubber, you can use pip:

pip install html-scrubber

Usage

Here is an example of how to use the html-scrubber library to clean HTML content:

from html_scrubber import clean_html

raw_html = """
<html>
    <head><style>body {color: red;}</style></head>
    <body>
        <script>alert("test");</script>
        <div>Test content</div>
        <iframe src="https://example.com"></iframe>
    </body>
</html>
"""

# Clean the HTML content by removing style, script, and iframe tags
cleaned_html = clean_html(
    html_content=raw_html,
    remove_styles=True,
    remove_scripts=True,
    remove_iframes=True,
    return_as_string=True,
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_scrubber-0.0.3.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

html_scrubber-0.0.3-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file html_scrubber-0.0.3.tar.gz.

File metadata

  • Download URL: html_scrubber-0.0.3.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for html_scrubber-0.0.3.tar.gz
Algorithm Hash digest
SHA256 46d86d8e097765efce557a6c8d16127c193e6546309467dd9504096036221c07
MD5 34957601bd7d2340da3360c0a57142c8
BLAKE2b-256 c310a2b98ee547739af32a241bcfcb5bb424a80d32b978db9525f80b26f61e57

See more details on using hashes here.

File details

Details for the file html_scrubber-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: html_scrubber-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for html_scrubber-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 10064da695f495a3945c700bddb6570329a3ffdcef55aefd698d49805bf7348f
MD5 f0b3820867a7639054307faa7a716c4c
BLAKE2b-256 cf211d0b1129014640382c1c83ea98446965932036b4078dfe576b255136ea1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page