Skip to main content

Democritus functions for working with HTML.

Project description

Democritus Html

PyPI CI Lint codecov The Democritus Project uses semver version 2.0.0 The Democritus Project uses ruff to format and lint code License: LGPL v3

Democritus functions[1] for working with HTML.

[1] Democritus functions are simple, effective, modular, well-tested, and well-documented Python functions.

We use d8s (pronounced "dee-eights") as an abbreviation for democritus (you can read more about this here).

Installation

pip install d8s-html

Usage

You import the library like:

from d8s_html import *

Once imported, you can use any of the functions listed below.

Functions

  • def html_text(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
    
  • def html_unescape(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
    
  • def html_escape(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
    
  • def html_to_markdown(html_content: StringOrBeautifulSoupObject, **kwargs) -> str:
        """Convert the html string to markdown."""
    
  • def html_find_comments(html_content: StringOrBeautifulSoupObject) -> str:
        """Get a list of all of the comments in the html strings."""
    
  • def html_soupify(html_string: str, parser: str = 'html.parser') -> bs4.BeautifulSoup:
        """Return an instance of beautifulsoup with the html."""
    
  • def html_remove_tags(html_content: StringOrBeautifulSoupObject) -> bs4.BeautifulSoup:
        """."""
    
  • def html_remove_element(html_content: StringOrBeautifulSoupObject, element_tag: str) -> bs4.BeautifulSoup:
        """."""
    
  • def html_find_css_path(html_content: StringOrBeautifulSoupObject, css_path: str) -> ListOfBeautifulSoupTags:
        """Find the given css_path in the html_content."""
    
  • def html_elements_with_class(
        html_content: StringOrBeautifulSoupObject, html_element_class: str
    ) -> ListOfBeautifulSoupTags:
        """Find all elements with the given class from the html string."""
    
  • def html_elements_with_id(html_content: StringOrBeautifulSoupObject, html_element_id: str) -> ListOfBeautifulSoupTags:
        """Find all elements with the given html_element_id from the html_content."""
    
  • def html_elements_with_tag(html_content: StringOrBeautifulSoupObject, tag: str) -> ListOfBeautifulSoupTags:
        """."""
    
  • def html_headings_table_of_contents(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
        """."""
    
  • def html_headings_table_of_contents_string(
        html_content: StringOrBeautifulSoupObject, *, indentation: str = '  '
    ) -> str:
        """."""
    
  • def html_headings(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
        """."""
    
  • def html_to_json(html_content: StringOrBeautifulSoupObject, *, convert_only_tables: bool = False):
        """Convert the html to json using https://gitlab.com/fhightower/html-to-json."""
    
  • def html_soupify_first_arg_string(func):
        """Return a Beautiful Soup instance of the first argument (if it is a string)."""
    

Development

👋  If you want to get involved in this project, we have some short, helpful guides below:

If you have any questions or there is anything we did not cover, please raise an issue and we'll be happy to help.

Credits

This package was created with Cookiecutter and Floyd Hightower's Python project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d8s_html-0.7.0.tar.gz (97.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

d8s_html-0.7.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file d8s_html-0.7.0.tar.gz.

File metadata

  • Download URL: d8s_html-0.7.0.tar.gz
  • Upload date:
  • Size: 97.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for d8s_html-0.7.0.tar.gz
Algorithm Hash digest
SHA256 3a64a726be9b406823509b5c0c1f01af1b6f4cc863f452b693570055607f8d64
MD5 b1fead639ccebfc824f0ea4b7152f87f
BLAKE2b-256 fe9da1c403a58e655913519d9d4fa0de13ff9da69969dd04b8756e3d94a67f00

See more details on using hashes here.

Provenance

The following attestation bundles were made for d8s_html-0.7.0.tar.gz:

Publisher: release-please.yml on democritus-project/d8s-html

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file d8s_html-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: d8s_html-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for d8s_html-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 649d21ac7a1b1ad75e436df5baa246f3a5fa76e2bafdc2dab885565380793b1e
MD5 8e8de442958de83c9526ef58db1095ce
BLAKE2b-256 d55f123b1d835927a3b701bab0f28b2609f586d4914805b6426a551eb4f45fc3

See more details on using hashes here.

Provenance

The following attestation bundles were made for d8s_html-0.7.0-py3-none-any.whl:

Publisher: release-please.yml on democritus-project/d8s-html

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page