Skip to main content

Democritus functions for working with HTML.

Project description

Democritus Html

PyPI CI Lint codecov The Democritus Project uses semver version 2.0.0 The Democritus Project uses black to format code License: LGPL v3

Democritus functions[1] for working with HTML.

[1] Democritus functions are simple, effective, modular, well-tested, and well-documented Python functions.

We use d8s (pronounced "dee-eights") as an abbreviation for democritus (you can read more about this here).

Installation

pip install d8s-html

Usage

You import the library like:

from d8s_html import *

Once imported, you can use any of the functions listed below.

Functions

  • def html_text(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
    
  • def html_unescape(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
    
  • def html_escape(html_content: StringOrBeautifulSoupObject) -> str:
        """."""
    
  • def html_to_markdown(html_content: StringOrBeautifulSoupObject, **kwargs) -> str:
        """Convert the html string to markdown."""
    
  • def html_find_comments(html_content: StringOrBeautifulSoupObject) -> str:
        """Get a list of all of the comments in the html strings."""
    
  • def html_soupify(html_string: str, parser: str = 'html.parser') -> bs4.BeautifulSoup:
        """Return an instance of beautifulsoup with the html."""
    
  • def html_remove_tags(html_content: StringOrBeautifulSoupObject) -> bs4.BeautifulSoup:
        """."""
    
  • def html_remove_element(html_content: StringOrBeautifulSoupObject, element_tag: str) -> bs4.BeautifulSoup:
        """."""
    
  • def html_find_css_path(html_content: StringOrBeautifulSoupObject, css_path: str) -> ListOfBeautifulSoupTags:
        """Find the given css_path in the html_content."""
    
  • def html_elements_with_class(
        html_content: StringOrBeautifulSoupObject, html_element_class: str
    ) -> ListOfBeautifulSoupTags:
        """Find all elements with the given class from the html string."""
    
  • def html_elements_with_id(html_content: StringOrBeautifulSoupObject, html_element_id: str) -> ListOfBeautifulSoupTags:
        """Find all elements with the given html_element_id from the html_content."""
    
  • def html_elements_with_tag(html_content: StringOrBeautifulSoupObject, tag: str) -> ListOfBeautifulSoupTags:
        """."""
    
  • def html_headings_table_of_contents(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
        """."""
    
  • def html_headings_table_of_contents_string(
        html_content: StringOrBeautifulSoupObject, *, indentation: str = '  '
    ) -> str:
        """."""
    
  • def html_headings(html_content: StringOrBeautifulSoupObject) -> ListOfBeautifulSoupTags:
        """."""
    
  • def html_to_json(html_content: StringOrBeautifulSoupObject, *, convert_only_tables: bool = False):
        """Convert the html to json using https://gitlab.com/fhightower/html-to-json."""
    
  • def html_soupify_first_arg_string(func):
        """Return a Beautiful Soup instance of the first argument (if it is a string)."""
    

Development

👋  If you want to get involved in this project, we have some short, helpful guides below:

If you have any questions or there is anything we did not cover, please raise an issue and we'll be happy to help.

Credits

This package was created with Cookiecutter and Floyd Hightower's Python project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d8s_html-0.6.0.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

d8s_html-0.6.0-py2.py3-none-any.whl (23.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file d8s_html-0.6.0.tar.gz.

File metadata

  • Download URL: d8s_html-0.6.0.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for d8s_html-0.6.0.tar.gz
Algorithm Hash digest
SHA256 18d1535404f008a1d88ef7a7bffa4f9eff85c541bc109a5903d38a640620ed89
MD5 41d784e44b78468fe77132c1a40b9540
BLAKE2b-256 1d547ffafdc9ada6db7a58b1110821b1eec4e400d93d27e7e4a1c90422945cd7

See more details on using hashes here.

File details

Details for the file d8s_html-0.6.0-py2.py3-none-any.whl.

File metadata

  • Download URL: d8s_html-0.6.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for d8s_html-0.6.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ec0097174dc2cccc6bd15d16e05efdc6f8f58b24681304eec46664d2f6cd1104
MD5 f15f1435d9189e2e9453d21e3148afc6
BLAKE2b-256 d17f7d50171688f4fbcb7e402cf6cfa6a33f05cc1638d42775fd220119819ffd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page