Skip to main content

Split Markup Language (HTML and Markdown) to Groups and Nodes

Project description

SplitML

Split Markup Language (HTML and Markdown) to Groups and Nodes

Install

pip install --upgrade splitml

Usage

from splitml import HTMLSplitter, NodesGrouper, stat_tokens

html_root = Path(__file__).parent / "samples"
html_paths = sorted(list(html_root.glob("*.md.html")))[:2]
splitter = HTMLSplitter()
grouper = NodesGrouper()
for html_path in html_paths:
    # print(f"> Processing: {html_path}")
    nodes = splitter.split_html_file(html_path)
    # print(f"  - {len(nodes)} doc nodes.")
    stat_tokens(nodes)
    grouped_nodes = grouper.group_nodes(nodes)
    # print(f"  - {len(grouped_nodes)} doc groups.")
    stat_tokens(grouped_nodes)

Classes

class HTMLSplitter:
    ...

class NodesGrouper:
    ...

Functions

def split_html_str(html_str: str):
    ...

def split_html_file(html_path: Union[Path, str]):
    ...

def chunk_html_str(html_str: str):
    ...

def chunk_html_file(html_path: Union[Path, str]):
    ...
def count_tokens(text):
    ...

def stat_tokens(nodes, console_abnormal=False):
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

splitml-0.5.3.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

splitml-0.5.3-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file splitml-0.5.3.tar.gz.

File metadata

  • Download URL: splitml-0.5.3.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for splitml-0.5.3.tar.gz
Algorithm Hash digest
SHA256 dad9ec7d99988801657121478a16311a754e18fa0af5ec0af7225fdb548b4656
MD5 c8aa8ba10028b7670488f3850f4ac230
BLAKE2b-256 0c751412b1e407a40458a51ad03c77188b6ae57c1082133bf3cf0a222cfac53d

See more details on using hashes here.

File details

Details for the file splitml-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: splitml-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for splitml-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 85e20896837a369857251f54f0df9e2fe40236ff0028e04449e4359eeeac8680
MD5 6c7485c0af5effdaa16c9ff48fad7510
BLAKE2b-256 989f8594893ca4856f8b6cd3c9a720abe959f196c6bec1623714915a4616fba5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page