Skip to main content

Convert HTML to Markdown for LLM input extraction

Project description

fast-html2md

Convert HTML to Markdown for LLM input extraction.

Installation

# use pip
pip install fast-html2md

# or use poetry
poetry add fast-html2md

# or use uv
uv add fast-html2md

Usage

from fast_html2md import HTMLToMarkdown

converter = HTMLToMarkdown()

html = """
<!DOCTYPE html>
<html>
<body>
  <h1 id="title" data-updated="20201101">Hi there</h1>
  <div class="post">
    Lorem Ipsum is simply dummy text of the printing and typesetting industry.
  </div>
  <div class="post">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  </div>
</body>
</html>
"""

markdown = converter.convert(html)

print(markdown)

# Count tokens
token_count = converter.count_tokens(markdown)
print(f"Token count: {token_count}")

# Compute cost
cost = converter.compute_cost(token_count)
print(f"Estimated cost: ${cost:.6f}")

Features

  • Fast HTML to Markdown conversion
  • Optimized for LLM input processing
  • Built-in token counting using tiktoken
  • Clean and minimal output

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_html2md-0.1.2.tar.gz (43.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_html2md-0.1.2-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file fast_html2md-0.1.2.tar.gz.

File metadata

  • Download URL: fast_html2md-0.1.2.tar.gz
  • Upload date:
  • Size: 43.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for fast_html2md-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a2d55d58981fa367ab650deaa246a05bbf86d1b0b0fca6e5b1ada3c87e035bee
MD5 75517879620f15664218c55c74c99d24
BLAKE2b-256 d304b34a61ed101c1585fa5aa27314cbcce0a9b600cdc3ad931a9ac2b5f6296b

See more details on using hashes here.

File details

Details for the file fast_html2md-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: fast_html2md-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for fast_html2md-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 506c578d7c59c07579ee10dc300f6502a75fb83a9c610a3b53ec18f9c10badd1
MD5 425333afe4141a1505a25b56a1b59e09
BLAKE2b-256 c3340f4f9a011bc4cd60995bf3690d843e97e2be28948b89dc4b8806fcd12b92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page