Skip to main content

Convert HTML to Markdown for LLM input extraction

Project description

fast-html2md PyPI version Run Tests codecov

Convert HTML to Markdown for LLM input extraction.

Installation

# use pip
pip install fast-html2md

# or use poetry
poetry add fast-html2md

# or use uv
uv add fast-html2md

Usage

from fast_html2md import HTMLToMarkdown

converter = HTMLToMarkdown()

html = """
<!DOCTYPE html>
<html>
<body>
  <h1 id="title" data-updated="20201101">Hi there</h1>
  <div class="post">
    Lorem Ipsum is simply dummy text of the printing and typesetting industry.
  </div>
  <div class="post">
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
  </div>
</body>
</html>
"""

markdown = converter.convert(html)

print(markdown)

# Count tokens
token_count = converter.count_tokens(markdown)
print(f"Token count: {token_count}")

# Compute cost
cost = converter.compute_cost(token_count)
print(f"Estimated cost: ${cost:.6f}")

Features

  • Fast HTML to Markdown conversion
  • Optimized for LLM input processing
  • Built-in token counting using tiktoken
  • Clean and minimal output

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_html2md-0.1.3.tar.gz (34.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_html2md-0.1.3-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file fast_html2md-0.1.3.tar.gz.

File metadata

  • Download URL: fast_html2md-0.1.3.tar.gz
  • Upload date:
  • Size: 34.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for fast_html2md-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b72ac7a75f2f4f3748ccf95799473cb155f3b852382f6885382e0365fa954134
MD5 389c7b077d4ced2ae7a92667fb1e1f1e
BLAKE2b-256 3d8e6107a44ac1c93e19d5485d4736d64f9fc18ee1cf0b6717340c1401fe44db

See more details on using hashes here.

File details

Details for the file fast_html2md-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: fast_html2md-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.0

File hashes

Hashes for fast_html2md-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 695615e3a808696ca7c864e57a64f5e6983848349cd6de16c2c10665ae10e0cf
MD5 92636446f4b329f993398b9965dbee9a
BLAKE2b-256 b564c3d2de9b9c7b365e64cc5de608eee6eeedbaf24af6a3d002c0f7649d25fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page