Convert HTML to Markdown for LLM input extraction
Project description
fast-html2md

Convert HTML to Markdown for LLM input extraction.
Installation
# use pip
pip install fast-html2md
# or use poetry
poetry add fast-html2md
# or use uv
uv add fast-html2md
Usage
from fast_html2md import HTMLToMarkdown
converter = HTMLToMarkdown()
html = """
<!DOCTYPE html>
<html>
<body>
<h1 id="title" data-updated="20201101">Hi there</h1>
<div class="post">
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
</div>
<div class="post">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
</div>
</body>
</html>
"""
markdown = converter.convert(html)
print(markdown)
# Count tokens
token_count = converter.count_tokens(markdown)
print(f"Token count: {token_count}")
# Compute cost
cost = converter.compute_cost(token_count)
print(f"Estimated cost: ${cost:.6f}")
Features
- Fast HTML to Markdown conversion
- Optimized for LLM input processing
- Built-in token counting using tiktoken
- Clean and minimal output
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fast_html2md-0.1.3.tar.gz
(34.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fast_html2md-0.1.3.tar.gz.
File metadata
- Download URL: fast_html2md-0.1.3.tar.gz
- Upload date:
- Size: 34.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b72ac7a75f2f4f3748ccf95799473cb155f3b852382f6885382e0365fa954134
|
|
| MD5 |
389c7b077d4ced2ae7a92667fb1e1f1e
|
|
| BLAKE2b-256 |
3d8e6107a44ac1c93e19d5485d4736d64f9fc18ee1cf0b6717340c1401fe44db
|
File details
Details for the file fast_html2md-0.1.3-py3-none-any.whl.
File metadata
- Download URL: fast_html2md-0.1.3-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
695615e3a808696ca7c864e57a64f5e6983848349cd6de16c2c10665ae10e0cf
|
|
| MD5 |
92636446f4b329f993398b9965dbee9a
|
|
| BLAKE2b-256 |
b564c3d2de9b9c7b365e64cc5de608eee6eeedbaf24af6a3d002c0f7649d25fc
|