Convert HTML to Markdown for LLM input extraction
Project description
fast-html2md
Convert HTML to Markdown for LLM input extraction.
Installation
# use pip
pip install fast-html2md
# or use poetry
poetry add fast-html2md
# or use uv
uv add fast-html2md
Usage
from fast_html2md import HTMLToMarkdown
converter = HTMLToMarkdown()
html = """
<!DOCTYPE html>
<html>
<body>
<h1 id="title" data-updated="20201101">Hi there</h1>
<div class="post">
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
</div>
<div class="post">
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
</div>
</body>
</html>
"""
markdown = converter.convert(html)
print(markdown)
# Count tokens
token_count = converter.count_tokens(markdown)
print(f"Token count: {token_count}")
# Compute cost
cost = converter.compute_cost(token_count)
print(f"Estimated cost: ${cost:.6f}")
Features
- Fast HTML to Markdown conversion
- Optimized for LLM input processing
- Built-in token counting using tiktoken
- Clean and minimal output
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fast_html2md-0.1.2.tar.gz
(43.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fast_html2md-0.1.2.tar.gz.
File metadata
- Download URL: fast_html2md-0.1.2.tar.gz
- Upload date:
- Size: 43.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2d55d58981fa367ab650deaa246a05bbf86d1b0b0fca6e5b1ada3c87e035bee
|
|
| MD5 |
75517879620f15664218c55c74c99d24
|
|
| BLAKE2b-256 |
d304b34a61ed101c1585fa5aa27314cbcce0a9b600cdc3ad931a9ac2b5f6296b
|
File details
Details for the file fast_html2md-0.1.2-py3-none-any.whl.
File metadata
- Download URL: fast_html2md-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
506c578d7c59c07579ee10dc300f6502a75fb83a9c610a3b53ec18f9c10badd1
|
|
| MD5 |
425333afe4141a1505a25b56a1b59e09
|
|
| BLAKE2b-256 |
c3340f4f9a011bc4cd60995bf3690d843e97e2be28948b89dc4b8806fcd12b92
|