Lightweight HTML-to-Markdown tooling for agent workflows.
Project description
markmaton
markmaton is a lightweight HTML-to-Markdown parser core built for agent workflows.
It solves the last-mile parsing problem in a web pipeline: you already have page HTML,
but it is still too noisy and awkward for downstream agent use. Feed markmaton
HTML from a fetcher or browser layer and get back cleaner Markdown, metadata, links,
images, and quality signals.
[!NOTE]
markmatonis a general parser, not a crawler. Feed it HTML from Playwright,fetch, Firecrawl, or another upstream page-visit tool.
Why it exists
- Raw page HTML is usually not directly useful for downstream agent workflows.
- Modern pages often mix the real content with navigation, overlays, cards, and app shell chrome.
markmatonkeeps that cleanup and conversion step deterministic and separate from crawling.- The project stays narrow by design: no crawling, browser control, network, or LLM features.
- The user-facing entrypoint is a Python CLI and API wrapped around a fast Go engine.
Install
pip
pip install markmaton
uv tool
uv tool install markmaton
[!TIP] The installed package works through plain
pip. Local development usesuvwith Python 3.12.
Quickstart
CLI
markmaton convert \
--html-file page.html \
--url https://example.com/article \
--output-format markdown
To get the full structured response:
markmaton convert \
--html-file page.html \
--url https://example.com/article \
--output-format json
Python API
from markmaton import ConvertOptions, ConvertRequest, convert_html
html = "<article><h1>Hello</h1><p>World</p></article>"
response = convert_html(
ConvertRequest(
html=html,
url="https://example.com/article",
options=ConvertOptions(only_main_content=True),
)
)
print(response.markdown)
print(response.metadata.title)
[!TIP] Pass
urlwhenever you can.markmatonuses it as parsing context for canonical metadata and absolute link resolution.
Output
JSON mode returns markdown, html_clean, metadata, links, images, and quality. See response shape for details.
Project shape
- Go engine:
cmd/markmaton-engine - Python wrapper and CLI:
markmaton/ - Parser fixtures and golden files:
testdata/ - Research, benchmark, and release docs:
docs/
Documentation
- Documentation index
- Usage guide
- Packaging layout
- PyPI release path
- Benchmark workflow
- Benchmark matrix
- AI agent skill — for using
markmatoninside an agent workflow
Development
Set up the local development environment:
uv sync --group dev
Run the core test suites:
uv run python -m unittest discover -s tests -p 'test_*.py'
go test ./...
For a manual end-to-end smoke:
The repo is pinned to:
- Python
3.12via.python-version - a committed
uv.lock
[!IMPORTANT] Automated tests are unit-test-first. Live page visits and benchmarks are manual.
Release notes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markmaton-0.1.7.tar.gz.
File metadata
- Download URL: markmaton-0.1.7.tar.gz
- Upload date:
- Size: 359.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b31065ff22942425fa631a6eded1370a62ee5ad2c5416ce2c016de683b54c05
|
|
| MD5 |
3bde69878acfc53ef78f076ee71d66f4
|
|
| BLAKE2b-256 |
46d7e5cfdb98023ce3a90ce90edacbcfd224531964c52474e9d72a6ce4c77c20
|
Provenance
The following attestation bundles were made for markmaton-0.1.7.tar.gz:
Publisher:
workflow.yml on appautomaton/markmaton
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markmaton-0.1.7.tar.gz -
Subject digest:
9b31065ff22942425fa631a6eded1370a62ee5ad2c5416ce2c016de683b54c05 - Sigstore transparency entry: 1291973221
- Sigstore integration time:
-
Permalink:
appautomaton/markmaton@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/appautomaton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Trigger Event:
push
-
Statement type:
File details
Details for the file markmaton-0.1.7-py3-none-win_amd64.whl.
File metadata
- Download URL: markmaton-0.1.7-py3-none-win_amd64.whl
- Upload date:
- Size: 3.9 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
850111c9285d6c864f16a087bed84617f9c50ef2365767e20f04b9680250e122
|
|
| MD5 |
69c403cd1aa2716a96716c8320f55614
|
|
| BLAKE2b-256 |
9c9f4d6ac10fa01affafe089d6d45edd455d9dec496937ff5d4092762aa1e4b2
|
Provenance
The following attestation bundles were made for markmaton-0.1.7-py3-none-win_amd64.whl:
Publisher:
workflow.yml on appautomaton/markmaton
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markmaton-0.1.7-py3-none-win_amd64.whl -
Subject digest:
850111c9285d6c864f16a087bed84617f9c50ef2365767e20f04b9680250e122 - Sigstore transparency entry: 1291973372
- Sigstore integration time:
-
Permalink:
appautomaton/markmaton@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/appautomaton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Trigger Event:
push
-
Statement type:
File details
Details for the file markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl.
File metadata
- Download URL: markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl
- Upload date:
- Size: 3.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
106bbb1f4975db82e0eea363cfcec2f8d21d0ef28acfe6304be230bc174347f2
|
|
| MD5 |
35ad22a12195960901bd62c583b080e2
|
|
| BLAKE2b-256 |
9e59d3bce4c0bc66c251e9b414980f974fd0b4d1727d03016534610b4c5ec154
|
Provenance
The following attestation bundles were made for markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl:
Publisher:
workflow.yml on appautomaton/markmaton
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl -
Subject digest:
106bbb1f4975db82e0eea363cfcec2f8d21d0ef28acfe6304be230bc174347f2 - Sigstore transparency entry: 1291973516
- Sigstore integration time:
-
Permalink:
appautomaton/markmaton@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/appautomaton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Trigger Event:
push
-
Statement type:
File details
Details for the file markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl.
File metadata
- Download URL: markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl
- Upload date:
- Size: 4.0 MB
- Tags: Python 3, macOS 12.0+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acea5b1da79386744c939ff260b32b4fc10309f8dc27d07616bc301aa463b01e
|
|
| MD5 |
1e7262d1055e34cc21489168089e44c8
|
|
| BLAKE2b-256 |
ad6f18b1d5b5a07511559ad1eb0fa395793066bfdaf2699c0097f5dc344529d7
|
Provenance
The following attestation bundles were made for markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl:
Publisher:
workflow.yml on appautomaton/markmaton
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl -
Subject digest:
acea5b1da79386744c939ff260b32b4fc10309f8dc27d07616bc301aa463b01e - Sigstore transparency entry: 1291973279
- Sigstore integration time:
-
Permalink:
appautomaton/markmaton@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/appautomaton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Trigger Event:
push
-
Statement type:
File details
Details for the file markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl.
File metadata
- Download URL: markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl
- Upload date:
- Size: 3.8 MB
- Tags: Python 3, macOS 12.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a67a309ebe20615972f224fcbb298f361b61416ab2f36ae6bc61aec75559463c
|
|
| MD5 |
6f2f0ef319df42cf001da161df3cdc07
|
|
| BLAKE2b-256 |
934ff535d04021f846d127e7fbb05c6b2346add3244d8a8756ffae44ec73387b
|
Provenance
The following attestation bundles were made for markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl:
Publisher:
workflow.yml on appautomaton/markmaton
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl -
Subject digest:
a67a309ebe20615972f224fcbb298f361b61416ab2f36ae6bc61aec75559463c - Sigstore transparency entry: 1291973454
- Sigstore integration time:
-
Permalink:
appautomaton/markmaton@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Branch / Tag:
refs/tags/v0.1.7 - Owner: https://github.com/appautomaton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@b315ceea7e53b371e856f3fd37ff4ac12ee4896c -
Trigger Event:
push
-
Statement type: