Skip to main content

Lightweight HTML-to-Markdown tooling for agent workflows.

Project description

markmaton

CI Release PyPI version Python versions

markmaton is a lightweight HTML-to-Markdown parser core built for agent workflows.

It solves the last-mile parsing problem in a web pipeline: you already have page HTML, but it is still too noisy and awkward for downstream agent use. Feed markmaton HTML from a fetcher or browser layer and get back cleaner Markdown, metadata, links, images, and quality signals.

[!NOTE] markmaton is a general parser, not a crawler. Feed it HTML from Playwright, fetch, Firecrawl, or another upstream page-visit tool.

Why it exists

  • Raw page HTML is usually not directly useful for downstream agent workflows.
  • Modern pages often mix the real content with navigation, overlays, cards, and app shell chrome.
  • markmaton keeps that cleanup and conversion step deterministic and separate from crawling.
  • The project stays narrow by design: no crawling, browser control, network, or LLM features.
  • The user-facing entrypoint is a Python CLI and API wrapped around a fast Go engine.

Install

pip

pip install markmaton

uv tool

uv tool install markmaton

[!TIP] The installed package works through plain pip. Local development uses uv with Python 3.12.

Quickstart

CLI

markmaton convert \
  --html-file page.html \
  --url https://example.com/article \
  --output-format markdown

To get the full structured response:

markmaton convert \
  --html-file page.html \
  --url https://example.com/article \
  --output-format json

Python API

from markmaton import ConvertOptions, ConvertRequest, convert_html

html = "<article><h1>Hello</h1><p>World</p></article>"

response = convert_html(
    ConvertRequest(
        html=html,
        url="https://example.com/article",
        options=ConvertOptions(only_main_content=True),
    )
)

print(response.markdown)
print(response.metadata.title)

[!TIP] Pass url whenever you can. markmaton uses it as parsing context for canonical metadata and absolute link resolution.

Output

JSON mode returns markdown, html_clean, metadata, links, images, and quality. See response shape for details.

Project shape

  • Go engine: cmd/markmaton-engine
  • Python wrapper and CLI: markmaton/
  • Parser fixtures and golden files: testdata/
  • Research, benchmark, and release docs: docs/

Documentation

Development

Set up the local development environment:

uv sync --group dev

Run the core test suites:

uv run python -m unittest discover -s tests -p 'test_*.py'
go test ./...

For a manual end-to-end smoke:

The repo is pinned to:

[!IMPORTANT] Automated tests are unit-test-first. Live page visits and benchmarks are manual.

Release notes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markmaton-0.1.7.tar.gz (359.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

markmaton-0.1.7-py3-none-win_amd64.whl (3.9 MB view details)

Uploaded Python 3Windows x86-64

markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl (3.9 MB view details)

Uploaded Python 3

markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl (4.0 MB view details)

Uploaded Python 3macOS 12.0+ x86-64

markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl (3.8 MB view details)

Uploaded Python 3macOS 12.0+ ARM64

File details

Details for the file markmaton-0.1.7.tar.gz.

File metadata

  • Download URL: markmaton-0.1.7.tar.gz
  • Upload date:
  • Size: 359.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markmaton-0.1.7.tar.gz
Algorithm Hash digest
SHA256 9b31065ff22942425fa631a6eded1370a62ee5ad2c5416ce2c016de683b54c05
MD5 3bde69878acfc53ef78f076ee71d66f4
BLAKE2b-256 46d7e5cfdb98023ce3a90ce90edacbcfd224531964c52474e9d72a6ce4c77c20

See more details on using hashes here.

Provenance

The following attestation bundles were made for markmaton-0.1.7.tar.gz:

Publisher: workflow.yml on appautomaton/markmaton

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markmaton-0.1.7-py3-none-win_amd64.whl.

File metadata

  • Download URL: markmaton-0.1.7-py3-none-win_amd64.whl
  • Upload date:
  • Size: 3.9 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markmaton-0.1.7-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 850111c9285d6c864f16a087bed84617f9c50ef2365767e20f04b9680250e122
MD5 69c403cd1aa2716a96716c8320f55614
BLAKE2b-256 9c9f4d6ac10fa01affafe089d6d45edd455d9dec496937ff5d4092762aa1e4b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for markmaton-0.1.7-py3-none-win_amd64.whl:

Publisher: workflow.yml on appautomaton/markmaton

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 106bbb1f4975db82e0eea363cfcec2f8d21d0ef28acfe6304be230bc174347f2
MD5 35ad22a12195960901bd62c583b080e2
BLAKE2b-256 9e59d3bce4c0bc66c251e9b414980f974fd0b4d1727d03016534610b4c5ec154

See more details on using hashes here.

Provenance

The following attestation bundles were made for markmaton-0.1.7-py3-none-manylinux2014_x86_64.whl:

Publisher: workflow.yml on appautomaton/markmaton

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 acea5b1da79386744c939ff260b32b4fc10309f8dc27d07616bc301aa463b01e
MD5 1e7262d1055e34cc21489168089e44c8
BLAKE2b-256 ad6f18b1d5b5a07511559ad1eb0fa395793066bfdaf2699c0097f5dc344529d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for markmaton-0.1.7-py3-none-macosx_12_0_x86_64.whl:

Publisher: workflow.yml on appautomaton/markmaton

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 a67a309ebe20615972f224fcbb298f361b61416ab2f36ae6bc61aec75559463c
MD5 6f2f0ef319df42cf001da161df3cdc07
BLAKE2b-256 934ff535d04021f846d127e7fbb05c6b2346add3244d8a8756ffae44ec73387b

See more details on using hashes here.

Provenance

The following attestation bundles were made for markmaton-0.1.7-py3-none-macosx_12_0_arm64.whl:

Publisher: workflow.yml on appautomaton/markmaton

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page