Skip to main content

Markdown processing utilities: strip, chunk, append, break, Word export, lint, and more.

Project description

markdown_hero

CI PyPI version Python versions License: MIT Typed

A Python library for processing Markdown — cleanup, chunking, Word (.docx) export, concatenation, splitting by delimiters, and validation.

Compatible with Python 3.10+. Target dialect: GitHub Flavored Markdown (GFM).

Installation

pip install markdown_hero
# with a real tokenizer (tiktoken) for accurate chunking:
pip install "markdown_hero[tokenizers]"

Main functions

Function What it does
strip(md, ...) Reduces Markdown to normalized plain text (no diacritics, no punctuation, lowercase).
extract_chunks(md, purpose=...) Splits the document respecting heading hierarchy, with rich metadata.
word_format(md, output) Exports to .docx with a fixed set of professional styles.
markdown_append(*paths, output) Concatenates files with heading shift and frontmatter merge.
markdown_break(path, delimiter, ...) Splits a file into N+1 parts. Accepts string, regex, or list.
markdown_merge(*paths, output) Smart append with section dedupe and TOC generation.
extract_* Frontmatter, links, images, tables, code blocks, headings, TOC.
lint(md) Detects skipped headings, duplicate anchors, unclosed fences.
CLI markdown-hero Command-line access.

See docs/reference.md for the full technical reference and docs/helpers.md for the index of internal utilities.

Quick example

from markdown_hero import strip, extract_chunks, word_format

text = "**Hello!** See [docs](https://x). p=2 captures both."
print(strip(text))
# "hello see docs p2 captures both"

chunks = extract_chunks(open("doc.md").read(), purpose="rag", max_tokens=512)
word_format("doc.md", "doc.docx")

CLI

markdown-hero strip doc.md -o doc.txt
markdown-hero chunk doc.md --purpose rag --max-tokens 512 -o chunks.json
markdown-hero word doc.md -o doc.docx
markdown-hero append a.md b.md -o merged.md
markdown-hero break doc.md "---" --output-dir parts/
markdown-hero lint doc.md

Documentation

Contact

Questions, suggestions, and reports:

  • Email: bernardo.leandro@gmail.com
  • Always include the prefix Markdown Hero: in the subject line so the message is routed correctly.

For security vulnerabilities follow the instructions in SECURITY.md (same email, same subject prefix).

License

MIT © 2026 Bernardo Leandro.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdown_hero-0.1.0.tar.gz (44.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdown_hero-0.1.0-py3-none-any.whl (33.2 kB view details)

Uploaded Python 3

File details

Details for the file markdown_hero-0.1.0.tar.gz.

File metadata

  • Download URL: markdown_hero-0.1.0.tar.gz
  • Upload date:
  • Size: 44.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for markdown_hero-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f8055328e8b7e14f836d46a98907487828a923bf41850c4edd37b8532b3968f
MD5 6128ed48a28bc20b283b891828823d85
BLAKE2b-256 371d6635115f8f48c57e23a654a28b4680f33b5eb199a069c1d0cf0ece5d2d0d

See more details on using hashes here.

File details

Details for the file markdown_hero-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: markdown_hero-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for markdown_hero-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dda264221400686c66aa64c7fd75ad9ba9232ad0cb6ef1de18b08c46c57f35ce
MD5 1845f78ed93546520296100d3d7e0319
BLAKE2b-256 a7a1098da1afc9cd67c06b021368881f2c62cbdf864c6f50533913fac2d28234

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page