Restore heading hierarchy in markdown documents using a fine-tuned Qwen3-0.6B model

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

joelbarmettlerUZH

These details have not been verified by PyPI

Project links

Project description

md-reheader

Restore heading hierarchy in markdown documents with a fine-tuned 0.6B LLM.

The problem

PDF-to-markdown tools like MinerU, Docling, and Marker do great text extraction — then collapse your document structure. Every heading becomes # or ##. TOCs break. RAG chunking breaks. Navigation breaks.

md-reheader fixes it. A 0.6B-parameter Qwen3 fine-tune reads the document and predicts the correct H1–H6 level for every heading in a single forward pass.

Source PDF → md-reheader → hierarchy restored vs. flat PDF-parser output

Quick start

CLI

pip install md-reheader
rehead --input flat.md --output fixed.md

Auto-detects CUDA. Use --cpu or --gpu to override. Omit --output to stream to stdout — pipe-friendly for integration with other CLIs.

rehead -i flat.md | tee fixed.md               # pipe
rehead -i flat.md --gpu -o out/fixed.md        # creates nested dirs
rehead -i flat.md --force -o existing.md       # overwrite
rehead --help                                   # all flags

Python API

from md_reheader.inference.predict import load_model, reheader_document

model, tokenizer = load_model("joelbarmettler/md-reheader")

flat = open("document.md").read()
fixed = reheader_document(flat, model, tokenizer)

The package handles preprocessing (flattening + body stripping) and postprocessing (applying predicted levels back to the original document) automatically.

Direct `transformers` usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("joelbarmettler/md-reheader")
model = AutoModelForCausalLM.from_pretrained(
    "joelbarmettler/md-reheader",
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a markdown document structure expert. Given a markdown document with incorrect or flattened heading levels, output each heading with its correct markdown prefix (# for level 1, ## for level 2, etc.), one per line."},
    {"role": "user", "content": "# Introduction\n\nSome text...\n\n# Background\n\nMore text...\n\n# Methods"},
]

input_text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)

generated = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Important: pass enable_thinking=False to apply_chat_template. Without it, the model enters a repetition loop because training used the non-thinking chat format.

Self-host with vLLM

pip install vllm
vllm serve joelbarmettler/md-reheader --dtype bfloat16 --max-model-len 8192

Higher throughput than raw transformers and drop-in OpenAI-compatible clients. On <10 GB cards add --enforce-eager --gpu-memory-utilization 0.70 to skip CUDA-graph allocations.

Remote inference (any OpenAI-compatible endpoint)

Once a server is running, use md-reheader as a thin client — no local weights needed.

rehead -i flat.md -o fixed.md --endpoint http://localhost:8000/v1
rehead -i flat.md -o fixed.md --endpoint https://api.example.com/v1 --api-key sk-xxx
# or set MD_REHEADER_API_KEY in the environment

from md_reheader.inference.remote import reheader_document_remote

fixed = reheader_document_remote(
    open("flat.md").read(),
    endpoint="http://localhost:8000/v1",
    model="joelbarmettler/md-reheader",
)

Identical output to local inference. Preprocessing (flatten + strip) happens client-side; the server just runs the chat completion with chat_template_kwargs={"enable_thinking": false} to match training.

How it works

flat markdown  ──►  flatten headings to #  ──►  strip body to 128+128 tokens
                                                         │
                                                         ▼
        restored markdown  ◄──  apply predicted levels  ◄── Qwen3-0.6B (fine-tuned)

Extract headings with markdown-it-py — correctly skips code blocks.
Flatten every heading to # — the model ignores input levels.
Strip each section's body to its first 128 + last 128 tokens — preserves structural cues, kills context bloat.
Qwen3-0.6B predicts the correct # prefix per heading.
Levels get mapped back to the original document.

Evaluation

Benchmarked on 7,321 held-out documents from GitHub markdown and Wikipedia.

Metric	All-H1 baseline	Heuristic	md-reheader
Exact match	0.0%	14.5%	56.1%
Per-heading accuracy	13.1%	49.1%	80.6%
Hierarchy preservation	61.3%	68.6%	91.0%
Mean absolute error	1.38	0.62	0.22

Per-level accuracy

	H1	H2	H3	H4	H5	H6
Accuracy	77%	85%	78%	68%	45%	50%

H1–H3 land in the 77–85% band; H5/H6 drop but still beat baselines. Most deep-level errors are off-by-one — the relative structure survives.

By document depth

Max depth	Exact match	Per-heading accuracy	Hierarchy
Depth 2	83%	91%	95%
Depth 3	54%	82%	90%
Depth 4	32%	70%	88%
Depth 5-6	33%	65%	89%

By source

Source	Exact match	Per-heading accuracy
GitHub markdown	49.5%	74.0%
Wikipedia	71.3%	95.5%

Speed

Document size	RTX 4090 (BF16)	CPU (fp32)
< 1k tokens	0.4s	5s
1k–2k tokens	0.8s	10s
2k–4k tokens	1.4s	~20s
4k–8k tokens	3.4s	~60s

Documents longer than ~8k tokens (after stripping) are truncated from the tail.

Limitations

Deep nesting (H5/H6) — accuracy drops to 45–50%. Relative structure is preserved; absolute depth gets compressed by 1–2 levels.
Ambiguous structure — heading levels are subjective. The model learns common conventions; it can't resolve genuine ambiguity.
Long documents — >8k tokens (after stripping) get truncated. Headings past the cutoff retain their input levels.
English-centric — trained primarily on English content.

Reproducing training

git clone https://github.com/joelbarmettlerUZH/md-reheader.git
cd md-reheader

uv sync --extra train    # install training dependencies
make download            # download raw data (~150k documents)
make prepare             # strip, flatten, oversample, format
make train               # train on 2x GPU with Axolotl
make eval                # evaluate on test set

The model is a fine-tune of Qwen/Qwen3-0.6B trained on ~197k markdown documents:

codeparrot/github-code — ~105k markdown files from GitHub repositories
euirim/goodwiki — ~45k Wikipedia articles
Deep documents (depth 4+) oversampled 2–8× for class balance

Trained with Axolotl on 2× RTX 4090 using DDP, BF16, 8k sequence length with sample packing.

License

Code and model weights: Apache 2.0. Training data includes Wikipedia content (CC BY-SA 4.0) and GitHub repositories (various open-source licenses).

Citation

@software{barmettler2026mdreheader,
  author = {Barmettler, Joel},
  title  = {md-reheader: Restoring Heading Hierarchy in Markdown Documents},
  year   = {2026},
  url    = {https://github.com/joelbarmettlerUZH/md-reheader}
}

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

joelbarmettlerUZH

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.2

Apr 17, 2026

0.2.1

Apr 17, 2026

0.1.3

Apr 5, 2026

0.1.2

Apr 5, 2026

0.1.1

Apr 5, 2026

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_reheader-0.2.2.tar.gz (440.9 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

md_reheader-0.2.2-py3-none-any.whl (21.4 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file md_reheader-0.2.2.tar.gz.

File metadata

Download URL: md_reheader-0.2.2.tar.gz
Upload date: Apr 17, 2026
Size: 440.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_reheader-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`3a0585635158f692ae34a587f085af56fed5ce9c7c9c1809a8356834e7aee4dc`
MD5	`0f9a01fe7c2f2a483c7e853d297d8a66`
BLAKE2b-256	`466b7ab0c024e87429fc8f526181641384e36dbcbe69f729ea8cd613785573ba`

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_reheader-0.2.2.tar.gz:

Publisher: release.yml on joelbarmettlerUZH/md-reheader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: md_reheader-0.2.2.tar.gz
- Subject digest: 3a0585635158f692ae34a587f085af56fed5ce9c7c9c1809a8356834e7aee4dc
- Sigstore transparency entry: 1328012348
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: joelbarmettlerUZH/md-reheader@2d6ed46794c445869f928bcc305a71b2508da771
- Branch / Tag: refs/heads/main
- Owner: https://github.com/joelbarmettlerUZH
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2d6ed46794c445869f928bcc305a71b2508da771
- Trigger Event: push

File details

Details for the file md_reheader-0.2.2-py3-none-any.whl.

File metadata

Download URL: md_reheader-0.2.2-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 21.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_reheader-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b7af9d334e9d1dbb0bd9915eec2b1735417dd5161f9c1040d01355e90a90055`
MD5	`8521855955937400c8ebb337ff0be331`
BLAKE2b-256	`da90add26bfcb7c9686150b32e5f03c2faebfe716362cb27b467f1c3d0fb36ce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_reheader-0.2.2-py3-none-any.whl:

Publisher: release.yml on joelbarmettlerUZH/md-reheader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: md_reheader-0.2.2-py3-none-any.whl
- Subject digest: 7b7af9d334e9d1dbb0bd9915eec2b1735417dd5161f9c1040d01355e90a90055
- Sigstore transparency entry: 1328012412
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: joelbarmettlerUZH/md-reheader@2d6ed46794c445869f928bcc305a71b2508da771
- Branch / Tag: refs/heads/main
- Owner: https://github.com/joelbarmettlerUZH
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@2d6ed46794c445869f928bcc305a71b2508da771
- Trigger Event: push

md-reheader 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

md-reheader

The problem

Quick start

CLI

Python API

Direct transformers usage

Self-host with vLLM

Remote inference (any OpenAI-compatible endpoint)

How it works

Evaluation

Per-level accuracy

By document depth

By source

Speed

Limitations

Reproducing training

License

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Direct `transformers` usage