Restore heading hierarchy in markdown documents using a fine-tuned Qwen3-0.6B model
Project description
The problem
PDF-to-markdown tools like MinerU, Docling, and Marker do great text extraction — then collapse your document structure. Every heading becomes # or ##. TOCs break. RAG chunking breaks. Navigation breaks.
md-reheader fixes it. A 0.6B-parameter Qwen3 fine-tune reads the document and predicts the correct H1–H6 level for every heading in a single forward pass.
Quick start
CLI
pip install md-reheader
rehead --input flat.md --output fixed.md
Auto-detects CUDA. Use --cpu or --gpu to override. Omit --output to stream to stdout — pipe-friendly for integration with other CLIs.
rehead -i flat.md | tee fixed.md # pipe
rehead -i flat.md --gpu -o out/fixed.md # creates nested dirs
rehead -i flat.md --force -o existing.md # overwrite
rehead --help # all flags
Python API
from md_reheader.inference.predict import load_model, reheader_document
model, tokenizer = load_model("joelbarmettler/md-reheader")
flat = open("document.md").read()
fixed = reheader_document(flat, model, tokenizer)
The package handles preprocessing (flattening + body stripping) and postprocessing (applying predicted levels back to the original document) automatically.
Direct transformers usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("joelbarmettler/md-reheader")
model = AutoModelForCausalLM.from_pretrained(
"joelbarmettler/md-reheader",
dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a markdown document structure expert. Given a markdown document with incorrect or flattened heading levels, output each heading with its correct markdown prefix (# for level 1, ## for level 2, etc.), one per line."},
{"role": "user", "content": "# Introduction\n\nSome text...\n\n# Background\n\nMore text...\n\n# Methods"},
]
input_text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
generated = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))
Important: pass
enable_thinking=Falsetoapply_chat_template. Without it, the model enters a repetition loop because training used the non-thinking chat format.
Self-host with vLLM
pip install vllm
vllm serve joelbarmettler/md-reheader --dtype bfloat16 --max-model-len 8192
Higher throughput than raw transformers and drop-in OpenAI-compatible clients. On <10 GB cards add --enforce-eager --gpu-memory-utilization 0.70 to skip CUDA-graph allocations.
Remote inference (any OpenAI-compatible endpoint)
Once a server is running, use md-reheader as a thin client — no local weights needed.
rehead -i flat.md -o fixed.md --endpoint http://localhost:8000/v1
rehead -i flat.md -o fixed.md --endpoint https://api.example.com/v1 --api-key sk-xxx
# or set MD_REHEADER_API_KEY in the environment
from md_reheader.inference.remote import reheader_document_remote
fixed = reheader_document_remote(
open("flat.md").read(),
endpoint="http://localhost:8000/v1",
model="joelbarmettler/md-reheader",
)
Identical output to local inference. Preprocessing (flatten + strip) happens client-side; the server just runs the chat completion with chat_template_kwargs={"enable_thinking": false} to match training.
How it works
flat markdown ──► flatten headings to # ──► strip body to 128+128 tokens
│
▼
restored markdown ◄── apply predicted levels ◄── Qwen3-0.6B (fine-tuned)
- Extract headings with markdown-it-py — correctly skips code blocks.
- Flatten every heading to
#— the model ignores input levels. - Strip each section's body to its first 128 + last 128 tokens — preserves structural cues, kills context bloat.
- Qwen3-0.6B predicts the correct
#prefix per heading. - Levels get mapped back to the original document.
Evaluation
Benchmarked on 7,321 held-out documents from GitHub markdown and Wikipedia.
| Metric | All-H1 baseline | Heuristic | md-reheader |
|---|---|---|---|
| Exact match | 0.0% | 14.5% | 56.1% |
| Per-heading accuracy | 13.1% | 49.1% | 80.6% |
| Hierarchy preservation | 61.3% | 68.6% | 91.0% |
| Mean absolute error | 1.38 | 0.62 | 0.22 |
Per-level accuracy
| H1 | H2 | H3 | H4 | H5 | H6 | |
|---|---|---|---|---|---|---|
| Accuracy | 77% | 85% | 78% | 68% | 45% | 50% |
H1–H3 land in the 77–85% band; H5/H6 drop but still beat baselines. Most deep-level errors are off-by-one — the relative structure survives.
By document depth
| Max depth | Exact match | Per-heading accuracy | Hierarchy |
|---|---|---|---|
| Depth 2 | 83% | 91% | 95% |
| Depth 3 | 54% | 82% | 90% |
| Depth 4 | 32% | 70% | 88% |
| Depth 5-6 | 33% | 65% | 89% |
By source
| Source | Exact match | Per-heading accuracy |
|---|---|---|
| GitHub markdown | 49.5% | 74.0% |
| Wikipedia | 71.3% | 95.5% |
Speed
| Document size | RTX 4090 (BF16) | CPU (fp32) |
|---|---|---|
| < 1k tokens | 0.4s | 5s |
| 1k–2k tokens | 0.8s | 10s |
| 2k–4k tokens | 1.4s | ~20s |
| 4k–8k tokens | 3.4s | ~60s |
Documents longer than ~8k tokens (after stripping) are truncated from the tail.
Limitations
- Deep nesting (H5/H6) — accuracy drops to 45–50%. Relative structure is preserved; absolute depth gets compressed by 1–2 levels.
- Ambiguous structure — heading levels are subjective. The model learns common conventions; it can't resolve genuine ambiguity.
- Long documents — >8k tokens (after stripping) get truncated. Headings past the cutoff retain their input levels.
- English-centric — trained primarily on English content.
Reproducing training
git clone https://github.com/joelbarmettlerUZH/md-reheader.git
cd md-reheader
uv sync --extra train # install training dependencies
make download # download raw data (~150k documents)
make prepare # strip, flatten, oversample, format
make train # train on 2x GPU with Axolotl
make eval # evaluate on test set
The model is a fine-tune of Qwen/Qwen3-0.6B trained on ~197k markdown documents:
- codeparrot/github-code — ~105k markdown files from GitHub repositories
- euirim/goodwiki — ~45k Wikipedia articles
- Deep documents (depth 4+) oversampled 2–8× for class balance
Trained with Axolotl on 2× RTX 4090 using DDP, BF16, 8k sequence length with sample packing.
License
Code and model weights: Apache 2.0. Training data includes Wikipedia content (CC BY-SA 4.0) and GitHub repositories (various open-source licenses).
Citation
@software{barmettler2026mdreheader,
author = {Barmettler, Joel},
title = {md-reheader: Restoring Heading Hierarchy in Markdown Documents},
year = {2026},
url = {https://github.com/joelbarmettlerUZH/md-reheader}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file md_reheader-0.2.2.tar.gz.
File metadata
- Download URL: md_reheader-0.2.2.tar.gz
- Upload date:
- Size: 440.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a0585635158f692ae34a587f085af56fed5ce9c7c9c1809a8356834e7aee4dc
|
|
| MD5 |
0f9a01fe7c2f2a483c7e853d297d8a66
|
|
| BLAKE2b-256 |
466b7ab0c024e87429fc8f526181641384e36dbcbe69f729ea8cd613785573ba
|
Provenance
The following attestation bundles were made for md_reheader-0.2.2.tar.gz:
Publisher:
release.yml on joelbarmettlerUZH/md-reheader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
md_reheader-0.2.2.tar.gz -
Subject digest:
3a0585635158f692ae34a587f085af56fed5ce9c7c9c1809a8356834e7aee4dc - Sigstore transparency entry: 1328012348
- Sigstore integration time:
-
Permalink:
joelbarmettlerUZH/md-reheader@2d6ed46794c445869f928bcc305a71b2508da771 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/joelbarmettlerUZH
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2d6ed46794c445869f928bcc305a71b2508da771 -
Trigger Event:
push
-
Statement type:
File details
Details for the file md_reheader-0.2.2-py3-none-any.whl.
File metadata
- Download URL: md_reheader-0.2.2-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b7af9d334e9d1dbb0bd9915eec2b1735417dd5161f9c1040d01355e90a90055
|
|
| MD5 |
8521855955937400c8ebb337ff0be331
|
|
| BLAKE2b-256 |
da90add26bfcb7c9686150b32e5f03c2faebfe716362cb27b467f1c3d0fb36ce
|
Provenance
The following attestation bundles were made for md_reheader-0.2.2-py3-none-any.whl:
Publisher:
release.yml on joelbarmettlerUZH/md-reheader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
md_reheader-0.2.2-py3-none-any.whl -
Subject digest:
7b7af9d334e9d1dbb0bd9915eec2b1735417dd5161f9c1040d01355e90a90055 - Sigstore transparency entry: 1328012412
- Sigstore integration time:
-
Permalink:
joelbarmettlerUZH/md-reheader@2d6ed46794c445869f928bcc305a71b2508da771 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/joelbarmettlerUZH
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2d6ed46794c445869f928bcc305a71b2508da771 -
Trigger Event:
push
-
Statement type: