Skip to main content

Restore heading hierarchy in markdown documents using a fine-tuned Qwen3-0.6B model

Project description

md-reheader

Restore heading hierarchy in markdown documents with a fine-tuned 0.6B LLM.

PyPI Python 3.12+ Apache 2.0 HuggingFace Model HuggingFace Dataset GitHub stars

Exact Match 56% Per-Heading Accuracy 81% Hierarchy Preservation 91% 0.6B parameters


The problem

PDF-to-markdown tools like MinerU, Docling, and Marker do great text extraction — then collapse your document structure. Every heading becomes # or ##. TOCs break. RAG chunking breaks. Navigation breaks.

md-reheader fixes it. A 0.6B-parameter Qwen3 fine-tune reads the document and predicts the correct H1–H6 level for every heading in a single forward pass.

Before (flat output from a PDF parser)

# API Reference
# Authentication
# Endpoints
# Users
# List Users
# Get User by ID
# Projects
# List Projects
# Error Handling

After (restored by md-reheader)

# API Reference
## Authentication
## Endpoints
### Users
#### List Users
#### Get User by ID
### Projects
#### List Projects
## Error Handling

Quick start

CLI

pip install md-reheader
rehead --input flat.md --output fixed.md

Auto-detects CUDA. Use --cpu or --gpu to override. Omit --output to stream to stdout — pipe-friendly for integration with other CLIs.

rehead -i flat.md | tee fixed.md               # pipe
rehead -i flat.md --gpu -o out/fixed.md        # creates nested dirs
rehead -i flat.md --force -o existing.md       # overwrite
rehead --help                                   # all flags

Python API

from md_reheader.inference.predict import load_model, reheader_document

model, tokenizer = load_model("joelbarmettler/md-reheader")

flat = open("document.md").read()
fixed = reheader_document(flat, model, tokenizer)

The package handles preprocessing (flattening + body stripping) and postprocessing (applying predicted levels back to the original document) automatically.

Direct transformers usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("joelbarmettler/md-reheader")
model = AutoModelForCausalLM.from_pretrained(
    "joelbarmettler/md-reheader",
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a markdown document structure expert. Given a markdown document with incorrect or flattened heading levels, output each heading with its correct markdown prefix (# for level 1, ## for level 2, etc.), one per line."},
    {"role": "user", "content": "# Introduction\n\nSome text...\n\n# Background\n\nMore text...\n\n# Methods"},
]

input_text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False,
)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4096, do_sample=False)

generated = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))

Important: pass enable_thinking=False to apply_chat_template. Without it, the model enters a repetition loop because training used the non-thinking chat format.

Self-host with vLLM

pip install vllm
vllm serve joelbarmettler/md-reheader --dtype bfloat16 --max-model-len 8192

Higher throughput than raw transformers and drop-in OpenAI-compatible clients. On <10 GB cards add --enforce-eager --gpu-memory-utilization 0.70 to skip CUDA-graph allocations.

Remote inference (any OpenAI-compatible endpoint)

Once a server is running, use md-reheader as a thin client — no local weights needed.

rehead -i flat.md -o fixed.md --endpoint http://localhost:8000/v1
rehead -i flat.md -o fixed.md --endpoint https://api.example.com/v1 --api-key sk-xxx
# or set MD_REHEADER_API_KEY in the environment
from md_reheader.inference.remote import reheader_document_remote

fixed = reheader_document_remote(
    open("flat.md").read(),
    endpoint="http://localhost:8000/v1",
    model="joelbarmettler/md-reheader",
)

Identical output to local inference. Preprocessing (flatten + strip) happens client-side; the server just runs the chat completion with chat_template_kwargs={"enable_thinking": false} to match training.


How it works

flat markdown  ──►  flatten headings to #  ──►  strip body to 128+128 tokens
                                                         │
                                                         ▼
        restored markdown  ◄──  apply predicted levels  ◄── Qwen3-0.6B (fine-tuned)
  1. Extract headings with markdown-it-py — correctly skips code blocks.
  2. Flatten every heading to # — the model ignores input levels.
  3. Strip each section's body to its first 128 + last 128 tokens — preserves structural cues, kills context bloat.
  4. Qwen3-0.6B predicts the correct # prefix per heading.
  5. Levels get mapped back to the original document.

Evaluation

Benchmarked on 7,321 held-out documents from GitHub markdown and Wikipedia.

Metric All-H1 baseline Heuristic md-reheader
Exact match 0.0% 14.5% 56.1%
Per-heading accuracy 13.1% 49.1% 80.6%
Hierarchy preservation 61.3% 68.6% 91.0%
Mean absolute error 1.38 0.62 0.22

Per-level accuracy

H1 H2 H3 H4 H5 H6
Accuracy 77% 85% 78% 68% 45% 50%

H1–H3 land in the 77–85% band; H5/H6 drop but still beat baselines. Most deep-level errors are off-by-one — the relative structure survives.

By document depth

Max depth Exact match Per-heading accuracy Hierarchy
Depth 2 83% 91% 95%
Depth 3 54% 82% 90%
Depth 4 32% 70% 88%
Depth 5-6 33% 65% 89%

By source

Source Exact match Per-heading accuracy
GitHub markdown 49.5% 74.0%
Wikipedia 71.3% 95.5%

Speed

Document size RTX 4090 (BF16) CPU (fp32)
< 1k tokens 0.4s 5s
1k–2k tokens 0.8s 10s
2k–4k tokens 1.4s ~20s
4k–8k tokens 3.4s ~60s

Documents longer than ~8k tokens (after stripping) are truncated from the tail.


Limitations

  • Deep nesting (H5/H6) — accuracy drops to 45–50%. Relative structure is preserved; absolute depth gets compressed by 1–2 levels.
  • Ambiguous structure — heading levels are subjective. The model learns common conventions; it can't resolve genuine ambiguity.
  • Long documents — >8k tokens (after stripping) get truncated. Headings past the cutoff retain their input levels.
  • English-centric — trained primarily on English content.

Reproducing training

git clone https://github.com/joelbarmettlerUZH/md-reheader.git
cd md-reheader

uv sync --extra train    # install training dependencies
make download            # download raw data (~150k documents)
make prepare             # strip, flatten, oversample, format
make train               # train on 2x GPU with Axolotl
make eval                # evaluate on test set

The model is a fine-tune of Qwen/Qwen3-0.6B trained on ~197k markdown documents:

  • codeparrot/github-code — ~105k markdown files from GitHub repositories
  • euirim/goodwiki — ~45k Wikipedia articles
  • Deep documents (depth 4+) oversampled 2–8× for class balance

Trained with Axolotl on 2× RTX 4090 using DDP, BF16, 8k sequence length with sample packing.


License

Code and model weights: Apache 2.0. Training data includes Wikipedia content (CC BY-SA 4.0) and GitHub repositories (various open-source licenses).


Citation

@software{barmettler2026mdreheader,
  author = {Barmettler, Joel},
  title  = {md-reheader: Restoring Heading Hierarchy in Markdown Documents},
  year   = {2026},
  url    = {https://github.com/joelbarmettlerUZH/md-reheader}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_reheader-0.2.1.tar.gz (102.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

md_reheader-0.2.1-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file md_reheader-0.2.1.tar.gz.

File metadata

  • Download URL: md_reheader-0.2.1.tar.gz
  • Upload date:
  • Size: 102.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_reheader-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4776d370b493f27845403d48683c39db4af08d2bf51a5e5abfc322815310df6a
MD5 4f6858c70dc41e44cbd0d46a9d447724
BLAKE2b-256 ee8c303228856b5898fda4147d1dcdb0ed3820226ef6174736d9a30365b7059c

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_reheader-0.2.1.tar.gz:

Publisher: release.yml on joelbarmettlerUZH/md-reheader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file md_reheader-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: md_reheader-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_reheader-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e833387a71911cbbc89e134a6ac0eda1b25916eba578a33c9054891e77181b9d
MD5 516917d38ca9d24163cf49011f466366
BLAKE2b-256 e8332f318a4751a948483e7473b74a3977047db8efab28fe6bd6a1ac758f053e

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_reheader-0.2.1-py3-none-any.whl:

Publisher: release.yml on joelbarmettlerUZH/md-reheader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page