Convert PDF files to high-quality Markdown using LLM vision models
Project description
pdfmark-ai
PDF to Markdown, powered by LLM vision.
Drop a PDF, get clean Markdown — tables, formulas, code, figures, all handled.
Demo · Installation · Quick Start · Configuration · CLI Reference
English | 简体中文
pdfmark-ai doesn't parse PDFs the traditional way. Instead, it renders each page as an image and lets multimodal LLMs (Claude, Kimi, Qwen, etc.) "read" it — just like a human would. The result? Clean, structured Markdown that handles what other tools simply can't: complex tables with merged cells, inline math formulas, source code blocks, embedded diagrams, and even blurry scans.
Demo
Real conversion results on academic papers and technical documents — no post-editing.
Image Extraction, Tables & Code
| PDF original — mixed figures, tables & code | Converted Markdown — images extracted, tables formatted |
| PDF original — tables & code blocks | Converted Markdown — syntax-highlighted code |
| PDF original — charts & formulas | Converted Markdown — chart images referenced |
Math Formulas & Blurred Content
| PDF original — dense math formulas | Converted Markdown — LaTeX `$...$` and `$$...$$` wrapping |
| PDF original — blurred / low-quality scan | Converted Markdown — content correctly recognized |
Features
- 🖼️ Vision-based extraction — treats each page as an image, handles complex layouts that traditional parsers miss
- 🧮 Math formulas — LaTeX rendering with automatic
$...$and$$...$$wrapping - 📊 Complex tables — merged cells, multi-row headers, nested structures
- 💻 Code blocks — syntax-appropriate formatting for source code
- ✂️ Image extraction —
--crop-imagesto crop figures and diagrams as separate files - 🔍 Blur tolerance — handles low-quality and blurred scans with high recognition accuracy
- 🤖 Multi-provider — Claude, Kimi, Xiaomi, Qwen, and any OpenAI-compatible API
- ⚡ Incremental caching — SHA-256 progressive cache avoids re-processing unchanged pages
Installation
pip install pdfmark-ai
Requirements
- Python >= 3.10
- An LLM API key (Anthropic, Kimi, Qwen, or OpenAI-compatible)
Quick Start
# Step 1: Generate config templates in your current directory
pdfmark --init
# Step 2: Edit .env — uncomment ONE provider and fill in your API key
# e.g. LLM_API_KEY=your-kimi-api-key
# Step 3: Run
pdfmark input.pdf -o output.md
Default provider is Kimi (kimi-for-coding). To use a different provider, either edit .env to set LLM_MODEL / LLM_BASE_URL, or change active_provider in pdfmark.toml.
💡 Tip: Configuration files (
.envandpdfmark.toml) are always read from your current working directory — not from the package installation directory. Place them alongside your PDF files or in your project root.
Configuration
pdfmark-ai uses a 4-layer priority chain: CLI args > env vars > TOML config > defaults.
Config files live in your working directory (where you run pdfmark):
| File | Purpose | Contains |
|---|---|---|
.env |
API keys & overrides | LLM_API_KEY, LLM_MODEL, LLM_BASE_URL |
pdfmark.toml |
Provider presets & settings | providers, DPI, chunking, caching |
You can generate both files with pdfmark --init, or create them manually.
.env (API keys)
# Uncomment ONE provider and add your key:
LLM_API_KEY=your-kimi-api-key
# LLM_API_KEY=your-anthropic-api-key
# LLM_API_KEY=your-qwen-api-key
# Optional: override model or base URL
# LLM_MODEL=claude-sonnet-4-20250514
# LLM_BASE_URL=https://api.your-provider.com/v1
pdfmark.toml (settings)
active_provider = "anthropic"
[providers.anthropic]
base_url = "https://api.anthropic.com"
model = "claude-sonnet-4-20250514"
[render]
dpi = 150
[cache]
enabled = true
dir = "~/.cache/pdfmark"
Supported Providers
| Provider | active_provider |
Notes |
|---|---|---|
| Anthropic Claude | anthropic |
Supports Opus 4.6, Sonnet 4.6 and other Claude models. Uses Anthropic Messages API natively. |
| Kimi (Moonshot) | kimi |
Anthropic-compatible API |
| Xiaomi (MiMo) | xiaomi |
Auth token required |
| Qwen (Alibaba) | qwen |
OpenAI-compatible SDK |
| Any OpenAI-compatible | set LLM_BASE_URL |
Set LLM_SDK_TYPE=openai |
CLI Reference
Usage: pdfmark [OPTIONS] [INPUT]
Arguments:
INPUT Path to the PDF file to convert
Options:
--init Generate .env and pdfmark.toml config templates
-f, --force Overwrite existing config files (use with --init)
-o, --output Output markdown file path
--lang Document language (e.g. 'en', 'zh', 'auto')
--crop-images Extract visual regions from pages as images
--refine Run optional LLM global refinement pass
--no-cache Disable caching of rendered pages and chunks
--no-frontmatter Omit YAML frontmatter from output
--detect-only Detect document structure and print sections
--config Path to a TOML configuration file
--dpi Rendering DPI for PDF pages
--model LLM model identifier
--api-key LLM API key (or set LLM_API_KEY env var)
--base-url LLM API base URL
--max-concurrent Maximum concurrent LLM requests
Image Extraction
Use --crop-images to extract figures and diagrams from the PDF as separate image files:
pdfmark input.pdf -o output.md --crop-images
Cropped images are saved alongside the output file (e.g., images/page_003_fig_001.png). Crop mode and plain mode use separate caches, so you can switch freely without needing --no-cache.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfmark_ai-0.4.0.tar.gz.
File metadata
- Download URL: pdfmark_ai-0.4.0.tar.gz
- Upload date:
- Size: 43.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a93233f67329bb59caa7627f81a5cfbd88d667b1b3369822b29c1358cc72c34c
|
|
| MD5 |
23cc6de3ea48831ecce0b9ccaaeb9ed0
|
|
| BLAKE2b-256 |
dbd1bdb558342168d735ead1b8cad5665e19bd45c9e2ae6eecf63f016dac288e
|
File details
Details for the file pdfmark_ai-0.4.0-py3-none-any.whl.
File metadata
- Download URL: pdfmark_ai-0.4.0-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
999ee8e07032df8258bc0712a52e55676aec2cf30405442fc0e68b12deee604b
|
|
| MD5 |
2560bbd7b926e627a95756b665ab30b2
|
|
| BLAKE2b-256 |
b5e351a64cc3422bbb2c7bde2fc56539081244f43f93706a2fd43778cb662765
|