Skip to main content

Convert PDF files to high-quality Markdown using LLM vision models

Project description

pdfmark-ai

PDF to Markdown, powered by LLM vision.

Drop a PDF, get clean Markdown — tables, formulas, code, figures, all handled.

PyPI version Python License GitHub commit activity

Demo · Installation · Quick Start · Configuration · CLI Reference

English | 简体中文

pdfmark-ai doesn't parse PDFs the traditional way. Instead, it renders each page as an image and lets multimodal LLMs (Claude, Kimi, Qwen, etc.) "read" it — just like a human would. The result? Clean, structured Markdown that handles what other tools simply can't: complex tables with merged cells, inline math formulas, source code blocks, embedded diagrams, and even blurry scans.

Demo

Real conversion results on academic papers and technical documents — no post-editing.

Image Extraction, Tables & Code

demo0 demo3
PDF original — mixed figures, tables & code Converted Markdown — images extracted, tables formatted
demo1 demo4
PDF original — tables & code blocks Converted Markdown — syntax-highlighted code
demo2 demo5
PDF original — charts & formulas Converted Markdown — chart images referenced

Math Formulas & Blurred Content

demo6 demo7
PDF original — dense math formulas Converted Markdown — LaTeX `$...$` and `$$...$$` wrapping
demo8 demo9
PDF original — blurred / low-quality scan Converted Markdown — content correctly recognized

Features

  • 🖼️ Vision-based extraction — treats each page as an image, handles complex layouts that traditional parsers miss
  • 🧮 Math formulas — LaTeX rendering with automatic $...$ and $$...$$ wrapping
  • 📊 Complex tables — merged cells, multi-row headers, nested structures
  • 💻 Code blocks — syntax-appropriate formatting for source code
  • ✂️ Image extraction--crop-images to crop figures and diagrams as separate files
  • 🔍 Blur tolerance — handles low-quality and blurred scans with high recognition accuracy
  • 🤖 Multi-provider — Claude, Kimi, Xiaomi, Qwen, and any OpenAI-compatible API
  • Incremental caching — SHA-256 progressive cache avoids re-processing unchanged pages

Installation

pip install pdfmark-ai

Requirements

  • Python >= 3.10
  • An LLM API key (Anthropic, Kimi, Qwen, or OpenAI-compatible)

Quick Start

# Step 1: Generate config templates in your current directory
pdfmark --init

# Step 2: Edit .env — uncomment ONE provider and fill in your API key
#   e.g. LLM_API_KEY=your-kimi-api-key

# Step 3: Run
pdfmark input.pdf -o output.md

Default provider is Kimi (kimi-for-coding). To use a different provider, either edit .env to set LLM_MODEL / LLM_BASE_URL, or change active_provider in pdfmark.toml.

💡 Tip: Configuration files (.env and pdfmark.toml) are always read from your current working directory — not from the package installation directory. Place them alongside your PDF files or in your project root.

Configuration

pdfmark-ai uses a 4-layer priority chain: CLI args > env vars > TOML config > defaults.

Config files live in your working directory (where you run pdfmark):

File Purpose Contains
.env API keys & overrides LLM_API_KEY, LLM_MODEL, LLM_BASE_URL
pdfmark.toml Provider presets & settings providers, DPI, chunking, caching

You can generate both files with pdfmark --init, or create them manually.

.env (API keys)

# Uncomment ONE provider and add your key:
LLM_API_KEY=your-kimi-api-key
# LLM_API_KEY=your-anthropic-api-key
# LLM_API_KEY=your-qwen-api-key

# Optional: override model or base URL
# LLM_MODEL=claude-sonnet-4-20250514
# LLM_BASE_URL=https://api.your-provider.com/v1

pdfmark.toml (settings)

active_provider = "anthropic"

[providers.anthropic]
base_url = "https://api.anthropic.com"
model = "claude-sonnet-4-20250514"

[render]
dpi = 150

[cache]
enabled = true
dir = "~/.cache/pdfmark"

Supported Providers

Provider active_provider Notes
Anthropic Claude anthropic Supports Opus 4.6, Sonnet 4.6 and other Claude models. Uses Anthropic Messages API natively.
Kimi (Moonshot) kimi Anthropic-compatible API
Xiaomi (MiMo) xiaomi Auth token required
Qwen (Alibaba) qwen OpenAI-compatible SDK
Any OpenAI-compatible set LLM_BASE_URL Set LLM_SDK_TYPE=openai

CLI Reference

Usage: pdfmark [OPTIONS] [INPUT]

Arguments:
  INPUT                   Path to the PDF file to convert

Options:
  --init                  Generate .env and pdfmark.toml config templates
  -f, --force             Overwrite existing config files (use with --init)
  -o, --output            Output markdown file path
  --lang                  Document language (e.g. 'en', 'zh', 'auto')
  --crop-images           Extract visual regions from pages as images
  --refine                Run optional LLM global refinement pass
  --no-cache              Disable caching of rendered pages and chunks
  --no-frontmatter        Omit YAML frontmatter from output
  --detect-only           Detect document structure and print sections
  --config                Path to a TOML configuration file
  --dpi                   Rendering DPI for PDF pages
  --model                 LLM model identifier
  --api-key               LLM API key (or set LLM_API_KEY env var)
  --base-url              LLM API base URL
  --max-concurrent        Maximum concurrent LLM requests

Image Extraction

Use --crop-images to extract figures and diagrams from the PDF as separate image files:

pdfmark input.pdf -o output.md --crop-images

Cropped images are saved alongside the output file (e.g., images/page_003_fig_001.png). Crop mode and plain mode use separate caches, so you can switch freely without needing --no-cache.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfmark_ai-0.5.0.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfmark_ai-0.5.0-py3-none-any.whl (36.4 kB view details)

Uploaded Python 3

File details

Details for the file pdfmark_ai-0.5.0.tar.gz.

File metadata

  • Download URL: pdfmark_ai-0.5.0.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdfmark_ai-0.5.0.tar.gz
Algorithm Hash digest
SHA256 b21770c6bd967be6f0972ecae1d1dca92dd304cf4ba91c71e8667c7193053d1f
MD5 13fca365c79866cedfeb5b48db1356fc
BLAKE2b-256 894019c2c161052baf4aa9a3197a2d523ac883ce9ce363e14639702608d49bba

See more details on using hashes here.

File details

Details for the file pdfmark_ai-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pdfmark_ai-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 36.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for pdfmark_ai-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d893b60846de4a303c93a447b4e69e8b2a36e760e508c57e3089a22eebc0c7e
MD5 5dc623e0fd90a3c32c2a2397deb62416
BLAKE2b-256 9c599345880cf5a47f409694eef87d8ff6af345b8cdfe4976d44935e99c06964

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page