A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI GPT-4o, Google Gemini, Anthropic Claude, Groq, OpenRouter, and LiteLLM.
Project description
Markdrop
A Python package for converting PDFs to structured Markdown and interactive HTML, with AI-powered image and table descriptions across six major LLM providers. Available on PyPI.
Features
- PDF → Markdown conversion with formatting preservation (via Docling)
- Automatic image extraction using XRef IDs
- Table detection using Microsoft's Table Transformer
- PDF URL support
- AI-powered image and table descriptions — 6 providers: Gemini, OpenAI, Anthropic Claude, Groq, OpenRouter, LiteLLM
- Interactive HTML output with downloadable Excel tables
- Customisable image resolution and UI elements
- Structured logging (never pollutes your app's root logger)
- Support for DOCX / PPTX input
Installation
Core install (PDF conversion + Gemini/OpenAI):
pip install markdrop
With Anthropic Claude:
pip install "markdrop[anthropic]"
With Groq:
pip install "markdrop[groq]"
With LiteLLM (routes to 100+ providers):
pip install "markdrop[litellm]"
Everything (including local HuggingFace models):
pip install "markdrop[all]"
OpenRouter is accessed through the
openaipackage (already included in core), so no extra install is needed.
Supported AI Providers
| Provider | --ai_provider |
Default model | Vision |
|---|---|---|---|
| Google Gemini | gemini |
gemini-3.1-flash-lite |
✅ |
| OpenAI | openai |
gpt-5.4 |
✅ |
| Anthropic Claude | anthropic |
claude-opus-4-6 |
✅ |
| Groq | groq |
meta-llama/llama-4-maverick-17b-128e-instruct |
✅ |
| OpenRouter | openrouter |
google/gemini-3.1-flash-lite (any model) |
✅ |
| LiteLLM | litellm |
openai/gpt-5.4 (configurable) |
✅ |
All models are configurable — use
--modelto override for any provider, or setmodel_name_overrideinProcessorConfig.
Quick Start
CLI Usage
1. Convert PDF → Markdown + HTML
markdrop convert <input_path> --output_dir <dir> [--add_tables]
# Example
markdrop convert report.pdf --output_dir out --add_tables
# Also works with URLs:
markdrop convert https://arxiv.org/pdf/1706.03762 --output_dir out
2. Generate AI Descriptions for Images & Tables
markdrop describe <markdown_file> --ai_provider <provider> [--output_dir <dir>] [--remove_images] [--remove_tables]
| Provider | --ai_provider |
|---|---|
| Google Gemini 2.0 Flash | gemini |
| OpenAI GPT-4o | openai |
| Anthropic Claude Opus | anthropic |
| Groq Llama-4 Scout | groq |
| OpenRouter | openrouter |
| LiteLLM | litellm |
# Gemini (default)
markdrop describe doc.md --ai_provider gemini
# Anthropic Claude
markdrop describe doc.md --ai_provider anthropic --remove_images
# Groq (fastest inference)
markdrop describe doc.md --ai_provider groq
# OpenRouter (any model)
markdrop describe doc.md --ai_provider openrouter
# LiteLLM (unified gateway)
markdrop describe doc.md --ai_provider litellm
3. Set Up API Keys
markdrop setup <provider>
Keys are stored in <package-root>/.env with 0o600 permissions on POSIX systems.
markdrop setup gemini # → GEMINI_API_KEY
markdrop setup openai # → OPENAI_API_KEY
markdrop setup anthropic # → ANTHROPIC_API_KEY
markdrop setup groq # → GROQ_API_KEY
markdrop setup openrouter # → OPENROUTER_API_KEY
markdrop setup litellm # → LITELLM_API_KEY
4. Analyze Images in a PDF
markdrop analyze report.pdf --output_dir pdf_analysis --save_images
5. Batch Image Description Generation
markdrop generate images/ --output_dir descriptions/ --prompt "Describe in detail." \
--llm_client gemini openai
Available --llm_client values: qwen, gemini, openai, llama-vision, molmo, pixtral
Python API
PDF Conversion
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging
config = MarkDropConfig(
image_resolution_scale=2.0,
download_button_color='#444444',
log_level=logging.INFO,
log_dir='logs',
excel_dir='markdrop-excel-tables',
)
html_path = markdrop("path/to/input.pdf", "output", config)
downloadable_html = add_downloadable_tables(html_path, config)
AI Descriptions
from markdrop import process_markdown, ProcessorConfig, AIProvider, setup_keys
# One-time key setup (writes to .env)
setup_keys('anthropic')
config = ProcessorConfig(
input_path="doc.md",
output_dir="output",
ai_provider=AIProvider.ANTHROPIC, # GEMINI | OPENAI | ANTHROPIC | GROQ | OPENROUTER | LITELLM
remove_images=False,
remove_tables=False,
table_descriptions=True,
image_descriptions=True,
max_retries=3,
retry_delay=2,
# Override default models (all providers have matching config fields):
anthropic_model_name="claude-sonnet-4-5", # faster / cheaper
anthropic_text_model_name="claude-sonnet-4-5",
)
output_path = process_markdown(config)
Using OpenRouter to access any model
config = ProcessorConfig(
input_path="doc.md",
output_dir="output",
ai_provider=AIProvider.OPENROUTER,
openrouter_model_name="meta-llama/llama-4-scout", # any model on openrouter.ai/models
openrouter_text_model_name="anthropic/claude-sonnet-4-5",
openrouter_site_url="https://yoursite.com",
openrouter_site_name="My App",
)
Using LiteLLM for any 100+ provider
import os
os.environ["ANTHROPIC_API_KEY"] = "..." # set any provider's key
config = ProcessorConfig(
input_path="doc.md",
output_dir="output",
ai_provider=AIProvider.LITELLM,
litellm_model_name="anthropic/claude-opus-4-6",
litellm_text_model_name="groq/llama-3.3-70b-versatile",
)
Batch Image Description Generation
from markdrop import generate_descriptions
generate_descriptions(
input_path='images/',
output_dir='output/',
prompt='Give a highly detailed description of this image.',
llm_client=['gemini', 'llama-vision'],
)
API Reference
ProcessorConfig – AI Provider Fields
| Field | Default | Notes |
|---|---|---|
gemini_model_name |
gemini-2.0-flash |
Vision model |
gemini_text_model_name |
gemini-2.0-flash |
Text model |
openai_model_name |
gpt-4o |
Vision + text |
openai_text_model_name |
gpt-4o |
|
anthropic_model_name |
claude-opus-4-6 |
Vision |
anthropic_text_model_name |
claude-sonnet-4-5 |
Text (cheaper) |
groq_model_name |
meta-llama/llama-4-scout-17b-16e-instruct |
Vision |
groq_text_model_name |
llama-3.3-70b-versatile |
Text |
openrouter_model_name |
google/gemini-2.0-flash-001 |
Any model string from openrouter.ai/models |
openrouter_text_model_name |
anthropic/claude-sonnet-4-5 |
|
litellm_model_name |
openai/gpt-4o |
provider/model format |
litellm_text_model_name |
openai/gpt-4o |
MarkDropConfig
| Field | Default | Notes |
|---|---|---|
image_resolution_scale |
2.0 |
Scale factor for extracted images |
download_button_color |
'#444444' |
HTML button colour |
log_level |
logging.INFO |
|
log_dir |
'logs' |
|
excel_dir |
'markdrop_excel_tables' |
Contributing
We welcome contributions! See CONTRIBUTING.md.
git clone https://github.com/shoryasethia/markdrop.git
cd markdrop
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e ".[all]"
Project Structure
markdrop/
├── setup.py
├── requirements.txt
├── README.md
└── markdrop/
├── __init__.py
├── main.py ← CLI entry-point
├── process.py ← PDF conversion
├── parse.py ← AI description engine (all 6 providers)
├── helper.py ← PDF image analysis
├── utils.py ← PDF download helpers
├── setup_keys.py ← Interactive API key manager
├── ignore_warnings.py
├── src/
│ └── markdrop-logo.png
└── models/
├── img_descriptions.py
├── model_loader.py ← Local HF model loader
├── responder.py
└── logger.py
Star History
License
GPL-3.0 — see LICENSE.
Changelog
See CHANGELOG.md.
Support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markdrop-4.0.2.tar.gz.
File metadata
- Download URL: markdrop-4.0.2.tar.gz
- Upload date:
- Size: 43.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccaa48cfaf70a26c7848bfd59540852a94580d0fdf024df6736bd355c0e94c4c
|
|
| MD5 |
7ee0c32fd77dca82747796f02b94c86d
|
|
| BLAKE2b-256 |
ef964105b869d1ba3c477a9c1447fa6ee2675cbf85aa10aaf62f0a442487dd45
|
File details
Details for the file markdrop-4.0.2-py3-none-any.whl.
File metadata
- Download URL: markdrop-4.0.2-py3-none-any.whl
- Upload date:
- Size: 44.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abb62d881496eeacb1a5635ecb9090b60bae22af4bffba4f7241d9daea72b3ec
|
|
| MD5 |
86047b1abce69b75b342c68c2242b1b5
|
|
| BLAKE2b-256 |
f1ee25f4d98792aed6cd9cc072754a84d5844dbc17a0aaf87b3929c86b67bc5d
|