Skip to main content

A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI GPT-4o, Google Gemini, Anthropic Claude, Groq, OpenRouter, and LiteLLM.

Project description

Markdrop Logo

Markdrop

Downloads PyPI Version License Stars Issues Forks

A Python package for converting PDFs to structured Markdown and interactive HTML, with AI-powered image and table descriptions across six major LLM providers. Available on PyPI.


Features

  • PDF → Markdown conversion with formatting preservation (via Docling)
  • Automatic image extraction using XRef IDs
  • Table detection using Microsoft's Table Transformer
  • PDF URL support
  • AI-powered image and table descriptions — 6 providers: Gemini, OpenAI, Anthropic Claude, Groq, OpenRouter, LiteLLM
  • Interactive HTML output with downloadable Excel tables
  • Customisable image resolution and UI elements
  • Structured logging (never pollutes your app's root logger)
  • Support for DOCX / PPTX input

Installation

Core install (PDF conversion + Gemini/OpenAI):

pip install markdrop

With Anthropic Claude:

pip install "markdrop[anthropic]"

With Groq:

pip install "markdrop[groq]"

With LiteLLM (routes to 100+ providers):

pip install "markdrop[litellm]"

Everything (including local HuggingFace models):

pip install "markdrop[all]"

OpenRouter is accessed through the openai package (already included in core), so no extra install is needed.


Supported AI Providers

Provider --ai_provider Default model Vision
Google Gemini gemini gemini-3.1-flash-lite
OpenAI openai gpt-5.4
Anthropic Claude anthropic claude-opus-4-6
Groq groq meta-llama/llama-4-maverick-17b-128e-instruct
OpenRouter openrouter google/gemini-3.1-flash-lite (any model)
LiteLLM litellm openai/gpt-5.4 (configurable)

All models are configurable — use --model to override for any provider, or set model_name_override in ProcessorConfig.


Quick Start

Open in Colab Watch the demo


CLI Usage

1. Convert PDF → Markdown + HTML

markdrop convert <input_path> --output_dir <dir> [--add_tables]
# Example
markdrop convert report.pdf --output_dir out --add_tables
# Also works with URLs:
markdrop convert https://arxiv.org/pdf/1706.03762 --output_dir out

2. Generate AI Descriptions for Images & Tables

markdrop describe <markdown_file> --ai_provider <provider> [--output_dir <dir>] [--remove_images] [--remove_tables]
Provider --ai_provider
Google Gemini 2.0 Flash gemini
OpenAI GPT-4o openai
Anthropic Claude Opus anthropic
Groq Llama-4 Scout groq
OpenRouter openrouter
LiteLLM litellm
# Gemini (default)
markdrop describe doc.md --ai_provider gemini

# Anthropic Claude
markdrop describe doc.md --ai_provider anthropic --remove_images

# Groq (fastest inference)
markdrop describe doc.md --ai_provider groq

# OpenRouter (any model)
markdrop describe doc.md --ai_provider openrouter

# LiteLLM (unified gateway)
markdrop describe doc.md --ai_provider litellm

3. Set Up API Keys

markdrop setup <provider>

Keys are stored in <package-root>/.env with 0o600 permissions on POSIX systems.

markdrop setup gemini       # → GEMINI_API_KEY
markdrop setup openai       # → OPENAI_API_KEY
markdrop setup anthropic    # → ANTHROPIC_API_KEY
markdrop setup groq         # → GROQ_API_KEY
markdrop setup openrouter   # → OPENROUTER_API_KEY
markdrop setup litellm      # → LITELLM_API_KEY

4. Analyze Images in a PDF

markdrop analyze report.pdf --output_dir pdf_analysis --save_images

5. Batch Image Description Generation

markdrop generate images/ --output_dir descriptions/ --prompt "Describe in detail." \
  --llm_client gemini openai

Available --llm_client values: qwen, gemini, openai, llama-vision, molmo, pixtral


Python API

PDF Conversion

from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging

config = MarkDropConfig(
    image_resolution_scale=2.0,
    download_button_color='#444444',
    log_level=logging.INFO,
    log_dir='logs',
    excel_dir='markdrop-excel-tables',
)

html_path = markdrop("path/to/input.pdf", "output", config)
downloadable_html = add_downloadable_tables(html_path, config)

AI Descriptions

from markdrop import process_markdown, ProcessorConfig, AIProvider, setup_keys

# One-time key setup (writes to .env)
setup_keys('anthropic')

config = ProcessorConfig(
    input_path="doc.md",
    output_dir="output",
    ai_provider=AIProvider.ANTHROPIC,       # GEMINI | OPENAI | ANTHROPIC | GROQ | OPENROUTER | LITELLM
    remove_images=False,
    remove_tables=False,
    table_descriptions=True,
    image_descriptions=True,
    max_retries=3,
    retry_delay=2,
    # Override default models (all providers have matching config fields):
    anthropic_model_name="claude-sonnet-4-5",    # faster / cheaper
    anthropic_text_model_name="claude-sonnet-4-5",
)

output_path = process_markdown(config)

Using OpenRouter to access any model

config = ProcessorConfig(
    input_path="doc.md",
    output_dir="output",
    ai_provider=AIProvider.OPENROUTER,
    openrouter_model_name="meta-llama/llama-4-scout",   # any model on openrouter.ai/models
    openrouter_text_model_name="anthropic/claude-sonnet-4-5",
    openrouter_site_url="https://yoursite.com",
    openrouter_site_name="My App",
)

Using LiteLLM for any 100+ provider

import os
os.environ["ANTHROPIC_API_KEY"] = "..."   # set any provider's key

config = ProcessorConfig(
    input_path="doc.md",
    output_dir="output",
    ai_provider=AIProvider.LITELLM,
    litellm_model_name="anthropic/claude-opus-4-6",
    litellm_text_model_name="groq/llama-3.3-70b-versatile",
)

Batch Image Description Generation

from markdrop import generate_descriptions

generate_descriptions(
    input_path='images/',
    output_dir='output/',
    prompt='Give a highly detailed description of this image.',
    llm_client=['gemini', 'llama-vision'],
)

API Reference

ProcessorConfig – AI Provider Fields

Field Default Notes
gemini_model_name gemini-2.0-flash Vision model
gemini_text_model_name gemini-2.0-flash Text model
openai_model_name gpt-4o Vision + text
openai_text_model_name gpt-4o
anthropic_model_name claude-opus-4-6 Vision
anthropic_text_model_name claude-sonnet-4-5 Text (cheaper)
groq_model_name meta-llama/llama-4-scout-17b-16e-instruct Vision
groq_text_model_name llama-3.3-70b-versatile Text
openrouter_model_name google/gemini-2.0-flash-001 Any model string from openrouter.ai/models
openrouter_text_model_name anthropic/claude-sonnet-4-5
litellm_model_name openai/gpt-4o provider/model format
litellm_text_model_name openai/gpt-4o

MarkDropConfig

Field Default Notes
image_resolution_scale 2.0 Scale factor for extracted images
download_button_color '#444444' HTML button colour
log_level logging.INFO
log_dir 'logs'
excel_dir 'markdrop_excel_tables'

Contributing

We welcome contributions! See CONTRIBUTING.md.

git clone https://github.com/shoryasethia/markdrop.git
cd markdrop
python -m venv venv && source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -e ".[all]"

Project Structure

markdrop/
├── setup.py
├── requirements.txt
├── README.md
└── markdrop/
    ├── __init__.py
    ├── main.py          ← CLI entry-point
    ├── process.py       ← PDF conversion
    ├── parse.py         ← AI description engine (all 6 providers)
    ├── helper.py        ← PDF image analysis
    ├── utils.py         ← PDF download helpers
    ├── setup_keys.py    ← Interactive API key manager
    ├── ignore_warnings.py
    ├── src/
    │   └── markdrop-logo.png
    └── models/
        ├── img_descriptions.py
        ├── model_loader.py  ← Local HF model loader
        ├── responder.py
        └── logger.py

Star History

Star History Chart


License

GPL-3.0 — see LICENSE.

Changelog

See CHANGELOG.md.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdrop-4.0.2.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdrop-4.0.2-py3-none-any.whl (44.0 kB view details)

Uploaded Python 3

File details

Details for the file markdrop-4.0.2.tar.gz.

File metadata

  • Download URL: markdrop-4.0.2.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for markdrop-4.0.2.tar.gz
Algorithm Hash digest
SHA256 ccaa48cfaf70a26c7848bfd59540852a94580d0fdf024df6736bd355c0e94c4c
MD5 7ee0c32fd77dca82747796f02b94c86d
BLAKE2b-256 ef964105b869d1ba3c477a9c1447fa6ee2675cbf85aa10aaf62f0a442487dd45

See more details on using hashes here.

File details

Details for the file markdrop-4.0.2-py3-none-any.whl.

File metadata

  • Download URL: markdrop-4.0.2-py3-none-any.whl
  • Upload date:
  • Size: 44.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for markdrop-4.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 abb62d881496eeacb1a5635ecb9090b60bae22af4bffba4f7241d9daea72b3ec
MD5 86047b1abce69b75b342c68c2242b1b5
BLAKE2b-256 f1ee25f4d98792aed6cd9cc072754a84d5844dbc17a0aaf87b3929c86b67bc5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page