A Python library for converting images and PDFs to Markdown or generating rich image descriptions using state-of-the-art multimodal LLMs

These details have not been verified by PyPI

Project description

MarkThat

A Python library for converting images and PDFs to Markdown or generating rich image descriptions using state-of-the-art multimodal LLMs.

🚀 Features

Multiple Provider Support: OpenAI, Anthropic, Google Gemini, Mistral, and OpenRouter
Dual Mode Operation: Convert to Markdown or generate detailed descriptions
Advanced Figure Extraction: Automatically detect, extract, and process figures from PDFs
Robust Retry Logic: Intelligent retry with fallback models and failure feedback
Async Support: Concurrent processing for improved performance
Clean architecture: Type-safe, well-documented, and thoroughly tested
Easy Integration: Simple API with comprehensive configuration options

📦 Option 1: Install from PyPI

pip install markthat

Option 2: Development Installation

git clone https://github.com/Flopsky/markthat.git
cd markthat
pip install -e .
pre-commit install

🏃 Quick Start

Basic Usage

from markthat import MarkThat

# Initialize with your preferred model
converter = MarkThat(
    model="gemini-2.0-flash-001",
    provider="gemini",
    api_key="YOUR_API_KEY"
)

# Convert image to markdown
result = converter.convert("path/to/image.jpg")
print(result[0])

# Generate image description
description = converter.convert(
    "path/to/image.jpg", 
    description_mode=True
)
print(description[0])

Updated Examples from `examples/basic_usage.py`

from markthat import MarkThat
from dotenv import load_dotenv
import os
import asyncio

load_dotenv()

def test_markthat_with_figure_extraction():
    """Test MarkThat with advanced figure extraction capabilities."""
    try:
        client = MarkThat(
            provider="gemini",
            model="gemini-2.0-flash-001",
            api_key=os.getenv("GEMINI_API_KEY"),
            api_key_figure_detector=os.getenv("GEMINI_API_KEY"),
            api_key_figure_extractor=os.getenv("GEMINI_API_KEY"),
            api_key_figure_parser=os.getenv("GEMINI_API_KEY"),
        )

        result = asyncio.run(
            client.async_convert(
                "path/to/document.pdf",
                extract_figure=True,
                coordinate_model="gemini-2.0-flash-001",
                parsing_model="gemini-2.5-flash-lite",
            )
        )
        return result
    except Exception as e:
        print("Figure extraction failed:", e)
        return None

def test_markthat_without_figure_extraction():
    """Test standard MarkThat conversion without figure extraction."""
    try:
        client = MarkThat(
            provider="gemini",
            model="gemini-2.0-flash-001",
            api_key=os.getenv("GEMINI_API_KEY"),
        )

        result = asyncio.run(
            client.async_convert(
                "path/to/document.pdf",
                extract_figure=False,
            )
        )
        return result
    except Exception as e:
        print("Standard conversion failed:", e)
        return None

if __name__ == "__main__":
    # Test both approaches
    with_figures = test_markthat_with_figure_extraction()
    without_figures = test_markthat_without_figure_extraction()
    
    print("With figure extraction:", with_figures)
    print("Without figure extraction:", without_figures)

🖥️ Gradio UI (Visual App)

Quickly try MarkThat in your browser.

pip install -r requirements.txt  # ensures gradio is installed
python gradio_ui.py

Then open http://localhost:7861 in your browser.

Supports multiple providers with per-step model overrides
Lets you pass provider-specific API keys (auto-fills from env when available)
Exports results as Markdown or JSON with detected figure paths

🔧 Advanced Configuration

Provider-Specific Setup

from markthat import MarkThat, RetryPolicy

# Custom retry policy
retry_policy = RetryPolicy(
    max_attempts=5,
    timeout_seconds=30,
    backoff_factor=1.5
)

# Multi-provider setup with fallbacks
converter = MarkThat(
    model="gpt-4o",
    provider="openai",
    fallback_models=["claude-3-5-sonnet-20241022", "gemini-2.0-flash-001"],
    retry_policy=retry_policy,
    api_key="YOUR_OPENAI_KEY"
)

OpenRouter Integration

# Access 300+ models through OpenRouter
converter = MarkThat(
    model="anthropic/claude-3.5-sonnet",
    provider="openrouter",
    api_key="YOUR_OPENROUTER_KEY"
)

# Or use model path auto-detection
converter = MarkThat(
    model="openai/gpt-4o",  # Automatically uses OpenRouter
    api_key="YOUR_OPENROUTER_KEY"
)

🎯 Figure Extraction Pipeline

MarkThat includes a sophisticated figure extraction system for PDFs:

converter = MarkThat(
    model="gemini-2.0-flash-001",
    api_key_figure_detector="DETECTOR_KEY",
    api_key_figure_extractor="EXTRACTOR_KEY", 
    api_key_figure_parser="PARSER_KEY"
)

results = await converter.async_convert(
    "research_paper.pdf",
    extract_figure=True,
    figure_detector_model="gemini-2.0-flash",
    coordinate_model="gemini-2.0-flash-001",
    parsing_model="gemini-2.5-flash-lite"
)

How Figure Extraction Works

Detection: Analyzes document content to identify pages with figures
Coordinate Mapping: Overlays coordinate grids and identifies figure boundaries
Extraction: Crops figures using precise coordinate mapping
Integration: Embeds figure paths into the final markdown output

⚡ Async Processing

For optimal performance with multi-page documents:

import asyncio
from markthat import MarkThat

async def process_document():
    converter = MarkThat(model="gemini-2.0-flash-001")
    
    # Process pages concurrently
    results = await converter.async_convert("large_document.pdf")
    
    for i, page_content in enumerate(results):
        print(f"Page {i+1}: {len(page_content)} characters")

asyncio.run(process_document())

🔑 Environment Variables

# Primary providers (used automatically if constructor api_key is not provided)
export OPENAI_API_KEY="your_openai_key"
export ANTHROPIC_API_KEY="your_anthropic_key"
export GEMINI_API_KEY="your_google_key"
export MISTRAL_API_KEY="your_mistral_key"

# Unified access via OpenRouter
export OPENROUTER_API_KEY="your_openrouter_key"

Note: For figure extraction you can pass separate keys via the constructor parameters api_key_figure_detector, api_key_figure_extractor, and api_key_figure_parser. If omitted, they default to the main api_key.

🧪 Testing

# Run the test suite
pytest

# Run with coverage
pytest --cov=markthat

# Run a specific test file
pytest tests/test_validation.py

📁 Project Structure

markthat/
├── markthat/
│   ├── __init__.py          # Public API
│   ├── client.py            # Main MarkThat class
│   ├── providers.py         # LLM provider abstractions
│   ├── file_processor.py    # PDF/image loading
│   ├── image_processing.py  # Image manipulation
│   ├── figure_extraction.py # Figure detection & extraction
│   ├── prompts/             # Prompt templates & utilities
│   ├── utils/               # Validation & helpers
│   ├── exceptions.py        # Custom exceptions
│   └── logging_config.py    # Logging setup
├── gradio_ui.py             # Visual demo app
├── tests/                   # Test suite
├── examples/                # Usage examples
├── pyproject.toml          # Project metadata
└── README.md               # This file

🛠️ Development

Code Quality

This project uses modern Python development practices:

Type Hints: Full type annotations with mypy validation
Code Formatting: Black for consistent code style
Linting: Ruff for fast, comprehensive linting
Import Sorting: isort for organized imports
Pre-commit Hooks: Automated quality checks

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes with proper tests
Run quality checks: pre-commit run --all-files
Submit a pull request

Development Setup

# Install development dependencies
pip install -e .[dev]

# Set up pre-commit hooks
pre-commit install

# Run quality checks
black .
ruff check .
isort .
mypy markthat

📄 API Reference

MarkThat Class

class MarkThat:
    def __init__(
        self,
        model: str,
        *,
        provider: Optional[str] = None,
        fallback_models: Optional[Sequence[str]] = None,
        retry_policy: Optional[RetryPolicy] = None,
        api_key: Optional[str] = None,
        api_key_figure_detector: Optional[str] = None,
        api_key_figure_extractor: Optional[str] = None,
        api_key_figure_parser: Optional[str] = None,
        max_retry: int = 3,
    ) -> None: ...

    def convert(
        self,
        file_path: str,
        *,
        format_options: Optional[Dict[str, Any]] = None,
        additional_instructions: Optional[str] = None,
        description_mode: bool = False,
        extract_figure: bool = False,
        figure_detector_model: str = "gemini-2.0-flash",
        coordinate_model: str = "gemini-2.0-flash",
        parsing_model: str = "gemini-2.5-flash-lite",
        max_retry: Optional[int] = None,
        clean_output: bool = True,
    ) -> List[str]: ...

    async def async_convert(
        self,
        file_path: str,
        *,
        format_options: Optional[Dict[str, Any]] = None,
        additional_instructions: Optional[str] = None,
        description_mode: bool = False,
        extract_figure: bool = False,
        figure_detector_model: str = "gemini-2.0-flash",
        coordinate_model: str = "gemini-2.0-flash",
        parsing_model: str = "gemini-2.5-flash-lite",
        max_retry: Optional[int] = None,
        clean_output: bool = True,
    ) -> List[str]: ...

RetryPolicy Configuration

@dataclass
class RetryPolicy:
    max_attempts: int = 3
    timeout_seconds: int = 30
    backoff_factor: float = 1.0

🏆 Supported Models

Direct Provider Access

OpenAI: gpt-4o, gpt-4-turbo, gpt-4o-mini
Anthropic: claude-3-5-sonnet-20241022, claude-3-opus, claude-3-haiku
Google: gemini-2.0-flash-001, gemini-1.5-pro, gemini-1.5-flash
Mistral: mistral-large-latest, mistral-medium, mistral-small

OpenRouter Models (300+)

Meta: meta-llama/llama-3.2-90b-vision
Qwen: qwen/qwen-2-vl-72b-instruct
Many more: Access the full catalog at OpenRouter

🐛 Error Handling

MarkThat provides comprehensive error handling:

from markthat import MarkThat
from markthat.exceptions import ProviderInitializationError, ConversionError

try:
    converter = MarkThat(model="invalid-model")
except ProviderInitializationError as e:
    print(f"Provider setup failed: {e}")

try:
    result = converter.convert("image.jpg")
except ConversionError as e:
    print(f"Conversion failed: {e}")

📊 Performance Tips

Use Async for Multiple Pages: async_convert() processes pages concurrently
Configure Appropriate Timeouts: Balance speed vs. reliability
Choose the Right Model: Faster models for simple tasks, powerful models for complex content
Leverage Fallbacks: Set up model hierarchies for reliability

📈 Roadmap

✅ Multi-provider LLM support
✅ PDF processing with figure extraction
✅ Async processing capabilities
✅ Comprehensive retry logic
✅ Type-safe, clean architecture
🔄 Additional file format support (TIFF, WEBP)
🔄 Cost tracking and optimization
🔄 Batch processing API
🔄 Custom prompt template system

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with modern Python best practices
Leverages state-of-the-art multimodal LLMs
Inspired by the need for robust document processing tools

💬 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: See docs/ for Sphinx sources

MarkThat - Transform visual content into structured text with the power of AI 🚀

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.2.14

Aug 8, 2025

1.2.13

Aug 8, 2025

1.2.11

Aug 7, 2025

1.2.10

Aug 6, 2025

1.2.9

Aug 4, 2025

1.2.6

Aug 4, 2025

1.2.3

Aug 4, 2025

1.2.1

Aug 4, 2025

0.1.0

Aug 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markthat-1.2.14.tar.gz (30.4 kB view details)

Uploaded Aug 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

markthat-1.2.14-py3-none-any.whl (32.4 kB view details)

Uploaded Aug 8, 2025 Python 3

File details

Details for the file markthat-1.2.14.tar.gz.

File metadata

Download URL: markthat-1.2.14.tar.gz
Upload date: Aug 8, 2025
Size: 30.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for markthat-1.2.14.tar.gz
Algorithm	Hash digest
SHA256	`c8f4de3faff18bb576a0500d4f17bc30c79f098d269bc205428aac58abb177a2`
MD5	`ca76c878b021a4da56eed1970182053e`
BLAKE2b-256	`0f8cb41725e811e3934c29ad0b7a6ddc679ff154ffe194f51d99f31695e54033`

See more details on using hashes here.

Provenance

The following attestation bundles were made for markthat-1.2.14.tar.gz:

Publisher: release.yml on Flopsky/markthat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: markthat-1.2.14.tar.gz
- Subject digest: c8f4de3faff18bb576a0500d4f17bc30c79f098d269bc205428aac58abb177a2
- Sigstore transparency entry: 369776005
- Sigstore integration time: Aug 8, 2025
Source repository:
- Permalink: Flopsky/markthat@c0fd87e9b74ba5bd0253f7dc25d75012f9b00e15
- Branch / Tag: refs/tags/v1.2.14
- Owner: https://github.com/Flopsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c0fd87e9b74ba5bd0253f7dc25d75012f9b00e15
- Trigger Event: release

File details

Details for the file markthat-1.2.14-py3-none-any.whl.

File metadata

Download URL: markthat-1.2.14-py3-none-any.whl
Upload date: Aug 8, 2025
Size: 32.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for markthat-1.2.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92acaa0adbe1568791d6a05c19def5ead95be5814f01f79be8fb56cae80d5dc8`
MD5	`6debd66c58e93e12cfc21dd4a9752ea0`
BLAKE2b-256	`67b5b3257905f0e460ff70c5b937fed1514ba592052663d5da01771fcaf7fde8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for markthat-1.2.14-py3-none-any.whl:

Publisher: release.yml on Flopsky/markthat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: markthat-1.2.14-py3-none-any.whl
- Subject digest: 92acaa0adbe1568791d6a05c19def5ead95be5814f01f79be8fb56cae80d5dc8
- Sigstore transparency entry: 369776048
- Sigstore integration time: Aug 8, 2025
Source repository:
- Permalink: Flopsky/markthat@c0fd87e9b74ba5bd0253f7dc25d75012f9b00e15
- Branch / Tag: refs/tags/v1.2.14
- Owner: https://github.com/Flopsky
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@c0fd87e9b74ba5bd0253f7dc25d75012f9b00e15
- Trigger Event: release

markthat 1.2.14

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MarkThat

🚀 Features

📦 Option 1: Install from PyPI

Option 2: Development Installation

🏃 Quick Start

Basic Usage

Updated Examples from examples/basic_usage.py

🖥️ Gradio UI (Visual App)

🔧 Advanced Configuration

Provider-Specific Setup

OpenRouter Integration

🎯 Figure Extraction Pipeline

How Figure Extraction Works

⚡ Async Processing

🔑 Environment Variables

🧪 Testing

📁 Project Structure

🛠️ Development

Code Quality

Contributing

Development Setup

📄 API Reference

MarkThat Class

RetryPolicy Configuration

🏆 Supported Models

Direct Provider Access

OpenRouter Models (300+)

🐛 Error Handling

📊 Performance Tips

📈 Roadmap

📜 License

🙏 Acknowledgments

💬 Support

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Updated Examples from `examples/basic_usage.py`