Skip to main content

A Python library for converting images and PDFs to Markdown or generating rich image descriptions using state-of-the-art multimodal LLMs

Project description

MarkThat

License: MIT Python 3.10+ PyPI version Code style: black

A Python library for converting images and PDFs to Markdown or generating rich image descriptions using state-of-the-art multimodal LLMs.

๐Ÿš€ Features

  • Multiple Provider Support: OpenAI, Anthropic, Google Gemini, Mistral, and OpenRouter
  • Dual Mode Operation: Convert to Markdown or generate detailed descriptions
  • Advanced Figure Extraction: Automatically detect, extract, and process figures from PDFs
  • Robust Retry Logic: Intelligent retry with fallback models and failure feedback
  • Async Support: Concurrent processing for improved performance
  • ** Architecture**: Type-safe, well-documented, and thoroughly tested
  • Easy Integration: Simple API with comprehensive configuration options

๐Ÿ“ฆ Option 1: Install from PyPI

pip install markthat

Option 2: Development Installation

git clone https://github.com/your-repo/markthat.git
cd markthat
pip install -e .
pre-commit install

๐Ÿƒ Quick Start

Basic Usage

from markthat import MarkThat

# Initialize with your preferred model
converter = MarkThat(
    model="gemini-2.0-flash-001",
    provider="google",
    api_key="YOUR_API_KEY"
)

# Convert image to markdown
result = converter.convert("path/to/image.jpg")
print(result[0])

# Generate image description
description = converter.convert(
    "path/to/image.jpg", 
    description_mode=True
)
print(description[0])

Updated Examples from examples/basic_usage.py

from markthat import MarkThat
from dotenv import load_dotenv
import os
import asyncio

load_dotenv()

def test_markthat_with_figure_extraction():
    """Test MarkThat with advanced figure extraction capabilities."""
    try:
        client = MarkThat(
            provider="gemini",
            model="gemini-2.0-flash-001",
            api_key=os.getenv("GEMINI_API_KEY"),
            api_key_figure_detector=os.getenv("GEMINI_API_KEY"),
            api_key_figure_extractor=os.getenv("GEMINI_API_KEY"),
            api_key_figure_parser=os.getenv("GEMINI_API_KEY"),
        )

        result = asyncio.run(
            client.async_convert(
                "path/to/document.pdf",
                extract_figure=True,
                coordinate_model="gemini-2.0-flash-001",
                parsing_model="gemini-2.5-flash-lite",
            )
        )
        return result
    except Exception as e:
        print("Figure extraction failed:", e)
        return None

def test_markthat_without_figure_extraction():
    """Test standard MarkThat conversion without figure extraction."""
    try:
        client = MarkThat(
            provider="gemini",
            model="gemini-2.0-flash-001",
            api_key=os.getenv("GEMINI_API_KEY"),
        )

        result = asyncio.run(
            client.async_convert(
                "path/to/document.pdf",
                extract_figure=False,
            )
        )
        return result
    except Exception as e:
        print("Standard conversion failed:", e)
        return None

if __name__ == "__main__":
    # Test both approaches
    with_figures = test_markthat_with_figure_extraction()
    without_figures = test_markthat_without_figure_extraction()
    
    print("With figure extraction:", with_figures)
    print("Without figure extraction:", without_figures)

๐Ÿ”ง Advanced Configuration

Provider-Specific Setup

from markthat import MarkThat, RetryPolicy

# Custom retry policy
retry_policy = RetryPolicy(
    max_attempts=5,
    timeout_seconds=30,
    backoff_factor=1.5
)

# Multi-provider setup with fallbacks
converter = MarkThat(
    model="gpt-4o",
    provider="openai",
    fallback_models=["claude-3-5-sonnet-20241022", "gemini-2.0-flash-001"],
    retry_policy=retry_policy,
    api_key="YOUR_OPENAI_KEY"
)

OpenRouter Integration

# Access 300+ models through OpenRouter
converter = MarkThat(
    model="anthropic/claude-3.5-sonnet",
    provider="openrouter",
    api_key="YOUR_OPENROUTER_KEY"
)

# Or use model path auto-detection
converter = MarkThat(
    model="openai/gpt-4o",  # Automatically uses OpenRouter
    api_key="YOUR_OPENROUTER_KEY"
)

๐ŸŽฏ Figure Extraction Pipeline

MarkThat includes a sophisticated figure extraction system for PDFs:

converter = MarkThat(
    model="gemini-2.0-flash-001",
    api_key_figure_detector="DETECTOR_KEY",
    api_key_figure_extractor="EXTRACTOR_KEY", 
    api_key_figure_parser="PARSER_KEY"
)

results = await converter.async_convert(
    "research_paper.pdf",
    extract_figure=True,
    figure_detector_model="gemini-2.0-flash",
    coordinate_model="gemini-2.0-flash-001",
    parsing_model="gemini-2.5-flash-lite"
)

How Figure Extraction Works

  1. Detection: Analyzes document content to identify pages with figures
  2. Coordinate Mapping: Overlays coordinate grids and identifies figure boundaries
  3. Extraction: Crops figures using precise coordinate mapping
  4. Integration: Embeds figure paths into the final markdown output

โšก Async Processing

For optimal performance with multi-page documents:

import asyncio
from markthat import MarkThat

async def process_document():
    converter = MarkThat(model="gemini-2.0-flash-001")
    
    # Process pages concurrently
    results = await converter.async_convert("large_document.pdf")
    
    for i, page_content in enumerate(results):
        print(f"Page {i+1}: {len(page_content)} characters")

asyncio.run(process_document())

๐Ÿ”‘ Environment Variables

# Primary providers
export OPENAI_API_KEY="your_openai_key"
export ANTHROPIC_API_KEY="your_anthropic_key" 
export GEMINI_API_KEY="your_google_key"
export MISTRAL_API_KEY="your_mistral_key"

# Unified access
export OPENROUTER_API_KEY="your_openrouter_key"

# Figure extraction (can use different keys for different models)
export FIGURE_DETECTOR_KEY="detector_api_key"
export FIGURE_EXTRACTOR_KEY="extractor_api_key"
export FIGURE_PARSER_KEY="parser_api_key"

๐Ÿงช Testing

# Run the test suite
pytest

# Run with coverage
pytest --cov=markthat

# Run specific test categories
pytest tests/test_validation.py
pytest tests/test_providers.py

๐Ÿ“ Project Structure

markthat/
โ”œโ”€โ”€ markthat/
โ”‚   โ”œโ”€โ”€ __init__.py          # Public API
โ”‚   โ”œโ”€โ”€ client.py            # Main MarkThat class
โ”‚   โ”œโ”€โ”€ providers.py         # LLM provider abstractions
โ”‚   โ”œโ”€โ”€ file_processor.py    # PDF/image loading
โ”‚   โ”œโ”€โ”€ image_processing.py  # Image manipulation
โ”‚   โ”œโ”€โ”€ figure_extraction.py # Figure detection & extraction
โ”‚   โ”œโ”€โ”€ prompts/             # Prompt templates & utilities
โ”‚   โ”œโ”€โ”€ utils/               # Validation & helpers
โ”‚   โ”œโ”€โ”€ exceptions.py        # Custom exceptions
โ”‚   โ””โ”€โ”€ logging_config.py    # Logging setup
โ”œโ”€โ”€ tests/                   # Test suite
โ”œโ”€โ”€ examples/                # Usage examples
โ”œโ”€โ”€ pyproject.toml          # Project metadata
โ””โ”€โ”€ README.md               # This file

๐Ÿ› ๏ธ Development

Code Quality

This project uses modern Python development practices:

  • Type Hints: Full type annotations with mypy validation
  • Code Formatting: Black for consistent code style
  • Linting: Ruff for fast, comprehensive linting
  • Import Sorting: isort for organized imports
  • Pre-commit Hooks: Automated quality checks

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes with proper tests
  4. Run quality checks: pre-commit run --all-files
  5. Submit a pull request

Development Setup

# Install development dependencies
pip install -e .[dev]

# Set up pre-commit hooks
pre-commit install

# Run quality checks
black .
ruff check .
isort .
mypy markthat

๐Ÿ“„ API Reference

MarkThat Class

class MarkThat:
    def __init__(
        self,
        *,
        model: str,
        provider: Optional[str] = None,
        fallback_models: Optional[Sequence[str]] = None,
        retry_policy: Optional[RetryPolicy] = None,
        api_key: Optional[str] = None,
    ) -> None: ...

    def convert(
        self,
        file_path: str,
        *,
        format_options: Optional[Dict[str, Any]] = None,
        additional_instructions: Optional[str] = None,
        description_mode: bool = False,
    ) -> List[str]: ...

    async def async_convert(
        self,
        file_path: str,
        *,
        format_options: Optional[Dict[str, Any]] = None,
        additional_instructions: Optional[str] = None,
        description_mode: bool = False,
    ) -> List[str]: ...

RetryPolicy Configuration

@dataclass
class RetryPolicy:
    max_attempts: int = 3
    timeout_seconds: int = 30
    backoff_factor: float = 1.0

๐Ÿ† Supported Models

Direct Provider Access

  • OpenAI: gpt-4o, gpt-4-turbo, gpt-4o-mini
  • Anthropic: claude-3-5-sonnet-20241022, claude-3-opus, claude-3-haiku
  • Google: gemini-2.0-flash-001, gemini-1.5-pro, gemini-1.5-flash
  • Mistral: mistral-large-latest, mistral-medium, mistral-small

OpenRouter Models (300+)

  • Meta: meta-llama/llama-3.2-90b-vision
  • Qwen: qwen/qwen-2-vl-72b-instruct
  • Many more: Access the full catalog at OpenRouter

๐Ÿ› Error Handling

MarkThat provides comprehensive error handling:

from markthat import MarkThat
from markthat.exceptions import ProviderInitializationError, ConversionError

try:
    converter = MarkThat(model="invalid-model")
except ProviderInitializationError as e:
    print(f"Provider setup failed: {e}")

try:
    result = converter.convert("image.jpg")
except ConversionError as e:
    print(f"Conversion failed: {e}")

๐Ÿ“Š Performance Tips

  1. Use Async for Multiple Pages: async_convert() processes pages concurrently
  2. Configure Appropriate Timeouts: Balance speed vs. reliability
  3. Choose the Right Model: Faster models for simple tasks, powerful models for complex content
  4. Leverage Fallbacks: Set up model hierarchies for reliability

๐Ÿ“ˆ Roadmap

  • โœ… Multi-provider LLM support
  • โœ… PDF processing with figure extraction
  • โœ… Async processing capabilities
  • โœ… Comprehensive retry logic
  • โœ… Type-safe, architecture
  • ๐Ÿ”„ Additional file format support (TIFF, WEBP)
  • ๐Ÿ”„ Cost tracking and optimization
  • ๐Ÿ”„ Batch processing API
  • ๐Ÿ”„ Custom prompt template system

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built with modern Python best practices
  • Leverages state-of-the-art multimodal LLMs
  • Inspired by the need for robust document processing tools

๐Ÿ’ฌ Support


MarkThat - Transform visual content into structured text with the power of AI ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markthat-1.2.6.tar.gz (27.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markthat-1.2.6-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file markthat-1.2.6.tar.gz.

File metadata

  • Download URL: markthat-1.2.6.tar.gz
  • Upload date:
  • Size: 27.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for markthat-1.2.6.tar.gz
Algorithm Hash digest
SHA256 fe6bdc6c4aea544648bc69b5339bfa8efbea2e6ac317b16c22a13ae50fca7b7a
MD5 2618aab5a816c60c471d5bbaef97833c
BLAKE2b-256 8de3ff7465434784c802287d586292af855d2b1ac003e27d9d3a24063ea083cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for markthat-1.2.6.tar.gz:

Publisher: release.yml on Flopsky/markthat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file markthat-1.2.6-py3-none-any.whl.

File metadata

  • Download URL: markthat-1.2.6-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for markthat-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9745ab61408cf1bf3e8a9efb8f1189af4c30d1c1096952fa4c78178f00ac0c1d
MD5 4e36f41065d70f843349e5aa96c06899
BLAKE2b-256 83cd7848be553ee9c0bcd0576a6867d42e5b09c05489e5a642952fb32d53df90

See more details on using hashes here.

Provenance

The following attestation bundles were made for markthat-1.2.6-py3-none-any.whl:

Publisher: release.yml on Flopsky/markthat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page