Skip to main content

A Python module to extract content from PDF documents using Vision Language Models (VLMs)

Project description

AI Vision Capture

A powerful Python library for extracting and analyzing content from PDF documents using Vision Language Models (VLMs). This library provides a flexible and efficient way to process documents with support for multiple VLM providers including OpenAI, Anthropic Claude, Google Gemini, and Azure OpenAI.

Features

  • 🔍 Multi-Provider Support: Compatible with major VLM providers (OpenAI, Claude, Gemini, Azure, OpenSource models)
  • 📄 PDF Processing: Efficient PDF to image conversion with configurable DPI
  • 🚀 Async Processing: Asynchronous processing with configurable concurrency
  • 💾 Two-Layer Caching: Local file system and cloud caching for improved performance
  • 🔄 Batch Processing: Process multiple PDFs in parallel
  • 📝 Text Extraction: Enhanced accuracy through combined OCR and VLM processing
  • 🎨 Image Quality Control: Configurable image quality settings
  • 📊 Structured Output: Well-organized JSON and Markdown output

Quick Start

Install the library:

pip install aicapture
from vision_capture import VisionParser

# Initialize parser
parser = VisionParser()

# Process a single PDF
result = parser.process_pdf("path/to/your/document.pdf")

# Process a folder of PDFs asynchronously
async def process_folder():
    results = await parser.process_folder_async("path/to/folder")
    return results

Configuration

The library is configured through environment variables that can be set in your shell or via a .env file.

  1. Copy .env.template to .env
  2. Choose ONE vision provider:
USE_VISION=claude  # Options: openai, claude, gemini, azure-openai
  1. Configure your chosen provider's API key and settings
  2. Adjust common settings if needed (DPI, concurrency, etc.)

See .env.template for detailed configuration options and examples.

Output Format

The library produces structured output in both JSON and Markdown formats:

{
  "file_object": {
    "file_name": "example.pdf",
    "file_hash": "sha256_hash",
    "total_pages": 10,
    "total_words": 5000,
    "pages": [
      {
        "page_number": 1,
        "page_content": "extracted content",
        "page_hash": "sha256_hash"
      }
    ]
  }
}

Advanced Usage

from vision_capture import VisionParser, GeminiVisionModel

# Configure Gemini vision model with custom settings
vision_model = GeminiVisionModel(
    model="gemini-2.0-flash",
    api_key="your_gemini_api_key"
)

# Initialize parser with custom configuration
parser = VisionParser(
    vision_model=vision_model,
    dpi=400,
    image_quality="high",
    prompt="""
    Please analyze this technical document and extract:
    1. Equipment specifications and model numbers
    2. Operating parameters and limits
    3. Maintenance requirements
    4. Safety protocols
    5. Quality control metrics
    """
)

# Process PDF with custom settings
result = parser.process_pdf(
    pdf_path="path/to/document.pdf",
    cache_enabled=True
)

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

For detailed guidelines, see our Contributing Guide.

License

Copyright 2024 Aitomatic, Inc.

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aicapture-0.1.2.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aicapture-0.1.2-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file aicapture-0.1.2.tar.gz.

File metadata

  • Download URL: aicapture-0.1.2.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.3.0

File hashes

Hashes for aicapture-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ea66c4450411d40555f3f6c0c16420caa29c51a3be6173680617f5515021fbd2
MD5 53b660de73818e38532b38db754d0a65
BLAKE2b-256 100d291308bfad3c38fcca9cc76dae43670e099480b168c01bf5129378a246ec

See more details on using hashes here.

File details

Details for the file aicapture-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: aicapture-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.3.0

File hashes

Hashes for aicapture-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0eb370ba10e5456dfe0ad0992e75d05ae68a257aabfa3e4950f957945abd6d59
MD5 aa4ea8f0b890e4b67546896b42700ff0
BLAKE2b-256 fc7a13c881495b499ab327c19f10e651c8e40a8f641f98555755284c54d3ca41

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page