Skip to main content

A Python module to extract content from PDF documents using Vision Language Models (VLMs)

Project description

AI Vision Capture

A powerful Python library for extracting and analyzing content from PDF documents using Vision Language Models (VLMs). This library provides a flexible and efficient way to process documents with support for multiple VLM providers including OpenAI, Anthropic Claude, Google Gemini, and Azure OpenAI.

Features

  • 🔍 Multi-Provider Support: Compatible with major VLM providers (OpenAI, Claude, Gemini, Azure, OpenSource models)
  • 📄 PDF Processing: Efficient PDF to image conversion with configurable DPI
  • 🚀 Async Processing: Asynchronous processing with configurable concurrency
  • 💾 Two-Layer Caching: Local file system and cloud caching for improved performance
  • 🔄 Batch Processing: Process multiple PDFs in parallel
  • 📝 Text Extraction: Enhanced accuracy through combined OCR and VLM processing
  • 🎨 Image Quality Control: Configurable image quality settings
  • 📊 Structured Output: Well-organized JSON and Markdown output

Coming Soon Features

  • 🔗 Cross-Document Knowledge Capture: Capture structured knowledge across multiple documents

  • 🎥 Video Knowledge Capture: Capture structured knowledge from video

Quick Start

Installation

pip install aicapture

Basic Setup

  1. Set your chosen provider and API key (example using OpenAI):
export USE_VISION=openai
export OPENAI_API_KEY=your_openai_key
  1. Use in your code:
from vision_capture import VisionParser

# Initialize parser
parser = VisionParser()

# Process a PDF
result = parser.process_pdf("path/to/your/document.pdf")

# Process multiple PDFs asynchronously
async def process_folder():
    results = await parser.process_folder_async("path/to/folder")
    return results

For detailed configuration options and examples, see:

Configuration

Production Environment

In production, configure the library using environment variables in your shell or deployment environment.

Common settings you may want to adjust:

# Optional performance settings
export MAX_CONCURRENT_TASKS=5      # Number of concurrent processing tasks
export VISION_PARSER_DPI=333      # Image DPI for PDF processing

Development Environment

For local development:

  1. Clone the repository
  2. Copy .env.template to .env
  3. Edit .env with your settings
  4. Install development dependencies: pip install -e ".[dev]"

See .env.template for all available configuration options.

Output Format

The library produces structured output in both JSON and Markdown formats:

{
  "file_object": {
    "file_name": "example.pdf",
    "file_hash": "sha256_hash",
    "total_pages": 10,
    "total_words": 5000,
    "pages": [
      {
        "page_number": 1,
        "page_content": "extracted content",
        "page_hash": "sha256_hash"
      }
    ]
  }
}

Advanced Usage

from vision_capture import VisionParser, GeminiVisionModel

# Configure Gemini vision model with custom settings
vision_model = GeminiVisionModel(
    model="gemini-2.0-flash",
    api_key="your_gemini_api_key"
)

# Initialize parser with custom configuration
parser = VisionParser(
    vision_model=vision_model,
    dpi=400,
    prompt="""
    Please analyze this technical document and extract:
    1. Equipment specifications and model numbers
    2. Operating parameters and limits
    3. Maintenance requirements
    4. Safety protocols
    5. Quality control metrics
    """
)

# Process PDF with custom settings
result = parser.process_pdf(
    pdf_path="path/to/document.pdf",
)

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/tiny-but-mighty)
  3. Commit your changes (git commit -m 'feat: add small but delightful improvement')
  4. Push to the branch (git push origin feature/tiny-but-mighty)
  5. Open a Pull Request

For detailed guidelines, see our Contributing Guide.

License

Copyright 2024 Aitomatic, Inc.

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aicapture-0.1.4.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aicapture-0.1.4-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file aicapture-0.1.4.tar.gz.

File metadata

  • Download URL: aicapture-0.1.4.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.2.0

File hashes

Hashes for aicapture-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f479db94e7e34b370aa6115be151d5295ebc34ed644278eb5b6893975d99f150
MD5 a66e9d65bbc97332db19d89d6cccb0bc
BLAKE2b-256 e730622c507e57920f3ec0f8b00ee6682caf363c1363c96fa0de302672d1c25c

See more details on using hashes here.

File details

Details for the file aicapture-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: aicapture-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.2.0

File hashes

Hashes for aicapture-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2863d3f725be5548f6dab7b25cebe59d47b9b5c2380e25d9bfddc81378e85923
MD5 4ffe470fdc2bcb962ae983e2f5e5bebe
BLAKE2b-256 0e02984d68c1dc5284be9492774efa132461370b8f2f8f972d9a4fb1a23fc236

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page