A Python module to extract content from PDF documents using Vision Language Models (VLMs)

These details have not been verified by PyPI

Project links

Project description

AI Vision Capture

A powerful Python library for extracting and analyzing content from PDF documents using Vision Language Models (VLMs). This library provides a flexible and efficient way to process documents with support for multiple VLM providers including OpenAI, Anthropic Claude, Google Gemini, and Azure OpenAI.

Features

🔍 Multi-Provider Support: Compatible with major VLM providers (OpenAI, Claude, Gemini, Azure, OpenSource models)
📄 Document Processing: Process PDFs and images (JPG, PNG, TIFF, WebP, BMP)
🚀 Async Processing: Asynchronous processing with configurable concurrency
💾 Two-Layer Caching: Local file system and cloud caching for improved performance
🔄 Batch Processing: Process multiple documents in parallel
📝 Text Extraction: Enhanced accuracy through combined OCR and VLM processing
🎨 Image Quality Control: Configurable image quality settings
📊 Structured Output: Well-organized JSON and Markdown output

Coming Soon Features

🔗 Cross-Document Knowledge Capture: Capture structured knowledge across multiple documents
🎥 Video Knowledge Capture: Capture structured knowledge from video

Quick Start

Installation

pip install aicapture

Basic Setup

Set your chosen provider and API key (example using OpenAI):

export USE_VISION=openai
export OPENAI_API_KEY=your_openai_key

Use in your code:

from vision_capture import VisionParser

# Initialize parser
parser = VisionParser()

# Process a PDF
result = parser.process_pdf("path/to/your/document.pdf")

# Process an image
result = parser.process_image("path/to/your/image.jpg")

# Process multiple documents asynchronously
async def process_folder():
    results = await parser.process_folder_async("path/to/folder")  # Processes both PDFs and images
    return results

For detailed configuration options and examples, see:

Common settings you may want to adjust:

# Optional performance settings
export MAX_CONCURRENT_TASKS=5      # Number of concurrent processing tasks
export VISION_PARSER_DPI=333      # Image DPI for PDF processing

Development Environment

For local development:

Clone the repository
Copy .env.template to .env
Edit .env with your settings
Install development dependencies: pip install -e ".[dev]"

See .env.template for all available configuration options.

Output Format

The library produces structured output in both JSON and Markdown formats:

{
  "file_object": {
    "file_name": "example.pdf",
    "file_hash": "sha256_hash",
    "total_pages": 10,
    "total_words": 5000,
    "pages": [
      {
        "page_number": 1,
        "page_content": "extracted content",
        "page_hash": "sha256_hash"
      }
    ]
  }
}

Advanced Usage

from vision_capture import VisionParser, GeminiVisionModel

# Configure Gemini vision model with custom settings
vision_model = GeminiVisionModel(
    model="gemini-2.0-flash",
    api_key="your_gemini_api_key"
)

# Initialize parser with custom configuration
parser = VisionParser(
    vision_model=vision_model,
    dpi=400,
    prompt="""
    Please analyze this technical document and extract:
    1. Equipment specifications and model numbers
    2. Operating parameters and limits
    3. Maintenance requirements
    4. Safety protocols
    5. Quality control metrics
    """
)

# Process PDF with custom settings
result = parser.process_pdf(
    pdf_path="path/to/document.pdf",
)

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/tiny-but-mighty)
Commit your changes (git commit -m 'feat: add small but delightful improvement')
Push to the branch (git push origin feature/tiny-but-mighty)
Open a Pull Request

For detailed guidelines, see our Contributing Guide.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.4

Mar 5, 2026

0.6.3

Mar 2, 2026

0.6.2

Feb 23, 2026

0.6.1

Feb 23, 2026

0.6.0

Feb 23, 2026

0.5.4

Feb 15, 2026

0.5.3

Feb 13, 2026

0.5.1

Jan 27, 2026

0.5.0

Jan 24, 2026

0.4.0

Jan 24, 2026

0.3.7

Jan 5, 2026

0.3.6

Nov 27, 2025

0.3.5

Oct 3, 2025

0.3.4

Oct 3, 2025

0.3.3

Aug 14, 2025

0.3.2

Aug 7, 2025

0.3.1

Jul 5, 2025

0.3.0

Jun 3, 2025

0.2.11

May 2, 2025

0.2.10

Apr 29, 2025

0.2.9

Apr 20, 2025

0.2.8

Apr 15, 2025

0.2.7

Apr 12, 2025

0.2.6

Apr 11, 2025

0.2.5

Apr 11, 2025

0.2.4

Apr 3, 2025

0.2.3

Apr 3, 2025

0.2.2

Apr 3, 2025

0.2.1

Apr 3, 2025

0.2.0

Apr 2, 2025

0.1.10

Apr 2, 2025

0.1.9

Apr 2, 2025

0.1.8

Mar 19, 2025

0.1.7

Mar 15, 2025

This version

0.1.5

Mar 15, 2025

0.1.4

Mar 10, 2025

0.1.3

Mar 6, 2025

0.1.2

Mar 5, 2025

0.1.1

Mar 4, 2025

0.1.0

Mar 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aicapture-0.1.5.tar.gz (22.6 kB view details)

Uploaded Mar 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aicapture-0.1.5-py3-none-any.whl (23.6 kB view details)

Uploaded Mar 15, 2025 Python 3

File details

Details for the file aicapture-0.1.5.tar.gz.

File metadata

Download URL: aicapture-0.1.5.tar.gz
Upload date: Mar 15, 2025
Size: 22.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.2.0

File hashes

Hashes for aicapture-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`79e621aaeda793704baf2fea5826b77f52c3a672e93af7b62f2b3b765207726e`
MD5	`afd37605c97a85208e17a74b2d3b64ba`
BLAKE2b-256	`ba8374dc5c2f3bd06330a17f02dbd96ce26d64ce497fa914d37e123220b9b756`

See more details on using hashes here.

File details

Details for the file aicapture-0.1.5-py3-none-any.whl.

File metadata

Download URL: aicapture-0.1.5-py3-none-any.whl
Upload date: Mar 15, 2025
Size: 23.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.2.0

File hashes

Hashes for aicapture-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a191f6b01da8bb6658576b73a678b06b0ab340d8b30177d33c3a58f7029d254`
MD5	`f10a8d093652a0571f73d3825b42b521`
BLAKE2b-256	`0390913cbcf083bff5ce6f7c85a0ef1212174d656f58458d9e5525e378ccfac8`

See more details on using hashes here.

aicapture 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AI Vision Capture

Features

Coming Soon Features

Quick Start

Installation

Basic Setup

Development Environment

Output Format

Advanced Usage

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes