A Python module to extract content from PDF documents using Vision Language Models (VLMs)
Project description
AI Vision Capture
A powerful Python library for extracting and analyzing content from PDF documents using Vision Language Models (VLMs). This library provides a flexible and efficient way to process documents with support for multiple VLM providers including OpenAI, Anthropic Claude, Google Gemini, and Azure OpenAI.
Features
- 🔍 Multi-Provider Support: Compatible with major VLM providers (OpenAI, Claude, Gemini, Azure, OpenSource models)
- 📄 PDF Processing: Efficient PDF to image conversion with configurable DPI
- 🚀 Async Processing: Asynchronous processing with configurable concurrency
- 💾 Two-Layer Caching: Local file system and cloud caching for improved performance
- 🔄 Batch Processing: Process multiple PDFs in parallel
- 📝 Text Extraction: Enhanced accuracy through combined OCR and VLM processing
- 🎨 Image Quality Control: Configurable image quality settings
- 📊 Structured Output: Well-organized JSON and Markdown output
Quick Start
from vision_capture import VisionParser
# Initialize parser
parser = VisionParser()
# Process a single PDF
result = parser.process_pdf("path/to/your/document.pdf")
# Process a folder of PDFs asynchronously
async def process_folder():
results = await parser.process_folder_async("path/to/folder")
return results
Configuration
The library can be configured through environment variables:
# Vision Model Selection
USE_VISION=openai # Options: openai, claude, gemini, azure-openai
# API Keys
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GEMINI_API_KEY=your_key
AZURE_OPENAI_API_KEY=your_key
# Cache Settings
DXA_DATA_BUCKET=your_s3_bucket_name
# Performance Settings
MAX_CONCURRENT_TASKS=5
VISION_PARSER_DPI=333
Output Format
The library produces structured output in both JSON and Markdown formats:
{
"file_object": {
"file_name": "example.pdf",
"file_hash": "sha256_hash",
"total_pages": 10,
"total_words": 5000,
"pages": [
{
"page_number": 1,
"page_content": "extracted content",
"page_hash": "sha256_hash"
}
]
}
}
Advanced Usage
from vision_capture import VisionParser, GeminiVisionModel
# Configure Gemini vision model with custom settings
vision_model = GeminiVisionModel(
model="gemini-pro-vision",
api_key="your_gemini_api_key"
)
# Initialize parser with custom configuration
parser = VisionParser(
vision_model=vision_model,
dpi=400,
image_quality="high",
prompt="""
Please analyze this technical document and extract:
1. Equipment specifications and model numbers
2. Operating parameters and limits
3. Maintenance requirements
4. Safety protocols
5. Quality control metrics
"""
)
# Process PDF with custom settings
result = parser.process_pdf(
pdf_path="path/to/document.pdf",
cache_enabled=True
)
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
For detailed guidelines, see our Contributing Guide.
License
Copyright 2024 Aitomatic, Inc.
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aicapture-0.1.0.tar.gz.
File metadata
- Download URL: aicapture-0.1.0.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a677d157040d88320b261557e1bb74ee6f3ab17b54384f4341c1691a4f61ae3
|
|
| MD5 |
d9b26cf795e0743aadeba93eddfb8e04
|
|
| BLAKE2b-256 |
764ab34dc7b13816cc72979ee0604e83a61c4fdc9f7962be932d196a8a719802
|
File details
Details for the file aicapture-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aicapture-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.10.4 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab5baf150626ba71ec6b05998f7caec7adce142ffbeb5af1ce3fa7b76a4eee3b
|
|
| MD5 |
abc3cc1389ae3f67ac4f16263e7df391
|
|
| BLAKE2b-256 |
fda1a5414cff57d68da25367d6dc1313776d06f9da4b2348dce64b2262e583ef
|