Plug-and-play document readability and extraction toolkit for VLMs.

Project description

PatchFinder

Based on the paper "PatchFinder: Leveraging Visual Language Models for Accurate Information Retrieval using Model Uncertainty"

PatchFinder is a Python library for accurate document text extraction using Vision Language Models (VLMs). It works by splitting images into overlapping patches, processing each patch independently, and combining results based on model confidence.

Features

Efficient patch-based document processing
Support for custom VLM models and prompts
Confidence-based result aggregation
GPU acceleration support
Batch processing capabilities
Comprehensive CLI interface

Installation

pip install patchfinder

Quick Start

Process a single document:

from patchfinder import PatchFinder
from transformers import AutoProcessor, AutoModelForCausalLM

# Initialize model and processor
model_name = "microsoft/phi-3-vision-128k-instruct"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Create PatchFinder instance
finder = PatchFinder(model=model, processor=processor)

# Process image
result = finder.extract(
    image_path="document.jpg",
    prompt="Extract all text from this document"
)

print(f"Extracted text: {result['text']}")
print(f"Confidence: {result['confidence']}")

Integration with Existing Models

If you're already using transformers or other vision models, PatchFinder can be easily integrated:

import torch
from transformers import AutoProcessor, AutoModelForCausalLM
from patchfinder import PatchFinder

class DocumentProcessor:
    def __init__(self, model_name="microsoft/phi-3-vision-128k-instruct"):
        # Initialize your existing model pipeline
        self.processor = AutoProcessor.from_pretrained(
            model_name, 
            trust_remote_code=True
        )
        
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name,
            trust_remote_code=True,
            torch_dtype=torch.float16,
            device_map="auto" if torch.cuda.is_available() else None
        )
        
        # Add PatchFinder on top
        self.patchfinder = PatchFinder(
            model=self.model,
            processor=self.processor,
            patch_size=256,
            overlap=0.25
        )
    
    def process_document(self, image_path: str, custom_prompt: str = None) -> dict:
        # Use your existing preprocessing if needed
        prompt = custom_prompt or "Extract all text from this document"
        
        # Let PatchFinder handle the patch-based processing
        result = self.patchfinder.extract(
            image_path=image_path,
            prompt=prompt,
            timeout=30
        )
        
        # Post-process or format results as needed
        return {
            "text": result["text"],
            "confidence": result["confidence"],
            "processed_patches": result["processed_patches"]
        }

# Usage example
processor = DocumentProcessor()
result = processor.process_document(
    "document.jpg",
    custom_prompt="Extract and structure all text from this document"
)

Command Line Interface

PatchFinder provides a powerful CLI for both single-file and batch processing:

Process a single image:

python -m patchfinder.cli process image.jpg --prompt="Extract text" --verbose

Process all images in a directory:

python -m patchfinder.cli batch_process ./images/ --output_file=results.json

CLI Options:

model_name: Vision language model to use (default: microsoft/phi-3-vision-128k-instruct)
patch_size: Size of image patches (default: 256)
overlap: Overlap ratio between patches (default: 0.25)
device: Processing device (default: auto-detect GPU/CPU)
timeout: Processing timeout in seconds (default: 30)
verbose: Enable detailed logging

Advanced Usage

Custom model configuration:

finder = PatchFinder(
    model=model,
    processor=processor,
    patch_size=512,  # Larger patches
    overlap=0.5,     # More overlap
    max_workers=2    # Parallel processing
)

Batch processing with custom settings:

from pathlib import Path

image_dir = Path("./documents")
for image_path in image_dir.glob("*.jpg"):
    result = finder.extract(
        image_path=str(image_path),
        prompt="Extract and format all text",
        timeout=60
    )
    print(f"Processed {image_path.name}: {result['confidence']:.2f} confidence")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Jan 26, 2025

0.1.0

Jan 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patchfinder-0.2.0.tar.gz (13.7 kB view details)

Uploaded Jan 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

patchfinder-0.2.0-py3-none-any.whl (9.5 kB view details)

Uploaded Jan 26, 2025 Python 3

File details

Details for the file patchfinder-0.2.0.tar.gz.

File metadata

Download URL: patchfinder-0.2.0.tar.gz
Upload date: Jan 26, 2025
Size: 13.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.18

File hashes

Hashes for patchfinder-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`93bb193338861214ad73e4be58f8c110f355365a76936534b09d3ee523e867b3`
MD5	`0d03a1f7c450fc98ac5e810d755b2632`
BLAKE2b-256	`84e2d1340f5aaedefc95a228527ede55373bad93a794886cf98cecd309427487`

See more details on using hashes here.

File details

Details for the file patchfinder-0.2.0-py3-none-any.whl.

File metadata

Download URL: patchfinder-0.2.0-py3-none-any.whl
Upload date: Jan 26, 2025
Size: 9.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.18

File hashes

Hashes for patchfinder-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bc79b2b130db9098c3ceeb816c4d22762031c09feec95a4bfc94c0f5db76eb5`
MD5	`a9a215bc21aabd5fe17fee25d4ac1cb6`
BLAKE2b-256	`761f8f42723c0800706c4d27897beab3d95a7cbc3120d6169c9490fa2af91811`

See more details on using hashes here.

patchfinder 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

PatchFinder

Features

Installation

Quick Start

Integration with Existing Models

Command Line Interface

Advanced Usage

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes