Reusable Python toolkit for vision-based document analysis using IBM Watsonx AI or Ollama

These details have not been verified by PyPI

Project links

Project description

Watsonx Vision Toolkit

A reusable Python toolkit for vision-based document analysis using IBM Watsonx AI or Ollama. Includes fraud detection, document classification, information extraction, cross-validation, and multi-criteria decision engines.

Extracted from the IBM Watsonx Loan Preprocessing Agents project with proven production results.

Features

Vision LLM Interface - Unified API for IBM Watsonx AI and Ollama vision models
Document Classification - Automatically classify documents (passport, license, tax returns, etc.)
Information Extraction - Extract structured data from document images (PII, financial data)
Fraud Detection - Detect document forgery, manipulation, and authenticity issues
Cross-Validation - Compare data across multiple documents for consistency
Decision Engine - Multi-criteria weighted scoring for automated decisions

Installation

# Basic installation (no provider dependencies)
pip install watsonx-vision-toolkit

# With IBM Watsonx AI support
pip install watsonx-vision-toolkit[watsonx]

# With Ollama support
pip install watsonx-vision-toolkit[ollama]

# With all providers
pip install watsonx-vision-toolkit[all]

# Development installation
pip install watsonx-vision-toolkit[dev]

From Source

git clone https://github.com/qvidal01/watsonx-vision-toolkit.git
cd watsonx-vision-toolkit
pip install -e ".[all,dev]"

Quick Start

1. Vision LLM - Document Classification

from watsonx_vision import VisionLLM, VisionLLMConfig, LLMProvider

# Configure for IBM Watsonx AI
config = VisionLLMConfig(
    provider=LLMProvider.WATSONX,
    model_id="meta-llama/llama-4-maverick-17b-128e-instruct-fp8",
    api_key="your-ibm-cloud-api-key",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="your-project-id"
)

# Or configure for Ollama (local)
config = VisionLLMConfig(
    provider=LLMProvider.OLLAMA,
    model_id="llava:13b",
    url="http://localhost:11434"
)

# Initialize and use
llm = VisionLLM(config)

# Encode image
image_data = VisionLLM.encode_image_to_base64("path/to/document.png")

# Classify document
result = llm.classify_document(image_data)
print(f"Document type: {result['doc_type']}")
# Output: Document type: Passport

# Extract information
info = llm.extract_information(image_data)
print(f"Name: {info.get('name')}")
print(f"DOB: {info.get('dob')}")
# Output: Name: John Doe
# Output: DOB: 1990-05-15

2. Fraud Detection

from watsonx_vision import VisionLLM, VisionLLMConfig, FraudDetector

# Initialize vision LLM (see above)
llm = VisionLLM(config)

# Create fraud detector
detector = FraudDetector(
    vision_llm=llm,
    layout_threshold=70,  # Minimum layout score
    field_threshold=70,   # Minimum field score
    min_confidence=60     # Minimum overall confidence
)

# Validate single document
image_data = VisionLLM.encode_image_to_base64("passport.png")
result = detector.validate_document(image_data, filename="passport.png")

if result.valid:
    print(f"Document is authentic (confidence: {result.confidence}%)")
else:
    print(f"Fraud detected: {result.reason}")
    print(f"Severity: {result.severity.value}")
    print(f"Issues: {result.forgery_signs}")

# Validate batch of documents
documents = [
    {"image_data": img1, "filename": "passport.png"},
    {"image_data": img2, "filename": "license.png"},
    {"image_data": img3, "filename": "bank_statement.png"}
]
results = detector.validate_batch(documents)

# Generate report
report = detector.generate_report(results)
print(f"Fraud rate: {report['fraud_rate']}%")

3. Cross-Validation

from watsonx_vision import CrossValidator

# Initialize validator
validator = CrossValidator(
    api_key="your-ibm-cloud-api-key",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="your-project-id"
)

# Application data from form
application_data = {
    "name": "John Doe",
    "dob": "1990-05-15",
    "ssn": "123-45-6789",
    "address": "123 Main St, Springfield, IL 62701"
}

# Extracted data from documents
document_data = [
    {
        "doc_type": "Passport",
        "name": "John Doe",
        "dob": "1990-05-15",
        "passport_number": "X12345678"
    },
    {
        "doc_type": "Tax Return",
        "name": "John D. Doe",  # Slight variation
        "ssn": "123-45-6789",
        "annual_income": 75000
    },
    {
        "doc_type": "Bank Statement",
        "name": "John Doe",
        "account_number": "****1234"
    }
]

# Validate
result = validator.validate(application_data, document_data)

if result.passed:
    print("All data is consistent!")
else:
    print(f"Found {result.total_inconsistencies} inconsistencies:")
    for issue in result.inconsistencies:
        print(f"  - {issue.field}: {issue.explanation} [{issue.severity.value}]")

# Generate human-readable report
print(validator.generate_report(result))

4. Decision Engine

from watsonx_vision import DecisionEngine, LoanDecisionEngine

# Basic decision engine
engine = DecisionEngine(
    approval_threshold=75.0,
    rejection_threshold=40.0,
    fraud_weight=0.4,
    cross_validation_weight=0.3,
    custom_criteria_weight=0.3
)

# Add custom criteria
engine.add_criterion(
    "minimum_age",
    weight=0.5,
    evaluator=lambda data: data.get("age", 0) >= 18
)

engine.add_criterion(
    "income_requirement",
    weight=0.5,
    evaluator=lambda data: data.get("annual_income", 0) >= 50000
)

# Make decision
decision = engine.decide(
    fraud_results=fraud_detector_results,    # List[FraudResult]
    validation_result=cross_validation_result,  # ValidationResult
    custom_data={"age": 25, "annual_income": 75000}
)

print(decision.summary())
# Output: ✅ APPROVED (Score: 85.5/100)

# Get detailed results
print(f"Status: {decision.status.value}")
print(f"Reasons: {decision.reasons}")
print(f"Recommendations: {decision.recommendations}")

# Or use the pre-configured loan decision engine
loan_engine = LoanDecisionEngine(
    min_age=18,
    min_income=30000,
    max_dti=0.43
)

decision = loan_engine.decide(
    fraud_results=fraud_results,
    validation_result=validation_result,
    custom_data={
        "age": 35,
        "annual_income": 85000,
        "monthly_debt": 1500,
        "monthly_income": 7000
    }
)

Complete Example: Loan Application Processing

from watsonx_vision import (
    VisionLLM, VisionLLMConfig, LLMProvider,
    FraudDetector, CrossValidator, LoanDecisionEngine
)

# 1. Setup
config = VisionLLMConfig(
    provider=LLMProvider.WATSONX,
    model_id="meta-llama/llama-4-maverick-17b-128e-instruct-fp8",
    api_key="your-api-key",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="your-project-id"
)

vision_llm = VisionLLM(config)
fraud_detector = FraudDetector(vision_llm)
cross_validator = CrossValidator(
    api_key="your-api-key",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="your-project-id"
)
decision_engine = LoanDecisionEngine(min_age=18, min_income=30000)

# 2. Process documents
documents = ["passport.png", "tax_return.png", "bank_statement.png"]
document_data = []
fraud_results = []

for doc_path in documents:
    # Encode image
    image_data = VisionLLM.encode_image_to_base64(doc_path)

    # Classify
    doc_type = vision_llm.classify_document(image_data)

    # Extract info
    extracted = vision_llm.extract_information(image_data)
    extracted["doc_type"] = doc_type["doc_type"]
    extracted["filename"] = doc_path
    document_data.append(extracted)

    # Fraud check
    fraud_result = fraud_detector.validate_document(image_data, doc_path)
    fraud_results.append(fraud_result)

# 3. Cross-validate
application_data = {
    "name": "John Doe",
    "dob": "1990-05-15",
    "annual_income": 75000
}
validation_result = cross_validator.validate(application_data, document_data)

# 4. Make decision
decision = decision_engine.decide(
    fraud_results=fraud_results,
    validation_result=validation_result,
    custom_data=application_data
)

# 5. Output results
print(decision.summary())
print(decision.to_dict())

Environment Variables

The toolkit can be configured via environment variables:

# IBM Watsonx AI
export WATSONX_APIKEY="your-api-key"
export WATSONX_URL="https://us-south.ml.cloud.ibm.com"
export WATSONX_PROJECT_ID="your-project-id"

# Ollama (optional)
export OLLAMA_URL="http://localhost:11434"

Supported Models

IBM Watsonx AI

Model	Type	Best For
`meta-llama/llama-4-maverick-17b-128e-instruct-fp8`	Vision	Document analysis, classification
`mistralai/mistral-medium-2505`	Text	Cross-validation, decision logic
`ibm/granite-3-8b-instruct`	Text	General processing

Ollama (Local)

Model	Type	Best For
`llava:13b`	Vision	Document analysis
`llava:34b`	Vision	High-accuracy analysis
`mistral:7b`	Text	Cross-validation

API Reference

Full API Documentation: See docs/API_REFERENCE.md for complete API documentation including all parameters, return types, exceptions, and examples.

VisionLLM

class VisionLLM:
    def __init__(self, config: VisionLLMConfig): ...
    def classify_document(self, image_data: str, document_types: List[str] = None) -> Dict: ...
    def extract_information(self, image_data: str, fields: List[str] = None) -> Dict: ...
    def validate_authenticity(self, image_data: str) -> Dict: ...
    def analyze_image(self, image_data: str, prompt: str, system_prompt: str = None) -> Dict: ...

    @staticmethod
    def encode_image_to_base64(image_path: str, mime_type: str = None) -> str: ...

FraudDetector

class FraudDetector:
    def __init__(self, vision_llm: VisionLLM, layout_threshold: int = 70, ...): ...
    def validate_document(self, image_data: str, filename: str = None) -> FraudResult: ...
    def validate_batch(self, documents: List[Dict]) -> List[FraudResult]: ...
    def generate_report(self, results: List[FraudResult]) -> Dict: ...

CrossValidator

class CrossValidator:
    def __init__(self, api_key: str, url: str, project_id: str, ...): ...
    def validate(self, application_data: Dict, document_data: List[Dict]) -> ValidationResult: ...
    def validate_batch(self, packages: List[Dict]) -> List[ValidationResult]: ...
    def generate_report(self, result: ValidationResult) -> str: ...

DecisionEngine

class DecisionEngine:
    def __init__(self, approval_threshold: float = 75.0, ...): ...
    def add_criterion(self, name: str, weight: float, evaluator: Callable): ...
    def remove_criterion(self, name: str): ...
    def decide(self, fraud_results: List, validation_result: ValidationResult, ...) -> Decision: ...

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=watsonx_vision --cov-report=html

# Run specific test file
pytest tests/test_vision_llm.py

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests (pytest)
Commit (git commit -m 'Add amazing feature')
Push (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see LICENSE file.

Credits

Author: AIQSO - Quinn Vidal (quinn@aiqso.io)
Extracted from: IBM Watsonx Loan Preprocessing Agents project
Powered by: IBM Watsonx AI, Ollama, LangChain

Related Projects

IBM dsce-sample-apps - Original IBM sample applications
LangChain - LLM application framework
Ollama - Local LLM runtime

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

watsonx_vision_toolkit-0.2.0.tar.gz (77.6 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

watsonx_vision_toolkit-0.2.0-py3-none-any.whl (41.2 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file watsonx_vision_toolkit-0.2.0.tar.gz.

File metadata

Download URL: watsonx_vision_toolkit-0.2.0.tar.gz
Upload date: Jan 28, 2026
Size: 77.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for watsonx_vision_toolkit-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`492f2ffa5813405c2a64273d1b6b217434209450663490a37c113388e0e68bfa`
MD5	`5901b4d792f6e47f03844fcbf4aef9e9`
BLAKE2b-256	`9582816bbb89f5d81431afc73435b0cc255c1f41cafe01afee73c02866076e0c`

See more details on using hashes here.

File details

Details for the file watsonx_vision_toolkit-0.2.0-py3-none-any.whl.

File metadata

Download URL: watsonx_vision_toolkit-0.2.0-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 41.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for watsonx_vision_toolkit-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd90a3e47af79f9c76f2f28e5634fcc8ff240ad7e449ed2c229a7d7f1ccbc1b4`
MD5	`671140c67cb7c64a5a0dd3bc9c7e418a`
BLAKE2b-256	`9b48591a10a8b55052a952da98aaeaa3ee815036781d07d7fb481ebaaf23611f`

See more details on using hashes here.

watsonx-vision-toolkit 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Watsonx Vision Toolkit

Features

Installation

From Source

Quick Start

1. Vision LLM - Document Classification

2. Fraud Detection

3. Cross-Validation

4. Decision Engine

Complete Example: Loan Application Processing

Environment Variables

Supported Models

IBM Watsonx AI

Ollama (Local)

API Reference

VisionLLM

FraudDetector

CrossValidator

DecisionEngine

Testing

Contributing

License

Credits

Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes