Skip to main content

Proxy-based confidence scoring framework for extraction quality assessment

Project description

document-confidence

A proxy-based confidence scoring framework for evaluating extraction quality without requiring ground truth labels. This library provides a production-ready architecture for assessing document extraction results and routing them to appropriate recovery workflows.

Overview

document-confidence implements a multi-metric scoring system that evaluates extraction quality across multiple dimensions. The library operates deterministically without requiring external APIs, LLM calls, or cloud dependencies, making it suitable for high-throughput document processing pipelines.

Architecture

The library follows a modular architecture with:

  • Metric-based Scoring - Independent metrics for different quality dimensions
  • Weighted Aggregation - Configurable weights for metric prioritization
  • Deficiency Classification - Automatic detection of specific failure types
  • Recommendation Engine - Routing decisions for accept/recover/human review
  • Protocol-based Interfaces - Type-safe contracts for extensibility

Installation

pip install document-confidence

Optional Dependencies

# For development
pip install document-confidence[dev]

Quick Start

from document_confidence import (
    ConfidenceConfig,
    ConfidenceScorer,
    RecommendationType,
)
from document_confidence.metrics import (
    TextCoverageMetric,
    TableCompletenessMetric,
    SchemaFillMetric,
    ConsistencyMetric,
    DensityMetric,
)

# Configure confidence scoring
config = ConfidenceConfig(
    acceptance_threshold=90.0,
    human_review_threshold=80.0,
    weights={
        "text_coverage": 0.25,
        "table_completeness": 0.20,
        "schema_fill": 0.25,
        "consistency": 0.15,
        "density": 0.15,
    },
)

# Create metrics
metrics = [
    TextCoverageMetric(),
    TableCompletenessMetric(),
    SchemaFillMetric(),
    ConsistencyMetric(),
    DensityMetric(),
]

# Initialize scorer
scorer = ConfidenceScorer(config, metrics)

# Score extraction
report = scorer.score(
    extraction=extracted_data,
    page_metadata=page_metadata,
    parse_results=parse_results,
)

# Get recommendation
if report.recommendation == RecommendationType.ACCEPT:
    print("Extraction accepted")
elif report.recommendation == RecommendationType.HUMAN_REVIEW:
    print("Requires human review")
else:
    print("Recovery needed")

Configuration

Confidence Configuration

from document_confidence import ConfidenceConfig

config = ConfidenceConfig(
    acceptance_threshold=90.0,
    human_review_threshold=80.0,
    weights={
        "text_coverage": 0.25,
        "table_completeness": 0.20,
        "schema_fill": 0.25,
        "consistency": 0.15,
        "density": 0.15,
    },
    enable_metric_explanations=True,
    normalize_weights=True,
    strict_schema_validation=True,
    minimum_metric_score=0.50,
)

Metrics

Text Coverage Metric

Detects OCR failures by measuring text coverage.

from document_confidence.metrics import TextCoverageMetric

metric = TextCoverageMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)

Formula: extracted_text_length / expected_text_length

Threshold: < 0.70 → OCR_GAP deficiency

Table Completeness Metric

Detects missing tables by measuring table completeness.

from document_confidence.metrics import TableCompletenessMetric

metric = TableCompletenessMetric(weight=0.20)
score = metric.compute(extraction, page_metadata, parse_results)

Formula: recovered_cells / expected_cells

Threshold: < 0.80 → TABLE_MISSING deficiency

Schema Fill Metric

Measures extraction completeness by checking schema fill.

from document_confidence.metrics import SchemaFillMetric

metric = SchemaFillMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)

Formula: required_fields_populated / required_fields_total

Consistency Metric

Detects internal contradictions in extraction.

from document_confidence.metrics import ConsistencyMetric

metric = ConsistencyMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)

Checks:

  • Duplicate shelf numbers
  • Same UPC with different names
  • Missing cross-reference mappings
  • Orphan products

Formula: 1 - error_rate

Threshold: < 0.70 → CROSSREF_BROKEN deficiency

Density Metric

Detects implausible shelf layouts by measuring density.

from document_confidence.metrics import DensityMetric

metric = DensityMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)

Checks:

  • Products per shelf (too sparse or too dense)
  • Facings distribution
  • Section density

Threshold: < 0.60 → SPATIAL_FAILURE deficiency

Deficiency Classification

The library automatically classifies deficiencies based on metric scores:

from document_confidence.models import DeficiencyType

deficiency_types = [
    DeficiencyType.OCR_GAP,        # Text coverage < 0.70
    DeficiencyType.TABLE_MISSING,   # Table completeness < 0.80
    DeficiencyType.SPATIAL_FAILURE, # Density < 0.60
    DeficiencyType.CROSSREF_BROKEN, # Consistency < 0.70
]

Each deficiency includes:

  • Type
  • Severity (0.0 - 1.0)
  • Affected pages
  • Human-readable description

Confidence Bands

Overall scores are mapped to confidence bands:

95-100   EXCELLENT
90-95    GOOD
80-90    FAIR
60-80    POOR
0-60     CRITICAL

Recommendations

Based on overall score and deficiencies:

score >= 90               ACCEPT
80 <= score < 90          HUMAN_REVIEW
score < 80                RECOVER

Custom Metrics

Create custom metrics by extending BaseConfidenceMetric:

from document_confidence.metrics import BaseConfidenceMetric

class CustomMetric(BaseConfidenceMetric):
    def __init__(self, weight: float = 0.10):
        super().__init__(name="custom", weight=weight)
    
    def _compute(self, extraction, page_metadata, parse_results):
        # Custom computation logic
        return 0.9  # Return score between 0.0 and 1.0

Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=document_confidence

Code Style

# Format code
black document_confidence

# Lint code
ruff check document_confidence

# Type check
mypy document_confidence

Design Principles

  1. Deterministic - No external APIs or LLM calls
  2. Computationally Lightweight - O(n) complexity where possible
  3. Extensible - Plugin architecture for custom metrics
  4. Type-safe - Full type hints with Pydantic validation
  5. Production-ready - Enterprise-scale performance

Dependencies

  • plano-core>=0.1.0 - Shared interfaces and models
  • pydantic>=2.0 - Data validation
  • typing_extensions>=4.0 - Type extensions

Performance

The library is designed for:

  • 1000+ page documents
  • O(n) metric calculations
  • Minimal memory usage
  • No repeated traversals

License

MIT

Support

For issues, questions, or contributions, please visit the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepsico_document_confidence-0.1.0.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pepsico_document_confidence-0.1.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file pepsico_document_confidence-0.1.0.tar.gz.

File metadata

File hashes

Hashes for pepsico_document_confidence-0.1.0.tar.gz
Algorithm Hash digest
SHA256 68994e4642aacc8c15782cda30ce78b2ec8f4acaea430a61f14e1d9af638f789
MD5 3bf9b5c246df45e1b24f1caebe0cc067
BLAKE2b-256 850aaa85b3a68d806166907aff8a73d52b0084d3a0dc35ebb45362f9e3cea6d2

See more details on using hashes here.

File details

Details for the file pepsico_document_confidence-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pepsico_document_confidence-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e3b3aadcb9cc1ebc9fe9c51226c672e5a7fe48d8716eb84138bccc4074703ed6
MD5 8d5f80a92a6ae5f78da31f7ad1cb724c
BLAKE2b-256 f99d8a94181f676f538cdd1094b30437ab97e0b7c75db069452b4fa194e46e88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page