Skip to main content

Proxy-based confidence scoring framework for extraction quality assessment

Project description

document-confidence

A proxy-based confidence scoring framework for evaluating extraction quality without requiring ground truth labels. This library provides a production-ready architecture for assessing document extraction results and routing them to appropriate recovery workflows.

Overview

document-confidence implements a multi-metric scoring system that evaluates extraction quality across multiple dimensions. The library operates deterministically without requiring external APIs, LLM calls, or cloud dependencies, making it suitable for high-throughput document processing pipelines.

Architecture

The library follows a modular architecture with:

  • Metric-based Scoring - Independent metrics for different quality dimensions
  • Weighted Aggregation - Configurable weights for metric prioritization
  • Deficiency Classification - Automatic detection of specific failure types
  • Recommendation Engine - Routing decisions for accept/recover/human review
  • Protocol-based Interfaces - Type-safe contracts for extensibility

Installation

pip install document-confidence

Optional Dependencies

# For development
pip install document-confidence[dev]

Quick Start

from document_confidence import (
    ConfidenceConfig,
    ConfidenceScorer,
    RecommendationType,
)
from document_confidence.metrics import (
    TextCoverageMetric,
    TableCompletenessMetric,
    SchemaFillMetric,
    ConsistencyMetric,
    DensityMetric,
)

# Configure confidence scoring
config = ConfidenceConfig(
    acceptance_threshold=90.0,
    human_review_threshold=80.0,
    weights={
        "text_coverage": 0.25,
        "table_completeness": 0.20,
        "schema_fill": 0.25,
        "consistency": 0.15,
        "density": 0.15,
    },
)

# Create metrics
metrics = [
    TextCoverageMetric(),
    TableCompletenessMetric(),
    SchemaFillMetric(),
    ConsistencyMetric(),
    DensityMetric(),
]

# Initialize scorer
scorer = ConfidenceScorer(config, metrics)

# Score extraction
report = scorer.score(
    extraction=extracted_data,
    page_metadata=page_metadata,
    parse_results=parse_results,
)

# Get recommendation
if report.recommendation == RecommendationType.ACCEPT:
    print("Extraction accepted")
elif report.recommendation == RecommendationType.HUMAN_REVIEW:
    print("Requires human review")
else:
    print("Recovery needed")

Configuration

Confidence Configuration

from document_confidence import ConfidenceConfig

config = ConfidenceConfig(
    acceptance_threshold=90.0,
    human_review_threshold=80.0,
    weights={
        "text_coverage": 0.25,
        "table_completeness": 0.20,
        "schema_fill": 0.25,
        "consistency": 0.15,
        "density": 0.15,
    },
    enable_metric_explanations=True,
    normalize_weights=True,
    strict_schema_validation=True,
    minimum_metric_score=0.50,
)

Metrics

Text Coverage Metric

Detects OCR failures by measuring text coverage.

from document_confidence.metrics import TextCoverageMetric

metric = TextCoverageMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)

Formula: extracted_text_length / expected_text_length

Threshold: < 0.70 → OCR_GAP deficiency

Table Completeness Metric

Detects missing tables by measuring table completeness.

from document_confidence.metrics import TableCompletenessMetric

metric = TableCompletenessMetric(weight=0.20)
score = metric.compute(extraction, page_metadata, parse_results)

Formula: recovered_cells / expected_cells

Threshold: < 0.80 → TABLE_MISSING deficiency

Schema Fill Metric

Measures extraction completeness by checking schema fill.

from document_confidence.metrics import SchemaFillMetric

metric = SchemaFillMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)

Formula: required_fields_populated / required_fields_total

Consistency Metric

Detects internal contradictions in extraction.

from document_confidence.metrics import ConsistencyMetric

metric = ConsistencyMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)

Checks:

  • Duplicate shelf numbers
  • Same UPC with different names
  • Missing cross-reference mappings
  • Orphan products

Formula: 1 - error_rate

Threshold: < 0.70 → CROSSREF_BROKEN deficiency

Density Metric

Detects implausible shelf layouts by measuring density.

from document_confidence.metrics import DensityMetric

metric = DensityMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)

Checks:

  • Products per shelf (too sparse or too dense)
  • Facings distribution
  • Section density

Threshold: < 0.60 → SPATIAL_FAILURE deficiency

Deficiency Classification

The library automatically classifies deficiencies based on metric scores:

from document_confidence.models import DeficiencyType

deficiency_types = [
    DeficiencyType.OCR_GAP,        # Text coverage < 0.70
    DeficiencyType.TABLE_MISSING,   # Table completeness < 0.80
    DeficiencyType.SPATIAL_FAILURE, # Density < 0.60
    DeficiencyType.CROSSREF_BROKEN, # Consistency < 0.70
]

Each deficiency includes:

  • Type
  • Severity (0.0 - 1.0)
  • Affected pages
  • Human-readable description

Confidence Bands

Overall scores are mapped to confidence bands:

95-100   EXCELLENT
90-95    GOOD
80-90    FAIR
60-80    POOR
0-60     CRITICAL

Recommendations

Based on overall score and deficiencies:

score >= 90               ACCEPT
80 <= score < 90          HUMAN_REVIEW
score < 80                RECOVER

Custom Metrics

Create custom metrics by extending BaseConfidenceMetric:

from document_confidence.metrics import BaseConfidenceMetric

class CustomMetric(BaseConfidenceMetric):
    def __init__(self, weight: float = 0.10):
        super().__init__(name="custom", weight=weight)
    
    def _compute(self, extraction, page_metadata, parse_results):
        # Custom computation logic
        return 0.9  # Return score between 0.0 and 1.0

Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=document_confidence

Code Style

# Format code
black document_confidence

# Lint code
ruff check document_confidence

# Type check
mypy document_confidence

Design Principles

  1. Deterministic - No external APIs or LLM calls
  2. Computationally Lightweight - O(n) complexity where possible
  3. Extensible - Plugin architecture for custom metrics
  4. Type-safe - Full type hints with Pydantic validation
  5. Production-ready - Enterprise-scale performance

Dependencies

  • document-core>=0.1.0 - Shared interfaces and models
  • pydantic>=2.0 - Data validation
  • typing_extensions>=4.0 - Type extensions

Performance

The library is designed for:

  • 1000+ page documents
  • O(n) metric calculations
  • Minimal memory usage
  • No repeated traversals

License

MIT

Support

For issues, questions, or contributions, please visit the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepsico_document_confidence-0.1.1.tar.gz (18.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pepsico_document_confidence-0.1.1-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file pepsico_document_confidence-0.1.1.tar.gz.

File metadata

File hashes

Hashes for pepsico_document_confidence-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4669d38f90f581dfd86a20f547bb685721a1f9ae39d4dff284411068c44d3766
MD5 e3d9c03a24eee46e5f13db1b2b0c1938
BLAKE2b-256 abf36e080634ea35e0c010ec9803c343b9dd1884eb2602c3f3ff4850f81b73db

See more details on using hashes here.

File details

Details for the file pepsico_document_confidence-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pepsico_document_confidence-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 10aa8258463006eaf92aa9906309277d5cdbc22f4867f31b3e6541a7798f5e39
MD5 09e5f527adc98760f952fd7635086700
BLAKE2b-256 3e326a1aa5cd6d8f2617c7013a56db2f6a6a97b43d076e2d2e356f949c6319cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page