Proxy-based confidence scoring framework for extraction quality assessment
Project description
document-confidence
A proxy-based confidence scoring framework for evaluating extraction quality without requiring ground truth labels. This library provides a production-ready architecture for assessing document extraction results and routing them to appropriate recovery workflows.
Overview
document-confidence implements a multi-metric scoring system that evaluates extraction quality across multiple dimensions. The library operates deterministically without requiring external APIs, LLM calls, or cloud dependencies, making it suitable for high-throughput document processing pipelines.
Architecture
The library follows a modular architecture with:
- Metric-based Scoring - Independent metrics for different quality dimensions
- Weighted Aggregation - Configurable weights for metric prioritization
- Deficiency Classification - Automatic detection of specific failure types
- Recommendation Engine - Routing decisions for accept/recover/human review
- Protocol-based Interfaces - Type-safe contracts for extensibility
Installation
pip install document-confidence
Optional Dependencies
# For development
pip install document-confidence[dev]
Quick Start
from document_confidence import (
ConfidenceConfig,
ConfidenceScorer,
RecommendationType,
)
from document_confidence.metrics import (
TextCoverageMetric,
TableCompletenessMetric,
SchemaFillMetric,
ConsistencyMetric,
DensityMetric,
)
# Configure confidence scoring
config = ConfidenceConfig(
acceptance_threshold=90.0,
human_review_threshold=80.0,
weights={
"text_coverage": 0.25,
"table_completeness": 0.20,
"schema_fill": 0.25,
"consistency": 0.15,
"density": 0.15,
},
)
# Create metrics
metrics = [
TextCoverageMetric(),
TableCompletenessMetric(),
SchemaFillMetric(),
ConsistencyMetric(),
DensityMetric(),
]
# Initialize scorer
scorer = ConfidenceScorer(config, metrics)
# Score extraction
report = scorer.score(
extraction=extracted_data,
page_metadata=page_metadata,
parse_results=parse_results,
)
# Get recommendation
if report.recommendation == RecommendationType.ACCEPT:
print("Extraction accepted")
elif report.recommendation == RecommendationType.HUMAN_REVIEW:
print("Requires human review")
else:
print("Recovery needed")
Configuration
Confidence Configuration
from document_confidence import ConfidenceConfig
config = ConfidenceConfig(
acceptance_threshold=90.0,
human_review_threshold=80.0,
weights={
"text_coverage": 0.25,
"table_completeness": 0.20,
"schema_fill": 0.25,
"consistency": 0.15,
"density": 0.15,
},
enable_metric_explanations=True,
normalize_weights=True,
strict_schema_validation=True,
minimum_metric_score=0.50,
)
Metrics
Text Coverage Metric
Detects OCR failures by measuring text coverage.
from document_confidence.metrics import TextCoverageMetric
metric = TextCoverageMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)
Formula: extracted_text_length / expected_text_length
Threshold: < 0.70 → OCR_GAP deficiency
Table Completeness Metric
Detects missing tables by measuring table completeness.
from document_confidence.metrics import TableCompletenessMetric
metric = TableCompletenessMetric(weight=0.20)
score = metric.compute(extraction, page_metadata, parse_results)
Formula: recovered_cells / expected_cells
Threshold: < 0.80 → TABLE_MISSING deficiency
Schema Fill Metric
Measures extraction completeness by checking schema fill.
from document_confidence.metrics import SchemaFillMetric
metric = SchemaFillMetric(weight=0.25)
score = metric.compute(extraction, page_metadata, parse_results)
Formula: required_fields_populated / required_fields_total
Consistency Metric
Detects internal contradictions in extraction.
from document_confidence.metrics import ConsistencyMetric
metric = ConsistencyMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)
Checks:
- Duplicate shelf numbers
- Same UPC with different names
- Missing cross-reference mappings
- Orphan products
Formula: 1 - error_rate
Threshold: < 0.70 → CROSSREF_BROKEN deficiency
Density Metric
Detects implausible shelf layouts by measuring density.
from document_confidence.metrics import DensityMetric
metric = DensityMetric(weight=0.15)
score = metric.compute(extraction, page_metadata, parse_results)
Checks:
- Products per shelf (too sparse or too dense)
- Facings distribution
- Section density
Threshold: < 0.60 → SPATIAL_FAILURE deficiency
Deficiency Classification
The library automatically classifies deficiencies based on metric scores:
from document_confidence.models import DeficiencyType
deficiency_types = [
DeficiencyType.OCR_GAP, # Text coverage < 0.70
DeficiencyType.TABLE_MISSING, # Table completeness < 0.80
DeficiencyType.SPATIAL_FAILURE, # Density < 0.60
DeficiencyType.CROSSREF_BROKEN, # Consistency < 0.70
]
Each deficiency includes:
- Type
- Severity (0.0 - 1.0)
- Affected pages
- Human-readable description
Confidence Bands
Overall scores are mapped to confidence bands:
95-100 → EXCELLENT
90-95 → GOOD
80-90 → FAIR
60-80 → POOR
0-60 → CRITICAL
Recommendations
Based on overall score and deficiencies:
score >= 90 → ACCEPT
80 <= score < 90 → HUMAN_REVIEW
score < 80 → RECOVER
Custom Metrics
Create custom metrics by extending BaseConfidenceMetric:
from document_confidence.metrics import BaseConfidenceMetric
class CustomMetric(BaseConfidenceMetric):
def __init__(self, weight: float = 0.10):
super().__init__(name="custom", weight=weight)
def _compute(self, extraction, page_metadata, parse_results):
# Custom computation logic
return 0.9 # Return score between 0.0 and 1.0
Development
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=document_confidence
Code Style
# Format code
black document_confidence
# Lint code
ruff check document_confidence
# Type check
mypy document_confidence
Design Principles
- Deterministic - No external APIs or LLM calls
- Computationally Lightweight - O(n) complexity where possible
- Extensible - Plugin architecture for custom metrics
- Type-safe - Full type hints with Pydantic validation
- Production-ready - Enterprise-scale performance
Dependencies
document-core>=0.1.0- Shared interfaces and modelspydantic>=2.0- Data validationtyping_extensions>=4.0- Type extensions
Performance
The library is designed for:
- 1000+ page documents
- O(n) metric calculations
- Minimal memory usage
- No repeated traversals
License
MIT
Support
For issues, questions, or contributions, please visit the project repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pepsico_document_confidence-0.1.1.tar.gz.
File metadata
- Download URL: pepsico_document_confidence-0.1.1.tar.gz
- Upload date:
- Size: 18.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4669d38f90f581dfd86a20f547bb685721a1f9ae39d4dff284411068c44d3766
|
|
| MD5 |
e3d9c03a24eee46e5f13db1b2b0c1938
|
|
| BLAKE2b-256 |
abf36e080634ea35e0c010ec9803c343b9dd1884eb2602c3f3ff4850f81b73db
|
File details
Details for the file pepsico_document_confidence-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pepsico_document_confidence-0.1.1-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10aa8258463006eaf92aa9906309277d5cdbc22f4867f31b3e6541a7798f5e39
|
|
| MD5 |
09e5f527adc98760f952fd7635086700
|
|
| BLAKE2b-256 |
3e326a1aa5cd6d8f2617c7013a56db2f6a6a97b43d076e2d2e356f949c6319cc
|