Skip to main content

Deterministic migration-risk analysis for OCR-based invoice extraction workloads.

Project description

Invoice Migration Analyzer

Local CLI for deterministic migration-risk analysis of OCR-based invoice extraction workloads.

What it does / what it does not do

Answers one question: "Can this workload safely tolerate a cheaper LLM under operational risk constraints?"

Not an eval harness, benchmarking suite, model quality scorer, or generic document analyzer. Validation is source-supported only — OCR text is the supporting evidence, never ground truth.

Classification ladder

A row is labeled by the worst status across total_amount, currency, and invoice_date. vendor_name is warning-only.

Label Trigger
RISK Any critical field has no source support, or unparseable extraction.
REVIEW_AMBIGUOUS Any critical field has multiple competing source candidates.
REVIEW_INFERRED Any critical field matches by tolerance/normalization, not literally.
SAFE Every critical field has direct literal source support and zero ambiguity.

SAFE invariant: both direct source support and zero unresolved ambiguity must hold on all critical fields. Any deviation demotes the row. REVIEW_INFERRED (tolerance/normalization match) never maps to SAFE — enforced by policy and by a runtime guard.

Guarantee on the shipped corpus: false-SAFE count: 0 across 200 rows.

Installation

pip install -e ".[fuzzy]"
python corpus\generate_corpus.py --full

Usage

invoice-analyzer run ^
  --sample          PATH    (default: corpus\sample.jsonl) ^
  --keys            PATH    (default: corpus\keys.json) ^
  --output          DIR     (default: .\output) ^
  --cache           DIR     (optional, enables replay cache) ^
  --baseline-cost   FLOAT   (default: 0.015) ^
  --candidate-cost  FLOAT   (default: 0.002) ^
  --volume          INT     (default: 50000) ^
  --max-rows        INT     (default: 1000, hard cap: 1000) ^
  --no-detail ^
  --baseline-model  STR     (default: baseline) ^
  --candidate-model STR     (default: candidate)

invoice-analyzer version

Output files

  • output\report.md — human-readable report with decision summary, label table, conservative vs optimistic cost projection, and per-row evidence.
  • output\report.html — same content rendered as a single self-contained HTML page.
  • output\raw_results.jsonl — one JSON object per row with label, per-field statuses, evidence strings, reasons, cache flag, and any error.

Cost model

Two scenarios. Conservative: only SAFE rows migrate to the cheaper model; rest stays baseline. Optimistic: also routes REVIEW_INFERRED and REVIEW_AMBIGUOUS to the cheaper model. Conservative is the planning figure. Optimistic is an upper bound contingent on a human-review pipeline absorbing the review classes.

Running tests

pytest tests\ -v --tb=short --basetemp=.\pytest-tmp

The --basetemp=.\pytest-tmp workaround is required on Windows due to AppData\Temp permission constraints in some environments.

Corpus

200 adversarial rows. Base 100 (seed 42) + expansion 100 (seed 142). The 8 base attack vectors:

  • multi_occurrence — same total appears in multiple labeled positions.
  • ocr_near_miss — total garbled by O/0, l/1, spacing artifacts.
  • inferred_equivalence — extracted matches by tolerance only (e.g. 300.00 vs 300).
  • multi_currency — two currencies present in source.
  • date_collision — invoice/due/PO dates all parseable and distinct.
  • low_ocr_quality — heavily corrupted OCR throughout.
  • correct_clean — well-formed invoice with single supporting evidence.
  • repeated_total — total repeats but supporting amounts disagree.

Expansion adds 11 further vectors (european amount format, ambiguous date format, symbol-only currency, single-line OCR, vendor edge cases, amount-in-words, amount rounded in source, relative dates, currency implied not stated, amount/date/currency collisions, plus a clean control). Every adversarial example exists to attack SAFE credibility.

Hard limits / scope

  • Invoices only.
  • Single-turn JSON extraction.
  • Local CLI. No SaaS. No Docker. No UI. No telemetry.
  • One baseline vs one candidate model.
  • 1000 row hard cap.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

invoice_analyzer-0.1.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

invoice_analyzer-0.1.0-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file invoice_analyzer-0.1.0.tar.gz.

File metadata

  • Download URL: invoice_analyzer-0.1.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for invoice_analyzer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 94ff180343890dc9a45e2e2d433122463546635ac97dbcf1f3ebaf3659802c6e
MD5 a579ac262b661dc8b0604fb54445f200
BLAKE2b-256 2dba863c1cd129d048c0227cb5eaba49929dca4a9711656498eb29fac7552b2c

See more details on using hashes here.

File details

Details for the file invoice_analyzer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for invoice_analyzer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e09d6c2b8abc9cf1a9b1387ab054101dd34fe49fd77b1fa599a82002d0cc854
MD5 c79a991827808c1e96f99714786f4405
BLAKE2b-256 56bf674fd7378c4024597995b2d4cf8eb262e100a0d33bf848b6db54be070306

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page