STOP-first, evidence-grounded document extraction with audit-grade negative proof

These details have not been verified by PyPI

Project links

Project description

AJT Grounded Extract

Extract structured data only when it can be proven; otherwise stop—and prove that you stopped.

Status: Production-ready (v1.0) | Constitution: Frozen | Attack Tests: 10/10 blocked

Installation

pip install ajt-grounded-extract

Zero dependencies. Pure Python stdlib.

Philosophy: STOP-first

This project does not aim to extract everything.
Extraction occurs only when evidence is sufficient.
When evidence is insufficient, the system stops and proves why.
Evidence Integrity > Recall: Only extract values with verifiable document evidence
Default: STOP: When evidence is insufficient, conflicting, or missing → stop extraction
Negative Proof: Every STOP includes explicit reason + preserved artifacts
No Fine-tuning: Rule-based + LLM extraction without training pipelines
Local Execution: Runs entirely on local machine

What This Is NOT

This system is blocked-by-design, not secure-by-claim.

❌ Multi-domain rule engine
❌ Enterprise extraction with thresholds
❌ Training/fine-tuning pipeline
❌ High-recall extraction system
❌ "Secure" or "safe" (we demonstrate how attacks are blocked, not claim safety)

What we guarantee:

✅ Stoppability (DEFAULT: STOP)
✅ Traceability (decision_maker required)
✅ Audit trail (write-once logs)

Architecture

Document → Ingest → Extract → Ground → Judge → Archive
           ↓        ↓         ↓        ↓        ↓
           Hash     Candidates Evidence STOP?   Artifacts

Pipeline Stages

Ingest: Load document, compute hash, build line index
Extract: Find candidate values (rule-based or LLM)
Ground: Map each value to exact document span (quote + offsets)
Judge: STOP-first decision: ACCEPT | STOP | NEED_REVIEW
Archive: Write-once artifacts with timestamps + integrity hashes

Decision Taxonomy

ACCEPT: Evidence found, confidence sufficient, integrity verified
STOP: No candidates, conflict, low confidence, or integrity failure
NEED_REVIEW: Edge cases requiring human judgment

Quick Start

Run Extraction

# ACCEPT case (has clear "Effective Date: 01/15/2025")
python run.py examples/accept_example.txt

# STOP case (no explicit effective date)
python run.py examples/stop_example.txt

View Results

Open generated HTML viewer:

open viewer/accept_example_viewer.html
open viewer/stop_example_viewer.html

Output Format

JSON Result

{
  "field_name": "effective_date",
  "decision": "ACCEPT",
  "value": "01/15/2025",
  "evidence": {
    "quote": "01/15/2025",
    "start": 245,
    "end": 255,
    "line": 12,
    "context": "...Effective Date: 01/15/2025..."
  },
  "confidence": 0.9
}

STOP Event

{
  "field_name": "effective_date",
  "decision": "STOP",
  "value": null,
  "stop_reason": "no_candidates_found",
  "stop_proof": {
    "searched": true,
    "candidates_found": 0
  }
}

HTML Viewer Features

Evidence Highlighting: Green (ACCEPT) / Red (STOP)
Navigation Sidebar: Jump to extracted fields
"Why Stopped" Panel: Explicit reasons with proof artifacts
Offset Mapping: Click evidence span → see exact document location

Directory Structure

ajt-grounded-extract/
├── schema/              # Field definitions
├── engine/              # Core extraction modules
│   ├── ingest.py
│   ├── extract.py
│   ├── ground.py
│   ├── judge.py
│   └── archive.py
├── viewer/              # HTML viewer generator
├── evidence/            # Write-once artifacts (JSONL + manifests)
├── examples/            # Demo documents
└── run.py               # CLI entry point

Evidence Requirements

All extractions must satisfy:

✅ require_exact_quote: Value must appear verbatim in document
✅ require_offset_mapping: Quote mapped to byte offsets
✅ stop_on_conflict: Multiple conflicting values → STOP
✅ min_confidence: Below threshold → STOP

Acceptance Criteria

Demo shows at least one ACCEPT and one STOP
STOP includes explicit reason and preserved artifacts
Viewer navigates evidence spans correctly
Non-goals stated explicitly

Regulatory Mapping & Review

This system includes industry-specific regulatory risk mappings for:

Financial Services — Authorization scope, customer isolation, advisory vs execution separation
Healthcare — Patient data isolation, complete clinical evidence requirements, clinician traceability
Legal Practice — Attorney responsibility, client-matter isolation, conflict-of-interest prevention

Navigation: See REGULATORY_REVIEW_GUIDE.md for audience-specific entry points.

Key documents:

REGULATORY_META_MAP.md — Cross-industry risk-control mappings
docs/REG_MAP_FINANCE.md — Financial services mapping
docs/REG_MAP_HEALTHCARE.md — Healthcare mapping
docs/REG_MAP_LEGAL.md — Legal practice mapping
COMPLIANCE_GUIDE.md — Audit artifact generation
ATTACK_TEST.md — Adversarial verification results

Principle: This project demonstrates how specified risks are blocked. It does not claim regulatory compliance.

Reference

Motivated by ajt-negative-proof-sim (sealed reference).

Core principle: Prove extraction succeeded OR prove why you stopped.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.1.0

Jan 6, 2026

This version

1.0.0

Jan 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ajt_grounded_extract-1.0.0.tar.gz (13.0 kB view details)

Uploaded Jan 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ajt_grounded_extract-1.0.0-py3-none-any.whl (15.0 kB view details)

Uploaded Jan 6, 2026 Python 3

File details

Details for the file ajt_grounded_extract-1.0.0.tar.gz.

File metadata

Download URL: ajt_grounded_extract-1.0.0.tar.gz
Upload date: Jan 6, 2026
Size: 13.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ajt_grounded_extract-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`496cec37e7488790f00088b546d33c149d746028d7f548c9ee52077abb4c1a2c`
MD5	`e23497bbc2a3ca142b69f1a7a500dd64`
BLAKE2b-256	`fd9d701e18168aef2d7c44e9695fb044ce57975f8208a0d07231c8b8958f6a53`

See more details on using hashes here.

File details

Details for the file ajt_grounded_extract-1.0.0-py3-none-any.whl.

File metadata

Download URL: ajt_grounded_extract-1.0.0-py3-none-any.whl
Upload date: Jan 6, 2026
Size: 15.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ajt_grounded_extract-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`871300fd01c5c1026e607f01f550062df5efc8e987b3a13532197ff63202156e`
MD5	`2b88e3849f2bdece781d910b7e0a623f`
BLAKE2b-256	`b1d1c384f0e2cfe914f9bc42cdcd9efcec61fc682762850431f60912d31c97c5`

See more details on using hashes here.

ajt-grounded-extract 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AJT Grounded Extract

Installation

Philosophy: STOP-first

What This Is NOT

Architecture

Pipeline Stages

Decision Taxonomy

Quick Start

Run Extraction

View Results

Output Format

JSON Result

STOP Event

HTML Viewer Features

Directory Structure

Evidence Requirements

Acceptance Criteria

Regulatory Mapping & Review

Reference

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes