Contract-driven deterministic normalization engine for Python.
Project description
Paxman
Contract-driven deterministic normalization engine for Python.
Paxman transforms arbitrary input (PDFs, scans, emails, spreadsheets, APIs, free text) into evidence-backed, replayable normalized artifacts conforming to caller-supplied contracts (Pydantic, JSON Schema, OpenAPI, or a built-in Dict DSL).
from decimal import Decimal
from pydantic import BaseModel
import paxman
# IMPORTANT: import the adapter(s) you need so they self-register.
# Pydantic is an optional extra; the core package ships the registry
# but not the adapters themselves.
import paxman.contract.adapters.pydantic # noqa: F401 (triggers self-registration)
import paxman.contract.adapters.dict_dsl # noqa: F401
# Caller-owned contract (Pydantic example)
class Invoice(BaseModel):
supplier_name: str
total_amount: float
currency_code: str
line_items: list[LineItem]
# Normalize raw input against the contract
result = paxman.normalize(
input_data=raw_invoice_bytes,
contract=Invoice,
budget=paxman.Budget(max_total_cost_usd=Decimal("0.10")), # Decimal per ADR-0004
policy=paxman.Policy(allow_remote_inference=True),
)
# Inspect or consume
print(result.normalized_data) # matches the Invoice shape
print(result.unresolved_fields) # any fields Paxman could not resolve
print(result.replay_hash) # deterministic signature for replay
# Replay later from the artifact alone
rehydrated = paxman.replay(result, contract=Invoice)
assert rehydrated == result # byte-equal
Why Paxman?
- Contract-driven. You bring the contract. Paxman doesn't own your schema.
- Field-centric, deterministic planning. Each required field gets its own plan.
- Evidence-backed. Every resolved value carries provenance and confidence.
- Replayable. Rehydrate the artifact without recomputation.
- Honest. Unresolved fields are explicit, never silent.
What Paxman is NOT
- Not a workflow engine.
- Not a general-purpose agent framework.
- Not a RAG framework.
- Not a persistence layer.
- Not a schema registry.
- Not a standard library.
- Not a domain ontology.
If you need any of these, wrap Paxman from the outside (see §When to use Paxman vs When to wrap Paxman below).
When to use Paxman vs When to wrap Paxman
Paxman is a library that produces an evidence-backed, replayable normalized artifact. Use Paxman directly when your problem is one of the following:
- You have arbitrary input (text, PDF, JSON, HTML) that needs to be normalized against a caller-owned contract (Pydantic / JSON Schema / OpenAPI / Dict DSL).
- You need evidence-backed normalization — every resolved value carries provenance, and every step is auditable.
- You need replay — the ability to rehydrate a stored artifact without re-running the pipeline.
- You need field-centric confidence — different fields can have different confidence, and the Reconciler grades the candidates with a single, fixed rubric.
- You are integrating into a service (or a SaaS) that needs auditable normalization without owning a normalization engine.
Wrap Paxman from the outside when your problem is one of the following:
- You need a workflow engine (DAG of long-running tasks, retries, human-in-the-loop, …). Wrap Paxman in a workflow engine.
- You need a general-purpose agent framework (multi-turn reasoning, tool use, planning across many turns). Wrap Paxman behind an agent's tool call.
- You need a RAG framework (vector search, retrieval, ranking). Wrap Paxman behind a RAG pipeline; the contract becomes the structured extraction step.
- You need a persistence layer (database, ORM, migration tooling). Wrap Paxman in a service that stores the artifact.
- You need a schema registry (catalog of contracts, versioning of contracts, governance). Wrap Paxman in a registry.
- You need a standard library (general-purpose data transformation). Paxman is opinionated about evidence, replay, and confidence; it is not a general-purpose library.
- You need a domain ontology (taxonomy, classification, knowledge graph). Wrap Paxman behind an ontology lookup.
In short: Paxman is the normalization step in a larger system. It is not the larger system. If you find yourself wanting to add workflow, persistence, or agentic features to Paxman itself, that is a signal to wrap Paxman from the outside.
Install
pip install paxman # core (no adapters)
pip install paxman[pydantic] # + Pydantic adapter
pip install paxman[all] # + all V1 adapters
Paxman is in pre-release (v0.x). Public API may change between minor versions until 1.0.
Documentation
| Doc | Purpose |
|---|---|
| PRD.md | Product vision, philosophy, V1 success metrics and acceptance criteria. |
| ARCHITECTURE.md | Subsystem design, sequence diagram, error model, versioning, observability. |
| PACKAGE_STRUCTURE.md | Module layout, dependency DAG, public/private API split, packaging. |
| GLOSSARY.md | Single source of truth for Paxman vocabulary. |
| V1_ACCEPTANCE_CRITERIA.md | Definition of done for the 1.0 release. |
| REPLAY_AND_DETERMINISM.md | Deep dive on replay and determinism. |
| SECURITY.md | Threat model, PII handling, provider secrets, vulnerability reporting. |
| TESTING_STRATEGY.md | Test seams, property tests, replay tests, fixtures. |
| docs/TEST_DATA.md | Test data policy, dataset catalog, licensing rules. |
| DEVELOPMENT.md | Local dev setup, common tasks, release process. |
| EXTENDING.md | How to add a new contract adapter, capability, or inference provider. |
| DEPENDENCIES.md | Core vs optional dependencies, packaging policy. |
| docs/adr/ | Architecture Decision Records. |
| docs/concepts/ | Conceptual docs (contracts, capabilities, planning, reconciliation, replay, MIGRATION_GUIDE). |
| docs/howto/ | Quick-start how-tos (add adapter, add capability, add inference provider, replay artifact). |
| CONTRIBUTING.md | Contribution workflow + ADR-driven process. |
| CODE_OF_CONDUCT.md | Community standards (Contributor Covenant v2.1). |
| CHANGELOG.md | Release notes. |
Quickstart (5 minutes)
Note: Paxman V1 is in pre-release. The quickstart below is verified end-to-end in CI (see
.github/workflows/ci.yml). For a full migration walkthrough (e.g. from LlamaIndex, LangChain, or a hand-rolled pipeline), seedocs/concepts/MIGRATION_GUIDE.md.
1. Install
pip install paxman[pydantic]
2. Define a contract (Pydantic)
from pydantic import BaseModel, Field
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
class Invoice(BaseModel):
supplier_name: str = Field(..., description="The supplier's name.")
total_amount: float = Field(..., description="Total invoice amount.")
currency_code: str = Field(..., description="ISO-4217 currency code.")
line_items: list[LineItem] = Field(default_factory=list)
3. Normalize raw input
import paxman
# IMPORTANT: import the adapter(s) you need so they self-register.
# Pydantic is an optional extra; the core package ships the registry
# but not the adapters themselves.
import paxman.contract.adapters.pydantic # noqa: F401
import paxman.contract.adapters.dict_dsl # noqa: F401
raw_invoice = """
ACME Corp
Invoice #1234
Total: $1,234.56 USD
- Widget: 2 @ $500.00
- Gadget: 1 @ $234.56
"""
artifact = paxman.normalize(
input_data=raw_invoice,
contract=Invoice,
)
print(artifact.status) # Status.SUCCESS or Status.PARTIAL_SUCCESS
print(artifact.normalized_data) # {"supplier_name": "ACME Corp", ...}
print(artifact.unresolved_fields) # [] (or list of fields Paxman could not resolve)
print(artifact.replay_hash) # "a3f8..."
4. Replay
# Later, with just the artifact and the contract
rehydrated = paxman.replay(artifact, contract=Invoice)
assert rehydrated == artifact # byte-equal
Examples
Paxman ships with 3 reference examples covering the 3 target personas.
Each is a standalone mini-package. Clone the repo, cd into the
example, and run it.
Backend service (Persona A: backend developer)
A minimal FastAPI service exposing POST /normalize for contract-driven
normalization. Accepts raw text input, returns structured
evidence-backed JSON with a deterministic replay hash.
- Path:
examples/backend_service/ - What it demonstrates: Pydantic contract, REST endpoint, replay hash, unresolved fields
cd examples/backend_service
uv pip install -e "../../[pydantic]" -e ".[dev]"
uvicorn backend_service.app:app --reload --port 8000
AI agent ingest (Persona B: AI engineer)
A stdlib-only agent tool-calling loop that invokes paxman.normalize()
as a tool. Zero framework dependencies. Port the NormalizeTool to
LangChain, LlamaIndex, or any custom agent.
- Path:
examples/ai_agent_ingest/ - What it demonstrates: Agent tool loop, framework-agnostic design, evidence-backed extraction
cd examples/ai_agent_ingest
uv pip install -e ".[dev]"
uv run python -m ai_agent_ingest
SaaS procurement pipeline (Persona C: SaaS team)
A CSV-batch invoice/quotation pipeline. Reads a manifest of raw input files, normalizes each against a Pydantic contract, writes artifacts to disk, and verifies cross-run replay-hash reproducibility.
- Path:
examples/saas_procurement/ - What it demonstrates: Batch normalization, on-disk artifact storage, replay-hash determinism (D10.7 fixture)
cd examples/saas_procurement
uv pip install -e ".[dev]"
uv run python -m saas_procurement data/manifest.csv output/
Use cases
Paxman is designed for:
- Invoice/quotation/procurement normalization — compare offers across suppliers and currencies.
- Agentic ingestion flows — auditable, evidence-backed extraction for RAG or agent pipelines.
- Document understanding services — wrap Paxman inside a SaaS without giving up replay or evidence.
- Multi-source data pipelines — normalize email, OCR, CSV, and API inputs into one canonical schema.
See PRD.md §7 Primary Use Cases for detailed examples.
Status
- v0.0.0 (Sprint 6) — Shipped: Full pipeline — contract adaptation, planning, execution, reconciliation, artifact, and public API (
paxman.normalize(),paxman.replay()). - v0.0.0 + Sprint 7 — Shipped:
paxman.testing(Hypothesis strategies), golden artifacts, end-to-end integration tests, per-subsystem coverage thresholds. - v0.0.0 + Sprint 8 — In progress: Documentation site (
docs/concepts/,docs/howto/), community files (CONTRIBUTING.md,CODE_OF_CONDUCT.md), CI hardening (pyright, interrogate, bandit, pip-audit), 9-checkmake ci. - v0.1.0 (initial preview): planner + one adapter + one capability work end-to-end. (Pending.)
- v0.5.0 (feature-complete beta): 80% of V1 features. (Pending.)
- 1.0.0: All V1 acceptance criteria met. (Pending.)
Install (developer setup, Sprint 1)
Paxman uses uv for package management. The first preview is not published to PyPI yet; developers install the project from a working tree.
# Clone the repository
git clone https://github.com/nexusnv/paxman.git
cd paxman
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install the package + all dev dependencies (editable)
uv sync --all-extras --dev
# Verify the install
uv run python -c "import paxman; print(f'paxman {paxman.__version__}')"
Expected output: paxman 0.0.0.
Local CI
Run the full local-CI pipeline (the same checks run on GitHub Actions):
make ci
This runs, in order: install-frozen → lint → format-check → typecheck → typecheck-pyright → imports → docs-check → security → test-cov. All 9 checks must pass before opening a PR. Each check is also runnable individually (e.g. make lint, make typecheck, make docs-check, make security).
Project structure
paxman/
├── src/paxman/ # the package (src-layout)
│ ├── __init__.py # exposes __version__ + public API
│ ├── py.typed # PEP 561 marker
│ ├── errors.py # PaxmanError hierarchy
│ ├── types.py # Status, ConfidenceBand, FieldType enums
│ ├── protocols.py # internal Protocol definitions
│ ├── versioning.py # version constants and helpers
│ ├── logging.py # structlog factory (no timestamps in replay)
│ ├── budget.py # Budget, Policy, CurrencyPolicy
│ ├── clock.py # injectable Clock + FakeClock
│ ├── ids.py # prefixed ID helpers
│ ├── serialization.py # stable JSON encoder (RFC 8785-style)
│ ├── contract/ # adapter + validation (4 formats → CanonicalContract)
│ ├── planner/ # rule-based field-centric planning
│ ├── capabilities/ # 5 V1 capabilities (text/regex/lookup/inference/validation)
│ ├── executor/ # sequential execution + budget tracking
│ ├── reconciler/ # truth resolution + confidence + MONEY
│ ├── artifact/ # ExecutionArtifact + replay hash + diagnostics
│ ├── api/ # public API (normalize, replay, register_*)
│ └── testing/ # public Hypothesis strategies (paxman.testing)
├── tests/ # pytest test suite (unit / property / integration / public_api)
├── docs/ # design specs, ADRs, sprint plan, concepts, howtos
├── pyproject.toml # PEP 621 metadata + tooling config
├── Makefile # `make ci`, `make test`, `make build`, …
├── .pre-commit-config.yaml
├── .github/ # workflows + issue/PR templates
├── LICENSE # MIT (per ADR-0008)
├── CONTRIBUTING.md # contribution workflow + ADR-driven process
├── CODE_OF_CONDUCT.md # Contributor Covenant v2.1
└── CHANGELOG.md # release notes
See V1_ACCEPTANCE_CRITERIA.md for the full definition of done.
Contributing
We welcome contributions of all sizes — from typo fixes to new subsystems. See CONTRIBUTING.md for the contribution workflow and the ADR-driven process.
For local development setup, see DEVELOPMENT.md. For extension guides (adding a new contract adapter, capability, or inference provider), see EXTENDING.md.
Significant architectural changes require an ADR; see docs/adr/README.md. Community standards are in CODE_OF_CONDUCT.md.
License
MIT. See LICENSE. Per ADR-0008, MIT is the chosen license for V1. Apache-2.0 is the documented alternative if patent concerns emerge (see docs/specs/license-decision.md for the full trade-off analysis).
See also
- PRD.md — start here for the product vision
- GLOSSARY.md — vocabulary
- REPLAY_AND_DETERMINISM.md — replay model
- SECURITY.md — threat model
- Paxman Website — the official project site
- NexusNV Website — the people behind Paxman
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paxman-1.0.0.tar.gz.
File metadata
- Download URL: paxman-1.0.0.tar.gz
- Upload date:
- Size: 922.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18975ca3314688724bd98cc2e51908c525a0e498eeafff38198b1b20751433c7
|
|
| MD5 |
dc75a935c1adcecb53fd67b67873a701
|
|
| BLAKE2b-256 |
825d67033334e3c8f4a10cd7f3cc7ba04065f11ce6ca8c28c15a51a228de604a
|
Provenance
The following attestation bundles were made for paxman-1.0.0.tar.gz:
Publisher:
release.yml on nexusnv/paxman
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paxman-1.0.0.tar.gz -
Subject digest:
18975ca3314688724bd98cc2e51908c525a0e498eeafff38198b1b20751433c7 - Sigstore transparency entry: 1977619143
- Sigstore integration time:
-
Permalink:
nexusnv/paxman@a443d52ac4611073e1f0019819c7aaab39d981d6 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/nexusnv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a443d52ac4611073e1f0019819c7aaab39d981d6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file paxman-1.0.0-py3-none-any.whl.
File metadata
- Download URL: paxman-1.0.0-py3-none-any.whl
- Upload date:
- Size: 212.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64738453af1ebc3f72e7388759b3af807bfa7e2abf9953ef51e23f2d71d6e93a
|
|
| MD5 |
612f997ff17cf143c8cd4b3d9f2485f1
|
|
| BLAKE2b-256 |
52f380e9ab65293f5f194abdba3318ce88d99956281c3665ba7665cd07d5b8e7
|
Provenance
The following attestation bundles were made for paxman-1.0.0-py3-none-any.whl:
Publisher:
release.yml on nexusnv/paxman
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paxman-1.0.0-py3-none-any.whl -
Subject digest:
64738453af1ebc3f72e7388759b3af807bfa7e2abf9953ef51e23f2d71d6e93a - Sigstore transparency entry: 1977619179
- Sigstore integration time:
-
Permalink:
nexusnv/paxman@a443d52ac4611073e1f0019819c7aaab39d981d6 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/nexusnv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a443d52ac4611073e1f0019819c7aaab39d981d6 -
Trigger Event:
push
-
Statement type: