Reversible PII tokenization for LLM pipelines — send documents to cloud AI without exposing real data

These details have not been verified by PyPI

Project links

Project description

sovereign-vault

Reversible PII tokenization for LLM pipelines.

Send documents containing real names, SSNs, emails, and account numbers to any cloud AI — Claude, Gemini, GPT — without exposing the actual values. The AI reasons about relationships and patterns on placeholder tokens. You reconstruct the real values locally after the response comes back.

pip install sovereign-vault

The problem

You have documents with names, SSNs, emails, and account numbers. You need a cloud AI to analyze patterns, identify anomalies, or summarize findings. But you can't send the raw PII — compliance, legal, or common sense says no.

Standard redaction destroys the data permanently. The AI then can't reason about cross-entity relationships — "the same person appears in both transactions" becomes impossible once everything is [REDACTED].

The solution

Sovereign Vault replaces PII with stable, HMAC-bound tokens per session. The same value always maps to the same token, so AI can track relationships across a document. You reconstruct locally after the cloud call.

from sovereign_vault import VaultSession

with VaultSession() as vault:
    abstract = vault.tokenize(
        "John Doe (SSN: 123-45-6789) transferred funds to "
        "Jane Smith (SSN: 987-65-4321) via john@firm.com on 2024-01-15."
    )
    # abstract:
    # "[[PERSON_A1B2C3D4_e5f6a7]] (SSN: [[SSN_B8C9D0E1_f2a3b4]]) transferred
    #  funds to [[PERSON_F5G6H7I8_j9k0l1]] (SSN: [[SSN_J2K3L4M5_n6o7p8]])
    #  via [[EMAIL_N9O0P1Q2_r3s4t5]] on 2024-01-15."

    response = your_llm_client.complete(abstract)  # cloud sees only tokens

    result = vault.reconstruct(response)  # real values restored locally
    # VaultSession.destroy() called automatically on context exit

No disk writes. No persistence between sessions. The mapping lives in RAM and is wiped on destroy().

Detection layers

Three layers run in sequence. Each is optional — the system never falls below Layer 1 reliability.

Layer	Method	Confidence	Requires
1 — Regex	Deterministic structural patterns	1.0	Nothing (always active)
2 — GLiNER	Probabilistic NLP NER	0.85× model score	`pip install sovereign-vault[ner]`
3 — Ollama	Contextual LLM sweep	0.65	Local Ollama + `pip install sovereign-vault[llm]`

Layer 3 triggers only when GLiNER finds fewer than 3 entities — handles implicit identifiers and role references that regex and NER miss.

Regex catches: SSN, phone, email, IP address, credit card, passport, Michigan DL, court case numbers

GLiNER catches: person names, organizations, locations, addresses, DOB, financial accounts, government IDs, medical record numbers

Ollama catches: contextual identifiers — "the defendant", "Account #XYZ", implicit role-based references

Installation

# Core (regex only — no dependencies)
pip install sovereign-vault

# With NLP entity recognition
pip install sovereign-vault[ner]

# With local LLM sweep (requires Ollama running locally)
pip install sovereign-vault[llm]

# Everything
pip install sovereign-vault[all]

Usage

Basic round-trip

from sovereign_vault import VaultSession

raw = "Alice (alice@corp.com, SSN 123-45-6789) authorized the transfer."

with VaultSession(use_gliner=False, use_ollama=False) as vault:
    abstract = vault.tokenize(raw)
    # Send `abstract` to cloud AI
    cloud_response = call_your_cloud_ai(abstract)
    restored = vault.reconstruct(cloud_response)

LENIENT mode — cloud paraphrased some tokens

with VaultSession(recon_mode=ReconMode.LENIENT) as vault:
    abstract = vault.tokenize(raw)
    cloud_response = call_cloud(abstract)
    # Won't raise even if cloud dropped or paraphrased some tokens
    restored = vault.reconstruct(cloud_response)

SEALED mode — abstract output only, no reconstruction

with VaultSession(seal_mode=SealMode.SEALED) as vault:
    abstract = vault.tokenize(raw)
    # Reconstruction is intentionally disabled
    # Use when the abstract output IS the final product

Audit log — chain of custody, no real values

vault = VaultSession()
vault.tokenize(raw)
for entry in vault.audit_log():
    print(entry["label"], entry["source_layer"], entry["confidence"])
vault.destroy()

Multi-session / server use

from sovereign_vault import new_session, get_session, drop_session

sid, vault = new_session()
abstract = vault.tokenize(raw)
# ... pass sid to the next step in your pipeline ...
vault2 = get_session(sid)
restored = vault2.reconstruct(cloud_output)
drop_session(sid)  # destroys and deregisters

Security model

RAM-only, session-scoped — no disk writes, no persistence between sessions
HMAC-bound tokens — each token carries an HMAC tag derived from a 32-byte session secret; tampered or injected tokens raise VaultSealBreach
Injection prevention — input containing pre-existing [[...]] vault token format is rejected immediately
Entropy leak detection — reconstruct() flags high-entropy tokens in cloud output that may be inferred identifiers
Best-effort memory wipe — destroy() overwrites real values with random bytes before clearing

Reconstruction modes

Mode	Behavior
`ReconMode.STRICT` (default)	Raises `VaultReconstructionDegraded` if cloud dropped any vault token
`ReconMode.LENIENT`	Allows partial reconstruction — logs missing tokens as warnings
`SealMode.SEALED`	Disables reconstruction entirely — raises `VaultSealBreach` if attempted

Use cases

Forensic e-discovery — send document patterns to cloud AI without exposing real names or case numbers
HIPAA pipelines — analyze medical records cross-entity without raw patient identifiers leaving your perimeter
Financial fraud detection — transaction pattern analysis without raw account numbers
Gov/defense document processing — reason about relationships in sensitive case files
Cross-agent PII passing — sanitize data moving between local and cloud agents in an agentic pipeline

Part of the LexiPro Sovereign OS

Sovereign Vault is a component of LexiPro — a local-first agentic OS running 15 MCP servers, 228 tools, and 20 agent personas on sovereign hardware. In the full OS, it powers Workflow O (Privacy Bridge): tokenize before any cloud call, reconstruct locally after, audit trail preserved.

Anthropic Claude — Tier 5 reasoning backbone for multi-file analysis
Google Gemini — OSINT, research, and long-context processing
Ollama — Layer 3 local LLM sweep (Gemma, Llama) for contextual entity detection
GLiNER — Layer 2 NLP NER for named entity recognition

Contributing

Issues and PRs welcome. The detection layer system is designed for extension — add new regex patterns to REGEX_PATTERNS, new GLiNER entity types to _GLINER_TYPES, or swap the Ollama model via ollama_model parameter.

Known Limitations

Limitation	Impact	Mitigation
RAM-only storage	Vault lost if process crashes mid-pipeline	Call `vault.destroy()` in a `finally` block; checkpoint vault keys externally if needed
Probabilistic NER (GLiNER/Ollama)	Novel PII formats may not be detected	Use `coverage_report()` after tokenize to assess detection quality
Regex layer only on plain text	HTML entities, encoded chars may slip through	Pre-normalize input with `html.unescape()` before tokenizing
Session-scoped tokens	Same real value gets different token in different sessions	Design your pipeline to tokenize once per document, not per chunk
Not a legal compliance layer	Sovereign Vault assists compliance; it cannot replace legal review	Combine with your organization's data classification policy

Comparison: Sovereign Vault vs. alternatives

Feature	sovereign-vault	Microsoft Presidio	AWS Comprehend PII	Simple regex redaction
Reversible tokenization	Yes	No (replace only)	No	No
HMAC integrity on tokens	Yes	No	No	No
Offline capable	Yes (regex layer)	Partial	No (API)	Yes
Named entity detection	Yes (GLiNER + Ollama)	Yes (spaCy)	Yes (cloud)	No
STRICT mode audit trail	Yes	No	No	No
Cloud cost	$0 (local)	$0 (local)	Per-call	$0
Setup complexity	pip install	pip + models + server	AWS credentials	None

Compliance Disclaimer

Sovereign Vault is a technical tool that assists with PII handling in LLM pipelines. It is not a legal compliance product and does not constitute legal advice.

GDPR / HIPAA / CCPA: Tokenizing PII before sending it to a cloud model reduces exposure but does not by itself satisfy the requirements of any data protection regulation. Your compliance obligations depend on your specific use case, data classification, and organizational policies. Consult qualified legal counsel before deploying in regulated environments.

What sovereign-vault does:

Replaces PII with HMAC-bound tokens so cloud AI never receives raw values
Provides an audit trail of all vaulted entities (no real values in log)
Wipes vault from RAM on destroy() call

What sovereign-vault does NOT do:

Guarantee detection of all PII in all languages and formats
Provide legal indemnification or certification
Replace a data classification policy or DPO review
Encrypt data at rest (vault is RAM-only by design)

License

MIT — see LICENSE.

Built by Broken Arrow Entertainment LLC · Sovereign Intelligence Systems Group

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

May 6, 2026

1.0.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sovereign_vault-1.0.1.tar.gz (18.5 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sovereign_vault-1.0.1-py3-none-any.whl (13.1 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file sovereign_vault-1.0.1.tar.gz.

File metadata

Download URL: sovereign_vault-1.0.1.tar.gz
Upload date: May 6, 2026
Size: 18.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sovereign_vault-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`70b19f2e6a0e5091978901f1ad71fb7d49538cedf1e166c18dd33227334c0476`
MD5	`e6d5ebc06c76d37e9ad8f01cca16dbab`
BLAKE2b-256	`4dceee398944768d6e8b0da016c714b968f92b87026272e5ed38eb82c2e71cdd`

See more details on using hashes here.

File details

Details for the file sovereign_vault-1.0.1-py3-none-any.whl.

File metadata

Download URL: sovereign_vault-1.0.1-py3-none-any.whl
Upload date: May 6, 2026
Size: 13.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sovereign_vault-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc5e3921bcebf8d469e2c80b5128ed95311deaac26a1478ca5eb2c7fe2c8913d`
MD5	`9732089bcf3a70b07ae589e15edc9983`
BLAKE2b-256	`6025b505c861b351866e8ac86935140beb02758d8718b80541683c389dcfea97`

See more details on using hashes here.

sovereign-vault 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sovereign-vault

The problem

The solution

Detection layers

Installation

Usage

Basic round-trip

LENIENT mode — cloud paraphrased some tokens

SEALED mode — abstract output only, no reconstruction

Audit log — chain of custody, no real values

Multi-session / server use

Security model

Reconstruction modes

Use cases

Part of the LexiPro Sovereign OS

Contributing

Known Limitations

Comparison: Sovereign Vault vs. alternatives

Compliance Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes