Reversible PII tokenization for LLM pipelines — send documents to cloud AI without exposing real data

These details have not been verified by PyPI

Project links

Project description

sovereign-vault

Reversible PII tokenization for LLM pipelines.

Send documents containing real names, SSNs, emails, and account numbers to any cloud AI — Claude, Gemini, GPT — without exposing the actual values. The AI reasons about relationships and patterns on placeholder tokens. You reconstruct the real values locally after the response comes back.

pip install sovereign-vault

The problem

You have documents with names, SSNs, emails, and account numbers. You need a cloud AI to analyze patterns, identify anomalies, or summarize findings. But you can't send the raw PII — compliance, legal, or common sense says no.

Standard redaction destroys the data permanently. The AI then can't reason about cross-entity relationships — "the same person appears in both transactions" becomes impossible once everything is [REDACTED].

The solution

Sovereign Vault replaces PII with stable, HMAC-bound tokens per session. The same value always maps to the same token, so AI can track relationships across a document. You reconstruct locally after the cloud call.

from sovereign_vault import VaultSession

with VaultSession() as vault:
    abstract = vault.tokenize(
        "John Doe (SSN: 123-45-6789) transferred funds to "
        "Jane Smith (SSN: 987-65-4321) via john@firm.com on 2024-01-15."
    )
    # abstract:
    # "[[PERSON_A1B2C3D4_e5f6a7]] (SSN: [[SSN_B8C9D0E1_f2a3b4]]) transferred
    #  funds to [[PERSON_F5G6H7I8_j9k0l1]] (SSN: [[SSN_J2K3L4M5_n6o7p8]])
    #  via [[EMAIL_N9O0P1Q2_r3s4t5]] on 2024-01-15."

    response = your_llm_client.complete(abstract)  # cloud sees only tokens

    result = vault.reconstruct(response)  # real values restored locally
    # VaultSession.destroy() called automatically on context exit

No disk writes. No persistence between sessions. The mapping lives in RAM and is wiped on destroy().

Detection layers

Three layers run in sequence. Each is optional — the system never falls below Layer 1 reliability.

Layer	Method	Confidence	Requires
1 — Regex	Deterministic structural patterns	1.0	Nothing (always active)
2 — GLiNER	Probabilistic NLP NER	0.85× model score	`pip install sovereign-vault[ner]`
3 — Ollama	Contextual LLM sweep	0.65	Local Ollama + `pip install sovereign-vault[llm]`

Layer 3 triggers only when GLiNER finds fewer than 3 entities — handles implicit identifiers and role references that regex and NER miss.

Regex catches: SSN, phone, email, IP address, credit card, passport, Michigan DL, court case numbers

GLiNER catches: person names, organizations, locations, addresses, DOB, financial accounts, government IDs, medical record numbers

Ollama catches: contextual identifiers — "the defendant", "Account #XYZ", implicit role-based references

Installation

# Core (regex only — no dependencies)
pip install sovereign-vault

# With NLP entity recognition
pip install sovereign-vault[ner]

# With local LLM sweep (requires Ollama running locally)
pip install sovereign-vault[llm]

# Everything
pip install sovereign-vault[all]

Usage

Basic round-trip

from sovereign_vault import VaultSession

raw = "Alice (alice@corp.com, SSN 123-45-6789) authorized the transfer."

with VaultSession(use_gliner=False, use_ollama=False) as vault:
    abstract = vault.tokenize(raw)
    # Send `abstract` to cloud AI
    cloud_response = call_your_cloud_ai(abstract)
    restored = vault.reconstruct(cloud_response)

LENIENT mode — cloud paraphrased some tokens

with VaultSession(recon_mode=ReconMode.LENIENT) as vault:
    abstract = vault.tokenize(raw)
    cloud_response = call_cloud(abstract)
    # Won't raise even if cloud dropped or paraphrased some tokens
    restored = vault.reconstruct(cloud_response)

SEALED mode — abstract output only, no reconstruction

with VaultSession(seal_mode=SealMode.SEALED) as vault:
    abstract = vault.tokenize(raw)
    # Reconstruction is intentionally disabled
    # Use when the abstract output IS the final product

Audit log — chain of custody, no real values

vault = VaultSession()
vault.tokenize(raw)
for entry in vault.audit_log():
    print(entry["label"], entry["source_layer"], entry["confidence"])
vault.destroy()

Multi-session / server use

from sovereign_vault import new_session, get_session, drop_session

sid, vault = new_session()
abstract = vault.tokenize(raw)
# ... pass sid to the next step in your pipeline ...
vault2 = get_session(sid)
restored = vault2.reconstruct(cloud_output)
drop_session(sid)  # destroys and deregisters

Security model

RAM-only, session-scoped — no disk writes, no persistence between sessions
HMAC-bound tokens — each token carries an HMAC tag derived from a 32-byte session secret; tampered or injected tokens raise VaultSealBreach
Injection prevention — input containing pre-existing [[...]] vault token format is rejected immediately
Entropy leak detection — reconstruct() flags high-entropy tokens in cloud output that may be inferred identifiers
Best-effort memory wipe — destroy() overwrites real values with random bytes before clearing

Reconstruction modes

Mode	Behavior
`ReconMode.STRICT` (default)	Raises `VaultReconstructionDegraded` if cloud dropped any vault token
`ReconMode.LENIENT`	Allows partial reconstruction — logs missing tokens as warnings
`SealMode.SEALED`	Disables reconstruction entirely — raises `VaultSealBreach` if attempted

Use cases

Forensic e-discovery — send document patterns to cloud AI without exposing real names or case numbers
HIPAA pipelines — analyze medical records cross-entity without raw patient identifiers leaving your perimeter
Financial fraud detection — transaction pattern analysis without raw account numbers
Gov/defense document processing — reason about relationships in sensitive case files
Cross-agent PII passing — sanitize data moving between local and cloud agents in an agentic pipeline

Part of the LexiPro Sovereign OS

Sovereign Vault is a component of LexiPro — a local-first agentic OS running 15 MCP servers, 228 tools, and 20 agent personas on sovereign hardware. In the full OS, it powers Workflow O (Privacy Bridge): tokenize before any cloud call, reconstruct locally after, audit trail preserved.

Anthropic Claude — Tier 5 reasoning backbone for multi-file analysis
Google Gemini — OSINT, research, and long-context processing
Ollama — Layer 3 local LLM sweep (Gemma, Llama) for contextual entity detection
GLiNER — Layer 2 NLP NER for named entity recognition

Contributing

Issues and PRs welcome. The detection layer system is designed for extension — add new regex patterns to REGEX_PATTERNS, new GLiNER entity types to _GLINER_TYPES, or swap the Ollama model via ollama_model parameter.

License

MIT — see LICENSE.

Built by Broken Arrow Entertainment LLC · Sovereign Intelligence Systems Group

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.1

May 6, 2026

This version

1.0.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sovereign_vault-1.0.0.tar.gz (15.9 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sovereign_vault-1.0.0-py3-none-any.whl (11.5 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file sovereign_vault-1.0.0.tar.gz.

File metadata

Download URL: sovereign_vault-1.0.0.tar.gz
Upload date: May 6, 2026
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sovereign_vault-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ceaa211c39e168e6e8365a236a393de0de38494d86d7938473e1891e1e6872d2`
MD5	`ff44bd22d7bf54651f11d08229812641`
BLAKE2b-256	`dd43bb28f210afeec70e09fd5088e7de8ecff8c3421cf775d0101625133900e1`

See more details on using hashes here.

File details

Details for the file sovereign_vault-1.0.0-py3-none-any.whl.

File metadata

Download URL: sovereign_vault-1.0.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 11.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sovereign_vault-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8080bea201502331890e94ec2fbaaa1ffae115f1ec66d372549d98576e733bc1`
MD5	`de161a2184099eb5fbaa4b9013ae3e3f`
BLAKE2b-256	`a542f15b64944a0a0eebbc0a953036fc551c0401eeb013240ea9ee6f8245b161`

See more details on using hashes here.

sovereign-vault 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sovereign-vault

The problem

The solution

Detection layers

Installation

Usage

Basic round-trip

LENIENT mode — cloud paraphrased some tokens

SEALED mode — abstract output only, no reconstruction

Audit log — chain of custody, no real values

Multi-session / server use

Security model

Reconstruction modes

Use cases

Part of the LexiPro Sovereign OS

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes