Reversible PII anonymization framework for LLM data pipelines

These details have not been verified by PyPI

Project links

Project description

carnaval

The art of masking: concealing identity, preserving the essentials.

carnaval is a reversible Python framework for text-document anonymization. It masks sensitive entities (people, organizations, emails, phone numbers, bank identifiers, etc.) before sending them to a cloud LLM, and restores the original values in the structured response (JSON or XML) on the way back.

Status: Stable (Beta) - v0.2.3

License: Apache 2.0
Stack: Python 3.11 / 3.12 / 3.13, GLiNER (zero-shot NER), regex, AES-256-GCM, PyMuPDF
No external PII framework (no Presidio, no spaCy NER)
184 tests passing, ~95% coverage, mypy-checked, CI on every push
Used internally in production at one enterprise (anonymization of supplier acknowledgments before LLM extraction). Public API may evolve until v1.0.

Installation

Standard Installation (from PyPI)

pip install carnaval

Development / Local Source Installation

# 1. Clone the repository
git clone <repo>
cd carnaval

# 2. Set up virtual environment
python -m venv .venv
source .venv/bin/activate       # Linux/macOS
# or: .\.venv\Scripts\activate  # Windows

# 3. Install in editable mode
pip install -e .

Quick Start

1. Configuration

Create and edit your .env file to set your vault encryption password:

cp .env.example .env
# Edit .env and set CARNAVAL_VAULT_PASSWORD=<32+ characters>

2. Anonymization

Anonymize a document using one of the pre-configured business profiles:

python anonymize.py inbox/my_document.txt --profile acknowledge

3. Reinjection

Restore the original sensitive data back into the LLM's response (e.g. JSON/XML structure):

python reinject.py response_llm.json --vault outbox/vault/my_document_vault.enc

7-Stage Architecture

Raw TXT --> S1 Intake
        --> S2 Preprocess (language, normalization)
        --> S3 Detect (regex + denylist + GLiNER)
        --> S4 Resolve (dedup, arbitration)
        --> S5 Mask (placeholders + encrypted vault)
        --> S6 Output (6 formats: txt/json/jsonl/xml/conll/html)

JSON/XML --> S7 Reinject --> JSON/XML with original values

Out-of-the-box Business Profiles

Profile	Document Type
`acknowledge`	Supplier order acknowledgment
`invoice`	Invoice / professional fee note
`email`	B2B professional email

Private profiles (real client data) in profiles_private/ (git-ignored).

Documentation

Doc	Topic
docs/00_overview.md	Overview, principles
docs/01_architecture_etages.md	The 7 stages in detail
docs/02_install.md	Installation
docs/03_deploiement_production.md	Production
docs/04_configuration.md	YAML config + profiles
docs/05_extension_listes.md	Adding entities to mask
docs/06_extension_recognizers.md	Coding a new recognizer
docs/07_securite.md	Vault, password, audit
docs/08_format_entree_sortie.md	Supported formats
docs/09_troubleshooting.md	Common errors
docs/10_api_reference.md	Python API

Tests

pytest                          # all (except slow)
pytest -m slow                  # real AI tests (downloads GLiNER ~500 MB)
pytest --cov=src/carnaval       # coverage

Examples

You can find programmatic library usage examples in the examples/ directory:

examples/quickstart_api.py: A simple, commented python script that walks through using the library programmatically to anonymize data and reinject original values back into simulated LLM output.

Contributing

Contributions are welcome! Please read CONTRIBUTING.md and our CODE_OF_CONDUCT.md before getting started.

Issues and PRs: Welcome! Please ensure no personal or client data is included in public fixtures (use fictitious entities like Acme Corp, Globex, Initech, etc.).
Security Policy: For reporting security vulnerabilities, please check SECURITY.md to report responsibly via email.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.3

Jun 2, 2026

0.2.2

Jun 1, 2026

0.2.1

May 31, 2026

0.2.0

May 31, 2026

0.1.2

May 16, 2026

0.1.1

May 16, 2026

0.1.0 yanked

May 16, 2026

Reason this release was yanked:

Broken packaging - pip install non-functional, use 0.1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carnaval-0.2.3.tar.gz (1.0 MB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

carnaval-0.2.3-py3-none-any.whl (1.0 MB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file carnaval-0.2.3.tar.gz.

File metadata

Download URL: carnaval-0.2.3.tar.gz
Upload date: Jun 2, 2026
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for carnaval-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`0ec56aaebb79ba7b5024f22763e39144535b2592bff832f3f04fe5f0cc1bbc53`
MD5	`77c806732b9c9cc20fdf8b4fb9607111`
BLAKE2b-256	`34f74b56013f7725df8eab818d4fe768e54b1eb3b33c08103ab598699a03b835`

See more details on using hashes here.

File details

Details for the file carnaval-0.2.3-py3-none-any.whl.

File metadata

Download URL: carnaval-0.2.3-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 1.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for carnaval-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b9eb1253cbcfbff108c638cbc09b55de260106beb2ebf3d3afb2bcfaf55a8b06`
MD5	`44a2c491f5215f3c8c0311d8627ea1dd`
BLAKE2b-256	`99476b5f1e56d2ab85542a9c3fb0509da7643aaeeef88b9d1cce8a975bff00f9`

See more details on using hashes here.

carnaval 0.2.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

carnaval

Status: Stable (Beta) - v0.2.3

Installation

Standard Installation (from PyPI)

Development / Local Source Installation

Quick Start

1. Configuration

2. Anonymization

3. Reinjection

7-Stage Architecture

Out-of-the-box Business Profiles

Documentation

Tests

Examples

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes