Give AI a blindfold before it sees your data. HMAC-SHA256 PII sanitization for DataFrames.
Project description
venus-pii
Give AI a blindfold before it sees your data.
Every developer calling an LLM API is sending user data to someone else's server. This library lets you blindfold the AI with one line of code — before the data ever leaves your machine.
from venus_pii import sanitize
safe_df = sanitize(df)
# "张三" → "PERSON_a3f8c21e" (HMAC, irreversible without your key)
# "110101200001011234" → [REMOVED] (ID card, blocked entirely)
# "85" → "85" (score, passed through)
Your data. Your key. Your rules.
Why
| What happens today | What venus-pii does |
|---|---|
| You send raw names, phone numbers, IDs to ChatGPT/Claude/Gemini | HMAC-SHA256 tokenization — AI sees PERSON_a3f8c21e, never "张三" |
| You trust the provider's privacy policy | You trust your own code — data is masked before it leaves your machine |
| No way to prove what AI saw | Every masking operation is logged with SHA-256 hash chain |
Install
pip install venus-pii
Three Protection Levels
| Level | What happens | Example |
|---|---|---|
| BLOCK | Column removed entirely. AI never sees it. | ID cards, bank accounts |
| MASK | Values replaced with HMAC tokens. Reversible only with your key. | Names, phones, emails |
| PASS | No change. Non-sensitive data passes through. | Scores, dates, categories |
Usage
Basic — One line
import polars as pl
from venus_pii import sanitize
df = pl.DataFrame({
"name": ["Alice", "Bob"],
"ssn": ["123-45-6789", "987-65-4321"],
"score": [95, 88],
})
result = sanitize(df)
print(result.sanitized_df)
# name: PERSON_7a3f..., PERSON_b2e9...
# ssn: [column removed]
# score: 95, 88
print(result.blocked_columns) # ["ssn"]
print(result.token_maps.keys()) # ["name"]
Restore after processing
from venus_pii import sanitize, restore
result = sanitize(df)
safe_df = result.sanitized_df
# ... send safe_df to LLM, get results back ...
original_df = restore(safe_df, result.token_maps)
# "PERSON_7a3f..." → "Alice"
Custom HMAC key
export VENUS_PII_KEY="my-secret-enterprise-key"
Same name + same key = same token (deterministic across sessions). Different key = different token (multi-tenant isolation).
Detect without masking
from venus_pii import detect
reports = detect(df)
for r in reports:
print(f"{r.column_name}: {r.category} ({r.level})")
# name: name (mask)
# ssn: id_card (block)
# score: none (pass)
Supported PII Categories
| Category | Detection | Level | Contribute? |
|---|---|---|---|
| Names (Chinese/English) | Column name pattern | MASK | #1 |
| Phone numbers | Regex 1[3-9]\d{9} |
MASK | #2 |
| ID cards (China) | Regex 18-digit | BLOCK | #3 |
Regex *@*.* |
MASK | ||
| Addresses | Column name pattern | MASK | |
| Salary/Income | Column name + band mapping | MASK | |
| Bank accounts | Column name pattern | BLOCK | |
| Japanese names | — | — | Wanted! #10 |
| Korean names | — | — | Wanted! #11 |
| Medical records | — | — | Wanted! #12 |
| GDPR categories | — | — | Wanted! #13 |
| US SSN | — | — | Wanted! #14 |
Every new PII detector is one function + one PR. See CONTRIBUTING.md.
How HMAC tokenization works
"张三" + secret_key
→ HMAC-SHA256 → a3f8c21e78b4...
→ "PERSON_a3f8c21e"
- Deterministic: same input + same key = same token (join tables still work)
- Irreversible: can't recover "张三" from "PERSON_a3f8c21e" without the reverse map
- Isolated: different key = completely different tokens (multi-tenant safe)
The reverse map (token_maps) stays on your machine. The LLM never sees it.
Part of the Venus Protocol
venus-pii is the privacy layer of Venus — a white-box AI data processing engine.
The Venus philosophy: AI is a beautiful goddess, but she must have severed arms. Humans control what AI can touch, what it can see, and what it can do — through auditable, reversible, white-box constraints.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file venus_pii-0.1.0.tar.gz.
File metadata
- Download URL: venus_pii-0.1.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dc58ae86c958b0a4e35b72cf40d40688f46407c8b0f2e3c7c1b02959a8fdc34
|
|
| MD5 |
6fd2e9ecfe72e5d7ddd5f574f7e6d357
|
|
| BLAKE2b-256 |
aeca0949c234ef445e37e20f3a05df6abe78e79b927fc542d97505b59098fcd2
|
File details
Details for the file venus_pii-0.1.0-py3-none-any.whl.
File metadata
- Download URL: venus_pii-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
810b9f96f073227829e93303aadbe936185e6f9596faa0ef483cfb5ea563c9b0
|
|
| MD5 |
d9164da44e345348d00b963c6741dd6a
|
|
| BLAKE2b-256 |
f3621cf489556e446845e2405d03ad8aef50f6a6f1be7f5f3c79b255b9479642
|