Skip to main content

Give AI a blindfold before it sees your data. HMAC-SHA256 PII sanitization for DataFrames.

Project description

venus-pii

Give AI a blindfold before it sees your data.

Every developer calling an LLM API is sending user data to someone else's server. This library lets you blindfold the AI with one line of code — before the data ever leaves your machine.

from venus_pii import sanitize

safe_df = sanitize(df)
# "张三" → "PERSON_a3f8c21e"    (HMAC, irreversible without your key)
# "110101200001011234" → [REMOVED]  (ID card, blocked entirely)
# "85" → "85"                      (score, passed through)

Your data. Your key. Your rules.


Why

What happens today What venus-pii does
You send raw names, phone numbers, IDs to ChatGPT/Claude/Gemini HMAC-SHA256 tokenization — AI sees PERSON_a3f8c21e, never "张三"
You trust the provider's privacy policy You trust your own code — data is masked before it leaves your machine
No way to prove what AI saw Every masking operation is logged with SHA-256 hash chain

Install

pip install venus-pii

Three Protection Levels

Level What happens Example
BLOCK Column removed entirely. AI never sees it. ID cards, bank accounts
MASK Values replaced with HMAC tokens. Reversible only with your key. Names, phones, emails
PASS No change. Non-sensitive data passes through. Scores, dates, categories

Usage

Basic — One line

import polars as pl
from venus_pii import sanitize

df = pl.DataFrame({
    "name": ["Alice", "Bob"],
    "ssn": ["123-45-6789", "987-65-4321"],
    "score": [95, 88],
})

result = sanitize(df)
print(result.sanitized_df)
# name: PERSON_7a3f..., PERSON_b2e9...
# ssn: [column removed]
# score: 95, 88

print(result.blocked_columns)   # ["ssn"]
print(result.token_maps.keys()) # ["name"]

Restore after processing

from venus_pii import sanitize, restore

result = sanitize(df)
safe_df = result.sanitized_df

# ... send safe_df to LLM, get results back ...

original_df = restore(safe_df, result.token_maps)
# "PERSON_7a3f..." → "Alice"

Custom HMAC key

export VENUS_PII_KEY="my-secret-enterprise-key"

Same name + same key = same token (deterministic across sessions). Different key = different token (multi-tenant isolation).

Detect without masking

from venus_pii import detect

reports = detect(df)
for r in reports:
    print(f"{r.column_name}: {r.category} ({r.level})")
# name: name (mask)
# ssn: id_card (block)
# score: none (pass)

Supported PII Categories

Category Detection Level Contribute?
Names (Chinese/English) Column name pattern MASK #1
Phone numbers Regex 1[3-9]\d{9} MASK #2
ID cards (China) Regex 18-digit BLOCK #3
Email Regex *@*.* MASK
Addresses Column name pattern MASK
Salary/Income Column name + band mapping MASK
Bank accounts Column name pattern BLOCK
Japanese names Wanted! #10
Korean names Wanted! #11
Medical records Wanted! #12
GDPR categories Wanted! #13
US SSN Wanted! #14

Every new PII detector is one function + one PR. See CONTRIBUTING.md.

How HMAC tokenization works

"张三" + secret_key
    → HMAC-SHA256 → a3f8c21e78b4...
    → "PERSON_a3f8c21e"
  • Deterministic: same input + same key = same token (join tables still work)
  • Irreversible: can't recover "张三" from "PERSON_a3f8c21e" without the reverse map
  • Isolated: different key = completely different tokens (multi-tenant safe)

The reverse map (token_maps) stays on your machine. The LLM never sees it.

Part of the Venus Protocol

venus-pii is the privacy layer of Venus — a white-box AI data processing engine.

The Venus philosophy: AI is a beautiful goddess, but she must have severed arms. Humans control what AI can touch, what it can see, and what it can do — through auditable, reversible, white-box constraints.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

venus_pii-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

venus_pii-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file venus_pii-0.1.0.tar.gz.

File metadata

  • Download URL: venus_pii-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for venus_pii-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3dc58ae86c958b0a4e35b72cf40d40688f46407c8b0f2e3c7c1b02959a8fdc34
MD5 6fd2e9ecfe72e5d7ddd5f574f7e6d357
BLAKE2b-256 aeca0949c234ef445e37e20f3a05df6abe78e79b927fc542d97505b59098fcd2

See more details on using hashes here.

File details

Details for the file venus_pii-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: venus_pii-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for venus_pii-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 810b9f96f073227829e93303aadbe936185e6f9596faa0ef483cfb5ea563c9b0
MD5 d9164da44e345348d00b963c6741dd6a
BLAKE2b-256 f3621cf489556e446845e2405d03ad8aef50f6a6f1be7f5f3c79b255b9479642

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page