Skip to main content

CLI for creating PII-safe Excel test fixtures

Project description

sheetmask

Turn a real Excel file into a safe test fixture — fake names, fake numbers, real structure.

Install

pip install git+https://github.com/daniel-butler/sheetmask.git
uv add git+https://github.com/daniel-butler/sheetmask.git

Quickstart

  1. Run analyze on your file. It prints a prompt describing the columns and sample data — copy it.
sheetmask analyze "Q4 Expense Report.xlsx"
  1. Paste the prompt into Claude or ChatGPT. Save the config it returns:
# q4_expense_config.py
from sheetmask import PercentageVarianceRule, PreserveRelationshipRule

config = {
    "version": "1.0.0",
    "sheets_to_keep": ["Expenses"],
    "entity_columns": {
        "Employee Name": "PERSON",
        "Department": "ORGANIZATION",
        "Manager": "PERSON",
    },
    "numeric_rules": {
        "Reimbursement": PercentageVarianceRule(variance_pct=0.2),
        "Net Amount": PreserveRelationshipRule(
            formula="context['Reimbursement'] - context['Deduction']",
            dependent_columns=["Reimbursement", "Deduction"],
        ),
    },
    "preserve_columns": ["Date", "Category"],
}
  1. Run process. The output lands beside the original.
sheetmask process "Q4 Expense Report.xlsx" --config q4_expense_config.py
# Output: Q4 Expense Report_SYNTHETIC.xlsx

Reference

Entity types

Each unique value maps to the same fake value throughout the file, so relationships between rows stay intact.

Type Generates
PERSON Full name
PERSON_FIRST_NAME First name only
PERSON_LAST_NAME Last name only
ORGANIZATION Company name
EMAIL_ADDRESS Email address
PHONE_NUMBER Phone number
PROJECT_NAME Project name
LOCATION City, State

Numeric rules

PercentageVarianceRule replaces each value with a random number within a band of the original. Use it for independent figures.

"Headcount": PercentageVarianceRule(variance_pct=0.15)
# 100 becomes a random number between 85 and 115.

PreserveRelationshipRule derives a value from other already-anonymized columns. Use it wherever one column is computed from others, so the arithmetic stays consistent.

"Gross Margin": PreserveRelationshipRule(
    formula="context['Revenue'] - context['Cost']",
    dependent_columns=["Revenue", "Cost"],
)
# Gross Margin will always equal anonymized Revenue minus anonymized Cost.

All commands

Command Description
sheetmask analyze <file> Analyze file and print LLM prompt
sheetmask analyze <file> -o prompt.txt Save LLM prompt to a file
sheetmask analyze-multi f1 f2 f3 Analyze multiple files for shared schema patterns
sheetmask process <file> --config config.py Anonymize file using config
sheetmask process <file> out.xlsx --config config.py --seed 42 Write to named output with fixed random seed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheetmask-0.1.0.tar.gz (77.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sheetmask-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file sheetmask-0.1.0.tar.gz.

File metadata

  • Download URL: sheetmask-0.1.0.tar.gz
  • Upload date:
  • Size: 77.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for sheetmask-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3cf2bb3a7b045847addfd57abd73206e144a92dc4892116becb7210f7552a19c
MD5 5d3eae5f68c9cf4c89d2bbb39f273fb5
BLAKE2b-256 eab92b12e618b3156b20d3816e8d07ce41fbf226ac49e1e2c766312e408cd3d3

See more details on using hashes here.

File details

Details for the file sheetmask-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sheetmask-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for sheetmask-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08b6f97b414e87a72bbc7201ffe1dce28c8cf16219e604083e79d10612dfe511
MD5 9d94dcf9c2095891922f45e67017a0a0
BLAKE2b-256 9009546783c722757cdf23abff1fd5021e9c565b7abca58ca42d87f879488af0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page