Static anonymizer for plain SQL dumps (PostgreSQL, SQLite, SQL Server).

These details have not been verified by PyPI

Project description

Dumpling

Dumpling is a static anonymizer for plain SQL dumps. It supports PostgreSQL (pg_dump plain format), SQLite (.dump), and SQL Server / MSSQL (SSMS / mssql-scripter output). It lets you safely share, test with, or store database snapshots by replacing sensitive column data according to configurable rules — without ever touching a live database.

Why Dumpling?

No live database required. Works entirely on dump files; nothing connects to your database.
Streaming and memory-efficient. Processes dumps line by line, so even multi-gigabyte files stay manageable.
Fail-safe by default. If no configuration is found, Dumpling exits non-zero and tells you exactly where it looked. Silence is never mistaken for success.
Deterministic anonymization. Domain mappings ensure the same source value always produces the same pseudonym, keeping foreign-key relationships intact across tables.
CI/CD ready. --check mode, strict-coverage enforcement, JSON reports, and residual-PII scan gates plug cleanly into any pipeline.
Flexible configuration. Rules live in a .dumplingconf file or directly in pyproject.toml — no extra tooling needed.

Install

Rust (from source)

cargo build --release
./target/release/dumpling --help

Python / pip (`dumpling-cli`)

Dumpling is also published as a pip-installable CLI package:

pip install dumpling-cli

Or install from local source (requires maturin as PEP 517 backend):

pip install .

After install the CLI command is the same:

dumpling --help

Usage

dumpling -i dump.sql -o sanitized.sql           # read from file, write to file
dumpling -i dump.sql --in-place                 # overwrite the input file (atomic swap)
cat dump.sql | dumpling > sanitized.sql         # stream from stdin to stdout
dumpling -i dump.sql -c .dumplingconf           # use explicit config path
dumpling --check -i dump.sql                    # exit 1 if changes would occur, no output
dumpling --stats -i dump.sql -o out.sql         # print summary to stderr
dumpling --report report.json -i dump.sql       # write detailed JSON report of changes/drops
dumpling --strict-coverage --report report.json -i dump.sql --check  # fail on uncovered sensitive columns
dumpling --scan-output --report report.json -i dump.sql               # scan transformed output for residual PII-like patterns
dumpling --scan-output --fail-on-findings --report report.json -i dump.sql --check  # fail if scan thresholds are exceeded
dumpling --include-table '^public\\.' -i dump.sql -o out.sql
dumpling --exclude-table '^audit\\.' -i dump.sql -o out.sql
dumpling --allow-ext dmp -i data.dmp            # restrict processing to specific extensions
dumpling --allow-noop -i dump.sql -o out.sql    # explicitly allow no-op when config is missing
dumpling --format sqlite -i data.db.sql -o out.sql  # process a SQLite .dump file
dumpling --format mssql  -i backup.sql -o out.sql   # process a SQL Server plain-SQL dump
dumpling --security-profile hardened -i dump.sql -o sanitized.sql  # hardened CSPRNG + HMAC mode
dumpling lint-policy                          # lint the anonymization policy config
dumpling lint-policy --config .dumplingconf   # lint with explicit config path

Configuration is loaded in this order:

--config <path> if provided
.dumplingconf in the current directory
pyproject.toml [tool.dumpling] section

If no configuration is found, Dumpling fails closed by default and exits non-zero. The error output lists every checked location. Use --allow-noop to explicitly permit no-op behavior.

Configuration (TOML)

Both .dumplingconf and [tool.dumpling] inside pyproject.toml use the same schema:

# Optional global salt for strategies that support it (e.g. hash)
# Prefer env-backed secret references over plaintext.
salt = "${DUMPLING_GLOBAL_SALT}"

# Rules are keyed by either "table" or "schema.table"
[rules."public.users"]
email = { strategy = "faker", faker = "internet::SafeEmail", domain = "customer_identity", unique_within_domain = true }
name  = { strategy = "faker", faker = "name::Name", locale = "de_de" }   # German-locale name
ssn   = { strategy = "hash", salt = "${env:DUMPLING_USERS_SSN_SALT}", as_string = true }   # SHA-256 of original (salted)
age   = { strategy = "int_range", min = 18, max = 90 }

[rules."orders"]
credit_card = { strategy = "redact", as_string = true }

# Optional explicit sensitive columns policy list (for strict coverage)
[sensitive_columns]
"public.users" = ["employee_number", "tax_id"]

[output_scan]
# optional allowlist; if omitted, all built-in categories are enabled
enabled_categories = ["email", "ssn", "pan", "token"]
default_threshold = 0
default_severity = "high"
fail_on_severity = "low"
sample_limit_per_category = 5

[output_scan.thresholds]
email = 0
ssn = 0
pan = 0
token = 0

[output_scan.severities]
email = "medium"
ssn = "high"
pan = "critical"
token = "high"

Anonymization strategies

Strategy	Description
`null`	Set field to SQL `NULL`
`redact`	Replace with `REDACTED` (string)
`uuid`	Random UUIDv4-like string
`hash`	SHA-256 hex of original value; supports per-column `salt` and global `salt`
`faker`	Values from the Rust `fake` crate (docs.rs, `faker` modules), chosen by a string identifier only (`faker = "module::Type"`, e.g. `internet::SafeEmail`). Config is data only: nothing from TOML is compiled or executed as Rust at runtime. Use `locale` for locale-aware generators; optional `min`/`max`, `length`, `format` as documented. Unsupported targets fail at config load. New generators require a new Dumpling release (or your own fork), not config-side code.
`phone`	Locale-aware fake phone number (configurable via `locale`); defaults to English format
`int_range`	Random integer in `[min, max]`
`string`	Random alphanumeric string (`length = 12` by default)
`date_fuzz`	Shifts a date by a random number of days in `[min_days, max_days]` (defaults: `-30..30`)
`time_fuzz`	Shifts a time-of-day by a random number of seconds in `[min_seconds, max_seconds]` with 24h wraparound (defaults: `-300..300`)
`datetime_fuzz`	Shifts a timestamp/timestamptz by a random number of seconds in `[min_seconds, max_seconds]` (defaults: `-86400..86400`)

faker reference (upstream fake crate): Dumpling’s faker = "module::Type" strings mirror the Rust fake crate’s faker module layout. Use these when picking or extending generators:

docs.rs — fake crate root (overview, Fake / Dummy traits, locales)
docs.rs — fake::faker module index (per-domain submodules: address, internet, name, …)
GitHub — cksac/fake-rs (source, README with the CLI’s generator name list)

Secret references

Dumpling resolves secret references in string config fields so plaintext salts/keys never need to be committed to version control.

Syntax	Description
`${ENV_VAR}`	Value of environment variable `ENV_VAR`
`${env:ENV_VAR}`	Value of environment variable `ENV_VAR` (explicit provider prefix)
`${file:/path/to/secret}`	Contents of a file (trailing newlines stripped); works with Docker Swarm secrets, Kubernetes mounted secrets, and Vault Agent injected files

Missing env references and unreadable/empty files fail fast with a non-zero startup error that includes the config path.
Plaintext salt values still work for backwards compatibility, but Dumpling prints a startup warning because plaintext secrets are insecure.

# .dumplingconf — keep salts out of source control
salt = "${DUMPLING_GLOBAL_SALT}"

[rules."public.users"]
ssn   = { strategy = "hash", salt = "${env:DUMPLING_USERS_SSN_SALT}" }
email = { strategy = "hash", salt = "${file:/run/secrets/dumpling_email_salt}" }

# Local dev
export DUMPLING_GLOBAL_SALT='local-dev-salt'
export DUMPLING_USERS_SSN_SALT='users-ssn-salt'
dumpling --input dump.sql --check

# CI (injected from your secret store)
export DUMPLING_GLOBAL_SALT="$CI_DUMPLING_GLOBAL_SALT"
export DUMPLING_USERS_SSN_SALT="$CI_DUMPLING_USERS_SSN_SALT"
dumpling --input dump.sql --check --strict-coverage --report coverage.json

# Docker / Kubernetes (file-mounted secrets)
# salt = "${file:/run/secrets/dumpling_hmac_key}" in .dumplingconf
# secret mounted at /run/secrets/dumpling_hmac_key by the orchestrator
dumpling --security-profile hardened --input dump.sql --check

Common column options

as_string: if true, forces the anonymized value to be rendered as a quoted SQL string literal. By default Dumpling preserves the original quoting where possible.
domain: deterministic mapping domain. When set, the same source value always maps to the same pseudonym inside that domain (across tables/columns). SQL NULL inputs are always preserved as NULL — a null FK reference has no source value to map, so no pseudonym is fabricated.
unique_within_domain: when true, different source values are assigned unique pseudonyms within the configured domain. NULL values are unaffected and always remain NULL.
min_days / max_days: used by date_fuzz.
min_seconds / max_seconds: used by time_fuzz and datetime_fuzz.
locale: selects the language/regional format for the faker and phone strategies. Supported values: en, fr_fr, de_de, it_it, pt_br, pt_pt, ar_sa, zh_cn, zh_tw, ja_jp, cy_gb. Defaults to en when not specified.
faker: required when strategy = "faker". A plain string "module::Type" (case-insensitive) that maps to a built-in generator compiled into Dumpling—not arbitrary Rust or expressions. Names follow fake::faker (e.g. internet::SafeEmail → faker::internet::SafeEmail in the crate).
format: used with faker = "number::NumberWithFormat"; pattern uses # (0–9) and ^ (1–9) per the fake crate docs.

Note: table_options are no longer supported; use explicit rules and optional column_cases.

Strict coverage

--strict-coverage enforces that all detected sensitive columns have an explicit anonymization rule.

Sensitive columns are detected via:

Built-in column-name heuristics (the same patterns used by auto-detection).
Explicit lists under [sensitive_columns].

A column is considered covered only when it has an explicit rules entry or at least one column_cases entry. When strict coverage fails, Dumpling exits non-zero and reports the uncovered columns.

Coverage reporting

When --report <file> is used, the JSON output includes:

sensitive_columns_detected
sensitive_columns_covered
sensitive_columns_uncovered
deterministic_mapping_domains (columns configured with deterministic domain mapping)
output_scan (when --scan-output is enabled), including category counts and sample locations

CI gate pattern

dumpling --input dump.sql --check --strict-coverage --report coverage.json

This command exits non-zero if:

Data changes/drops are detected (--check semantics), or
Strict coverage finds uncovered sensitive columns.

Residual PII scan

dumpling \
  --input dump.sql \
  --check \
  --scan-output \
  --fail-on-findings \
  --report scan-report.json

--scan-output scans the transformed output for built-in detector categories:

email: email-address-like strings
ssn: U.S. SSN-like values
pan: payment-card-like numbers (Luhn validated)
token: common secret/token formats (JWT, AWS access key IDs, GitHub PAT prefixes, etc.)

When --fail-on-findings is set, Dumpling exits non-zero if any configured category exceeds its threshold and meets the configured severity gate.

Input format

Dumpling processes plain-text SQL dump files from multiple sources. Use --format to select the dialect (default: postgres).

PostgreSQL (`--format postgres`)

Produced by pg_dump --format=plain. Handles:

INSERT INTO schema.table (col1, col2, ...) VALUES (...), (...), ...;
COPY schema.table (col1, col2, ...) FROM stdin; ... \. (tab-delimited with \N as NULL)
"double-quoted" identifiers
''-escaped string literals

Binary, custom, and directory formats from pg_dump are not supported — use --format=plain when running pg_dump.

SQLite (`--format sqlite`)

Produced by the SQLite CLI .dump command or equivalent. Handles:

Standard INSERT INTO table (col1, ...) VALUES (...);
INSERT OR REPLACE INTO table (...) VALUES (...);
INSERT OR IGNORE INTO table (...) VALUES (...);
"double-quoted" identifiers
''-escaped string literals

The OR REPLACE / OR IGNORE variant keyword is preserved verbatim in the output.

SQL Server / MSSQL (`--format mssql`)

Produced by SSMS "Script Table as → INSERT To", mssql-scripter, or similar tools. Handles:

INSERT INTO [schema].[table] ([col1], [col2], ...) VALUES (...), ...;
[bracket]-quoted identifiers (stripped to unquoted names in output)
N'...' Unicode string literals (the N prefix is transparently discarded; value is preserved)
nvarchar(n) and nchar(n) column-length declarations (used to truncate generated values)
''-escaped string literals

Row filtering

You can retain or delete rows for specific tables using explicit predicate lists.

If retain is non-empty, a row is kept only if it matches at least one predicate.
Regardless of retain, a row is dropped if it matches any predicate in delete.

Supported predicate operators:

Operator	Description
`eq` / `neq`	String compare (case-insensitive if `case_insensitive = true`)
`in` / `not_in`	List of values (string compare)
`like` / `ilike`	SQL-like patterns (`%` and `_`)
`regex` / `iregex`	Rust regex (`iregex` is case-insensitive)
`lt` / `lte` / `gt` / `gte`	Numeric compare (values parsed as numbers)
`is_null` / `not_null`	No value needed

Predicates can target nested JSON values using dot notation (payload.profile.tier) or Django-style notation (payload__profile__tier). For JSON arrays, path segments are evaluated against each element, so list-of-dicts structures can be matched naturally.

[row_filters."public.users"]
retain = [
  { column = "country", op = "eq",  value = "US" },
  { column = "email",   op = "ilike", value = "%@myco.com" },
  { column = "profile.flags.plan", op = "eq", value = "gold" }
]
delete = [
  { column = "is_admin", op = "eq", value = "true" },
  { column = "email",    op = "ilike", value = "%@example.com" },
  { column = "devices__platform", op = "eq", value = "android" }
]

Row filtering works for both INSERT ... VALUES (...) and COPY ... FROM stdin rows.

Conditional per-column cases

Define default strategies in rules."<table>" and add ordered per-column cases in column_cases."<table>"."<column>". For each row and column, Dumpling applies the first matching case; if none match, it falls back to the default from rules.

[rules."public.users"]
email = { strategy = "hash", as_string = true }   # default
name  = { strategy = "faker", faker = "name::Name" }

[[column_cases."public.users".email]]
when.any = [{ column = "is_admin", op = "eq", value = "true" }]
strategy = { strategy = "redact", as_string = true }

[[column_cases."public.users".email]]
when.any = [{ column = "country", op = "in", values = ["DE","FR","GB"] }]
strategy = { strategy = "hash", salt = "eu-salt", as_string = true }

when.any is OR, when.all is AND; you can use either or both. If both are empty, the case matches unconditionally.
First-match-wins per column; there is no merge or fallthrough.
Row filtering (row_filters) is evaluated before cases; deleted rows are not transformed.

Hardened security profile

For adversarial risk environments — where an internal or external actor may have partial auxiliary data — use --security-profile hardened:

dumpling --security-profile hardened -i dump.sql -o sanitized.sql

What changes in hardened mode

Aspect	Standard	Hardened
Random generation	xorshift64* seeded from system time	OS CSPRNG (`getrandom`) — non-predictable
`hash` strategy	SHA-256(salt \|\| input)	HMAC-SHA-256(key=salt, data=input)
Deterministic domain byte stream	SHA-256 CTR-mode	HMAC-SHA-256 CTR-mode
Report `security_profile` field	`"standard"`	`"hardened"`
`--seed` / `DUMPLING_SEED`	Seeds the PRNG	Ignored (warning emitted)

Why this matters

Non-predictable output: xorshift64* is seeded from system time, which is guessable. The OS CSPRNG cannot be predicted from timing alone.
Proper keyed hashing: SHA-256(key || data) is vulnerable to length-extension attacks and weak as a MAC. HMAC-SHA-256 uses the salt as a genuine cryptographic key, providing provable PRF security.
Domain separation: HMAC construction ensures outputs from one salt/key cannot be confused with another.

Key management guidance

Configure a per-environment secret via an env-backed reference to prevent key leakage:

# .dumplingconf
salt = "${DUMPLING_HMAC_KEY}"

[rules."public.users"]
ssn = { strategy = "hash", as_string = true }
email = { strategy = "faker", faker = "internet::SafeEmail", domain = "users" }

export DUMPLING_HMAC_KEY="$(openssl rand -base64 32)"
dumpling --security-profile hardened -i dump.sql -o sanitized.sql

Key rotation: Changing DUMPLING_HMAC_KEY will produce entirely different pseudonyms for all salted/domain-mapped columns. If you rely on referential consistency across separately-processed dumps (e.g., snapshots over time), keep the same key or re-anonymize all related dumps together. Rotate keys when:

A key may have been compromised.
You intentionally want to break prior referential linkability.

Report metadata

The JSON report always includes the active security profile:

{
  "security_profile": "hardened",
  "total_rows_processed": 1000,
  ...
}

Policy linting

The lint-policy subcommand statically analyses your configuration and flags common issues before they affect a production pipeline.

dumpling lint-policy                          # auto-discover config
dumpling lint-policy --config .dumplingconf   # explicit config path

Check	Severity	Description
`empty-rules-table`	warning	A `[rules]` entry has no column rules
`empty-column-cases-table`	warning	A `[column_cases]` entry has no column cases
`unsalted-hash`	warning	`hash` strategy used without any salt — reversible for low-entropy inputs
`inconsistent-domain-strategy`	error	Same domain name used with different strategies — breaks referential integrity
`uncovered-sensitive-column`	error	A column in `[sensitive_columns]` has no matching rule or case

Exits 0 if no violations are found, 1 if any violations exist. Plug it into CI as a pre-merge gate:

- run: ./target/release/dumpling lint-policy

See the CI guardrails documentation for full pipeline recipes including strict-coverage enforcement, residual PII scan gating, and report diffing.

Notes

This is a streaming transformer; memory usage stays small even for large dumps.
For CI/CD and production-like workflows, prefer the default fail-closed mode and avoid --allow-noop unless a no-op run is intentional.
For best results, configure strategies compatible with column data types. If you hash an integer column, Dumpling will render a string; most databases can coerce this, but explicit as_string = false may help in some cases.
For length-restricted text columns (varchar(n), character varying(n), char(n), character(n)), Dumpling reads CREATE TABLE definitions and truncates generated text values to fit within the declared limit.
Deterministic anonymization for tests: pass --seed <u64> or set env DUMPLING_SEED to make fuzz strategies reproducible across runs. Note: --seed has no effect in --security-profile hardened.
Domain mappings (domain = "...") are deterministic by source value + domain (+ optional salt), so referential joins stay stable across tables within the same dump.

Full documentation

Detailed docs, including the configuration reference and release process, are available at the project's GitHub Pages site (built from docs/src/).

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.7.0a0 pre-release

May 4, 2026

0.6.0

May 3, 2026

0.5.0

May 3, 2026

0.4.3

May 3, 2026

0.4.2

May 3, 2026

0.4.1

May 3, 2026

0.4.0

May 2, 2026

This version

0.3.0

May 2, 2026

0.2.0

May 2, 2026

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dumpling_cli-0.3.0.tar.gz (102.8 kB view details)

Uploaded May 2, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dumpling_cli-0.3.0-py3-none-win_amd64.whl (2.1 MB view details)

Uploaded May 2, 2026 Python 3Windows x86-64

dumpling_cli-0.3.0-py3-none-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded May 2, 2026 Python 3manylinux: glibc 2.34+ x86-64

dumpling_cli-0.3.0-py3-none-macosx_11_0_arm64.whl (2.4 MB view details)

Uploaded May 2, 2026 Python 3macOS 11.0+ ARM64

File details

Details for the file dumpling_cli-0.3.0.tar.gz.

File metadata

Download URL: dumpling_cli-0.3.0.tar.gz
Upload date: May 2, 2026
Size: 102.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpling_cli-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`17c3cd54566f8c1a4eb219fea05743f27da06b0587c14667979a3c36c82e4ecd`
MD5	`2cf982fff72cff86568b835d51ded4bc`
BLAKE2b-256	`b9aa2e7451c7c2b3b9b216df06d9d659a2753280ae6ad3843c0293cd5e254c5f`

See more details on using hashes here.

File details

Details for the file dumpling_cli-0.3.0-py3-none-win_amd64.whl.

File metadata

Download URL: dumpling_cli-0.3.0-py3-none-win_amd64.whl
Upload date: May 2, 2026
Size: 2.1 MB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpling_cli-0.3.0-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`6bfa8dc4d14a066b56d12715a571e0ca89e19c549a0ba71797ee8b8b733fce43`
MD5	`3a7e67568360a9b71fb17382223d7014`
BLAKE2b-256	`c8a4627956238b101de1031733f43549d983404d7179bc549b0cc15dc593da14`

See more details on using hashes here.

File details

Details for the file dumpling_cli-0.3.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

Download URL: dumpling_cli-0.3.0-py3-none-manylinux_2_34_x86_64.whl
Upload date: May 2, 2026
Size: 2.6 MB
Tags: Python 3, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpling_cli-0.3.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`d78bdccf5336c9324ac2f0f698c45f274ab265f2ed046b7d25af534dab69e0ca`
MD5	`2b69276b0c140ef3bb7aa27a21099cf3`
BLAKE2b-256	`ce8e2ccfb256316243bd6810ea2c8c8825b76749088871ef84195c7dc562e4a6`

See more details on using hashes here.

File details

Details for the file dumpling_cli-0.3.0-py3-none-macosx_11_0_arm64.whl.

File metadata

Download URL: dumpling_cli-0.3.0-py3-none-macosx_11_0_arm64.whl
Upload date: May 2, 2026
Size: 2.4 MB
Tags: Python 3, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpling_cli-0.3.0-py3-none-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`cb7ec77084e55e6d51d975164b83b23ee05310c66b2671d8633b3dcc6790db80`
MD5	`b5f7374a74df389f9702adea4b432f41`
BLAKE2b-256	`da0b3dd8f25be06a84b113f0885226295eb68c13979a197651e4c92aff2a75a0`

See more details on using hashes here.

dumpling-cli 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Dumpling

Why Dumpling?

Install

Rust (from source)

Python / pip (dumpling-cli)

Usage

Configuration (TOML)

Anonymization strategies

Secret references

Common column options

Strict coverage

Coverage reporting

CI gate pattern

Residual PII scan

Input format

PostgreSQL (--format postgres)

SQLite (--format sqlite)

SQL Server / MSSQL (--format mssql)

Row filtering

Conditional per-column cases

Hardened security profile

What changes in hardened mode

Why this matters

Key management guidance

Report metadata

Policy linting

Notes

Full documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

Python / pip (`dumpling-cli`)

PostgreSQL (`--format postgres`)

SQLite (`--format sqlite`)

SQL Server / MSSQL (`--format mssql`)