dumpling-cli

Static anonymizer for Postgres plain SQL dumps produced by pg_dump.

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
Operating System
Programming Language
Topic

Project description

Dumpling

Static anonymizer for Postgres plain SQL dumps produced by pg_dump. It scans INSERT and COPY FROM stdin statements and replaces sensitive row data based on configurable rules.

Install / Build

cargo build --release
./target/release/dumpling --help

Python package build (maturin)

This repository now includes Python distribution metadata so Dumpling can be published as a pip-installable CLI package (distribution name: dumpling-cli).

# Build wheel/sdist locally
maturin build --release

# Install from local source (requires maturin as PEP 517 backend)
pip install .

After install, the CLI command remains:

dumpling --help

Project automation

Lint: .github/workflows/ci.yml runs cargo fmt and cargo clippy only (fast signal).
Test: .github/workflows/tests.yml runs cargo test --all-targets --all-features.
Platform compatibility (latest): .github/workflows/platform-compat-latest.yml runs cross-platform build checks on latest runner images.
Platform compatibility (matrix): .github/workflows/platform-compat-matrix.yml is a manual, explicit-version matrix for legacy compatibility checks over time.
Docs: .github/workflows/docs.yml builds this repo's mdBook docs and deploys them from main to GitHub Pages.
Publish: .github/workflows/publish.yml builds wheels/sdist via maturin, publishes to PyPI from tags, and supports manual TestPyPI publication.
Release: .github/workflows/release.yml publishes tagged releases (v*.*.*) with checksummed Linux artifacts.

Docs

mdbook build

Primary docs live under docs/src/, including the release process.

Usage

dumpling -i dump.sql -o sanitized.sql           # read from file, write to file
dumpling -i dump.sql --in-place                 # overwrite the input file (atomic swap)
cat dump.sql | dumpling > sanitized.sql         # stream from stdin to stdout
dumpling -i dump.sql -c .dumplingconf           # use explicit config path
dumpling --check -i dump.sql                    # exit 1 if changes would occur, no output
dumpling --stats -i dump.sql -o out.sql         # print summary to stderr
dumpling --report report.json -i dump.sql       # write detailed JSON report of changes/drops
dumpling --include-table '^public\\.' -i dump.sql -o out.sql
dumpling --exclude-table '^audit\\.' -i dump.sql -o out.sql
dumpling --allow-ext dmp -i data.dmp            # restrict processing to specific extensions

Configuration is loaded in this order:

--config <path> if provided
.dumplingconf in the current directory
pyproject.toml [tool.dumpling] section

If no configuration is found, Dumpling performs a no-op transformation.

Configuration (TOML)

Both .dumplingconf and [tool.dumpling] inside pyproject.toml use the same schema:

# Optional global salt for strategies that support it (e.g. hash)
salt = "mysalt"

# Rules are keyed by either "table" or "schema.table"
[rules."public.users"]
email = { strategy = "email" }
name  = { strategy = "name" }
ssn   = { strategy = "hash", as_string = true }   # SHA-256 of original (salted)
age   = { strategy = "int_range", min = 18, max = 90 }

[rules."orders"]
credit_card = { strategy = "redact", as_string = true }

Supported strategies:

null: set field to SQL NULL
redact: replace with REDACTED (string)
uuid: random UUIDv4-like string
hash: SHA-256 hex of original value; supports per-column salt and global salt
email: random-looking email at example.com
name, first_name, last_name: simple placeholder names
phone: simple US-like phone number (xxx) xxx-xxxx
int_range: random integer in [min, max]
string: random alphanumeric string, length = 12 by default
date_fuzz: shifts a date by a random number of days in [min_days, max_days] (defaults: -30..30)
time_fuzz: shifts a time-of-day by a random number of seconds in [min_seconds, max_seconds] with 24h wraparound (defaults: -300..300)
datetime_fuzz: shifts a timestamp/timestamptz by a random number of seconds in [min_seconds, max_seconds] (defaults: -86400..86400)

Common option:

as_string: if true, forces the anonymized value to be rendered as a quoted SQL string literal. By default Dumpling preserves the original quoting where possible.
min_days/max_days: used by date_fuzz
min_seconds/max_seconds: used by time_fuzz and datetime_fuzz

Input format

This tool targets the plain-text SQL format from pg_dump, handling:

INSERT INTO schema.table (col1, col2, ...) VALUES (...), (...), ...;
COPY schema.table (col1, col2, ...) FROM stdin; ... \. (tab-delimited with \N as NULL)

Other pg_dump formats (custom/binary/directory) are not supported.

Row filtering (retain/delete)

You can retain or delete rows for specific tables using explicit predicate lists. Semantics:

If retain is non-empty, a row is kept only if it matches at least one of its predicates.
Regardless of retain, a row is dropped if it matches any predicate in delete.

Predicates support these operators on a column:

eq, neq (string compare; case-insensitive if case_insensitive = true)
in, not_in (list of values, string compare)
like, ilike (SQL-like: % and _)
regex, iregex (Rust regex; iregex is case-insensitive)
lt, lte, gt, gte (numeric compare; values parsed as numbers)
is_null, not_null (no value needed)

Example:

[row_filters."public.users"]
retain = [
  { column = "country", op = "eq",  value = "US" },
  { column = "email",   op = "ilike", value = "%@myco.com" }
]
delete = [
  { column = "is_admin", op = "eq", value = "true" },
  { column = "email",    op = "ilike", value = "%@example.com" }
]

Row filtering works for both INSERT ... VALUES (...) and COPY ... FROM stdin rows.

Conditional per-column cases (first-match-wins)

Define default strategies in rules."<table>" and add ordered per-column cases in column_cases."<table>"."<column>". For each row, for each column, Dumpling applies the first matching case; if none match, it uses the default from rules.

Example:

[rules."public.users"]
email = { strategy = "hash", as_string = true }   # default
name  = { strategy = "name" }

[[column_cases."public.users".email]]
when.any = [{ column = "is_admin", op = "eq", value = "true" }]
strategy = { strategy = "redact", as_string = true }

[[column_cases."public.users".email]]
when.any = [{ column = "country", op = "in", values = ["DE","FR","GB"] }]
strategy = { strategy = "hash", salt = "eu-salt", as_string = true }

Notes:

when.any is OR, when.all is AND; you can use either or both. If both are empty, the case matches unconditionally.
First-match-wins per column; there is no merge/replace or fallthrough flag.
Row filtering (row_filters) is evaluated before cases; deleted rows are not transformed.

Notes

This is a streaming transformer; memory usage stays small even for big dumps.
For best results, configure strategies compatible with column data types. If you hash an integer column, Dumpling will render a string which Postgres can usually coerce, but explicit as_string = false may help in some cases.
If you switch runtimes/branches frequently and see test DB migration issues in your project, remember you can run tests with pytest --create-db (project convention).
Deterministic anonymization for tests: pass --seed <u64> or set env DUMPLING_SEED to make fuzz strategies reproducible across runs.

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
Operating System
Programming Language
Topic

Release history Release notifications | RSS feed

0.7.0a0 pre-release

May 4, 2026

0.6.0

May 3, 2026

0.5.0

May 3, 2026

0.4.3

May 3, 2026

0.4.2

May 3, 2026

0.4.1

May 3, 2026

0.4.0

May 2, 2026

0.3.0

May 2, 2026

0.2.0

May 2, 2026

This version

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dumpling_cli-0.1.0.tar.gz (33.3 kB view details)

Uploaded Mar 9, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dumpling_cli-0.1.0-py3-none-win_amd64.whl (1.3 MB view details)

Uploaded Mar 9, 2026 Python 3Windows x86-64

dumpling_cli-0.1.0-py3-none-manylinux_2_34_x86_64.whl (1.6 MB view details)

Uploaded Mar 9, 2026 Python 3manylinux: glibc 2.34+ x86-64

dumpling_cli-0.1.0-py3-none-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded Mar 9, 2026 Python 3macOS 11.0+ ARM64

File details

Details for the file dumpling_cli-0.1.0.tar.gz.

File metadata

Download URL: dumpling_cli-0.1.0.tar.gz
Upload date: Mar 9, 2026
Size: 33.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dumpling_cli-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`da29739d99ed896f65230ed6332119465fcef24351466c3391546ef70d98b7d6`
MD5	`006f95daf5b64a67404303fda45b4e23`
BLAKE2b-256	`4cadf1c13c463a408ddb6196e90aa79c8701c31d3a8cf9948f82683968a6fc97`

See more details on using hashes here.

File details

Details for the file dumpling_cli-0.1.0-py3-none-win_amd64.whl.

File metadata

Download URL: dumpling_cli-0.1.0-py3-none-win_amd64.whl
Upload date: Mar 9, 2026
Size: 1.3 MB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dumpling_cli-0.1.0-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`b302bc7c509beb69e0e10c013db0190c0d5c243ea011b6ad24f82ff14deffa40`
MD5	`6e219606224ca38e36a2141d921bd38e`
BLAKE2b-256	`20f12961a1a7d6001abc8b62184fc26e707701ec006aa7457c41ef8bb6b7f0ce`

See more details on using hashes here.

File details

Details for the file dumpling_cli-0.1.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

Download URL: dumpling_cli-0.1.0-py3-none-manylinux_2_34_x86_64.whl
Upload date: Mar 9, 2026
Size: 1.6 MB
Tags: Python 3, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dumpling_cli-0.1.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`5858d825276ba4486ccb5d6f4cdb773075f6c14fc7935ab45671697862cbe2c3`
MD5	`0ca3c1153041260dfb4319839ec6ed9c`
BLAKE2b-256	`11a8a6d45a587db01b356fe0aeaf7dcac442c686830a1a50b4eced9c3cf69f93`

See more details on using hashes here.

File details

Details for the file dumpling_cli-0.1.0-py3-none-macosx_11_0_arm64.whl.

File metadata

Download URL: dumpling_cli-0.1.0-py3-none-macosx_11_0_arm64.whl
Upload date: Mar 9, 2026
Size: 1.4 MB
Tags: Python 3, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dumpling_cli-0.1.0-py3-none-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`8368ac23afb8bd558b290a7269591cf3532fb32b853192949443b9b99f14bec6`
MD5	`84cd99e3032860a4e06acc12d2d70316`
BLAKE2b-256	`3a76068c0bba2ddad3880659b8ea72b590052545a091d81ff5a320482e752b51`

See more details on using hashes here.

dumpling-cli 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Dumpling

Install / Build

Python package build (maturin)

Project automation

Docs

Usage

Configuration (TOML)

Input format

Row filtering (retain/delete)

Conditional per-column cases (first-match-wins)

Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes