Static anonymizer for Postgres plain SQL dumps produced by pg_dump.
Project description
Dumpling
Static anonymizer for Postgres plain SQL dumps produced by pg_dump. It scans INSERT and COPY FROM stdin statements and replaces sensitive row data based on configurable rules.
Install / Build
cargo build --release
./target/release/dumpling --help
Python package build (maturin)
This repository now includes Python distribution metadata so Dumpling can be
published as a pip-installable CLI package (distribution name:
dumpling-cli).
# Build wheel/sdist locally
maturin build --release
# Install from local source (requires maturin as PEP 517 backend)
pip install .
After install, the CLI command remains:
dumpling --help
Project automation
- Lint:
.github/workflows/ci.ymlrunscargo fmtandcargo clippyonly (fast signal). - Test:
.github/workflows/tests.ymlrunscargo test --all-targets --all-features. - Platform compatibility (latest):
.github/workflows/platform-compat-latest.ymlruns cross-platform build checks on latest runner images. - Platform compatibility (matrix):
.github/workflows/platform-compat-matrix.ymlis a manual, explicit-version matrix for legacy compatibility checks over time. - Docs:
.github/workflows/docs.ymlbuilds this repo's mdBook docs and deploys them frommainto GitHub Pages. - Publish:
.github/workflows/publish.ymlbuilds wheels/sdist viamaturin, publishes to PyPI from tags, and supports manual TestPyPI publication. - Release:
.github/workflows/release.ymlpublishes tagged releases (v*.*.*) with checksummed Linux artifacts.
Docs
mdbook build
Primary docs live under docs/src/, including the release process.
Usage
dumpling -i dump.sql -o sanitized.sql # read from file, write to file
dumpling -i dump.sql --in-place # overwrite the input file (atomic swap)
cat dump.sql | dumpling > sanitized.sql # stream from stdin to stdout
dumpling -i dump.sql -c .dumplingconf # use explicit config path
dumpling --check -i dump.sql # exit 1 if changes would occur, no output
dumpling --stats -i dump.sql -o out.sql # print summary to stderr
dumpling --report report.json -i dump.sql # write detailed JSON report of changes/drops
dumpling --include-table '^public\\.' -i dump.sql -o out.sql
dumpling --exclude-table '^audit\\.' -i dump.sql -o out.sql
dumpling --allow-ext dmp -i data.dmp # restrict processing to specific extensions
Configuration is loaded in this order:
--config <path>if provided.dumplingconfin the current directorypyproject.toml[tool.dumpling]section
If no configuration is found, Dumpling performs a no-op transformation.
Configuration (TOML)
Both .dumplingconf and [tool.dumpling] inside pyproject.toml use the same schema:
# Optional global salt for strategies that support it (e.g. hash)
salt = "mysalt"
# Rules are keyed by either "table" or "schema.table"
[rules."public.users"]
email = { strategy = "email" }
name = { strategy = "name" }
ssn = { strategy = "hash", as_string = true } # SHA-256 of original (salted)
age = { strategy = "int_range", min = 18, max = 90 }
[rules."orders"]
credit_card = { strategy = "redact", as_string = true }
Supported strategies:
null: set field to SQL NULLredact: replace withREDACTED(string)uuid: random UUIDv4-like stringhash: SHA-256 hex of original value; supports per-columnsaltand globalsaltemail: random-looking email atexample.comname,first_name,last_name: simple placeholder namesphone: simple US-like phone number(xxx) xxx-xxxxint_range: random integer in[min, max]string: random alphanumeric string,length = 12by defaultdate_fuzz: shifts a date by a random number of days in[min_days, max_days](defaults:-30..30)time_fuzz: shifts a time-of-day by a random number of seconds in[min_seconds, max_seconds]with 24h wraparound (defaults:-300..300)datetime_fuzz: shifts a timestamp/timestamptz by a random number of seconds in[min_seconds, max_seconds](defaults:-86400..86400)
Common option:
as_string: if true, forces the anonymized value to be rendered as a quoted SQL string literal. By default Dumpling preserves the original quoting where possible.min_days/max_days: used bydate_fuzzmin_seconds/max_seconds: used bytime_fuzzanddatetime_fuzz
Input format
This tool targets the plain-text SQL format from pg_dump, handling:
INSERT INTO schema.table (col1, col2, ...) VALUES (...), (...), ...;COPY schema.table (col1, col2, ...) FROM stdin; ... \.(tab-delimited with\Nas NULL)
Other pg_dump formats (custom/binary/directory) are not supported.
Row filtering (retain/delete)
You can retain or delete rows for specific tables using explicit predicate lists. Semantics:
- If
retainis non-empty, a row is kept only if it matches at least one of its predicates. - Regardless of
retain, a row is dropped if it matches any predicate indelete.
Predicates support these operators on a column:
eq,neq(string compare; case-insensitive ifcase_insensitive = true)in,not_in(list of values, string compare)like,ilike(SQL-like:%and_)regex,iregex(Rust regex;iregexis case-insensitive)lt,lte,gt,gte(numeric compare; values parsed as numbers)is_null,not_null(no value needed)
Example:
[row_filters."public.users"]
retain = [
{ column = "country", op = "eq", value = "US" },
{ column = "email", op = "ilike", value = "%@myco.com" }
]
delete = [
{ column = "is_admin", op = "eq", value = "true" },
{ column = "email", op = "ilike", value = "%@example.com" }
]
Row filtering works for both INSERT ... VALUES (...) and COPY ... FROM stdin rows.
Conditional per-column cases (first-match-wins)
Define default strategies in rules."<table>" and add ordered per-column cases in column_cases."<table>"."<column>". For each row, for each column, Dumpling applies the first matching case; if none match, it uses the default from rules.
Example:
[rules."public.users"]
email = { strategy = "hash", as_string = true } # default
name = { strategy = "name" }
[[column_cases."public.users".email]]
when.any = [{ column = "is_admin", op = "eq", value = "true" }]
strategy = { strategy = "redact", as_string = true }
[[column_cases."public.users".email]]
when.any = [{ column = "country", op = "in", values = ["DE","FR","GB"] }]
strategy = { strategy = "hash", salt = "eu-salt", as_string = true }
Notes:
when.anyis OR,when.allis AND; you can use either or both. If both are empty, the case matches unconditionally.- First-match-wins per column; there is no merge/replace or fallthrough flag.
- Row filtering (
row_filters) is evaluated before cases; deleted rows are not transformed.
Notes
- This is a streaming transformer; memory usage stays small even for big dumps.
- For best results, configure strategies compatible with column data types. If you hash an integer column, Dumpling will render a string which Postgres can usually coerce, but explicit
as_string = falsemay help in some cases. - If you switch runtimes/branches frequently and see test DB migration issues in your project, remember you can run tests with
pytest --create-db(project convention). - Deterministic anonymization for tests: pass
--seed <u64>or set envDUMPLING_SEEDto make fuzz strategies reproducible across runs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dumpling_cli-0.1.0.tar.gz.
File metadata
- Download URL: dumpling_cli-0.1.0.tar.gz
- Upload date:
- Size: 33.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da29739d99ed896f65230ed6332119465fcef24351466c3391546ef70d98b7d6
|
|
| MD5 |
006f95daf5b64a67404303fda45b4e23
|
|
| BLAKE2b-256 |
4cadf1c13c463a408ddb6196e90aa79c8701c31d3a8cf9948f82683968a6fc97
|
File details
Details for the file dumpling_cli-0.1.0-py3-none-win_amd64.whl.
File metadata
- Download URL: dumpling_cli-0.1.0-py3-none-win_amd64.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b302bc7c509beb69e0e10c013db0190c0d5c243ea011b6ad24f82ff14deffa40
|
|
| MD5 |
6e219606224ca38e36a2141d921bd38e
|
|
| BLAKE2b-256 |
20f12961a1a7d6001abc8b62184fc26e707701ec006aa7457c41ef8bb6b7f0ce
|
File details
Details for the file dumpling_cli-0.1.0-py3-none-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: dumpling_cli-0.1.0-py3-none-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5858d825276ba4486ccb5d6f4cdb773075f6c14fc7935ab45671697862cbe2c3
|
|
| MD5 |
0ca3c1153041260dfb4319839ec6ed9c
|
|
| BLAKE2b-256 |
11a8a6d45a587db01b356fe0aeaf7dcac442c686830a1a50b4eced9c3cf69f93
|
File details
Details for the file dumpling_cli-0.1.0-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: dumpling_cli-0.1.0-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8368ac23afb8bd558b290a7269591cf3532fb32b853192949443b9b99f14bec6
|
|
| MD5 |
84cd99e3032860a4e06acc12d2d70316
|
|
| BLAKE2b-256 |
3a76068c0bba2ddad3880659b8ea72b590052545a091d81ff5a320482e752b51
|