Translate between MLLP and HTTP
Project description
SliceDB
Overview
SliceDB is a tool for capturing and restoring a subset of a PostgreSQL database. It also supports scrubbing sensive data.
Install
Pip
pip3 install psycopg2-binary slice_db
Docker
docker pull rivethealth/slicedb
Usage
For all commands and options, see Usage.
Basic example
First, query a database to create a schema file.
slicedb schema > schema.yml
Second, dump a slice:
slicedb dump --root public.example 'WHERE id IN (7, 56, 234)' --schema schema.yml > slice.zip
Third, restore that slice into another database:
slicedb restore < slice.zip
For a complete working example, see Example.
Connection
Use the libpq environment variables to configure the connection.
PGHOST=myhost slicedb schema > slice.yml
Dump
Output types
SliceDB can produce multiple formats:
- slice - ZIP archive. This can be restored into an existing database with
slicedb restore
. - sql - SQL file. This can be restored into an existing database with
psql
or another client. If restoring into existing schema, foreign keys must first be disabled, e.g.SET session_replication_role = replica
.
Both formats can optionally include schema. Restoring with schema requires an empty database.
Schema
See formats/schema.yml for the JSONSchema of the schema file.
The schema
command uses foreign keys to infer relationships between tables. It
is a suggested starting point.
You may want to prune the slice by removing relationships, or expand the slice by adding relationships that don't have explicit foreign keys.
slicedb schema-filter
can help modify the schema, or generic JSON tools like
jq
.
Algorithm
The slicing process works as follows:
-
Starting with the root table, query the physical IDs (ctid) of rows.
-
Add the row IDs to the existing list.
-
For new IDs, process each of the adjacent tables, using them as the current root.
Do this in parallel, using pg_export_snapshot()
to guarantee a consistent
snapshot across workers.
Performance
Hundreds of thousands of rows can be exported in only a few minutes and several dozen MBs of memory.
Transformation
TODO
Replacements are deterministic for a given pepper. By default, the pepper is
randomly geneated for a slice. You may specify it as --pepper
. Note that
possession of the pepper makes the data guessable.
Transformation may operate an existing slice, or happen during the dump.
Replacments
alphanumeric
- Replace alphanumeric characters, preserve the type and case of characters.date_year
- Change date by up to one year.geozip
- Replace zip code, preserving the first three digits.given_name
- Replace given name.person_name
- Replace name.surname
- Replace surname.composite
- Parse as a PostgreSQL composite, with suboptions.
Replacement data
- Given names: https://www.ssa.gov/cgi-bin/popularnames.cgi
- Surnames: https://raw.githubusercontent.com/fivethirtyeight/data/master/most-common-name/surnames.csv
- Zip codes: https://simplemaps.com/data/us-zips
Restore
SliceDB can restore slices into existing databases. In practice, this should normally be an empty existing database.
Cycles
Foreign keys may form a cycle only if at least one foreign key in the cycle is deferrable.
That foreign key will be deferred during restore.
A restore may happen in a single transaction or not. Parallelism requires multiple transactions.
Not supported
- Multiple databases
- Databases other than PostgreSQL
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.