Translate between MLLP and HTTP

These details have not been verified by PyPI

Project links

Project description

SliceDB

Overview

SliceDB is a tool for capturing and restoring a subset of a PostgreSQL database. It also supports scrubbing sensive data.

Install

Pip

pip3 install psycopg2-binary slice_db

Docker

docker pull rivethealth/slicedb

Usage

For all commands and options, see Usage.

Basic example

First, query a database to create a schema file.

slicedb schema > schema.yml

Second, dump a slice:

slicedb dump --root public.example 'WHERE id IN (7, 56, 234)' --schema schema.yml > slice.zip

Third, restore that slice into another database:

slicedb restore < slice.zip

For a complete working example, see Example.

Connection

Use the libpq environment variables to configure the connection.

PGHOST=myhost slicedb schema > slice.yml

Dump

Output types

SliceDB can produce multiple formats:

slice - ZIP archive. This can be restored into an existing database with slicedb restore.
sql - SQL file. This can be restored into an existing database with psql or another client. If restoring into existing schema, foreign keys must first be disabled, e.g. SET session_replication_role = replica.

Both formats can optionally include schema. Restoring with schema requires an empty database.

Schema

See formats/schema.yml for the JSONSchema of the schema file.

The schema command uses foreign keys to infer relationships between tables. It is a suggested starting point.

You may want to prune the slice by removing relationships, or expand the slice by adding relationships that don't have explicit foreign keys.

slicedb schema-filter can help modify the schema, or generic JSON tools like jq.

Algorithm

The slicing process works as follows:

Starting with the root table, query the physical IDs (ctid) of rows.
Add the row IDs to the existing list.
For new IDs, process each of the adjacent tables, using them as the current root.

Do this in parallel, using pg_export_snapshot() to guarantee a consistent snapshot across workers.

Performance

Hundreds of thousands of rows can be exported in only a few minutes and several dozen MBs of memory.

Transformation

TODO

Replacements are deterministic for a given pepper. By default, the pepper is randomly geneated for a slice. You may specify it as --pepper. Note that possession of the pepper makes the data guessable.

Transformation may operate an existing slice, or happen during the dump.

Replacments

alphanumeric - Replace alphanumeric characters, preserve the type and case of characters.
date_year - Change date by up to one year.
geozip - Replace zip code, preserving the first three digits.
given_name - Replace given name.
person_name - Replace name.
surname - Replace surname.
composite - Parse as a PostgreSQL composite, with suboptions.

Replacement data

Given names: https://www.ssa.gov/cgi-bin/popularnames.cgi
Surnames: https://raw.githubusercontent.com/fivethirtyeight/data/master/most-common-name/surnames.csv
Zip codes: https://simplemaps.com/data/us-zips

Restore

SliceDB can restore slices into existing databases. In practice, this should normally be an empty existing database.

Cycles

Foreign keys may form a cycle only if at least one foreign key in the cycle is deferrable.

That foreign key will be deferred during restore.

A restore may happen in a single transaction or not. Parallelism requires multiple transactions.

Not supported

Multiple databases
Databases other than PostgreSQL

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.1.16

Nov 7, 2023

5.1.15

Nov 7, 2023

5.1.14

Oct 13, 2023

5.1.13

Oct 13, 2023

5.1.12

Oct 11, 2023

5.1.11

Oct 6, 2023

5.1.10

Jul 13, 2023

5.1.9

Jul 12, 2023

5.1.8

Dec 29, 2022

5.1.7

Dec 13, 2022

5.1.6

Dec 13, 2022

5.1.5

Dec 12, 2022

5.1.4

Dec 8, 2022

5.1.3

Nov 10, 2022

5.1.2

Aug 26, 2022

5.1.1

Mar 16, 2022

5.1.0

Feb 24, 2022

5.0.1

Sep 20, 2021

5.0.0

Aug 23, 2021

4.0.2

Jul 28, 2021

4.0.1

Jun 29, 2021

4.0.0

Jun 8, 2021

3.0.1

May 28, 2021

3.0.0

May 27, 2021

2.2.2

May 13, 2021

2.2.1

May 13, 2021

2.2.0

May 13, 2021

2.1.3

May 13, 2021

2.1.2

May 13, 2021

2.1.1

May 12, 2021

2.1.0

May 7, 2021

2.0.3

Apr 24, 2021

2.0.2

Apr 23, 2021

2.0.1

Apr 23, 2021

2.0.0

Apr 23, 2021

1.0.2

Apr 6, 2021

1.0.1

Apr 3, 2021

1.0.0

Apr 2, 2021

0.1.4

Apr 2, 2021

This version

0.1.3

Apr 1, 2021

0.1.2

Mar 18, 2021

0.1.1

Mar 18, 2021

0.1.0

Mar 18, 2021

0.0.0

Mar 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slice-db-0.1.3.tar.gz (25.6 kB view hashes)

Uploaded Apr 1, 2021 Source

Built Distribution

slice_db-0.1.3-py3-none-any.whl (29.8 kB view hashes)

Uploaded Apr 1, 2021 Python 3

Hashes for slice-db-0.1.3.tar.gz

Hashes for slice-db-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`e427b87ae3f5907affd185137a3d5e2bfb301e191c6c4d34223a64ddb28e9e7a`
MD5	`7183211026b588bf275bcfc5919c816e`
BLAKE2b-256	`53ff9b605f9c7add3089f47c2dac152e580bc0b5e00acd577d0e84ed89ac3d98`

Hashes for slice_db-0.1.3-py3-none-any.whl

Hashes for slice_db-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c89a7304fc70f373c4bd7b6372c22ef391e1b93759a1ba7e1957fdcca2880844`
MD5	`1a1a17570cca71802c10e6c9fe4dc7ec`
BLAKE2b-256	`d82ff51b504ce70515b7bb9dc21a54abf5fcb0f5f8b64bb0330d69b74df85062`