Skip to main content

Capture, scrub, and restore subsets of PostgreSQL databases.

Project description

Slice DB

PyPI

Overview

SliceDB is a tool for capturing and restoring a subset of a PostgreSQL database. It also supports scrubbing sensive data.

Install

Pip

pip3 install slice-db

Docker

docker pull rivethealth/slicedb

Usage

For all commands and options, see Usage.

Basic example

First, query a database to create a schema file.

slicedb schema > schema.yml

Second, dump a slice:

slicedb dump --root public.example 'WHERE id IN (7, 56, 234)' --schema schema.yml > slice.zip

Third, restore that slice into another database:

slicedb restore < slice.zip

For a complete working example, see Example.

Connection

Use the libpq environment variables to configure the connection.

PGHOST=myhost slicedb schema > slice.yml

Dump

See dump.yml for the JSONSchema.

Output formats

SliceDB can produce multiple formats:

  • slice - ZIP archive. This can be restored with slicedb restore.
  • sql - SQL file. This can be restored with psql or another client. If restoring into existing schema, foreign keys must first be disabled, e.g. SET session_replication_role = replica.

Output content

Schema can optionally be included. Restoring with schema requires an existing empty database.

Schema

The schema command uses foreign keys to infer relationships between tables. It is a suggested starting point.

You may want to prune the slice by removing relationships, or expand the slice by adding relationships that don't have explicit foreign keys.

slicedb schema-filter can help modify the schema, or generic JSON tools like jq.

Algorithm

The slicing process works as follows:

  1. Starting with the root table, query the physical IDs (ctid) of rows.

  2. Add the row IDs to the existing list.

  3. For new IDs, process each of the adjacent tables, using them as the current root.

Do this in parallel, using pg_export_snapshot() to guarantee a consistent snapshot across workers.

Performance

Hundreds of thousands of rows can be exported in only a few minutes and several dozen MBs of memory.

Transformation

See transform.yml for the JSONSchema.

Replacements are deterministic for a given pepper. By default, the pepper is randomly generated each run. You may specify it as --pepper. Note that possession of the pepper makes the data guessable.

Transformation may operate on an existing slice (TODO), or happen during the dump.

Configuration

Transforms are specified by:

  • class, the Python class
  • config, transform-specific options
  • module, defaults to slice_db.transforms

The name given to the transform is appended to the global pepper.

Custom transforms

To create custom transforms, implement slice_db.transform.Transform, expose the class on a module, and install the module so that is accessible by slicedb.

Built-in transforms

The slice_db.transforms package has many common transforms.

See transforms.md for the full list.

Restore

SliceDB can restore slices into existing databases. In practice, this should normally be an empty existing database.

Cycles

Foreign keys may form a cycle only if at least one foreign key in the cycle is deferrable.

That foreign key will be deferred during restore.

A restore may happen in a single transaction or not. Parallelism requires multiple transactions.

Not supported

  • Multiple databases
  • Databases other than PostgreSQL

Developing

Install: make install

Format: make format

Publishing

  1. Update slice_db/version.py
  2. Commit git add slice_db/version.py && git commit -m "Version <version>"
  3. Tag git tag v<version>
  4. Push git push origin master --tags
  5. Publish to PyPI: make upload

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slice-db-5.1.16.tar.gz (406.8 kB view details)

Uploaded Source

Built Distribution

slice_db-5.1.16-py3-none-any.whl (412.4 kB view details)

Uploaded Python 3

File details

Details for the file slice-db-5.1.16.tar.gz.

File metadata

  • Download URL: slice-db-5.1.16.tar.gz
  • Upload date:
  • Size: 406.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for slice-db-5.1.16.tar.gz
Algorithm Hash digest
SHA256 3183081d07ab39e2583e61585244d8307df4990368234a3ade8d6030cd63bd10
MD5 92d0d47bc0e9ada4ad7228712a6e7c70
BLAKE2b-256 426aa88cee39f997cde6e4fbaa3cca74b76303df915129b984164639c10c223f

See more details on using hashes here.

File details

Details for the file slice_db-5.1.16-py3-none-any.whl.

File metadata

  • Download URL: slice_db-5.1.16-py3-none-any.whl
  • Upload date:
  • Size: 412.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for slice_db-5.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 d06cfafbc8bdc8566209cc32477ae8837a2e3a077587a41f2902e62d59551d2e
MD5 1da69ceadec722595b1aa0663acff55f
BLAKE2b-256 ad89dc33d4659d07cbdf75438b42d34cd186ef8dc95905663285f8eb10bb0bd6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page