Skip to main content

Extract minimal, referentially-intact database subsets for local development

Project description

dbslice logo

dbslice

PyPI version License: MIT Python 3.10+

Extract minimal, referentially-intact database subsets for local development and debugging.

The Problem

Copying an entire production database to your machine is infeasible. But reproducing a bug often requires having the exact data that caused it. dbslice solves this by extracting only the records you need, following foreign key relationships to ensure referential integrity.

dbslice — seed to subset

Quick Start

uv add dbslice

# Extract an order and all related records
dbslice extract postgres://localhost/myapp --seed "orders.id=12345" > subset.sql

# Import into local database
psql -d localdb < subset.sql

Features

  • Zero-config start -- Introspects schema automatically, no data model file required
  • Single command -- Extract complete data subsets with one CLI invocation
  • Safe by default -- Auto-detects and anonymizes sensitive fields (emails, phones, SSNs, etc.)
  • Multiple output formats -- SQL, JSON, and CSV
  • Streaming -- Memory-efficient extraction for large datasets (100K+ rows)
  • Virtual foreign keys -- Support for Django GenericForeignKeys and implicit relationships via config
  • Config files -- YAML-based configuration for repeatable extractions
  • Validation -- Checks referential integrity of extracted data

Database Support

Database Status
PostgreSQL Fully supported
MySQL Planned (not yet implemented)
SQLite Planned (not yet implemented)

Installation

# Install with uv (recommended)
uv add dbslice

# Try without installing
uvx dbslice --help

# Or with pip
pip install dbslice

Usage

Basic Extraction

# Extract by primary key
dbslice extract postgres://user:pass@host:5432/db --seed "orders.id=12345"

# Extract with WHERE clause
dbslice extract postgres://localhost/db --seed "orders:status='failed' AND created_at > '2024-01-01'"

# Multiple seeds
dbslice extract postgres://localhost/db \
  --seed "orders.id=100" \
  --seed "orders.id=101"

Control Traversal

# Limit depth (default: 3)
dbslice extract postgres://... --seed "orders.id=1" --depth 2

# Direction: up (parents only), down (children only), both (default)
dbslice extract postgres://... --seed "orders.id=1" --direction up

Anonymization

# Auto-anonymize detected sensitive fields
dbslice extract postgres://... --seed "users.id=1" --anonymize

# Redact additional fields
dbslice extract postgres://... --seed "users.id=1" --anonymize --redact "audit_logs.ip_address"

Output Formats

# SQL (default)
dbslice extract postgres://... --seed "orders.id=1" --output sql

# JSON fixtures
dbslice extract postgres://... --seed "orders.id=1" --output json --out-file fixtures/

# CSV
dbslice extract postgres://... --seed "orders.id=1" --output csv --out-file data/

Virtual Foreign Keys

For relationships not defined in the database schema (Django GenericForeignKeys, implicit relationships):

# dbslice.yaml
database:
  url: postgres://localhost:5432/myapp

virtual_foreign_keys:
  - source_table: notifications
    source_columns: [object_id]
    target_table: orders
    description: "Generic FK to orders via ContentType"

  - source_table: audit_log
    source_columns: [user_id]
    target_table: users
    description: "Implicit FK without DB constraint"
dbslice extract --config dbslice.yaml --seed "users.id=1"

Inspect Schema

dbslice inspect postgres://localhost/myapp

Configuration File

# Generate config from database
dbslice init postgres://localhost/myapp --out-file dbslice.yaml

# Use config
dbslice extract --config dbslice.yaml --seed "orders.id=12345"

How It Works

  1. Introspect -- Reads database schema to discover tables and foreign key relationships
  2. Traverse -- Starting from seed record(s), follows FK relationships via BFS
  3. Extract -- Fetches all identified records
  4. Sort -- Topologically sorts tables for correct INSERT order
  5. Output -- Generates SQL/JSON/CSV with proper escaping

Comparison

Feature dbslice Jailer Greenmask slice-db
Language Python Java Go Ruby
Configuration Zero-config Requires model file Config required Manual YAML
Setup time Seconds Hours Medium Medium
Anonymization Built-in (Faker) Plugin-based Advanced transformers Not available
Subsetting FK traversal FK traversal Limited FK traversal
Output formats SQL, JSON, CSV SQL, XML, CSV SQL SQL only
Cycle handling Automatic Manual config N/A Manual
Streaming Built-in Configurable Built-in Not available
Maintenance Active Active Active Unmaintained

dbslice is the lightweight, zero-config Python option: install and extract in under a minute.

Development

git clone https://github.com/nabroleonx/dbslice.git
cd dbslice
uv sync --dev
uv run pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbslice-0.1.2.tar.gz (126.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbslice-0.1.2-py3-none-any.whl (79.0 kB view details)

Uploaded Python 3

File details

Details for the file dbslice-0.1.2.tar.gz.

File metadata

  • Download URL: dbslice-0.1.2.tar.gz
  • Upload date:
  • Size: 126.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbslice-0.1.2.tar.gz
Algorithm Hash digest
SHA256 279a595bc2174a437552f37301c0d80ef33c0803157f1cc7d6cf450f41ce9735
MD5 62418bdc8b41ff32e45afdc135b30510
BLAKE2b-256 f22c2a2aa5c05ef57b98199a4e62ab0eecc571efa55b440afbf0d00c86fd82ef

See more details on using hashes here.

File details

Details for the file dbslice-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dbslice-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 79.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbslice-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 253d7319546e539c047f07cc91e36a1ec875cf027fc2ea4e9b4b4e17e2d8b524
MD5 5597cfac4e8768d4a572ee4e374e4b4b
BLAKE2b-256 3b4204ff6d50456c4474f9d124d64179d753b194edd1d20a412843618010080d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page