Skip to main content

Extract minimal, referentially-intact database subsets for local development

Project description

dbslice logo

dbslice

PyPI version License: MIT Python 3.10+

Extract minimal, referentially-intact database subsets for local development and debugging.

The Problem

Copying an entire production database to your machine is infeasible. But reproducing a bug often requires having the exact data that caused it. dbslice solves this by extracting only the records you need, following foreign key relationships to ensure referential integrity.

dbslice — seed to subset

Quick Start

# Install globally
uv tool install dbslice   # or: pip install dbslice

# Extract an order and all related records
dbslice extract postgres://localhost/myapp --seed "orders.id=12345" > subset.sql

# Import into local database
psql -d localdb < subset.sql

Features

  • Zero-config start -- Introspects schema automatically, no data model file required
  • Single command -- Extract complete data subsets with one CLI invocation
  • Safe by default -- Auto-detects and anonymizes sensitive fields (emails, phones, SSNs, etc.)
  • Multiple output formats -- SQL, JSON, and CSV
  • Streaming -- Memory-efficient extraction for large datasets (100K+ rows)
  • Virtual foreign keys -- Support for Django GenericForeignKeys and implicit relationships via config
  • Config files -- YAML-based configuration for repeatable extractions
  • Validation -- Checks referential integrity of extracted data

Database Support

Database Status
PostgreSQL Fully supported
MySQL Planned (not yet implemented)
SQLite Planned (not yet implemented)

Installation

# Install with uv (recommended)
uv add dbslice

# Try without installing
uvx dbslice --help

# Or with pip
pip install dbslice

Usage

Basic Extraction

# Extract by primary key
dbslice extract postgres://user:pass@host:5432/db --seed "orders.id=12345"

# Extract with WHERE clause
dbslice extract postgres://localhost/db --seed "orders:status='failed' AND created_at > '2024-01-01'"

# Multiple seeds
dbslice extract postgres://localhost/db \
  --seed "orders.id=100" \
  --seed "orders.id=101"

Control Traversal

# Limit depth (default: 3)
dbslice extract postgres://... --seed "orders.id=1" --depth 2

# Direction: up (parents only), down (children only), both (default)
dbslice extract postgres://... --seed "orders.id=1" --direction up

Anonymization

# Auto-anonymize detected sensitive fields
dbslice extract postgres://... --seed "users.id=1" --anonymize

# Redact additional fields
dbslice extract postgres://... --seed "users.id=1" --anonymize --redact "audit_logs.ip_address"

Output Formats

# SQL (default)
dbslice extract postgres://... --seed "orders.id=1" --output sql

# JSON fixtures
dbslice extract postgres://... --seed "orders.id=1" --output json --out-file fixtures/

# CSV
dbslice extract postgres://... --seed "orders.id=1" --output csv --out-file data/

Virtual Foreign Keys

For relationships not defined in the database schema (Django GenericForeignKeys, implicit relationships):

# dbslice.yaml
database:
  url: postgres://localhost:5432/myapp

virtual_foreign_keys:
  - source_table: notifications
    source_columns: [object_id]
    target_table: orders
    description: "Generic FK to orders via ContentType"

  - source_table: audit_log
    source_columns: [user_id]
    target_table: users
    description: "Implicit FK without DB constraint"
dbslice extract --config dbslice.yaml --seed "users.id=1"

Inspect Schema

dbslice inspect postgres://localhost/myapp

Configuration File

# Generate config from database
dbslice init postgres://localhost/myapp --out-file dbslice.yaml

# Use config
dbslice extract --config dbslice.yaml --seed "orders.id=12345"

How It Works

  1. Introspect -- Reads database schema to discover tables and foreign key relationships
  2. Traverse -- Starting from seed record(s), follows FK relationships via BFS
  3. Extract -- Fetches all identified records
  4. Sort -- Topologically sorts tables for correct INSERT order
  5. Output -- Generates SQL/JSON/CSV with proper escaping

Comparison

Feature dbslice Jailer Greenmask slice-db
Language Python Java Go Ruby
Configuration Zero-config Requires model file Config required Manual YAML
Setup time Seconds Hours Medium Medium
Anonymization Built-in (Faker) Plugin-based Advanced transformers Not available
Subsetting FK traversal FK traversal Limited FK traversal
Output formats SQL, JSON, CSV SQL, XML, CSV SQL SQL only
Cycle handling Automatic Manual config N/A Manual
Streaming Built-in Configurable Built-in Not available
Maintenance Active Active Active Unmaintained

dbslice is the lightweight, zero-config Python option: install and extract in under a minute.

Development

git clone https://github.com/nabroleonx/dbslice.git
cd dbslice
uv sync --dev
uv run pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbslice-0.4.0.tar.gz (149.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbslice-0.4.0-py3-none-any.whl (93.8 kB view details)

Uploaded Python 3

File details

Details for the file dbslice-0.4.0.tar.gz.

File metadata

  • Download URL: dbslice-0.4.0.tar.gz
  • Upload date:
  • Size: 149.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbslice-0.4.0.tar.gz
Algorithm Hash digest
SHA256 268c67115c8b416b8dbf92fc38bff11a2755849ef89111927b8f21397bd47c24
MD5 3574b48437b11c772bc639ed1cafe0c5
BLAKE2b-256 5f7632d2b4b25290286807c65e88a80b981d84ae316eb79759140c4f617a509f

See more details on using hashes here.

File details

Details for the file dbslice-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dbslice-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbslice-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eb0026157f430b5a12c0581c59e7ad1608d522cb5a6802d5914361e05239cb22
MD5 68b89ceb81e911c482587fdbba00666f
BLAKE2b-256 735d5994c6a60313420e44deec03d6cca6c80742d5fdb50501f92bd04439b069

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page