Extract minimal, referentially-intact database subsets for local development
Project description
dbslice
Extract minimal, referentially-intact database subsets for local development and debugging.
The Problem
Copying an entire production database to your machine is infeasible. But reproducing a bug often requires having the exact data that caused it. dbslice solves this by extracting only the records you need, following foreign key relationships to ensure referential integrity.
Quick Start
# Install globally
uv tool install dbslice # or: pip install dbslice
# Extract an order and all related records
dbslice extract postgres://localhost/myapp --seed "orders.id=12345" > subset.sql
# Import into local database
psql -d localdb < subset.sql
Features
- Zero-config start -- Introspects schema automatically, no data model file required
- Single command -- Extract complete data subsets with one CLI invocation
- Safe by default -- Auto-detects and anonymizes sensitive fields (emails, phones, SSNs, etc.)
- Multiple output formats -- SQL, JSON, and CSV
- Streaming -- Memory-efficient extraction for large datasets (100K+ rows)
- Virtual foreign keys -- Support for Django GenericForeignKeys and implicit relationships via config
- Config files -- YAML-based configuration for repeatable extractions
- Validation -- Checks referential integrity of extracted data
Database Support
| Database | Status |
|---|---|
| PostgreSQL | Fully supported |
| MySQL | Planned (not yet implemented) |
| SQLite | Planned (not yet implemented) |
Installation
# Install with uv (recommended)
uv add dbslice
# Try without installing
uvx dbslice --help
# Or with pip
pip install dbslice
Usage
Basic Extraction
# Extract by primary key
dbslice extract postgres://user:pass@host:5432/db --seed "orders.id=12345"
# Extract with WHERE clause
dbslice extract postgres://localhost/db --seed "orders:status='failed' AND created_at > '2024-01-01'"
# Multiple seeds
dbslice extract postgres://localhost/db \
--seed "orders.id=100" \
--seed "orders.id=101"
Control Traversal
# Limit depth (default: 3)
dbslice extract postgres://... --seed "orders.id=1" --depth 2
# Direction: up (parents only), down (children only), both (default)
dbslice extract postgres://... --seed "orders.id=1" --direction up
Anonymization
# Auto-anonymize detected sensitive fields
dbslice extract postgres://... --seed "users.id=1" --anonymize
# Redact additional fields
dbslice extract postgres://... --seed "users.id=1" --anonymize --redact "audit_logs.ip_address"
Output Formats
# SQL (default)
dbslice extract postgres://... --seed "orders.id=1" --output sql
# JSON fixtures
dbslice extract postgres://... --seed "orders.id=1" --output json --out-file fixtures/
# CSV
dbslice extract postgres://... --seed "orders.id=1" --output csv --out-file data/
Virtual Foreign Keys
For relationships not defined in the database schema (Django GenericForeignKeys, implicit relationships):
# dbslice.yaml
database:
url: postgres://localhost:5432/myapp
virtual_foreign_keys:
- source_table: notifications
source_columns: [object_id]
target_table: orders
description: "Generic FK to orders via ContentType"
- source_table: audit_log
source_columns: [user_id]
target_table: users
description: "Implicit FK without DB constraint"
dbslice extract --config dbslice.yaml --seed "users.id=1"
Inspect Schema
dbslice inspect postgres://localhost/myapp
Configuration File
# Generate config from database
dbslice init postgres://localhost/myapp --out-file dbslice.yaml
# Use config
dbslice extract --config dbslice.yaml --seed "orders.id=12345"
How It Works
- Introspect -- Reads database schema to discover tables and foreign key relationships
- Traverse -- Starting from seed record(s), follows FK relationships via BFS
- Extract -- Fetches all identified records
- Sort -- Topologically sorts tables for correct INSERT order
- Output -- Generates SQL/JSON/CSV with proper escaping
Comparison
| Feature | dbslice | Jailer | Greenmask | slice-db |
|---|---|---|---|---|
| Language | Python | Java | Go | Ruby |
| Configuration | Zero-config | Requires model file | Config required | Manual YAML |
| Setup time | Seconds | Hours | Medium | Medium |
| Anonymization | Built-in (Faker) | Plugin-based | Advanced transformers | Not available |
| Subsetting | FK traversal | FK traversal | Limited | FK traversal |
| Output formats | SQL, JSON, CSV | SQL, XML, CSV | SQL | SQL only |
| Cycle handling | Automatic | Manual config | N/A | Manual |
| Streaming | Built-in | Configurable | Built-in | Not available |
| Maintenance | Active | Active | Active | Unmaintained |
dbslice is the lightweight, zero-config Python option: install and extract in under a minute.
Development
git clone https://github.com/nabroleonx/dbslice.git
cd dbslice
uv sync --dev
uv run pytest
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbslice-0.4.0.tar.gz.
File metadata
- Download URL: dbslice-0.4.0.tar.gz
- Upload date:
- Size: 149.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
268c67115c8b416b8dbf92fc38bff11a2755849ef89111927b8f21397bd47c24
|
|
| MD5 |
3574b48437b11c772bc639ed1cafe0c5
|
|
| BLAKE2b-256 |
5f7632d2b4b25290286807c65e88a80b981d84ae316eb79759140c4f617a509f
|
File details
Details for the file dbslice-0.4.0-py3-none-any.whl.
File metadata
- Download URL: dbslice-0.4.0-py3-none-any.whl
- Upload date:
- Size: 93.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb0026157f430b5a12c0581c59e7ad1608d522cb5a6802d5914361e05239cb22
|
|
| MD5 |
68b89ceb81e911c482587fdbba00666f
|
|
| BLAKE2b-256 |
735d5994c6a60313420e44deec03d6cca6c80742d5fdb50501f92bd04439b069
|