Generate realistic, constraint-safe seed data for any database
Project description
SeedKit
Generate realistic, constraint-safe seed data for any database.
SeedKit connects to your PostgreSQL, MySQL, or SQLite database, reads the schema, and generates seed data that respects foreign keys, unique constraints, check constraints, and enum types -- all without copying production data.
Install
pip install seedkit
Or with pipx for isolated installation:
pipx install seedkit
Quick Start
# Generate 1000 rows per table as SQL
seedkit generate --db postgres://localhost/myapp --rows 1000 --output seed.sql
# Insert directly into database
seedkit generate --db postgres://localhost/myapp --rows 1000
# JSON or CSV output
seedkit generate --db postgres://localhost/myapp --rows 100 --output data.json
# Deterministic output with seed
seedkit generate --db postgres://localhost/myapp --rows 100 --seed 42 --output seed.sql
Database Connection
SeedKit automatically finds your database URL by checking (in order):
--dbCLI flagDATABASE_URLenvironment variable.envfile in the current directoryseedkit.tomlconfig file
Supported URL formats:
# PostgreSQL
seedkit generate --db postgres://user:pass@localhost:5432/mydb
# MySQL
seedkit generate --db mysql://user:pass@localhost:3306/mydb
# SQLite
seedkit generate --db sqlite://path/to/db.sqlite
AI-Enhanced Classification
SeedKit can use an LLM to improve column classification beyond the built-in 50+ regex rules. This helps with ambiguous column names that the rule engine classifies as Unknown.
# Set one of these environment variables:
export ANTHROPIC_API_KEY=sk-ant-... # Uses Claude Sonnet (default)
export OPENAI_API_KEY=sk-... # Uses GPT-4o (default)
# Run with --ai flag
seedkit generate --db postgres://localhost/myapp --rows 1000 --ai --output seed.sql
# Override the model
seedkit generate --db postgres://localhost/myapp --rows 1000 --ai --model claude-opus-4-20250514
The AI classification is cached locally so subsequent runs with the same schema don't re-query the LLM. Results are also stored in the lock file for team reproducibility.
Smart Sampling
Extract statistical distributions from a production database to generate data that mirrors real patterns:
# Sample distributions (read-only, PII auto-masked)
seedkit sample --db postgres://readonly-replica:5432/myapp
# Generate using sampled distributions
seedkit generate --db postgres://localhost/myapp --rows 1000 --subset seedkit.distributions.json
All Commands
| Command | Description |
|---|---|
seedkit generate |
Generate seed data (SQL, JSON, CSV, or direct insert) |
seedkit sample |
Extract production distributions with PII masking |
seedkit introspect |
Analyze schema and show classification results |
seedkit preview |
Preview sample rows without full generation |
seedkit check |
Detect schema drift against lock file (CI-friendly) |
seedkit graph |
Visualize table dependencies (Mermaid or Graphviz) |
Configuration
Create a seedkit.toml in your project root:
[database]
url = "postgres://localhost/myapp"
[generate]
rows = 500
seed = 42
[tables.users]
rows = 1000
[tables.orders]
rows = 5000
# Custom value lists with optional weights
[columns."products.color"]
values = ["red", "blue", "green", "black", "white"]
weights = [0.25, 0.20, 0.20, 0.20, 0.15]
Performance
| Operation | Throughput |
|---|---|
| Generation (10 cols, semantic providers) | ~480K rows/sec |
| Generation (FK references only) | ~3.7M rows/sec |
| Classification (100 tables x 20 cols) | ~2.1M cols/sec |
| SQL output formatting | ~1.5M rows/sec |
Documentation
Full documentation, architecture details, and benchmarks: github.com/kclaka/seedkit
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seedkit-1.5.1-py3-none-win_amd64.whl.
File metadata
- Download URL: seedkit-1.5.1-py3-none-win_amd64.whl
- Upload date:
- Size: 5.3 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b4b9f308292a6e1d44279a429557e4e631b168a281f06fe6de8f4f637c09013
|
|
| MD5 |
560193aca580aa72a4da772737f6792f
|
|
| BLAKE2b-256 |
7566574c8fa66229e9223bf5b5c30a26a378ca6dd33efddbdf092aa097c152e2
|
File details
Details for the file seedkit-1.5.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: seedkit-1.5.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 5.0 MB
- Tags: Python 3, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e26ec8b6dd57f8a6b1a7e7f9123e3c54efdb59e6b3df7ab58f1a6b835861684
|
|
| MD5 |
730546b73c980096290f89bb3fd5a85f
|
|
| BLAKE2b-256 |
1f4fafe6c0cdd383171994aa5c24148cb6bbf1d1adab12141017547600cfa572
|
File details
Details for the file seedkit-1.5.1-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.
File metadata
- Download URL: seedkit-1.5.1-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl
- Upload date:
- Size: 5.4 MB
- Tags: Python 3, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95a3720f437929e72477bc0fdefdcaccd948e0fc7f21bbc3a32e018ae734bd2c
|
|
| MD5 |
704ba915f64123f4b835db5fd330f7e7
|
|
| BLAKE2b-256 |
d325955af5f0555708f29e2c13caa4cbf12d32d73f9c9c8b5a5bfb54452ca7d7
|
File details
Details for the file seedkit-1.5.1-py3-none-macosx_11_0_arm64.whl.
File metadata
- Download URL: seedkit-1.5.1-py3-none-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.7 MB
- Tags: Python 3, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb9ca1b9222a3c97d7b885b5aff5948666760f84cafa960ded5154ee78bfd40c
|
|
| MD5 |
4ae965c1e0c448ea8c3009a22441fe57
|
|
| BLAKE2b-256 |
e55d10ff59dcad2ebafaa0d39f68edb0e5ecd53fbac27c33afad2c5c25b0dbab
|
File details
Details for the file seedkit-1.5.1-py3-none-macosx_10_12_x86_64.whl.
File metadata
- Download URL: seedkit-1.5.1-py3-none-macosx_10_12_x86_64.whl
- Upload date:
- Size: 5.0 MB
- Tags: Python 3, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f024fbc81a0fb742bc81c7202e352095e1c20cf188af48cb06426a20770144f
|
|
| MD5 |
0c8a03102e840c81ff7a6519a44832d9
|
|
| BLAKE2b-256 |
fe1561c9519495f7c0d911c9f5bdf018be273647170d5a7a47ebabad5106d292
|