Skip to main content

Generate realistic, constraint-safe seed data for any database

Project description

SeedKit

Generate realistic, constraint-safe seed data for any database.

SeedKit connects to your PostgreSQL, MySQL, or SQLite database, reads the schema, and generates seed data that respects foreign keys, unique constraints, check constraints, and enum types -- all without copying production data.

Install

pip install seedkit

Or with pipx for isolated installation:

pipx install seedkit

Quick Start

# Generate 1000 rows per table as SQL
seedkit generate --db postgres://localhost/myapp --rows 1000 --output seed.sql

# Insert directly into database
seedkit generate --db postgres://localhost/myapp --rows 1000

# JSON or CSV output
seedkit generate --db postgres://localhost/myapp --rows 100 --output data.json

# Deterministic output with seed
seedkit generate --db postgres://localhost/myapp --rows 100 --seed 42 --output seed.sql

Database Connection

SeedKit automatically finds your database URL by checking (in order):

  1. --db CLI flag
  2. DATABASE_URL environment variable
  3. .env file in the current directory
  4. seedkit.toml config file

Supported URL formats:

# PostgreSQL
seedkit generate --db postgres://user:pass@localhost:5432/mydb

# MySQL
seedkit generate --db mysql://user:pass@localhost:3306/mydb

# SQLite
seedkit generate --db sqlite://path/to/db.sqlite

AI-Enhanced Classification

SeedKit can use an LLM to improve column classification beyond the built-in 50+ regex rules. This helps with ambiguous column names that the rule engine classifies as Unknown.

# Set one of these environment variables:
export ANTHROPIC_API_KEY=sk-ant-...    # Uses Claude Sonnet (default)
export OPENAI_API_KEY=sk-...           # Uses GPT-4o (default)

# Run with --ai flag
seedkit generate --db postgres://localhost/myapp --rows 1000 --ai --output seed.sql

# Override the model
seedkit generate --db postgres://localhost/myapp --rows 1000 --ai --model claude-opus-4-20250514

The AI classification is cached locally so subsequent runs with the same schema don't re-query the LLM. Results are also stored in the lock file for team reproducibility.

Smart Sampling

Extract statistical distributions from a production database to generate data that mirrors real patterns:

# Sample distributions (read-only, PII auto-masked)
seedkit sample --db postgres://readonly-replica:5432/myapp

# Generate using sampled distributions
seedkit generate --db postgres://localhost/myapp --rows 1000 --subset seedkit.distributions.json

All Commands

Command Description
seedkit generate Generate seed data (SQL, JSON, CSV, or direct insert)
seedkit sample Extract production distributions with PII masking
seedkit introspect Analyze schema and show classification results
seedkit preview Preview sample rows without full generation
seedkit check Detect schema drift against lock file (CI-friendly)
seedkit graph Visualize table dependencies (Mermaid or Graphviz)

Configuration

Create a seedkit.toml in your project root:

[database]
url = "postgres://localhost/myapp"

[generate]
rows = 500
seed = 42

[tables.users]
rows = 1000

[tables.orders]
rows = 5000

# Custom value lists with optional weights
[columns."products.color"]
values = ["red", "blue", "green", "black", "white"]
weights = [0.25, 0.20, 0.20, 0.20, 0.15]

Performance

Operation Throughput
Generation (10 cols, semantic providers) ~480K rows/sec
Generation (FK references only) ~3.7M rows/sec
Classification (100 tables x 20 cols) ~2.1M cols/sec
SQL output formatting ~1.5M rows/sec

Documentation

Full documentation, architecture details, and benchmarks: github.com/kclaka/seedkit

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seedkit-1.5.1-py3-none-win_amd64.whl (5.3 MB view details)

Uploaded Python 3Windows x86-64

seedkit-1.5.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.0 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

seedkit-1.5.1-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (5.4 MB view details)

Uploaded Python 3manylinux: glibc 2.5+ x86-64

seedkit-1.5.1-py3-none-macosx_11_0_arm64.whl (4.7 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

seedkit-1.5.1-py3-none-macosx_10_12_x86_64.whl (5.0 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file seedkit-1.5.1-py3-none-win_amd64.whl.

File metadata

  • Download URL: seedkit-1.5.1-py3-none-win_amd64.whl
  • Upload date:
  • Size: 5.3 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.4

File hashes

Hashes for seedkit-1.5.1-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 4b4b9f308292a6e1d44279a429557e4e631b168a281f06fe6de8f4f637c09013
MD5 560193aca580aa72a4da772737f6792f
BLAKE2b-256 7566574c8fa66229e9223bf5b5c30a26a378ca6dd33efddbdf092aa097c152e2

See more details on using hashes here.

File details

Details for the file seedkit-1.5.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for seedkit-1.5.1-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 2e26ec8b6dd57f8a6b1a7e7f9123e3c54efdb59e6b3df7ab58f1a6b835861684
MD5 730546b73c980096290f89bb3fd5a85f
BLAKE2b-256 1f4fafe6c0cdd383171994aa5c24148cb6bbf1d1adab12141017547600cfa572

See more details on using hashes here.

File details

Details for the file seedkit-1.5.1-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for seedkit-1.5.1-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 95a3720f437929e72477bc0fdefdcaccd948e0fc7f21bbc3a32e018ae734bd2c
MD5 704ba915f64123f4b835db5fd330f7e7
BLAKE2b-256 d325955af5f0555708f29e2c13caa4cbf12d32d73f9c9c8b5a5bfb54452ca7d7

See more details on using hashes here.

File details

Details for the file seedkit-1.5.1-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seedkit-1.5.1-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eb9ca1b9222a3c97d7b885b5aff5948666760f84cafa960ded5154ee78bfd40c
MD5 4ae965c1e0c448ea8c3009a22441fe57
BLAKE2b-256 e55d10ff59dcad2ebafaa0d39f68edb0e5ecd53fbac27c33afad2c5c25b0dbab

See more details on using hashes here.

File details

Details for the file seedkit-1.5.1-py3-none-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for seedkit-1.5.1-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6f024fbc81a0fb742bc81c7202e352095e1c20cf188af48cb06426a20770144f
MD5 0c8a03102e840c81ff7a6519a44832d9
BLAKE2b-256 fe1561c9519495f7c0d911c9f5bdf018be273647170d5a7a47ebabad5106d292

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page