AI-powered test data generator that reads your database schema and fills it with realistic, FK-valid data in seconds
Project description
SeedForge
One command to fill your database with realistic test data.
SeedForge connects to your database, reads the schema (tables, columns, foreign keys, constraints), and generates realistic, FK-valid data — no code, no config, no seed scripts.
pip install seedforge
seedforge connect postgresql://user:pass@localhost/mydb
seedforge generate --rows 1000
# Done. 40 tables filled in 3 seconds.
The Problem
Every developer has been there: you set up a new project, run migrations, open the app — and everything is empty. Dashboards show zeros, lists return nothing, features that depend on data can't be tested.
So you write a seed script. Manually. For every table. And then the schema changes and the script breaks. Or you copy production data into staging — and now you have GDPR problems.
SeedForge solves this. It reads your actual database schema, understands the relationships between tables, and generates realistic data that respects all constraints — automatically.
Features
- Zero-config — reads your DB schema automatically, no setup needed
- FK integrity — resolves foreign keys via topological sort, inserts in correct order
- Smart heuristics — 80+ column name patterns for realistic data (
email→ real email,price→ decimal,role→ admin/user/editor) - Multi-database — PostgreSQL, MySQL/MariaDB, SQLite
- Deterministic — use
--seedto get the same data every time, across machines - AI-powered — optional AI integration (Anthropic, OpenAI, Gemini, Groq, Ollama) for maximum realism
- Export — SQL or JSON file output
- Privacy-first — runs entirely locally, your data never leaves your machine
Why Not Just Ask AI to Generate Data?
You can absolutely paste your schema into ChatGPT and ask it to generate INSERT statements. For a quick one-off with 5 tables, that works fine.
But in practice:
Scale. AI generates data token by token. 1,000 rows across 40 tables? That's a 10-minute wait and $2-5 in API costs. SeedForge does it in 2 seconds, free, offline.
Repeatability. Every time you ask AI, you get different data. With seedforge generate --seed 42, every developer on your team gets identical data, every time. Deterministic. Committable. Reviewable.
Automation. You can't put a ChatGPT conversation into your CI/CD pipeline. But you can put seedforge generate --rows 5000 into a GitHub Action and have fresh test data on every PR.
Correctness. AI sometimes forgets a foreign key, generates a duplicate for a UNIQUE column, or invents an ENUM value that doesn't exist. SeedForge reads the actual constraints from your database — it physically can't violate them.
Cost at scale. A team of 5 developers, each resetting their local DB 3 times a day:
| AI API | SeedForge | |
|---|---|---|
| Per run | ~$0.50 | $0 |
| Per day (team) | $7.50 | $0 |
| Per month | $225 | $0 |
SeedForge uses AI as an optional enhancement for complex data (product descriptions, realistic bios) — not as the engine for every INSERT.
Installation
pip install seedforge
# With MySQL support
pip install seedforge[mysql]
# With AI support
pip install seedforge[ai]
# Everything
pip install seedforge[all]
Quick Start
1. Connect
seedforge connect postgresql://user:pass@localhost:5432/mydb
# MySQL
seedforge connect mysql://user:pass@localhost:3306/mydb
# SQLite
seedforge connect sqlite:///path/to/database.db
Saves the connection to .seedforge.yaml so you don't have to type it again.
2. Inspect
seedforge inspect
Shows all tables, columns, types, foreign keys, and insertion order:
Found 18 tables (insertion order):
1. users
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Column ┃ Type ┃ Nullable ┃ FK → ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ id │ serial │ NO │ │
│ email │ varchar │ NO │ │
│ name │ varchar │ YES │ │
└────────────┴───────────┴──────────┴───────┘
2. orders
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Column ┃ Type ┃ Nullable ┃ FK → ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ id │ serial │ NO │ │
│ user_id │ integer │ NO │ users.id │
│ total │ numeric │ NO │ │
└────────────┴───────────┴──────────┴────────────┘
3. Generate
# Generate and insert 100 rows per table
seedforge generate --rows 100
# Preview without inserting
seedforge generate --rows 10 --dry-run
# Export to SQL file
seedforge generate --rows 1000 --export sql
# Export to JSON
seedforge generate --rows 1000 --export json
# Deterministic (same data every time)
seedforge generate --rows 100 --seed 42
# Only specific tables (auto-includes FK parents)
seedforge generate --tables orders,payments --rows 50
# Clean tables before generating
seedforge generate --rows 100 --clean
4. AI Generate (optional)
For maximum realism, SeedForge can use AI to generate context-aware data. Bring your own API key from any supported provider:
# Auto-detects provider by key prefix
seedforge ai-generate --api-key sk-ant-... # Anthropic
seedforge ai-generate --api-key sk-... # OpenAI
seedforge ai-generate --api-key AIza... # Gemini
seedforge ai-generate --api-key gsk_... # Groq
# Or set environment variable
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=AIza...
export GROQ_API_KEY=gsk_...
export OLLAMA_MODEL=llama3.2 # Local, free
seedforge ai-generate --rows 20
# Explicit provider
seedforge ai-generate --provider ollama --rows 20
| Provider | Models | Speed | Cost |
|---|---|---|---|
| Anthropic | Claude Haiku, Sonnet, Opus | Fast | $$ |
| OpenAI | GPT-4o-mini, GPT-4o | Fast | $$ |
| Gemini | Gemini 2.0 Flash | Fast | $ |
| Groq | Llama 3.3 70B, Mixtral | Very fast | $ |
| Ollama | Any local model | Varies | Free |
How It Works
┌─────────────┐ ┌──────────────────┐ ┌───────────────┐
│ Your DB │────>│ Schema Reader │────>│ Dependency │
│ (PG/MySQL/ │ │ (introspection) │ │ Graph │
│ SQLite) │ │ │ │ (topo sort) │
└─────────────┘ └──────────────────┘ └──────┬────────┘
│
┌──────────────────┐ │
│ Data Generator │<─────────────┘
│ (heuristics + │
│ optional AI) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Batch Inserter │────> Your DB (filled!)
│ (FK-valid data) │
└──────────────────┘
- Schema introspection — connects to your database, reads
information_schema(orPRAGMAfor SQLite) to get tables, columns, types, FK relationships, constraints, ENUMs - Dependency graph — builds a directed graph from FK relationships, runs topological sort to determine insertion order (parents first, children after)
- Smart heuristics — maps column names to appropriate generators using 80+ patterns
- FK resolution — child rows automatically reference real IDs from already-generated parent rows
- Batch insert — fast bulk insertion with proper transaction handling
Column Name Heuristics
SeedForge automatically detects what kind of data to generate based on column names:
| Column name | Generated data |
|---|---|
email |
john.smith@example.com |
phone, mobile |
+1-555-0123 |
first_name |
John |
last_name |
Smith |
username |
jsmith42 |
address, street |
123 Main St, Apt 4 |
city |
San Francisco |
country |
United States |
price, amount, total |
49.99 |
url, website |
https://example.com |
avatar, image_url |
https://picsum.photos/seed/123/400/300 |
role |
admin, user, moderator |
status |
active, pending, completed |
plan |
free, pro, enterprise |
created_at, updated_at |
Recent datetime |
is_active, verified |
true (85% bias) |
is_deleted, archived |
false (90% bias) |
password |
SHA-256 hash |
token, api_key |
Random hex string |
uuid, guid |
Valid UUID v4 |
| ...and 60+ more patterns |
Context-aware: name in a users table generates person names, in organizations — company names, in products — product names.
Configuration
.seedforge.yaml (auto-created by seedforge connect):
db_url: postgresql://user:pass@localhost:5432/mydb
default_rows: 100
default_schema: public
seed: 42 # optional, for deterministic generation
exclude_tables:
- _prisma_migrations
- django_migrations
Supported Databases
- PostgreSQL (psycopg2)
- MySQL / MariaDB (PyMySQL)
- SQLite (built-in)
Supported AI Providers
- Anthropic (Claude)
- OpenAI (GPT-4o)
- Google Gemini
- Groq (Llama, Mixtral)
- Ollama (local, free)
Data Privacy
Your data never leaves your machine. SeedForge runs entirely locally — it connects directly to your database, generates data in memory, and inserts it. No cloud, no telemetry, no data collection.
When using AI mode, only schema metadata (table and column names) is sent to the AI provider — never your actual data.
License
MIT
Contributing
Issues and PRs welcome at github.com/silkhorizonstudios/seedforge.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seedforge-0.3.0.tar.gz.
File metadata
- Download URL: seedforge-0.3.0.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d8f802e37a72022d239f3b7d9c57e4f695565073528e37bbe3642449b277058
|
|
| MD5 |
9661d5fbe24e79709ffdf084d234b36d
|
|
| BLAKE2b-256 |
4f6103b159d1c47f96c251cc846c30120d09e69ad1b8f8b45388f432dd11c439
|
File details
Details for the file seedforge-0.3.0-py3-none-any.whl.
File metadata
- Download URL: seedforge-0.3.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68b0b36b46bf7259eaab1754cb19918cf63b504ec3b6abe2635fd01f0a204a91
|
|
| MD5 |
bb725cc511e8c4ea2a171859a93ed2af
|
|
| BLAKE2b-256 |
b88af8af8a5b06274751ef5962d8ec507e1bab9f70b136cde589019db819ce11
|