Skip to main content

One command to fill your database with realistic test data. Reads schema, resolves FK, generates smart data — no code needed.

Project description

SeedForge

PyPI version Python License: MIT

One command to fill your database with realistic test data.

SeedForge connects to your database, reads the schema (tables, columns, foreign keys, constraints), and generates realistic, FK-valid data — no code, no config, no seed scripts.

pip install seedforge
seedforge connect postgresql://user:pass@localhost/mydb
seedforge generate --rows 1000
# Done. 40 tables filled in 3 seconds.

The Problem

Every developer has been there: you set up a new project, run migrations, open the app — and everything is empty. Dashboards show zeros, lists return nothing, features that depend on data can't be tested.

So you write a seed script. Manually. For every table. And then the schema changes and the script breaks. Or you copy production data into staging — and now you have GDPR problems.

SeedForge solves this. It reads your actual database schema, understands the relationships between tables, and generates realistic data that respects all constraints — automatically.

Features

  • Zero-config — reads your DB schema automatically, no setup needed
  • FK integrity — resolves foreign keys via topological sort, inserts in correct order
  • Smart heuristics — 80+ column name patterns for realistic data (email → real email, price → decimal, role → admin/user/editor)
  • Multi-database — PostgreSQL, MySQL/MariaDB, SQLite
  • Deterministic — use --seed to get the same data every time, across machines
  • AI (optional) — plug in Anthropic, OpenAI, Gemini, Groq, or Ollama for extra realism
  • Export — SQL or JSON file output
  • Privacy-first — runs entirely locally, your data never leaves your machine

Why Not Just Ask AI to Generate Data?

You can absolutely paste your schema into ChatGPT and ask it to generate INSERT statements. For a quick one-off with 5 tables, that works fine.

But in practice:

Scale. AI generates data token by token. 1,000 rows across 40 tables? That's a 10-minute wait and $2-5 in API costs. SeedForge does it in 2 seconds, free, offline.

Repeatability. Every time you ask AI, you get different data. With seedforge generate --seed 42, every developer on your team gets identical data, every time. Deterministic. Committable. Reviewable.

Automation. You can't put a ChatGPT conversation into your CI/CD pipeline. But you can put seedforge generate --rows 5000 into a GitHub Action and have fresh test data on every PR.

Correctness. AI sometimes forgets a foreign key, generates a duplicate for a UNIQUE column, or invents an ENUM value that doesn't exist. SeedForge reads the actual constraints from your database — it physically can't violate them.

Installation

pip install seedforge

# With MySQL support
pip install seedforge[mysql]

# With AI support
pip install seedforge[ai]

# Everything
pip install seedforge[all]

Quick Start

1. Connect

seedforge connect postgresql://user:pass@localhost:5432/mydb

# MySQL
seedforge connect mysql://user:pass@localhost:3306/mydb

# SQLite
seedforge connect sqlite:///path/to/database.db

Saves the connection to .seedforge.yaml so you don't have to type it again.

2. Inspect

seedforge inspect

Shows all tables, columns, types, foreign keys, and insertion order:

Found 18 tables (insertion order):

         1. users
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Column     ┃ Type      ┃ Nullable ┃ FK →  ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ id         │ serial    │ NO       │       │
│ email      │ varchar   │ NO       │       │
│ name       │ varchar   │ YES      │       │
└────────────┴───────────┴──────────┴───────┘

             2. orders
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Column     ┃ Type      ┃ Nullable ┃ FK →       ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━┩
│ id         │ serial    │ NO       │            │
│ user_id    │ integer   │ NO       │ users.id   │
│ total      │ numeric   │ NO       │            │
└────────────┴───────────┴──────────┴────────────┘

3. Generate

# Generate and insert 100 rows per table
seedforge generate --rows 100

# Preview without inserting
seedforge generate --rows 10 --dry-run

# Export to SQL file
seedforge generate --rows 1000 --export sql

# Export to JSON
seedforge generate --rows 1000 --export json

# Deterministic (same data every time)
seedforge generate --rows 100 --seed 42

# Only specific tables (auto-includes FK parents)
seedforge generate --tables orders,payments --rows 50

# Clean tables before generating
seedforge generate --rows 100 --clean

4. AI Generate (optional)

For maximum realism, SeedForge can use AI to generate context-aware data. Bring your own API key from any supported provider:

# Auto-detects provider by key prefix
seedforge ai-generate --api-key sk-ant-...   # Anthropic
seedforge ai-generate --api-key sk-...       # OpenAI
seedforge ai-generate --api-key AIza...      # Gemini
seedforge ai-generate --api-key gsk_...      # Groq

# Or set environment variable
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export GEMINI_API_KEY=AIza...
export GROQ_API_KEY=gsk_...

seedforge ai-generate --rows 20
Provider Speed Cost
Anthropic Fast $$
OpenAI Fast $$
Google Gemini Fast $
Groq Very fast $

How It Works

┌─────────────┐     ┌──────────────────┐     ┌───────────────┐
│  Your DB    │────>│  Schema Reader   │────>│  Dependency   │
│  (PG/MySQL/ │     │  (introspection) │     │  Graph        │
│   SQLite)   │     │                  │     │  (topo sort)  │
└─────────────┘     └──────────────────┘     └──────┬────────┘
                                                     │
                    ┌──────────────────┐              │
                    │  Data Generator  │<─────────────┘
                    │  (heuristics +   │
                    │   optional AI)   │
                    └────────┬─────────┘
                             │
                    ┌────────▼─────────┐
                    │  Batch Inserter  │────> Your DB (filled!)
                    │  (FK-valid data) │
                    └──────────────────┘
  1. Schema introspection — connects to your database, reads information_schema (or PRAGMA for SQLite) to get tables, columns, types, FK relationships, constraints, ENUMs
  2. Dependency graph — builds a directed graph from FK relationships, runs topological sort to determine insertion order (parents first, children after)
  3. Smart heuristics — maps column names to appropriate generators using 80+ patterns
  4. FK resolution — child rows automatically reference real IDs from already-generated parent rows
  5. Batch insert — fast bulk insertion with proper transaction handling

Column Name Heuristics

SeedForge automatically detects what kind of data to generate based on column names:

Column name Generated data
email john.smith@example.com
phone, mobile +1-555-0123
first_name John
last_name Smith
username jsmith42
address, street 123 Main St, Apt 4
city San Francisco
country United States
price, amount, total 49.99
url, website https://example.com
avatar, image_url https://picsum.photos/seed/123/400/300
role admin, user, moderator
status active, pending, completed
plan free, pro, enterprise
created_at, updated_at Recent datetime
is_active, verified true (85% bias)
is_deleted, archived false (90% bias)
password SHA-256 hash
token, api_key Random hex string
uuid, guid Valid UUID v4
...and 60+ more patterns

Context-aware: name in a users table generates person names, in organizations — company names, in products — product names.

Configuration

.seedforge.yaml (auto-created by seedforge connect):

db_url: postgresql://user:pass@localhost:5432/mydb
default_rows: 100
default_schema: public
seed: 42  # optional, for deterministic generation
exclude_tables:
  - _prisma_migrations
  - django_migrations

Supported Databases

  • PostgreSQL (psycopg2)
  • MySQL / MariaDB (PyMySQL)
  • SQLite (built-in)

Supported AI Providers

  • Anthropic
  • OpenAI
  • Google Gemini
  • Groq

Data Privacy

Your data never leaves your machine. SeedForge runs entirely locally — it connects directly to your database, generates data in memory, and inserts it. No cloud, no telemetry, no data collection.

When using AI mode, only schema metadata (table and column names) is sent to the AI provider — never your actual data.

License

MIT

Contributing

Issues and PRs welcome at github.com/silkhorizonstudios/seedforge.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seedforge-0.3.3.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seedforge-0.3.3-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file seedforge-0.3.3.tar.gz.

File metadata

  • Download URL: seedforge-0.3.3.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for seedforge-0.3.3.tar.gz
Algorithm Hash digest
SHA256 24a4700c273fedff69eebf0cbdee5fe4235590a2ac9f39c0603a96155c9bdea1
MD5 c7312469f284bcf75f89522fed05b818
BLAKE2b-256 1105e45eff1d42991ee2562f3126dc6b2a06ede9791a8299eeaac1c7b75d6cdd

See more details on using hashes here.

File details

Details for the file seedforge-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: seedforge-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for seedforge-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cfc608c5ff9fb8e7a80c3433ab5a7b0383ae262aba0458d14ee10f2e33e45c3a
MD5 37c8c04a3819219ba6fdac027b00a75a
BLAKE2b-256 7cfc34ce45340cec5c6734b42e2495e7250d79029f1219ba6b31bd3eab2c9977

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page