Skip to main content

Declarative SQLite test data generation toolkit

Project description

๐ŸŒฑ sqlseed

English | ไธญๆ–‡

Declarative SQLite Test Data Generation Toolkit

One line of code, tens of thousands of rows. Zero-config smart generation, AI-powered precision tuning.

CI Python 3.10+ License: AGPL v3 Code style: ruff Type: mypy


import sqlseed

# Just one line. Auto-infers schema, auto-selects strategy, auto-optimizes writes.
result = sqlseed.fill("test.db", table="users", count=100_000)
print(result)
# โ†’ GenerationResult(table=users, count=100000, elapsed=2.34s, speed=42735 rows/s)

๐Ÿ’ก Why sqlseed?

In development and testing workflows, we often need to populate SQLite databases with large volumes of realistic test data. Traditional approaches either require writing verbose data generation scripts or maintaining hard-to-scale SQL fixtures. sqlseed solves this with a declarative approach:

Feature sqlseed Hand-written Scripts SQL Fixtures
Zero-config smart generation โœ… โŒ โŒ
Automatic FK maintenance โœ… Manual Manual
100K+ rows โœ… Streaming โš ๏ธ OOM โŒ
Column semantic inference โœ… 9-level strategy โŒ โŒ
Reproducible generation โœ… seed โš ๏ธ Manual โœ…
AI-powered tuning โœ… LLM โŒ โŒ
Config reuse โœ… YAML โŒ โŒ

โœจ Core Features

๐Ÿš€ Zero-Config Smart Generation

Auto-infers database schema and selects the best generator for each column via a 9-level strategy chain. Column named email? Generates email addresses. Column named *_at? Generates timestamps. No configuration needed.

๐ŸŽฏ Declarative Fine-Grained Control

Precisely control each column's data generation strategy, constraints, and null ratio via Python API or YAML/JSON configuration.

๐Ÿ”— Automatic FK Ordering

Topological sort auto-detects table dependencies. SharedPool cross-table value sharing maintains referential integrity with zero configuration.

๐ŸŒŠ Streaming Memory Safety

DataStream yields batches via Iterator[list[dict]]. 1 million rows use the same memory as 1,000 rows.

๐Ÿงฎ Expression Engine & Constraint Solving

Supports derived column computation (last_eight = card_number[-8:]), unique constraint backtracking, and timeout protection against infinite loops.

๐Ÿค– AI First-Class Citizen

sqlseed-ai plugin uses LLM to analyze schema semantics, auto-generates YAML config suggestions with self-correction loop.

๐Ÿงฉ 11 Lifecycle Hooks

pluggy-based plugin architecture covering every stage from provider registration to batch insertion.

๐Ÿ“Š 3-Tier PRAGMA Optimization

Intelligently switches between LIGHT / MODERATE / AGGRESSIVE write strategies based on data volume for maximum throughput.


๐Ÿ“ฆ Installation

Basic

pip install sqlseed

Choose Data Engine

# Recommended: Mimesis (high performance, great locale support)
pip install sqlseed[mimesis]

# Optional: Faker (rich ecosystem)
pip install sqlseed[faker]

# Install all
pip install sqlseed[all]

Optional Plugins

# AI analysis plugin (requires openai SDK)
pip install sqlseed-ai

# MCP server (requires mcp SDK, lets AI assistants operate sqlseed)
pip install mcp-server-sqlseed

# MCP server + AI support (all-in-one)
pip install mcp-server-sqlseed[ai]

Docs Build (Developers)

pip install sqlseed[docs]   # mkdocs-material + mkdocstrings
๐Ÿ“‹ Full Dev Environment Setup
git clone https://github.com/sunbos/sqlseed.git
cd sqlseed

# Install core + all providers + dev dependencies
pip install -e ".[dev,all]"

# Optional plugins
pip install -e "./plugins/sqlseed-ai"
pip install -e "./plugins/mcp-server-sqlseed"

# Verify installation
pytest
ruff check src/ tests/
mypy src/sqlseed/

๐Ÿš€ Quick Start

Get Started in 30 Seconds

Suppose you have a SQLite database app.db with a users table:

CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    email TEXT,
    age INTEGER,
    phone TEXT,
    created_at TEXT,
    is_active INTEGER DEFAULT 1,
    balance REAL
);

One line of code fills 10,000 rows of high-quality test data:

import sqlseed

result = sqlseed.fill("app.db", table="users", count=10_000)
print(result)
# โ†’ GenerationResult(table=users, count=10000, elapsed=0.52s, speed=19230 rows/s)

sqlseed automatically:

  • โœ… Skips id (autoincrement PK)
  • โœ… Skips is_active (has default value)
  • โœ… name โ†’ generates real names
  • โœ… email โ†’ generates email addresses
  • โœ… age โ†’ generates integers 18โ€“100
  • โœ… phone โ†’ generates phone numbers
  • โœ… created_at โ†’ generates datetime (matches *_at pattern)
  • โœ… balance โ†’ generates floats

Fully zero-config. Smart inference for everything.


๐Ÿ“– Tutorials

Tutorial 1: Python API โ€” Fine-Grained Control

For precise control over each column, declare generation strategies via the columns parameter:

import sqlseed

result = sqlseed.fill(
    "app.db",
    table="users",
    count=50_000,
    columns={
        # Shorthand: specify generator name directly
        "email": "email",
        "phone": "phone",

        # Full config: specify parameters
        "age": {"type": "integer", "min_value": 18, "max_value": 65},
        "balance": {"type": "float", "min_value": 0.0, "max_value": 100000.0, "precision": 2},
        "name": "name",

        # Random selection from candidate list
        "status": {"type": "choice", "choices": ["active", "inactive", "banned"]},
    },
    provider="mimesis",      # Use Mimesis engine
    locale="en_US",          # English locale
    seed=42,                 # Fixed seed for reproducibility
    clear_before=True,       # Clear table before generation
    enrich=True,             # Infer distribution from existing data
    transform="./transform_users.py",  # Custom transform per row
)
print(result)

Supported Generator Types

Generator Description Example Parameters
string Random string min_length, max_length, charset
integer Integer min_value, max_value
float Float min_value, max_value, precision
boolean Boolean โ€”
name Full name โ€”
first_name First name โ€”
last_name Last name โ€”
email Email address โ€”
phone Phone number โ€”
address Address โ€”
company Company name โ€”
url URL โ€”
ipv4 IPv4 address โ€”
uuid UUID โ€”
date Date start_year, end_year
datetime Datetime start_year, end_year
timestamp Unix timestamp โ€”
text Long text min_length, max_length
sentence Sentence โ€”
password Password length
choice Pick from list choices
json JSON string schema
pattern Regex match regex
bytes Binary data length
username Username โ€”
city City โ€”
country Country โ€”
state State/Province โ€”
zip_code Zip/Postal code โ€”
job_title Job title โ€”
country_code Country code โ€”
foreign_key FK reference ref_table, ref_column, strategy
skip Skip (use default/NULL) โ€”

Tutorial 2: Multi-Table Associations โ€” Automatic FK Integrity

Use the context manager pattern to handle cross-table data dependencies:

import sqlseed

with sqlseed.connect("app.db", provider="mimesis", locale="en_US") as db:
    # Step 1: Fill parent table first
    db.fill("users", count=10_000, seed=42)

    # Step 2: Fill child table โ€” sqlseed auto-detects FK constraints
    #         and picks random values from users.id for orders.user_id
    db.fill("orders", count=50_000, columns={
        "amount": {"type": "float", "min_value": 9.99, "max_value": 999.99, "precision": 2},
        "quantity": {"type": "integer", "min_value": 1, "max_value": 20},
        "status": {"type": "choice", "choices": ["pending", "paid", "shipped", "delivered"]},
    })

    # Step 3: View generation report
    print(db.report())
    # โ†’ Database: app.db
    # โ†’ ==================================================
    # โ†’   users: 10000 rows
    # โ†’   orders: 50000 rows

๐Ÿ’ก Tip: If two tables share a column name (e.g., account_id), even without a declared FK constraint, sqlseed automatically maintains cross-table consistency via the SharedPool implicit association mechanism.

Explicit Cross-Table Associations (ColumnAssociation)

When the target column name differs from the source (e.g., department_id โ†’ id), or there's no FK constraint but you need an association, declare it explicitly via associations:

db_path: "app.db"
provider: mimesis

tables:
  - name: departments
    count: 5
    clear_before: true
  - name: employees
    count: 20
    clear_before: true

associations:
  - column_name: department_id     # Column name in the target table
    source_table: departments      # Source table providing values
    source_column: id              # Column name in source table (defaults to column_name)
    target_tables:                 # Target tables using this association
      - employees
    strategy: shared_pool          # Association strategy

This way, even without FOREIGN KEY (department_id) REFERENCES departments(id), department_id values will come from departments.id.


Tutorial 3: YAML Config-Driven Batch Generation

For complex multi-table scenarios, use YAML configuration:

1. Generate config template

sqlseed init generate.yaml --db app.db

2. Edit config file

# generate.yaml
db_path: "app.db"
provider: mimesis
locale: en_US
optimize_pragma: true

tables:
  - name: users
    count: 100000
    clear_before: true
    seed: 42
    columns:
      - name: username
        generator: name
      - name: email
        generator: email
      - name: phone
        generator: phone
      - name: age
        generator: integer
        params:
          min_value: 18
          max_value: 65
      - name: status
        generator: choice
        params:
          choices: [0, 1, 2]
        null_ratio: 0.05       # 5% chance of NULL

  - name: orders
    count: 500000
    batch_size: 10000          # 10K rows per batch, optimizes memory
    columns:
      - name: user_id
        generator: foreign_key
        params:
          ref_table: users
          ref_column: id
          strategy: random
      - name: amount
        generator: float
        params:
          min_value: 1.0
          max_value: 9999.99
          precision: 2
      - name: created_at
        generator: datetime
        params:
          start_year: 2024

3. Execute generation

sqlseed fill --config generate.yaml

Or in Python:

results = sqlseed.fill_from_config("generate.yaml")
for r in results:
    print(r)

Tutorial 4: Derived Columns & Expression Engine

sqlseed v2.0 introduces column dependency DAG and expression engine for computing derived columns:

# Bank card info table scenario
tables:
  - name: bank_cards
    count: 10000
    columns:
      - name: card_number
        generator: pattern
        params:
          regex: "62[0-9]{17}"     # 19-digit UnionPay card number
        constraints:
          unique: true

      - name: last_eight
        derive_from: card_number       # Depends on card_number
        expression: "value[-8:]"   # Last 8 digits
        constraints:
          unique: true

      - name: last_six
        derive_from: card_number
        expression: "value[-6:]"   # Last 6 digits

      - name: account_id
        generator: pattern
        params:
          regex: "U[0-9]{10}"
        constraints:
          unique: true

How it works:

  1. sqlseed builds a column dependency DAG: card_number โ†’ last_eight, last_six
  2. Topological sort determines generation order
  3. Generates card_number first, then computes last_eight via value[-8:]
  4. If last_eight unique constraint fails, backtracks to regenerate card_number

Expression Engine Functions (21 total)

Function Usage Description
len(s) len(value) Length
int(s) int(value) To integer
str(s) str(value) To string
float(s) float(value) To float
hex(n) hex(value) To hexadecimal
oct(n) oct(value) To octal
bin(n) bin(value) To binary
abs(n) abs(value) Absolute value
min(*args) min(a, b) Minimum
max(*args) max(a, b) Maximum
upper(s) upper(value) Uppercase
lower(s) lower(value) Lowercase
strip(s) strip(value) Trim both ends
lstrip(s) lstrip(value) Trim left
rstrip(s) rstrip(value) Trim right
zfill(s, width) zfill(value, 10) Zero-fill
replace(s, old, new) replace(value, "-", "") Replace
substr(s, start, end) substr(value, 0, 8) Substring
lpad(s, width, char) lpad(value, 8, "0") Left-pad
rpad(s, width, char) rpad(value, 8, "0") Right-pad
concat(*args) concat("PRE_", value) Concatenate
Slicing value[-8:] Python slice syntax
Math value * 2 + 1 Basic arithmetic

โš ๏ธ Safety: The expression engine is based on simpleeval with 5-second timeout protection. import, exec, and file I/O are not allowed.


Tutorial 5: Transform Scripts โ€” Complex Business Logic

For complex business logic that can't be expressed declaratively, write Python transform scripts:

1. Write transform script

# transform_users.py
def transform_row(row, ctx):
    """Called for every generated row."""

    # Calculate VIP level based on age
    age = row.get("age", 0)
    if age >= 60:
        row["vip_level"] = 3
    elif age >= 40:
        row["vip_level"] = 2
    else:
        row["vip_level"] = 1

    # Normalize phone format
    phone = row.get("phone", "")
    if phone and not phone.startswith("+1"):
        row["phone"] = f"+1{phone}"

    return row

2. Use in CLI

sqlseed fill app.db --table users --count 10000 --transform transform_users.py

3. Use in YAML

tables:
  - name: users
    count: 10000
    transform: "./transform_users.py"

Tutorial 6: Preview & Debug

Preview data before generating at scale:

Python API:

rows = sqlseed.preview("app.db", table="users", count=5, seed=42)
# Also supports enrich and transform parameters
rows = sqlseed.preview("app.db", table="users", count=5, seed=42, enrich=True)
for row in rows:
    print(row)
# โ†’ {'name': 'John Smith', 'email': 'jsmith@example.com', 'age': 32, ...}
# โ†’ {'name': 'Jane Doe', 'email': 'jdoe@test.org', 'age': 28, ...}
# โ†’ ...

CLI (Rich table output):

sqlseed preview app.db --table users --count 5

# โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
# โ”ƒ name       โ”ƒ email                โ”ƒ age โ”ƒ created_at          โ”ƒ
# โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
# โ”‚ John Smith โ”‚ jsmith@example.com   โ”‚ 32  โ”‚ 2024-03-15 08:23:11 โ”‚
# โ”‚ ...        โ”‚ ...                  โ”‚ ... โ”‚ ...                 โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

View column mapping strategy:

sqlseed inspect app.db --table users --show-mapping

# See what generation strategy sqlseed chose for each column
# โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
# โ”ƒ Column     โ”ƒ Type    โ”ƒ Nullable โ”ƒ Generator    โ”ƒ Params       โ”ƒ
# โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
# โ”‚ id         โ”‚ INTEGER โ”‚ โœ—        โ”‚ skip         โ”‚ {}           โ”‚
# โ”‚ name       โ”‚ TEXT    โ”‚ โœ—        โ”‚ name         โ”‚ {}           โ”‚
# โ”‚ email      โ”‚ TEXT    โ”‚ โœ“        โ”‚ email        โ”‚ {}           โ”‚
# โ”‚ age        โ”‚ INTEGER โ”‚ โœ“        โ”‚ integer      โ”‚ {min: 18...} โ”‚
# โ”‚ ...        โ”‚ ...     โ”‚ ...      โ”‚ ...          โ”‚ ...          โ”‚
# โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Tutorial 7: Snapshots & Replay

Save a successful generation config for exact replay later:

# Generate and save snapshot
sqlseed fill app.db --table users --count 10000 --seed 42 --snapshot
# โ†’ Snapshot saved: snapshots/2026-04-15_033000_users.yaml

# Replay anytime
sqlseed replay snapshots/2026-04-15_033000_users.yaml
# โ†’ GenerationResult(table=users, count=10000, elapsed=0.52s, speed=19230 rows/s)

Use cases:

  • ๐Ÿงช Reproducible test data in CI/CD
  • ๐Ÿ“‹ Consistent test environments across teams
  • ๐Ÿ”„ Quick database state reconstruction during development

Tutorial 8: AI-Powered Configuration (sqlseed-ai Plugin)

Let LLM analyze your database schema and auto-generate optimal config suggestions:

# Install AI plugin
pip install sqlseed-ai

# Set API key
export SQLSEED_AI_API_KEY="your-api-key"
export SQLSEED_AI_BASE_URL="https://your-llm-api-endpoint"

# AI analysis and config generation
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml

# AI suggestions with self-correction (3 rounds by default)
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml --verify

# Specify model (defaults to most popular free model)
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml --model nvidia/nemotron-3-super-120b-a12b:free

# Skip cache
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml --no-cache

AI Workflow:

1. Extract schema context (columns, indexes, sample data, FK, distribution)
2. Build LLM prompt with few-shot examples
3. LLM returns JSON column config suggestions
4. AiConfigRefiner auto-validates config correctness
5. If errors found (unknown generator, type mismatch, etc.), sends correction request to LLM
6. Up to 3 self-correction rounds, outputs validated YAML config

๐Ÿ’ก Environment Variables: Supports SQLSEED_AI_API_KEY, SQLSEED_AI_BASE_URL, SQLSEED_AI_MODEL. Also supports OPENAI_API_KEY / OPENAI_BASE_URL as fallback. Defaults to auto-selecting the most popular free model from OpenRouter (base_url https://openrouter.ai/api/v1). Set --model or SQLSEED_AI_MODEL to specify a model.


Tutorial 9: MCP Server Integration

Let AI assistants (Claude, Cursor, etc.) operate sqlseed directly via Model Context Protocol:

# Install MCP server
pip install mcp-server-sqlseed

# All-in-one: MCP server + AI support
pip install mcp-server-sqlseed[ai]

# Manual start (usually managed by MCP client)
python -m mcp_server_sqlseed

Configure MCP client (Claude Desktop example):

{
  "mcpServers": {
    "sqlseed": {
      "command": "mcp-server-sqlseed"
    }
  }
}

MCP Capabilities:

Type Name Description
๐Ÿ“– Resource sqlseed://schema/{db_path}/{table_name} Get table schema as JSON
๐Ÿ” Tool sqlseed_inspect_schema Inspect schema (columns, FK, indexes, samples, schema_hash)
๐Ÿค– Tool sqlseed_generate_yaml AI-driven YAML config generation with self-correction. Supports api_key/base_url/model overrides
โšก Tool sqlseed_execute_fill Execute data generation (supports YAML config string, includes enrich option)

This means you can tell your AI assistant:

"Analyze the structure of the bank_cards table in app.db, generate a YAML config, then fill 5000 rows."

The AI assistant will call sqlseed_inspect_schema โ†’ sqlseed_generate_yaml โ†’ sqlseed_execute_fill in sequence, without you writing any code.


Tutorial 10: Custom Provider Plugin

You can create your own data generation provider:

# my_provider.py
from __future__ import annotations
from typing import Any

from sqlseed.generators import UnknownGeneratorError

class MyCustomProvider:
    """Just implement the DataProvider Protocol. No base class required."""

    def __init__(self) -> None:
        self._locale: str = "en_US"

    @property
    def name(self) -> str:
        return "my_custom"

    def set_locale(self, locale: str) -> None:
        self._locale = locale

    def set_seed(self, seed: int) -> None:
        ...

    def generate(self, type_name: str, **params: Any) -> Any:
        if type_name == "string":
            return "custom_string"
        if type_name == "email":
            return "user@example.com"
        raise UnknownGeneratorError(type_name)

    # ... handle generator names you want to support
    # Full Protocol: src/sqlseed/generators/_protocol.py

To reuse the built-in generator name dispatch logic instead of hand-writing generate() routing, inherit BaseProvider and override selectively.

Registration method 1: via pyproject.toml entry-point (recommended)

[project.entry-points."sqlseed"]
my_custom = "my_provider:MyCustomProvider"

Registration method 2: via plugin hook

from sqlseed.plugins.hookspecs import hookimpl

class MyPlugin:
    @hookimpl
    def sqlseed_register_providers(self, registry):
        from my_provider import MyCustomProvider
        registry.register(MyCustomProvider())

๐Ÿ–ฅ๏ธ CLI Quick Reference

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# ๐Ÿ“‹ Data Generation
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

# Fill data (--count required when not using --config)
sqlseed fill app.db --table users --count 10000

# Full parameters
sqlseed fill app.db -t users -n 100000 \
    --provider mimesis \
    --locale en_US \
    --seed 42 \
    --batch-size 10000 \
    --clear \
    --enrich \
    --snapshot

# YAML config-driven (count from config file)
sqlseed fill --config generate.yaml

# Transform script
sqlseed fill app.db -t users -n 10000 --transform transform.py

# Enable debug logging
SQLSEED_LOG_LEVEL=DEBUG sqlseed fill app.db -t users -n 10

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# ๐Ÿ” Inspect & Preview
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

# Preview data (no write)
sqlseed preview app.db --table users --count 5

# List all tables
sqlseed inspect app.db

# View column mapping strategy
sqlseed inspect app.db --table users --show-mapping

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# ๐Ÿ“ธ Snapshots & Replay
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

# Generate config template
sqlseed init generate.yaml --db app.db

# Replay snapshot
sqlseed replay snapshots/2026-04-15_users.yaml

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# ๐Ÿค– AI Features
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

# AI suggestions (requires sqlseed-ai)
sqlseed ai-suggest app.db -t users -o users.yaml
sqlseed ai-suggest app.db -t users -o users.yaml --verify

# Specify API config
sqlseed ai-suggest app.db -t users -o users.yaml --api-key sk-xxx --base-url https://api.openai.com/v1

# Control self-correction
sqlseed ai-suggest app.db -t users -o users.yaml --max-retries 0   # Disable
sqlseed ai-suggest app.db -t users -o users.yaml --no-verify       # Skip verification

# Skip cache
sqlseed ai-suggest app.db -t users -o users.yaml --no-cache

๐Ÿง  9-Level Smart Column Mapping

One of sqlseed's core highlights is the ColumnMapper's 9-level strategy chain. Each column is matched by priority:

Level 1 โ”‚ Autoincrement PK    PK + AUTOINCREMENT / INTEGER โ†’ skip
        โ–ผ
Level 2 โ”‚ User config         columns={"email": "email"} highest priority
        โ–ผ
Level 3 โ”‚ Custom exact match  Rules registered via plugin hooks
        โ–ผ
Level 4 โ”‚ Built-in exact      74 rules: emailโ†’email, phoneโ†’phone, ageโ†’integer...
        โ–ผ
Level 5 โ”‚ DEFAULT check       Has default โ†’ skip / __enrich__ (when enrich=True)
        โ–ผ
Level 6 โ”‚ Custom pattern      Regex rules registered via plugin hooks
        โ–ผ
Level 7 โ”‚ Built-in pattern    25 regexes: *_atโ†’datetime, *_idโ†’foreign_key, is_*โ†’boolean...
        โ–ผ
Level 8 โ”‚ NULLABLE fallback   Nullable โ†’ skip / __enrich__
        โ–ผ
Level 9 โ”‚ Type-faithful       VARCHAR(32)โ†’max 32 chars, INT8โ†’0~255, BLOB(1024)โ†’1024 bytes

What this means:

  • Column user_email โ†’ Level 7 pattern *_email โ†’ email generator โœ…
  • Column is_verified โ†’ Level 7 pattern is_* โ†’ boolean generator โœ…
  • Column type VARCHAR(20) โ†’ Level 9 type fallback โ†’ max 20-char string โœ…
  • Column with DEFAULT 1 โ†’ Level 5 โ†’ skip generation โœ…
  • Column gender with DEFAULT 'male' โ†’ Level 4 exact match โ†’ choice generator (exact match takes priority over DEFAULT) โœ…

๐Ÿงฉ Plugin System

sqlseed provides 11 hook points via pluggy, covering the full data generation lifecycle:

Hook firstresult Trigger
sqlseed_register_providers
Register custom data providers
sqlseed_register_column_mappers
Register custom column mapping rules
sqlseed_ai_analyze_table โœ“ AI analyzes table schema (returns column config)
sqlseed_pre_generate_templates โœ“ AI pre-computes candidate value pools
sqlseed_before_generate
Before data generation loop
sqlseed_after_generate
After data generation completes
sqlseed_transform_row
Per-row transform (hot path, mind performance)
sqlseed_transform_batch
Per-batch transform (supports chaining)
sqlseed_before_insert
Before each batch write to DB
sqlseed_after_insert
After each batch write to DB
sqlseed_shared_pool_loaded
After SharedPool registration (pool readable)

๐Ÿ—๏ธ Project Architecture

src/sqlseed/
โ”œโ”€โ”€ __init__.py              # Public API (fill, connect, fill_from_config, preview)
โ”œโ”€โ”€ core/                    # ===== Core Orchestration =====
โ”‚   โ”œโ”€โ”€ orchestrator.py      # DataOrchestrator main engine
โ”‚   โ”œโ”€โ”€ mapper.py            # ColumnMapper 9-level strategy chain
โ”‚   โ”œโ”€โ”€ schema.py            # SchemaInferrer โ€” columns, indexes, distribution
โ”‚   โ”œโ”€โ”€ relation.py          # RelationResolver + SharedPool โ€” FK & cross-table sharing
โ”‚   โ”œโ”€โ”€ column_dag.py        # ColumnDAG โ€” column dependency graph + topological sort
โ”‚   โ”œโ”€โ”€ expression.py        # ExpressionEngine โ€” safe expressions (simpleeval + timeout)
โ”‚   โ”œโ”€โ”€ constraints.py       # ConstraintSolver โ€” unique backtracking
โ”‚   โ”œโ”€โ”€ transform.py         # TransformLoader โ€” dynamic user script loading
โ”‚   โ””โ”€โ”€ result.py            # GenerationResult dataclass
โ”œโ”€โ”€ generators/              # ===== Generator Layer =====
โ”‚   โ”œโ”€โ”€ _protocol.py         # DataProvider Protocol + UnknownGeneratorError
โ”‚   โ”œโ”€โ”€ registry.py          # ProviderRegistry (entry-point auto-discovery)
โ”‚   โ”œโ”€โ”€ base_provider.py     # Built-in base generators (zero dependencies)
โ”‚   โ”œโ”€โ”€ faker_provider.py    # Faker adapter
โ”‚   โ”œโ”€โ”€ mimesis_provider.py  # Mimesis adapter
โ”‚   โ””โ”€โ”€ stream.py            # DataStream streaming + constraint backtracking
โ”œโ”€โ”€ database/                # ===== Database Layer =====
โ”‚   โ”œโ”€โ”€ _protocol.py         # DatabaseAdapter Protocol (ColumnInfo, ForeignKeyInfo, IndexInfo)
โ”‚   โ”œโ”€โ”€ sqlite_utils_adapter.py   # Default adapter
โ”‚   โ”œโ”€โ”€ raw_sqlite_adapter.py     # sqlite3 fallback adapter
โ”‚   โ””โ”€โ”€ optimizer.py         # PragmaOptimizer 3-tier optimization
โ”œโ”€โ”€ plugins/                 # ===== Plugin Layer =====
โ”‚   โ”œโ”€โ”€ hookspecs.py         # 11 pluggy hook definitions
โ”‚   โ””โ”€โ”€ manager.py           # PluginManager
โ”œโ”€โ”€ config/                  # ===== Config Management =====
โ”‚   โ”œโ”€โ”€ models.py            # Pydantic models (GeneratorConfig/TableConfig/ColumnConfig)
โ”‚   โ”œโ”€โ”€ loader.py            # YAML/JSON load & save
โ”‚   โ””โ”€โ”€ snapshot.py          # Snapshot save & replay
โ”œโ”€โ”€ cli/                     # ===== CLI =====
โ”‚   โ””โ”€โ”€ main.py              # click commands (fill/preview/inspect/init/replay/ai-suggest)
โ””โ”€โ”€ _utils/                  # ===== Internal Utilities =====
    โ”œโ”€โ”€ sql_safe.py          # quote_identifier โ€” SQL injection protection
    โ”œโ”€โ”€ schema_helpers.py    # AUTOINCREMENT detection
    โ”œโ”€โ”€ metrics.py           # MetricsCollector performance metrics
    โ”œโ”€โ”€ progress.py          # Rich progress bar
    โ””โ”€โ”€ logger.py            # structlog logging

plugins/
โ”œโ”€โ”€ sqlseed-ai/              # AI plugin โ€” LLM-driven smart configuration
โ”‚   โ””โ”€โ”€ src/sqlseed_ai/      # SchemaAnalyzer, AiConfigRefiner, few-shot examples...
โ””โ”€โ”€ mcp-server-sqlseed/      # MCP server โ€” AI assistant integration
    โ””โ”€โ”€ src/mcp_server_sqlseed/   # FastMCP tools (sqlseed_inspect_schema/sqlseed_generate_yaml/sqlseed_execute_fill)

๐Ÿ› ๏ธ Development

# Run tests (with coverage)
pytest

# Lint
ruff check src/ tests/

# Auto-fix
ruff check --fix src/ tests/

# Type check
mypy src/sqlseed/

Tests cover all core modules, with path structure mirroring src/: test_core/, test_database/, test_generators/, test_plugins/, test_config/, test_utils/.

Dependencies

Package Core Dependencies Description
sqlseed sqlite-utils, pydantic, pluggy, structlog, pyyaml, click, rich, typing_extensions, simpleeval, rstr rstr used for pattern generator regex matching
sqlseed[faker] + faker>=30.0 Faker data engine
sqlseed[mimesis] + mimesis>=18.0 Mimesis data engine (recommended)
sqlseed[docs] + mkdocs-material, mkdocstrings Documentation build
sqlseed-ai sqlseed, openai>=1.0 AI plugin, auto-registered via entry-point
mcp-server-sqlseed sqlseed, mcp>=1.0 MCP server, standalone CLI tool
mcp-server-sqlseed[ai] + sqlseed-ai MCP server with AI support

๐Ÿ“„ License

AGPL-3.0-or-later


๐ŸŒฑ sqlseed โ€” Stop writing fixtures. Start generating data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlseed-0.1.13.tar.gz (77.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlseed-0.1.13-py3-none-any.whl (73.2 kB view details)

Uploaded Python 3

File details

Details for the file sqlseed-0.1.13.tar.gz.

File metadata

  • Download URL: sqlseed-0.1.13.tar.gz
  • Upload date:
  • Size: 77.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sqlseed-0.1.13.tar.gz
Algorithm Hash digest
SHA256 5725624c09c3ef0636f1fc814b4912054ea20e0b099fcf9fcdd284c0f12fd9d6
MD5 a493454a941fb0d9f5ead9cebb8d0e84
BLAKE2b-256 79a360f0cb0ac83b4eda45ef563f70ec72cf46dd525173ac9a037bd4b21e5060

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlseed-0.1.13.tar.gz:

Publisher: publish.yml on sunbos/sqlseed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlseed-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: sqlseed-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 73.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for sqlseed-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 86bb819cb374ad2fa96b0571323017ad1a57998a2e40767e540aef3e4d71678a
MD5 f5ed90632bfdca6a7fd593ca37c53379
BLAKE2b-256 860260c2a902cdafc7e8bc63c55f196b280bee29537b4938be1fb42c7cbc84c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlseed-0.1.13-py3-none-any.whl:

Publisher: publish.yml on sunbos/sqlseed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page