Declarative SQLite test data generation toolkit
Project description
๐ฑ sqlseed
Declarative SQLite Test Data Generation Toolkit
One line of code, tens of thousands of rows. Zero-config smart generation, AI-powered precision tuning.
import sqlseed
# Just one line. Auto-infers schema, auto-selects strategy, auto-optimizes writes.
result = sqlseed.fill("test.db", table="users", count=100_000)
print(result)
# โ GenerationResult(table=users, count=100000, elapsed=2.34s, speed=42735 rows/s)
๐ก Why sqlseed?
In development and testing workflows, we often need to populate SQLite databases with large volumes of realistic test data. Traditional approaches either require writing verbose data generation scripts or maintaining hard-to-scale SQL fixtures. sqlseed solves this with a declarative approach:
| Feature | sqlseed | Hand-written Scripts | SQL Fixtures |
|---|---|---|---|
| Zero-config smart generation | โ | โ | โ |
| Automatic FK maintenance | โ | Manual | Manual |
| 100K+ rows | โ Streaming | โ ๏ธ OOM | โ |
| Column semantic inference | โ 9-level strategy | โ | โ |
| Reproducible generation | โ seed | โ ๏ธ Manual | โ |
| AI-powered tuning | โ LLM | โ | โ |
| Config reuse | โ YAML | โ | โ |
โจ Core Features
|
๐ Zero-Config Smart Generation Auto-infers database schema and selects the best generator for each column via a 9-level strategy chain. Column named |
๐ฏ Declarative Fine-Grained Control Precisely control each column's data generation strategy, constraints, and null ratio via Python API or YAML/JSON configuration. |
|
๐ Automatic FK Ordering Topological sort auto-detects table dependencies. SharedPool cross-table value sharing maintains referential integrity with zero configuration. |
๐ Streaming Memory Safety
|
|
๐งฎ Expression Engine & Constraint Solving Supports derived column computation ( |
๐ค AI First-Class Citizen
|
|
๐งฉ 11 Lifecycle Hooks pluggy-based plugin architecture covering every stage from provider registration to batch insertion. |
๐ 3-Tier PRAGMA Optimization Intelligently switches between LIGHT / MODERATE / AGGRESSIVE write strategies based on data volume for maximum throughput. |
๐ฆ Installation
Basic
pip install sqlseed
Choose Data Engine
# Recommended: Mimesis (high performance, great locale support)
pip install sqlseed[mimesis]
# Optional: Faker (rich ecosystem)
pip install sqlseed[faker]
# Install all
pip install sqlseed[all]
Optional Plugins
# AI analysis plugin (requires openai SDK)
pip install sqlseed-ai
# MCP server (requires mcp SDK, lets AI assistants operate sqlseed)
pip install mcp-server-sqlseed
# MCP server + AI support (all-in-one)
pip install mcp-server-sqlseed[ai]
Docs Build (Developers)
pip install sqlseed[docs] # mkdocs-material + mkdocstrings
๐ Full Dev Environment Setup
git clone https://github.com/sunbos/sqlseed.git
cd sqlseed
# Install core + all providers + dev dependencies
pip install -e ".[dev,all]"
# Optional plugins
pip install -e "./plugins/sqlseed-ai"
pip install -e "./plugins/mcp-server-sqlseed"
# Verify installation
pytest
ruff check src/ tests/
mypy src/sqlseed/
๐ Quick Start
Get Started in 30 Seconds
Suppose you have a SQLite database app.db with a users table:
CREATE TABLE users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT,
age INTEGER,
phone TEXT,
created_at TEXT,
is_active INTEGER DEFAULT 1,
balance REAL
);
One line of code fills 10,000 rows of high-quality test data:
import sqlseed
result = sqlseed.fill("app.db", table="users", count=10_000)
print(result)
# โ GenerationResult(table=users, count=10000, elapsed=0.52s, speed=19230 rows/s)
sqlseed automatically:
- โ
Skips
id(autoincrement PK) - โ
Skips
is_active(has default value) - โ
nameโ generates real names - โ
emailโ generates email addresses - โ
ageโ generates integers 18โ100 - โ
phoneโ generates phone numbers - โ
created_atโ generates datetime (matches*_atpattern) - โ
balanceโ generates floats
Fully zero-config. Smart inference for everything.
๐ Tutorials
Tutorial 1: Python API โ Fine-Grained Control
For precise control over each column, declare generation strategies via the columns parameter:
import sqlseed
result = sqlseed.fill(
"app.db",
table="users",
count=50_000,
columns={
# Shorthand: specify generator name directly
"email": "email",
"phone": "phone",
# Full config: specify parameters
"age": {"type": "integer", "min_value": 18, "max_value": 65},
"balance": {"type": "float", "min_value": 0.0, "max_value": 100000.0, "precision": 2},
"name": "name",
# Random selection from candidate list
"status": {"type": "choice", "choices": ["active", "inactive", "banned"]},
},
provider="mimesis", # Use Mimesis engine
locale="en_US", # English locale
seed=42, # Fixed seed for reproducibility
clear_before=True, # Clear table before generation
enrich=True, # Infer distribution from existing data
transform="./transform_users.py", # Custom transform per row
)
print(result)
Supported Generator Types
| Generator | Description | Example Parameters |
|---|---|---|
string |
Random string | min_length, max_length, charset |
integer |
Integer | min_value, max_value |
float |
Float | min_value, max_value, precision |
boolean |
Boolean | โ |
name |
Full name | โ |
first_name |
First name | โ |
last_name |
Last name | โ |
email |
Email address | โ |
phone |
Phone number | โ |
address |
Address | โ |
company |
Company name | โ |
url |
URL | โ |
ipv4 |
IPv4 address | โ |
uuid |
UUID | โ |
date |
Date | start_year, end_year |
datetime |
Datetime | start_year, end_year |
timestamp |
Unix timestamp | โ |
text |
Long text | min_length, max_length |
sentence |
Sentence | โ |
password |
Password | length |
choice |
Pick from list | choices |
json |
JSON string | schema |
pattern |
Regex match | regex |
bytes |
Binary data | length |
username |
Username | โ |
city |
City | โ |
country |
Country | โ |
state |
State/Province | โ |
zip_code |
Zip/Postal code | โ |
job_title |
Job title | โ |
country_code |
Country code | โ |
foreign_key |
FK reference | ref_table, ref_column, strategy |
skip |
Skip (use default/NULL) | โ |
Tutorial 2: Multi-Table Associations โ Automatic FK Integrity
Use the context manager pattern to handle cross-table data dependencies:
import sqlseed
with sqlseed.connect("app.db", provider="mimesis", locale="en_US") as db:
# Step 1: Fill parent table first
db.fill("users", count=10_000, seed=42)
# Step 2: Fill child table โ sqlseed auto-detects FK constraints
# and picks random values from users.id for orders.user_id
db.fill("orders", count=50_000, columns={
"amount": {"type": "float", "min_value": 9.99, "max_value": 999.99, "precision": 2},
"quantity": {"type": "integer", "min_value": 1, "max_value": 20},
"status": {"type": "choice", "choices": ["pending", "paid", "shipped", "delivered"]},
})
# Step 3: View generation report
print(db.report())
# โ Database: app.db
# โ ==================================================
# โ users: 10000 rows
# โ orders: 50000 rows
๐ก Tip: If two tables share a column name (e.g.,
account_id), even without a declared FK constraint, sqlseed automatically maintains cross-table consistency via the SharedPool implicit association mechanism.
Explicit Cross-Table Associations (ColumnAssociation)
When the target column name differs from the source (e.g., department_id โ id), or there's no FK constraint but you need an association, declare it explicitly via associations:
db_path: "app.db"
provider: mimesis
tables:
- name: departments
count: 5
clear_before: true
- name: employees
count: 20
clear_before: true
associations:
- column_name: department_id # Column name in the target table
source_table: departments # Source table providing values
source_column: id # Column name in source table (defaults to column_name)
target_tables: # Target tables using this association
- employees
strategy: shared_pool # Association strategy
This way, even without FOREIGN KEY (department_id) REFERENCES departments(id), department_id values will come from departments.id.
Tutorial 3: YAML Config-Driven Batch Generation
For complex multi-table scenarios, use YAML configuration:
1. Generate config template
sqlseed init generate.yaml --db app.db
2. Edit config file
# generate.yaml
db_path: "app.db"
provider: mimesis
locale: en_US
optimize_pragma: true
tables:
- name: users
count: 100000
clear_before: true
seed: 42
columns:
- name: username
generator: name
- name: email
generator: email
- name: phone
generator: phone
- name: age
generator: integer
params:
min_value: 18
max_value: 65
- name: status
generator: choice
params:
choices: [0, 1, 2]
null_ratio: 0.05 # 5% chance of NULL
- name: orders
count: 500000
batch_size: 10000 # 10K rows per batch, optimizes memory
columns:
- name: user_id
generator: foreign_key
params:
ref_table: users
ref_column: id
strategy: random
- name: amount
generator: float
params:
min_value: 1.0
max_value: 9999.99
precision: 2
- name: created_at
generator: datetime
params:
start_year: 2024
3. Execute generation
sqlseed fill --config generate.yaml
Or in Python:
results = sqlseed.fill_from_config("generate.yaml")
for r in results:
print(r)
Tutorial 4: Derived Columns & Expression Engine
sqlseed v2.0 introduces column dependency DAG and expression engine for computing derived columns:
# Bank card info table scenario
tables:
- name: bank_cards
count: 10000
columns:
- name: card_number
generator: pattern
params:
regex: "62[0-9]{17}" # 19-digit UnionPay card number
constraints:
unique: true
- name: last_eight
derive_from: card_number # Depends on card_number
expression: "value[-8:]" # Last 8 digits
constraints:
unique: true
- name: last_six
derive_from: card_number
expression: "value[-6:]" # Last 6 digits
- name: account_id
generator: pattern
params:
regex: "U[0-9]{10}"
constraints:
unique: true
How it works:
- sqlseed builds a column dependency DAG:
card_number โ last_eight, last_six - Topological sort determines generation order
- Generates
card_numberfirst, then computeslast_eightviavalue[-8:] - If
last_eightunique constraint fails, backtracks to regeneratecard_number
Expression Engine Functions (21 total)
| Function | Usage | Description |
|---|---|---|
len(s) |
len(value) |
Length |
int(s) |
int(value) |
To integer |
str(s) |
str(value) |
To string |
float(s) |
float(value) |
To float |
hex(n) |
hex(value) |
To hexadecimal |
oct(n) |
oct(value) |
To octal |
bin(n) |
bin(value) |
To binary |
abs(n) |
abs(value) |
Absolute value |
min(*args) |
min(a, b) |
Minimum |
max(*args) |
max(a, b) |
Maximum |
upper(s) |
upper(value) |
Uppercase |
lower(s) |
lower(value) |
Lowercase |
strip(s) |
strip(value) |
Trim both ends |
lstrip(s) |
lstrip(value) |
Trim left |
rstrip(s) |
rstrip(value) |
Trim right |
zfill(s, width) |
zfill(value, 10) |
Zero-fill |
replace(s, old, new) |
replace(value, "-", "") |
Replace |
substr(s, start, end) |
substr(value, 0, 8) |
Substring |
lpad(s, width, char) |
lpad(value, 8, "0") |
Left-pad |
rpad(s, width, char) |
rpad(value, 8, "0") |
Right-pad |
concat(*args) |
concat("PRE_", value) |
Concatenate |
| Slicing | value[-8:] |
Python slice syntax |
| Math | value * 2 + 1 |
Basic arithmetic |
โ ๏ธ Safety: The expression engine is based on
simpleevalwith 5-second timeout protection.import,exec, and file I/O are not allowed.
Tutorial 5: Transform Scripts โ Complex Business Logic
For complex business logic that can't be expressed declaratively, write Python transform scripts:
1. Write transform script
# transform_users.py
def transform_row(row, ctx):
"""Called for every generated row."""
# Calculate VIP level based on age
age = row.get("age", 0)
if age >= 60:
row["vip_level"] = 3
elif age >= 40:
row["vip_level"] = 2
else:
row["vip_level"] = 1
# Normalize phone format
phone = row.get("phone", "")
if phone and not phone.startswith("+1"):
row["phone"] = f"+1{phone}"
return row
2. Use in CLI
sqlseed fill app.db --table users --count 10000 --transform transform_users.py
3. Use in YAML
tables:
- name: users
count: 10000
transform: "./transform_users.py"
Tutorial 6: Preview & Debug
Preview data before generating at scale:
Python API:
rows = sqlseed.preview("app.db", table="users", count=5, seed=42)
# Also supports enrich and transform parameters
rows = sqlseed.preview("app.db", table="users", count=5, seed=42, enrich=True)
for row in rows:
print(row)
# โ {'name': 'John Smith', 'email': 'jsmith@example.com', 'age': 32, ...}
# โ {'name': 'Jane Doe', 'email': 'jdoe@test.org', 'age': 28, ...}
# โ ...
CLI (Rich table output):
sqlseed preview app.db --table users --count 5
# โโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโณโโโโโโโโโโโโโโโโโโโโโโ
# โ name โ email โ age โ created_at โ
# โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
# โ John Smith โ jsmith@example.com โ 32 โ 2024-03-15 08:23:11 โ
# โ ... โ ... โ ... โ ... โ
# โโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโดโโโโโโโโโโโโโโโโโโโโโโ
View column mapping strategy:
sqlseed inspect app.db --table users --show-mapping
# See what generation strategy sqlseed chose for each column
# โโโโโโโโโโโโโโณโโโโโโโโโโณโโโโโโโโโโโณโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโ
# โ Column โ Type โ Nullable โ Generator โ Params โ
# โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
# โ id โ INTEGER โ โ โ skip โ {} โ
# โ name โ TEXT โ โ โ name โ {} โ
# โ email โ TEXT โ โ โ email โ {} โ
# โ age โ INTEGER โ โ โ integer โ {min: 18...} โ
# โ ... โ ... โ ... โ ... โ ... โ
# โโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Tutorial 7: Snapshots & Replay
Save a successful generation config for exact replay later:
# Generate and save snapshot
sqlseed fill app.db --table users --count 10000 --seed 42 --snapshot
# โ Snapshot saved: snapshots/2026-04-15_033000_users.yaml
# Replay anytime
sqlseed replay snapshots/2026-04-15_033000_users.yaml
# โ GenerationResult(table=users, count=10000, elapsed=0.52s, speed=19230 rows/s)
Use cases:
- ๐งช Reproducible test data in CI/CD
- ๐ Consistent test environments across teams
- ๐ Quick database state reconstruction during development
Tutorial 8: AI-Powered Configuration (sqlseed-ai Plugin)
Let LLM analyze your database schema and auto-generate optimal config suggestions:
# Install AI plugin
pip install sqlseed-ai
# Set API key
export SQLSEED_AI_API_KEY="your-api-key"
export SQLSEED_AI_BASE_URL="https://your-llm-api-endpoint"
# AI analysis and config generation
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml
# AI suggestions with self-correction (3 rounds by default)
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml --verify
# Specify model (defaults to most popular free model)
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml --model nvidia/nemotron-3-super-120b-a12b:free
# Skip cache
sqlseed ai-suggest app.db --table bank_cards --output bank_cards.yaml --no-cache
AI Workflow:
1. Extract schema context (columns, indexes, sample data, FK, distribution)
2. Build LLM prompt with few-shot examples
3. LLM returns JSON column config suggestions
4. AiConfigRefiner auto-validates config correctness
5. If errors found (unknown generator, type mismatch, etc.), sends correction request to LLM
6. Up to 3 self-correction rounds, outputs validated YAML config
๐ก Environment Variables: Supports
SQLSEED_AI_API_KEY,SQLSEED_AI_BASE_URL,SQLSEED_AI_MODEL. Also supportsOPENAI_API_KEY/OPENAI_BASE_URLas fallback. Defaults to auto-selecting the most popular free model from OpenRouter (base_urlhttps://openrouter.ai/api/v1). Set--modelorSQLSEED_AI_MODELto specify a model.
Tutorial 9: MCP Server Integration
Let AI assistants (Claude, Cursor, etc.) operate sqlseed directly via Model Context Protocol:
# Install MCP server
pip install mcp-server-sqlseed
# All-in-one: MCP server + AI support
pip install mcp-server-sqlseed[ai]
# Manual start (usually managed by MCP client)
python -m mcp_server_sqlseed
Configure MCP client (Claude Desktop example):
{
"mcpServers": {
"sqlseed": {
"command": "mcp-server-sqlseed"
}
}
}
MCP Capabilities:
| Type | Name | Description |
|---|---|---|
| ๐ Resource | sqlseed://schema/{db_path}/{table_name} |
Get table schema as JSON |
| ๐ Tool | sqlseed_inspect_schema |
Inspect schema (columns, FK, indexes, samples, schema_hash) |
| ๐ค Tool | sqlseed_generate_yaml |
AI-driven YAML config generation with self-correction. Supports api_key/base_url/model overrides |
| โก Tool | sqlseed_execute_fill |
Execute data generation (supports YAML config string, includes enrich option) |
This means you can tell your AI assistant:
"Analyze the structure of the
bank_cardstable inapp.db, generate a YAML config, then fill 5000 rows."
The AI assistant will call sqlseed_inspect_schema โ sqlseed_generate_yaml โ sqlseed_execute_fill in sequence, without you writing any code.
Tutorial 10: Custom Provider Plugin
You can create your own data generation provider:
# my_provider.py
from __future__ import annotations
from typing import Any
from sqlseed.generators import UnknownGeneratorError
class MyCustomProvider:
"""Just implement the DataProvider Protocol. No base class required."""
def __init__(self) -> None:
self._locale: str = "en_US"
@property
def name(self) -> str:
return "my_custom"
def set_locale(self, locale: str) -> None:
self._locale = locale
def set_seed(self, seed: int) -> None:
...
def generate(self, type_name: str, **params: Any) -> Any:
if type_name == "string":
return "custom_string"
if type_name == "email":
return "user@example.com"
raise UnknownGeneratorError(type_name)
# ... handle generator names you want to support
# Full Protocol: src/sqlseed/generators/_protocol.py
To reuse the built-in generator name dispatch logic instead of hand-writing generate() routing, inherit BaseProvider and override selectively.
Registration method 1: via pyproject.toml entry-point (recommended)
[project.entry-points."sqlseed"]
my_custom = "my_provider:MyCustomProvider"
Registration method 2: via plugin hook
from sqlseed.plugins.hookspecs import hookimpl
class MyPlugin:
@hookimpl
def sqlseed_register_providers(self, registry):
from my_provider import MyCustomProvider
registry.register(MyCustomProvider())
๐ฅ๏ธ CLI Quick Reference
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# ๐ Data Generation
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Fill data (--count required when not using --config)
sqlseed fill app.db --table users --count 10000
# Full parameters
sqlseed fill app.db -t users -n 100000 \
--provider mimesis \
--locale en_US \
--seed 42 \
--batch-size 10000 \
--clear \
--enrich \
--snapshot
# YAML config-driven (count from config file)
sqlseed fill --config generate.yaml
# Transform script
sqlseed fill app.db -t users -n 10000 --transform transform.py
# Enable debug logging
SQLSEED_LOG_LEVEL=DEBUG sqlseed fill app.db -t users -n 10
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# ๐ Inspect & Preview
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Preview data (no write)
sqlseed preview app.db --table users --count 5
# List all tables
sqlseed inspect app.db
# View column mapping strategy
sqlseed inspect app.db --table users --show-mapping
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# ๐ธ Snapshots & Replay
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Generate config template
sqlseed init generate.yaml --db app.db
# Replay snapshot
sqlseed replay snapshots/2026-04-15_users.yaml
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# ๐ค AI Features
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# AI suggestions (requires sqlseed-ai)
sqlseed ai-suggest app.db -t users -o users.yaml
sqlseed ai-suggest app.db -t users -o users.yaml --verify
# Specify API config
sqlseed ai-suggest app.db -t users -o users.yaml --api-key sk-xxx --base-url https://api.openai.com/v1
# Control self-correction
sqlseed ai-suggest app.db -t users -o users.yaml --max-retries 0 # Disable
sqlseed ai-suggest app.db -t users -o users.yaml --no-verify # Skip verification
# Skip cache
sqlseed ai-suggest app.db -t users -o users.yaml --no-cache
๐ง 9-Level Smart Column Mapping
One of sqlseed's core highlights is the ColumnMapper's 9-level strategy chain. Each column is matched by priority:
Level 1 โ Autoincrement PK PK + AUTOINCREMENT / INTEGER โ skip
โผ
Level 2 โ User config columns={"email": "email"} highest priority
โผ
Level 3 โ Custom exact match Rules registered via plugin hooks
โผ
Level 4 โ Built-in exact 74 rules: emailโemail, phoneโphone, ageโinteger...
โผ
Level 5 โ DEFAULT check Has default โ skip / __enrich__ (when enrich=True)
โผ
Level 6 โ Custom pattern Regex rules registered via plugin hooks
โผ
Level 7 โ Built-in pattern 25 regexes: *_atโdatetime, *_idโforeign_key, is_*โboolean...
โผ
Level 8 โ NULLABLE fallback Nullable โ skip / __enrich__
โผ
Level 9 โ Type-faithful VARCHAR(32)โmax 32 chars, INT8โ0~255, BLOB(1024)โ1024 bytes
What this means:
- Column
user_emailโ Level 7 pattern*_emailโemailgenerator โ - Column
is_verifiedโ Level 7 patternis_*โbooleangenerator โ - Column type
VARCHAR(20)โ Level 9 type fallback โ max 20-char string โ - Column with
DEFAULT 1โ Level 5 โ skip generation โ - Column
genderwithDEFAULT 'male'โ Level 4 exact match โchoicegenerator (exact match takes priority over DEFAULT) โ
๐งฉ Plugin System
sqlseed provides 11 hook points via pluggy, covering the full data generation lifecycle:
| Hook | firstresult | Trigger |
|---|---|---|
sqlseed_register_providers |
Register custom data providers | |
sqlseed_register_column_mappers |
Register custom column mapping rules | |
sqlseed_ai_analyze_table |
โ | AI analyzes table schema (returns column config) |
sqlseed_pre_generate_templates |
โ | AI pre-computes candidate value pools |
sqlseed_before_generate |
Before data generation loop | |
sqlseed_after_generate |
After data generation completes | |
sqlseed_transform_row |
Per-row transform (hot path, mind performance) | |
sqlseed_transform_batch |
Per-batch transform (supports chaining) | |
sqlseed_before_insert |
Before each batch write to DB | |
sqlseed_after_insert |
After each batch write to DB | |
sqlseed_shared_pool_loaded |
After SharedPool registration (pool readable) |
๐๏ธ Project Architecture
src/sqlseed/
โโโ __init__.py # Public API (fill, connect, fill_from_config, preview)
โโโ core/ # ===== Core Orchestration =====
โ โโโ orchestrator.py # DataOrchestrator main engine
โ โโโ mapper.py # ColumnMapper 9-level strategy chain
โ โโโ schema.py # SchemaInferrer โ columns, indexes, distribution
โ โโโ relation.py # RelationResolver + SharedPool โ FK & cross-table sharing
โ โโโ column_dag.py # ColumnDAG โ column dependency graph + topological sort
โ โโโ expression.py # ExpressionEngine โ safe expressions (simpleeval + timeout)
โ โโโ constraints.py # ConstraintSolver โ unique backtracking
โ โโโ transform.py # TransformLoader โ dynamic user script loading
โ โโโ result.py # GenerationResult dataclass
โโโ generators/ # ===== Generator Layer =====
โ โโโ _protocol.py # DataProvider Protocol + UnknownGeneratorError
โ โโโ registry.py # ProviderRegistry (entry-point auto-discovery)
โ โโโ base_provider.py # Built-in base generators (zero dependencies)
โ โโโ faker_provider.py # Faker adapter
โ โโโ mimesis_provider.py # Mimesis adapter
โ โโโ stream.py # DataStream streaming + constraint backtracking
โโโ database/ # ===== Database Layer =====
โ โโโ _protocol.py # DatabaseAdapter Protocol (ColumnInfo, ForeignKeyInfo, IndexInfo)
โ โโโ sqlite_utils_adapter.py # Default adapter
โ โโโ raw_sqlite_adapter.py # sqlite3 fallback adapter
โ โโโ optimizer.py # PragmaOptimizer 3-tier optimization
โโโ plugins/ # ===== Plugin Layer =====
โ โโโ hookspecs.py # 11 pluggy hook definitions
โ โโโ manager.py # PluginManager
โโโ config/ # ===== Config Management =====
โ โโโ models.py # Pydantic models (GeneratorConfig/TableConfig/ColumnConfig)
โ โโโ loader.py # YAML/JSON load & save
โ โโโ snapshot.py # Snapshot save & replay
โโโ cli/ # ===== CLI =====
โ โโโ main.py # click commands (fill/preview/inspect/init/replay/ai-suggest)
โโโ _utils/ # ===== Internal Utilities =====
โโโ sql_safe.py # quote_identifier โ SQL injection protection
โโโ schema_helpers.py # AUTOINCREMENT detection
โโโ metrics.py # MetricsCollector performance metrics
โโโ progress.py # Rich progress bar
โโโ logger.py # structlog logging
plugins/
โโโ sqlseed-ai/ # AI plugin โ LLM-driven smart configuration
โ โโโ src/sqlseed_ai/ # SchemaAnalyzer, AiConfigRefiner, few-shot examples...
โโโ mcp-server-sqlseed/ # MCP server โ AI assistant integration
โโโ src/mcp_server_sqlseed/ # FastMCP tools (sqlseed_inspect_schema/sqlseed_generate_yaml/sqlseed_execute_fill)
๐ ๏ธ Development
# Run tests (with coverage)
pytest
# Lint
ruff check src/ tests/
# Auto-fix
ruff check --fix src/ tests/
# Type check
mypy src/sqlseed/
Tests cover all core modules, with path structure mirroring src/: test_core/, test_database/, test_generators/, test_plugins/, test_config/, test_utils/.
Dependencies
| Package | Core Dependencies | Description |
|---|---|---|
sqlseed |
sqlite-utils, pydantic, pluggy, structlog, pyyaml, click, rich, typing_extensions, simpleeval, rstr | rstr used for pattern generator regex matching |
sqlseed[faker] |
+ faker>=30.0 | Faker data engine |
sqlseed[mimesis] |
+ mimesis>=18.0 | Mimesis data engine (recommended) |
sqlseed[docs] |
+ mkdocs-material, mkdocstrings | Documentation build |
sqlseed-ai |
sqlseed, openai>=1.0 | AI plugin, auto-registered via entry-point |
mcp-server-sqlseed |
sqlseed, mcp>=1.0 | MCP server, standalone CLI tool |
mcp-server-sqlseed[ai] |
+ sqlseed-ai | MCP server with AI support |
๐ License
๐ฑ sqlseed โ Stop writing fixtures. Start generating data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlseed-0.1.15.tar.gz.
File metadata
- Download URL: sqlseed-0.1.15.tar.gz
- Upload date:
- Size: 78.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
990f60d4ffa0f87e3137d6bd4f3904f65a21c3c786bbd73ad70e690f328435ad
|
|
| MD5 |
965d8ed26efc251338b2a272cea08550
|
|
| BLAKE2b-256 |
628adfa3be0c7e17b4095bf5adce81b03834309b3034a07aa8615f6c925ba79c
|
Provenance
The following attestation bundles were made for sqlseed-0.1.15.tar.gz:
Publisher:
publish.yml on sunbos/sqlseed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlseed-0.1.15.tar.gz -
Subject digest:
990f60d4ffa0f87e3137d6bd4f3904f65a21c3c786bbd73ad70e690f328435ad - Sigstore transparency entry: 1421079740
- Sigstore integration time:
-
Permalink:
sunbos/sqlseed@96566d32c64be0ed2725fd06cf4acb5fd5d81d4f -
Branch / Tag:
refs/tags/v0.1.15 - Owner: https://github.com/sunbos
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@96566d32c64be0ed2725fd06cf4acb5fd5d81d4f -
Trigger Event:
release
-
Statement type:
File details
Details for the file sqlseed-0.1.15-py3-none-any.whl.
File metadata
- Download URL: sqlseed-0.1.15-py3-none-any.whl
- Upload date:
- Size: 73.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0923221b0eaa570ab435d95eb9ac20511daf961d5bb8d344a37e2e082535a281
|
|
| MD5 |
686368c72d84124eb08bc0c46bf33d38
|
|
| BLAKE2b-256 |
0a1a96fa24739201ced62994a01804792206c16cbcab213c58eeaf5f4038b015
|
Provenance
The following attestation bundles were made for sqlseed-0.1.15-py3-none-any.whl:
Publisher:
publish.yml on sunbos/sqlseed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlseed-0.1.15-py3-none-any.whl -
Subject digest:
0923221b0eaa570ab435d95eb9ac20511daf961d5bb8d344a37e2e082535a281 - Sigstore transparency entry: 1421080105
- Sigstore integration time:
-
Permalink:
sunbos/sqlseed@96566d32c64be0ed2725fd06cf4acb5fd5d81d4f -
Branch / Tag:
refs/tags/v0.1.15 - Owner: https://github.com/sunbos
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@96566d32c64be0ed2725fd06cf4acb5fd5d81d4f -
Trigger Event:
release
-
Statement type: