Schema-aware seed data generation for PostgreSQL

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FraiseQL

These details have not been verified by PyPI

Project links

Documentation

Project description

fraiseql-data

Schema-aware seed data generation for PostgreSQL with Trinity pattern support.

Overview

fraiseql-data generates realistic test data for PostgreSQL databases by:

Introspecting your schema to understand tables, columns, and relationships
Respecting foreign key constraints with automatic dependency resolution
Supporting Trinity pattern (pk_*, id, identifier) for FraiseQL compatibility
Generating realistic data using Faker for domain-appropriate values
Correlating related columns (address, person, geo) for coherent rows
Handling complex scenarios like self-referencing tables, UNIQUE and CHECK constraints

Installation

# Using uv (recommended)
uv add fraiseql-data

# Or using pip
pip install fraiseql-data

Requirements:

Python 3.12+
PostgreSQL 14+
psycopg 3.1+

Quick Start

from psycopg import connect
from fraiseql_data import SeedBuilder

# Connect to database
conn = connect("postgresql://user:pass@localhost/mydb")

# Build seed plan (with seed common baseline)
builder = SeedBuilder(
    conn,
    schema="public",
    seed_common="db/seed_common.yaml"  # Optional but recommended
)
seeds = (
    builder
    .add("tb_manufacturer", count=10)
    .add("tb_model", count=50)
    .add("tb_variant", count=200)
    .execute()
)

# Access generated data
for manufacturer in seeds.tb_manufacturer:
    print(f"Created: {manufacturer.name} ({manufacturer.identifier})")

Features

Automatic Dependency Resolution

fraiseql-data automatically handles foreign key dependencies:

builder = SeedBuilder(conn, "public")

# No need to specify order - dependencies auto-resolved
seeds = (
    builder
    .add("tb_variant", count=100)      # Depends on tb_model
    .add("tb_model", count=20)         # Depends on tb_manufacturer
    .add("tb_manufacturer", count=5)   # No dependencies
    .execute()
)

# Inserts in correct order: manufacturer -> model -> variant

Auto-Dependency Generation

Automatically generate parent dependencies without manual specification:

# Auto-generate all FK dependencies (1 row each by default)
seeds = builder.add("tb_allocation", count=20, auto_deps=True).execute()

# Specify explicit counts per dependency
seeds = builder.add(
    "tb_allocation",
    count=100,
    auto_deps={
        "tb_organization": 3,
        "tb_machine": 10,
    }
).execute()

# With overrides on auto-generated dependencies
seeds = builder.add(
    "tb_allocation",
    count=50,
    auto_deps={
        "tb_organization": {
            "count": 2,
            "overrides": {"org_type": "nonprofit"},
        }
    }
).execute()

Trinity Pattern Support

Automatic handling of Trinity pattern (pk_*, id, identifier):

seeds = builder.add("tb_manufacturer", count=10).execute()

for mfr in seeds.tb_manufacturer:
    print(f"PK: {mfr.pk_manufacturer}")     # 1, 2, 3, ...
    print(f"ID: {mfr.id}")                  # UUID v4 with pattern
    print(f"Identifier: {mfr.identifier}")  # MANUFACTURER-001, ...

Realistic Data Generation

Uses Faker for domain-appropriate data:

# Faker automatically detects common column names:
# - email -> realistic email addresses
# - name, first_name, last_name -> person names
# - company, company_name -> company names
# - phone, phone_number -> phone numbers
# - address, street -> addresses

seeds = builder.add("tb_user", count=10).execute()
# email: "john.doe@example.com" (not "column_1_value")

Numeric columns with precision and scale (numeric(p,s)) generate values within bounds:

# numeric(10,2) -> values up to 99,999,999.99
# numeric(5,3)  -> values up to 99.999

Correlated Column Groups

Semantically related columns are automatically detected and generated together for coherent rows:

# Address columns auto-detected and correlated
builder.add("tb_address", count=100)
# -> country/city/state/postal_code are coherent per row
# -> French address gets French city and 5-digit postal code

# Person columns auto-detected
builder.add("tb_user", count=50)
# -> first_name/last_name/email are coherent
# -> email derived as first.last@domain

# Override-aware coherence
builder.add("tb_address", count=100, overrides={"country": "France"})
# -> city, state, postal_code are all French

Built-in groups (activate when >= 2 matching columns present):

Group	Fields	Behavior
address	country, state, city, postal_code, street, address, zip/zipcode/zip_code	Locale-coherent components
person	first_name, last_name, name, email	Name pair with derived email
geo	latitude, longitude, lat, lng, lon	Coherent lat/lng pair, locale-biased when address group is active

Custom groups for domain-specific correlation:

from fraiseql_data import ColumnGroup

def product_gen(context):
    category = context.get("category") or random.choice(["Electronics", "Clothing"])
    prefix = {"Electronics": "EL", "Clothing": "CL"}[category]
    return {"category": category, "sku": f"{prefix}-{random.randint(1000, 9999)}"}

builder.add("tb_product", count=200, groups=[
    ColumnGroup("product", frozenset({"category", "sku"}), product_gen)
])

# Disable auto-detection entirely
builder.add("tb_address", count=100, groups=[])

Generator context keys:

The context dict passed to your generator function includes:

Key	Type	Description
`_instance`	`int`	1-based row counter (1, 2, ..., N)
`_table_columns`	`frozenset[str]`	All column names of the table being seeded
(column overrides)	`Any`	Override values for columns in this group
(upstream group outputs)	`Any`	Values from earlier groups in the pipeline

def smart_gen(context):
    row_num = context["_instance"]
    has_notes = "notes" in context["_table_columns"]
    return {
        "label": f"Item #{row_num}",
        "description": "See notes" if has_notes else "N/A",
    }

Custom Overrides

Override auto-generation for specific columns:

import random

seeds = (
    builder
    .add("tb_product", count=50, overrides={
        "price": lambda: round(random.uniform(10.0, 500.0), 2),
        "status": "active",  # Static value for all rows
        "created_at": lambda i: f"2024-{i:02d}-01",  # Uses instance number
    })
    .execute()
)

Override priority: Overrides take precedence over both automatic FK resolution and column group generation. This enables cross-builder seeding where parent data already exists:

# Parent data already in database from a previous builder/migration
builder.add("tb_product", count=50, overrides={
    "fk_organization": 42,  # Use existing org, skip FK auto-resolution
})

When all FK columns pointing to a dependency table are overridden, that table can be omitted from the seed plan entirely.

Self-Referencing Tables

Support for hierarchical data structures:

seeds = builder.add("tb_category", count=20).execute()

# First category has NULL parent, others pick random parent
categories = seeds.tb_category
assert categories[0].parent_category is None  # Root category

UNIQUE Constraint Handling

Automatic collision detection and retry:

seeds = builder.add("tb_user", count=100).execute()

# Guaranteed unique emails and usernames (max 10 retry attempts)
emails = [u.email for u in seeds.tb_user]
assert len(emails) == len(set(emails))  # No duplicates!

For group-generated UNIQUE columns (e.g., email), the entire group is regenerated on collision to preserve coherence. After half of the retry attempts, an email suffix fallback activates (first.last42@domain).

CHECK Constraint Auto-Satisfaction

Automatically generate valid data for CHECK constraints:

# status TEXT NOT NULL CHECK (status IN ('active', 'pending', 'archived'))
# price NUMERIC CHECK (price > 0 AND price < 10000)

# No overrides needed - constraints automatically satisfied!
seeds = builder.add("tb_product", count=100).execute()

Supported: enum values (IN), range constraints (>, <, >=, <=), BETWEEN.

Batch Operations

Fluent API for multi-table seeding with conditional operations:

with builder.batch() as batch:
    batch.add("tb_manufacturer", count=10)
    batch.add("tb_model", count=50)
    batch.when(include_demo_data).add("tb_demo_product", count=100)

Data Export / Import

# Export
json_str = seeds.to_json()
seeds.to_csv("tb_manufacturer", "manufacturers.csv")

# Import
imported = Seeds.from_json(file_path="seeds.json")
imported = Seeds.from_csv("tb_manufacturer", "manufacturers.csv")
result = builder.insert_seeds(imported)

Staging Backend (In-Memory Testing)

Generate seed data without a database connection:

from fraiseql_data import SeedBuilder
from fraiseql_data.models import TableInfo, ColumnInfo

builder = SeedBuilder(conn=None, schema="test", backend="staging")

table_info = TableInfo(
    name="tb_product",
    columns=[
        ColumnInfo(name="pk_product", pg_type="integer", is_nullable=False, is_primary_key=True),
        ColumnInfo(name="name", pg_type="text", is_nullable=False),
        ColumnInfo(name="price", pg_type="numeric", is_nullable=True),
    ],
)
builder.set_table_schema("tb_product", table_info)

seeds = builder.add("tb_product", count=100).execute()

Seed Common Baseline

Define a required baseline layer that all test data builds upon, eliminating UUID collisions:

builder = SeedBuilder(
    conn, schema="public",
    seed_common="db/seed_common.yaml"
)

Instance range separation:

1 - 1,000: Seed common (reserved baseline)
1,001 - 999,999: Test data (generated per test run)
1,000,000+: Runtime generated

Supports YAML, JSON, and environment-specific baselines (seed_common.dev.yaml, seed_common.staging.yaml).

Warning behavior: When seed_common is omitted, a warning is logged once per process. Pass validate_seed_common=False to suppress.

pytest Integration

from fraiseql_data import seed_data

@seed_data("tb_manufacturer", count=5)
@seed_data("tb_model", count=20)
def test_models(seeds):
    assert len(seeds.tb_manufacturer) == 5
    assert len(seeds.tb_model) == 20

API Reference

For complete API documentation, see API.md.

Quick reference:

SeedBuilder - Main API for seed generation
ColumnGroup - Define custom correlated column groups
Seeds - Container for generated data with export/import
@seed_data - pytest decorator for test fixtures

Development

# All tests
uv run pytest

# With coverage
uv run pytest --cov=src/fraiseql_data

# Linting
uv run ruff check src/ tests/

Architecture

fraiseql-data uses a modular architecture:

Introspection: Query information_schema for tables, columns, FKs, UNIQUE constraints, CHECK constraints
Dependency Graph: Topological sort for correct insertion order
Auto-Dependency Resolver: Recursive FK traversal, DAG-based deduplication, multi-path handling
Seed Common: Baseline management with multi-format support (YAML, JSON, SQL), FK validation, environment detection
Generators: Faker, Trinity, Column Groups (address/person/geo), CHECK constraint satisfaction (extensible)
Backends: DirectBackend (bulk INSERT), StagingBackend (in-memory)
Import/Export: JSON and CSV with automatic type conversion
Batch API: Context manager with conditional operations
Decorators: pytest integration with auto-cleanup

License

MIT License - see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FraiseQL

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.1.4

May 27, 2026

0.1.3

Mar 23, 2026

0.1.2

Mar 21, 2026

0.1.1

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fraiseql_data-0.1.4.tar.gz (123.8 kB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fraiseql_data-0.1.4-py3-none-any.whl (78.7 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file fraiseql_data-0.1.4.tar.gz.

File metadata

Download URL: fraiseql_data-0.1.4.tar.gz
Upload date: May 27, 2026
Size: 123.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fraiseql_data-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`531c0d893d341e79d938860b32f2d0b919fd8f08c441dc07b0f30d3edd08db0f`
MD5	`bf48a1f67f7e277f3559fcce21ec1919`
BLAKE2b-256	`b46660b549ca392b7963d853a8f559f7ab0e21855090b0e7b23e39bd33df663c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fraiseql_data-0.1.4.tar.gz:

Publisher: deploy.yml on fraiseql/fraiseql-seed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fraiseql_data-0.1.4.tar.gz
- Subject digest: 531c0d893d341e79d938860b32f2d0b919fd8f08c441dc07b0f30d3edd08db0f
- Sigstore transparency entry: 1642397443
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: fraiseql/fraiseql-seed@5177a58569e2f25311f41eecc50070f4e58c9e03
- Branch / Tag: refs/heads/main
- Owner: https://github.com/fraiseql
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy.yml@5177a58569e2f25311f41eecc50070f4e58c9e03
- Trigger Event: push

File details

Details for the file fraiseql_data-0.1.4-py3-none-any.whl.

File metadata

Download URL: fraiseql_data-0.1.4-py3-none-any.whl
Upload date: May 27, 2026
Size: 78.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fraiseql_data-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`15e08a08ddfa259585c193a039641dc65f2796d412bfd677d0c5041851d2ccec`
MD5	`f1f97550c2667ed2c8f8ddfd08fe0ec7`
BLAKE2b-256	`f20b690a3e6aebbcbc7062d73a39d9a7c7167a55aefa378f08c9c7681ebf69d2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fraiseql_data-0.1.4-py3-none-any.whl:

Publisher: deploy.yml on fraiseql/fraiseql-seed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fraiseql_data-0.1.4-py3-none-any.whl
- Subject digest: 15e08a08ddfa259585c193a039641dc65f2796d412bfd677d0c5041851d2ccec
- Sigstore transparency entry: 1642397630
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: fraiseql/fraiseql-seed@5177a58569e2f25311f41eecc50070f4e58c9e03
- Branch / Tag: refs/heads/main
- Owner: https://github.com/fraiseql
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy.yml@5177a58569e2f25311f41eecc50070f4e58c9e03
- Trigger Event: push

fraiseql-data 0.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fraiseql-data

Overview

Installation

Quick Start

Features

Automatic Dependency Resolution

Auto-Dependency Generation

Trinity Pattern Support

Realistic Data Generation

Correlated Column Groups

Custom Overrides

Self-Referencing Tables

UNIQUE Constraint Handling

CHECK Constraint Auto-Satisfaction

Batch Operations

Data Export / Import

Staging Backend (In-Memory Testing)

Seed Common Baseline

pytest Integration

API Reference

Development

Architecture

License

Links

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance