Skip to main content

Next-generation synthetic data generation with LLM-augmented pipelines, diffusion models, and evaluation-first design

Project description

SynthForge

Next-generation synthetic data generation with LLM-augmented pipelines.

SynthForge combines statistical generative models (Gaussian Copula, CTGAN, TVAE, Diffusion) with LLM-powered intelligence (schema enrichment, PII/MNPI detection, semantic validation) to produce high-fidelity synthetic tabular data from small production samples.

Quick Start

import pandas as pd
from synthforge import SynthForge

# Load a sample from production (e.g., 2500 rows from Redshift)
df = pd.read_csv("production_sample.csv")

# One-line generation
forge = SynthForge()
synthetic_df = forge.fit_generate(df, num_rows=100_000)

# With LLM enrichment (auto-detects PII, infers semantics)
forge = SynthForge(llm_provider="anthropic", llm_model="claude-sonnet-4-20250514")
forge.profile(df)                    # Schema enrichment + PII detection
forge.fit(df)                        # Train synthesizer
synthetic_df = forge.generate(100_000)  # Bulk generate
report = forge.evaluate(df, synthetic_df)  # Quality report

Key Features

  • Intelligent Schema Detection: LLM-powered column semantic inference beyond statistical type detection
  • PII/MNPI Detection: Presidio + LLM augmentation for catching non-obvious sensitive data
  • Multiple Synthesizers: Gaussian Copula (fast), CTGAN/TVAE (balanced), TabSyn (highest quality)
  • Data-Type Strategies: Specialized pipelines for categorical, numerical, time-series, and mixed-type tables
  • Evaluation-First: Built-in quality reports with statistical fidelity, ML utility, and privacy metrics
  • Configurable Scale: From 1K to 10M+ rows with batch generation and optional GPU acceleration
  • LLM-Agnostic: Works with Claude, OpenAI, Ollama, vLLM, or any LiteLLM-supported provider

Installation

pip install synthforge                  # Core (Gaussian Copula only)
pip install "synthforge[gan]"           # + CTGAN/TVAE
pip install "synthforge[llm]"           # + LLM enrichment
pip install "synthforge[evaluation]"    # + quality reports
pip install "synthforge[all]"           # Everything

Architecture

Production Sample (DataFrame/CSV)
        │
        ▼
┌─────────────────────────────────┐
│  1. PROFILE (LLM-augmented)     │
│  • Auto-detect metadata         │
│  • Semantic column inference     │
│  • PII / MNPI detection         │
│  • Business rule extraction     │
│  • Synthesizer recommendation   │
└─────────────┬───────────────────┘
              ▼
┌─────────────────────────────────┐
│  2. FIT (Statistical/Neural)    │
│  • Reversible data transforms   │
│  • Constraint-aware training    │
│  • Auto-select or user-pick:    │
│    GaussianCopula / CTGAN /     │
│    TVAE / TabSyn / Diffusion    │
└─────────────┬───────────────────┘
              ▼
┌─────────────────────────────────┐
│  3. GENERATE (Batch)            │
│  • Configurable row count       │
│  • Batch chunking for scale     │
│  • Constraint enforcement       │
│  • PII replacement (Faker)      │
└─────────────┬───────────────────┘
              ▼
┌─────────────────────────────────┐
│  4. EVALUATE (5-layer pipeline) │
│  • Diagnostic checks            │
│  • Statistical fidelity         │
│  • ML utility (TSTR)            │
│  • Privacy (MIA, Anonymeter)    │
│  • LLM semantic validation      │
└─────────────────────────────────┘

License

Proprietary. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthforge-0.1.0.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synthforge-0.1.0-py3-none-any.whl (59.1 kB view details)

Uploaded Python 3

File details

Details for the file synthforge-0.1.0.tar.gz.

File metadata

  • Download URL: synthforge-0.1.0.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for synthforge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d32bc7968155ca2b3ffcc354d42b602837b7b0af831916c08ba28c2008a8b83c
MD5 3eec58e09ede216acdc91b222c6f3e66
BLAKE2b-256 1fbcf2276c4776400c308f2e8e4c51506fbcdd89bb8c4bcfea33051e58232a11

See more details on using hashes here.

File details

Details for the file synthforge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: synthforge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 59.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for synthforge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 832cab7229bd20479b62f92fe1a4b1b428e694b5114b44c94dfae809bff6e41f
MD5 a8e9f3ebe4117d657a230dfa4f75f168
BLAKE2b-256 f1819c459ce3e1bf72ca41a683204573c709e110d4cfcea02de0ae3b52c96f2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page