Next-generation synthetic data generation with LLM-augmented pipelines, diffusion models, and evaluation-first design
Project description
SynthForge
Next-generation synthetic data generation with LLM-augmented pipelines.
SynthForge combines statistical generative models (Gaussian Copula, CTGAN, TVAE, Diffusion) with LLM-powered intelligence (schema enrichment, PII/MNPI detection, semantic validation) to produce high-fidelity synthetic tabular data from small production samples.
Quick Start
import pandas as pd
from synthforge import SynthForge
# Load a sample from production (e.g., 2500 rows from Redshift)
df = pd.read_csv("production_sample.csv")
# One-line generation
forge = SynthForge()
synthetic_df = forge.fit_generate(df, num_rows=100_000)
# With LLM enrichment (auto-detects PII, infers semantics)
forge = SynthForge(llm_provider="anthropic", llm_model="claude-sonnet-4-20250514")
forge.profile(df) # Schema enrichment + PII detection
forge.fit(df) # Train synthesizer
synthetic_df = forge.generate(100_000) # Bulk generate
report = forge.evaluate(df, synthetic_df) # Quality report
Key Features
- Intelligent Schema Detection: LLM-powered column semantic inference beyond statistical type detection
- PII/MNPI Detection: Presidio + LLM augmentation for catching non-obvious sensitive data
- Multiple Synthesizers: Gaussian Copula (fast), CTGAN/TVAE (balanced), TabSyn (highest quality)
- Data-Type Strategies: Specialized pipelines for categorical, numerical, time-series, and mixed-type tables
- Evaluation-First: Built-in quality reports with statistical fidelity, ML utility, and privacy metrics
- Configurable Scale: From 1K to 10M+ rows with batch generation and optional GPU acceleration
- LLM-Agnostic: Works with Claude, OpenAI, Ollama, vLLM, or any LiteLLM-supported provider
Installation
pip install synthforge # Core (Gaussian Copula only)
pip install "synthforge[gan]" # + CTGAN/TVAE
pip install "synthforge[llm]" # + LLM enrichment
pip install "synthforge[evaluation]" # + quality reports
pip install "synthforge[all]" # Everything
Architecture
Production Sample (DataFrame/CSV)
│
▼
┌─────────────────────────────────┐
│ 1. PROFILE (LLM-augmented) │
│ • Auto-detect metadata │
│ • Semantic column inference │
│ • PII / MNPI detection │
│ • Business rule extraction │
│ • Synthesizer recommendation │
└─────────────┬───────────────────┘
▼
┌─────────────────────────────────┐
│ 2. FIT (Statistical/Neural) │
│ • Reversible data transforms │
│ • Constraint-aware training │
│ • Auto-select or user-pick: │
│ GaussianCopula / CTGAN / │
│ TVAE / TabSyn / Diffusion │
└─────────────┬───────────────────┘
▼
┌─────────────────────────────────┐
│ 3. GENERATE (Batch) │
│ • Configurable row count │
│ • Batch chunking for scale │
│ • Constraint enforcement │
│ • PII replacement (Faker) │
└─────────────┬───────────────────┘
▼
┌─────────────────────────────────┐
│ 4. EVALUATE (5-layer pipeline) │
│ • Diagnostic checks │
│ • Statistical fidelity │
│ • ML utility (TSTR) │
│ • Privacy (MIA, Anonymeter) │
│ • LLM semantic validation │
└─────────────────────────────────┘
License
Proprietary. All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file synthforge-0.1.0.tar.gz.
File metadata
- Download URL: synthforge-0.1.0.tar.gz
- Upload date:
- Size: 58.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d32bc7968155ca2b3ffcc354d42b602837b7b0af831916c08ba28c2008a8b83c
|
|
| MD5 |
3eec58e09ede216acdc91b222c6f3e66
|
|
| BLAKE2b-256 |
1fbcf2276c4776400c308f2e8e4c51506fbcdd89bb8c4bcfea33051e58232a11
|
File details
Details for the file synthforge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: synthforge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 59.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
832cab7229bd20479b62f92fe1a4b1b428e694b5114b44c94dfae809bff6e41f
|
|
| MD5 |
a8e9f3ebe4117d657a230dfa4f75f168
|
|
| BLAKE2b-256 |
f1819c459ce3e1bf72ca41a683204573c709e110d4cfcea02de0ae3b52c96f2f
|