Synthetic data generation for test fixtures and demos — single brain primitive made portable. Built on GIGI's DREAM primitive (https://davisgeometric.com).

These details have not been verified by PyPI

Project links

Project description

gigi-dream

Synthetic data generation for test fixtures, dev environments, and privacy-aware demos. Statistically faithful records that aren't real records.

from gigi_dream import dream

real_customers = [
    {"age": 30, "country": "US", "salary": 75000},
    {"age": 45, "country": "CA", "salary": 95000},
    {"age": 28, "country": "US", "salary": 68000},
    # ... 100 more ...
]

result = dream(real_customers, n_samples=1000, temperature=1.0, seed=42)
print(result.records[0])
# {"age": 32.7, "country": "US", "salary": 73210.3}

$ gigi-dream customers.csv -n 1000 -o test_customers.csv
  source:      customers.csv
  output:      test_customers.csv
  backend:     local
  temperature: 1.0
  n_samples:   1000
  columns:     5

What it's for

Anywhere you need data that looks like your real data but isn't your real data:

Test fixtures — populate test databases with records that exercise edge cases
Dev environments — stop hand-rolling fake data; learn it from prod
Staging — anonymized demos with statistically faithful behavior
ML augmentation — extra training records sampled from the empirical density
Privacy-conscious onboarding — let new hires explore data shape without seeing real PII

gigi-dream is intentionally narrow: per-column distribution sampling, nothing else. Other "DREAM" features (multivariate, correlated, anisotropic, fiber-bundle native) live in the GIGI engine — gigi-dream exposes one specific brain primitive as the smallest possible installable tool.

Install

pip install gigi-dream

Optional: install with GIGI backend (requires requests):

pip install "gigi-dream[gigi]"

Optional: install with Parquet support (requires pandas + pyarrow):

pip install "gigi-dream[parquet]"

Quick start

Library

from gigi_dream import dream

# Learn the distribution from real data
real = [
    {"age": 30, "country": "US", "salary": 75000},
    {"age": 45, "country": "CA", "salary": 95000},
    {"age": 28, "country": "US", "salary": 68000},
    {"age": 51, "country": "UK", "salary": 110000},
    # ... more records ...
]

# Generate 1000 synthetic records at temperature 1.0 (faithful)
result = dream(real, n_samples=1000, temperature=1.0, seed=42)

# Inspect what was learned
for col in result.columns:
    if col.kind == "numeric":
        print(f"  {col.name}: numeric  mean={col.mean:.1f} sigma={col.sigma:.1f}")
    else:
        print(f"  {col.name}: categorical {len(col.values)} values")

# Use the synthetic records anywhere you'd use real ones
for r in result.records[:5]:
    print(r)

CLI

# Generate 1000 synthetic CSV records
gigi-dream customers.csv -n 1000 -o test_customers.csv

# Higher temperature = wider spread, more novel records
gigi-dream customers.csv -n 1000 -T 3.0 -o exotic_customers.csv

# Output to stdout for piping into other tools
gigi-dream customers.csv -n 100 | head

# Output JSON instead of CSV
gigi-dream customers.csv -n 100 --format json -o synth.json

# Reproducible — same seed gives same output
gigi-dream customers.csv -n 100 --seed 42 -o snapshot.csv

# Just inspect the column distributions, don't sample
gigi-dream customers.csv --inspect

Supported input formats: .csv, .json, .jsonl / .ndjson, .parquet (with [parquet] extra). Supported output formats: same.

Tuning

Parameter	Default	Effect
`--num` / `-n`	100	Number of synthetic records
`--temperature` / `-T`	1.0	1.0 = faithful; > 1.0 = wider; < 1.0 = tighter
`--seed`	none	Reproducibility

Temperature notes:

T = 1.0 — synthetic distribution matches the real one (~variance, ~range)
T = 2.0–4.0 — DREAM mode; ~1.4–2× wider spread; "novel-but-plausible"
T = 0.3–0.7 — synthesize tight samples near the mode; useful for "typical case" demos
T = 0 — every sample equals the per-column mean (degenerate)

How it works (v0)

gigi-dream fits an independent per-column model to your input:

Numeric columns → diagonal Gaussian with Welford-streamed mean and variance. Sample: μ + √T × σ × N(0,1).
Categorical / string / boolean columns → empirical frequency distribution. Sample: weighted choice from observed values.

Each column is sampled independently. Correlations between columns are NOT preserved in v0. If your data has strong inter-column structure (e.g., income correlates with age), use GigiBackend instead — GIGI's /brain/dream endpoint uses the engine's full Kähler-aware fit including the L13.3 diagonal-Gaussian variant of the brain primitives.

Two backends

LocalBackend (default) — pure-numpy, no infrastructure required. Use this 99% of the time.

from gigi_dream import LocalBackend, dream
result = dream(real_records, backend=LocalBackend())

GigiBackend — calls a running GIGI instance's /brain/dream endpoint. Higher-fidelity sampling for anisotropic, correlated, or multivariate data. Useful when your data is already in a GIGI bundle.

from gigi_dream import GigiBackend, dream

backend = GigiBackend(
    url="http://localhost:3142",
    api_key="dev-local",
    bundle="customers",
    fields=["age", "salary"],
)
result = dream(n_samples=1000, backend=backend)

What gigi-dream isn't

Not a differential-privacy tool. It provides statistical faithfulness, not formal DP guarantees. If you need ε-differential privacy, use a DP-specific library (e.g., diffprivlib, tumult-analytics).
Not a relational data generator. Single tables only; no FK constraints, no schema relationships. (DHOOM supports nested bundles natively, so a future version could.)
Not a model-based synthesizer. No GANs, no diffusion. The "model" is the per-column Welford fit. That's intentional — small, fast, transparent.

License

MIT. Free for any use, commercial or otherwise. See LICENSE.

GIGI — the fiber-bundle database engine; gigi-dream's GigiBackend calls it. DREAM is one of twelve brain primitives.
EpisodeKit — change-point detection using GIGI's EPISODIC primitive. Sibling project.
gigi-mind — VS Code extension exposing all twelve brain primitives. Sibling project.

Status

v0.1.0 — stable for the documented surface (CSV/JSON/JSONL + LocalBackend + CLI + GigiBackend skeleton). API may evolve in 0.x; will stabilize at 1.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gigi_dream-0.1.0.tar.gz (33.3 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gigi_dream-0.1.0-py3-none-any.whl (16.8 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file gigi_dream-0.1.0.tar.gz.

File metadata

Download URL: gigi_dream-0.1.0.tar.gz
Upload date: May 26, 2026
Size: 33.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for gigi_dream-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a8ac4ea9f10106006810bf2e12a20646ca269b33902ec46385c24b174bfc6d94`
MD5	`f2d6adba8bc31b2bf2a2c19de597ad1c`
BLAKE2b-256	`0604038cd2fcb569477cfd26b4cbee31bbc44e8407b26d908441a30a333cedae`

See more details on using hashes here.

File details

Details for the file gigi_dream-0.1.0-py3-none-any.whl.

File metadata

Download URL: gigi_dream-0.1.0-py3-none-any.whl
Upload date: May 26, 2026
Size: 16.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for gigi_dream-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75220fe38ffc6d85b124dd2380ce21173b419a86650279d5da717319c6b70417`
MD5	`c058d9618964d770e9fe52235cf986c3`
BLAKE2b-256	`a9661384ac57632db3f688eb83b3fceb4b901055c8d8a7d744f9e2c3850f7fc0`

See more details on using hashes here.

gigi-dream 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gigi-dream

What it's for

Install

Quick start

Library

CLI

Tuning

How it works (v0)

Two backends

What gigi-dream isn't

License

Related

Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes