Skip to main content

AI-Powered Synthetic Data Engine - Generate realistic multi-table datasets from natural language

Project description

๐Ÿง  Misata

The Intelligent Synthetic Data Engine

PyPI version Python versions License Downloads

Stop writing fake data scripts.
Generate production-grade datasets from natural language.

Quick Start โ€ข Features โ€ข Python API โ€ข Enterprise


๐Ÿš€ Why Misata?

Misata isn't just a random data generator. It's an intelligent engine that understands your business logic, relationships, and constraints. Whether you need 50 rows for unit tests or 10 million rows for load testing, Misata delivers statistically realistic data that looks and behaves like the real thing.

Feature Faker SDV Misata
Natural Language Input โŒ โŒ โœ…
Auto Schema Generation โŒ โŒ โœ…
Relational Integrity โŒ โœ… โœ…
Business Constraints โŒ โŒ โœ…
No Training Data Needed โœ… โŒ โœ…
Streaming (10M+ rows) โŒ โŒ โœ…

โšก Quick Start

1. Install

pip install misata

2. Generate

Describe what you need in plain English. Misata handles the rest.

# Basic generation (Rule-based, instant)
misata generate --story "A SaaS platform with 50K users, monthly subscriptions, and a 20% churn rate in Q3"

# Intelligent generation (LLM-powered)
export GROQ_API_KEY=gsk_...
misata generate --story "E-commerce store with seasonal trends and customer segments" --use-llm

3. Result

Misata creates a relational schema, generates the data, and saves it to ./generated_data.

๐Ÿ“‹ Schema: SaaS_Platform
   Tables: 4 (users, subscriptions, payments, events)
   Relationships: 3
   Events: 1 (Churn Spike Q3)

๐Ÿš€ Performance: 385,000 rows/second
๐Ÿ’พ Data saved to: ./generated_data

๐Ÿ”ฅ New in v0.5.2 โ€” The Realism Engine

Every column is now aware of every other column. Misata generates data that is mathematically consistent, not randomly independent.

What makes this different from Faker?

                 Faker/Random              Misata v0.5.2
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
order.total      $847.23 (random)          $847.23 = $798.50 + $29.99 + $18.74
product.cost     $96.00 (> price!)         $41.20 (43% of price $95.81)
line_total       $3,291.00 (random)        $3,291.00 = 5 ร— $662.00 โˆ’ $19.00
user.email       luke.ri@wanadoo.co.uk     emma.chen@gmail.com (from name)
rating           137 (wat?)                4 โ˜… (J-curve weighted)
categories       "Hypothyroidism"          "Electronics"
delivered_at     2021-01-03 (before order) 2024-03-15 (+7 days after order)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Row counts       100 ร— every table         15 categories, 500 order_items

Smart Row Proportions

Misata analyzes your FK graph to size tables realistically:

misata generate --db-url sqlite:///shop.db --smart --rows 100

# categories:    15   (reference โ€” fewer, no duplicates)
# users:        100   (entities โ€” your base count)
# products:     250   (entities with variety)
# orders:       250   (transactions โ€” more than users)
# order_items:  500   (line items โ€” most rows)
# reviews:      150   (activity โ€” subset of orders)

Seed Any Existing Database

# PostgreSQL, MySQL, SQLite โ€” just point and seed
misata generate \
  --db-url postgresql://user:pass@localhost:5432/mydb \
  --smart --rows 10000 --db-truncate

๐Ÿ’ป Python API

Seamlessly integrate Misata into your test suites and CI/CD pipelines.

Standard Generation

from misata import DataSimulator
from misata.llm_parser import LLMSchemaGenerator

# 1. Design schema with AI
llm = LLMSchemaGenerator(provider="groq")
config = llm.generate_from_story(
    "Healthcare app with patients, doctors, and appointments"
)

# 2. Generate data
simulator = DataSimulator(config)
for table_name, df in simulator.generate_all():
    print(f"Generated {len(df)} rows for {table_name}")
    df.to_csv(f"{table_name}.csv", index=False)

SQLAlchemy Seeding (Powerful!)

Directly seed your SQLAlchemy models without writing factories.

from misata import seed_from_sqlalchemy_models
from myapp.models import Base, engine

# Automatically analyzes your models and foreign keys
report = seed_from_sqlalchemy_models(
    engine, 
    Base, 
    default_rows=10_000, 
    create=True, 
    smart_mode=True  # Infers realistic values from column names
)

print(f"Seeded {report.total_rows} rows in {report.duration_seconds}s")

๐ŸŽฏ Business Constraints

Define complex rules that simple random generators can't handle.

from misata import Constraint, Table

timesheets = Table(
    name="timesheets",
    row_count=10000,
    constraints=[
        Constraint(
            name="max_daily_hours",
            type="sum_limit",
            group_by=["employee_id", "date"],
            column="hours",
            value=8.0,
            action="redistribute"  # Automatically fixes violations
        )
    ]
)

๐Ÿ”Œ Providers

Misata supports multiple LLM providers for schema generation.

Provider Env Var Tier Best For
Groq GROQ_API_KEY Free Speed (Recommended)
OpenAI OPENAI_API_KEY Paid Quality
Ollama None Free Privacy (Local)

๐Ÿข Enterprise

Building a platform? Misata Studio is our commercial offering for teams.

  • ๐Ÿ–ฅ๏ธ Visual Schema Editor: Drag-and-drop schema design.
  • ๐Ÿ”’ Privacy Filters: PII scanning and masking.
  • ๐Ÿ“ฆ One-Click Deploy: Docker & Kubernetes ready.
  • ๐Ÿค Support: Dedicated support and custom integration.

Contact Sales for a demo.


Built with โค๏ธ by Muhammed Rasin

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

misata-0.5.2.tar.gz (186.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

misata-0.5.2-py3-none-any.whl (187.0 kB view details)

Uploaded Python 3

File details

Details for the file misata-0.5.2.tar.gz.

File metadata

  • Download URL: misata-0.5.2.tar.gz
  • Upload date:
  • Size: 186.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for misata-0.5.2.tar.gz
Algorithm Hash digest
SHA256 a1d31d2bd154f5d1b41a07f0cffe04a50555bae0ae91c3487aefef88a8b7a007
MD5 51b3263cda76aa52ca89789bccb6e1e5
BLAKE2b-256 9692e68c3de16d1f5102044df5fa36fe7ac3d342b8bdb7211641f09dbde8d503

See more details on using hashes here.

File details

Details for the file misata-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: misata-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 187.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for misata-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2c3e1bfebb9dbebefc6f1aeb1171f91200bea5703b14ee5ac02b44e2e2eecc0d
MD5 39441474bb8003d30e05507afb32d093
BLAKE2b-256 119fd78e62b1b8c7743b2f813fcf6d3f1da64230a3cf4483db896658b33d5982

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page