Skip to main content

AI-Powered Synthetic Data Engine - Generate realistic multi-table datasets from natural language

Project description

🧠 Misata

The Intelligent Synthetic Data Engine

PyPI version Python versions License Downloads

Stop writing fake data scripts.
Generate production-grade datasets from natural language.

Quick StartFeaturesPython APIEnterprise


🚀 Why Misata?

Misata isn't just a random data generator. It's an intelligent engine that understands your business logic, relationships, and constraints. Whether you need 50 rows for unit tests or 10 million rows for load testing, Misata delivers statistically realistic data that looks and behaves like the real thing.

Feature Faker SDV Misata
Natural Language Input
Auto Schema Generation
Relational Integrity
Business Constraints
No Training Data Needed
Streaming (10M+ rows)

⚡ Quick Start

1. Install

pip install misata

2. Generate

Describe what you need in plain English. Misata handles the rest.

# Basic generation (Rule-based, instant)
misata generate --story "A SaaS platform with 50K users, monthly subscriptions, and a 20% churn rate in Q3"

# Intelligent generation (LLM-powered)
export GROQ_API_KEY=gsk_...
misata generate --story "E-commerce store with seasonal trends and customer segments" --use-llm

3. Result

Misata creates a relational schema, generates the data, and saves it to ./generated_data.

📋 Schema: SaaS_Platform
   Tables: 4 (users, subscriptions, payments, events)
   Relationships: 3
   Events: 1 (Churn Spike Q3)

🚀 Performance: 385,000 rows/second
💾 Data saved to: ./generated_data

🔥 New in v0.5.0

🔄 Schema Introspection & Seeding

Already have a database? Misata can reverse-engineer your schema and seed it with realistic data.

# 1. Introspect your existing DB
misata schema --db-url postgresql://user:pass@localhost:5432/mydb --output schema.yaml

# 2. Seed it with 100K rows of realistic data
misata generate --config schema.yaml --db-url postgresql://... --db-truncate

📈 Reverse Engineering from Charts

Describe a chart, and Misata generates the underlying data to match it.

misata graph "Monthly revenue growing from $10k to $1M over 2 years, with a dip in August"

💻 Python API

Seamlessly integrate Misata into your test suites and CI/CD pipelines.

Standard Generation

from misata import DataSimulator
from misata.llm_parser import LLMSchemaGenerator

# 1. Design schema with AI
llm = LLMSchemaGenerator(provider="groq")
config = llm.generate_from_story(
    "Healthcare app with patients, doctors, and appointments"
)

# 2. Generate data
simulator = DataSimulator(config)
for table_name, df in simulator.generate_all():
    print(f"Generated {len(df)} rows for {table_name}")
    df.to_csv(f"{table_name}.csv", index=False)

SQLAlchemy Seeding (Powerful!)

Directly seed your SQLAlchemy models without writing factories.

from misata import seed_from_sqlalchemy_models
from myapp.models import Base, engine

# Automatically analyzes your models and foreign keys
report = seed_from_sqlalchemy_models(
    engine, 
    Base, 
    default_rows=10_000, 
    create=True, 
    smart_mode=True  # Infers realistic values from column names
)

print(f"Seeded {report.total_rows} rows in {report.duration_seconds}s")

🎯 Business Constraints

Define complex rules that simple random generators can't handle.

from misata import Constraint, Table

timesheets = Table(
    name="timesheets",
    row_count=10000,
    constraints=[
        Constraint(
            name="max_daily_hours",
            type="sum_limit",
            group_by=["employee_id", "date"],
            column="hours",
            value=8.0,
            action="redistribute"  # Automatically fixes violations
        )
    ]
)

🔌 Providers

Misata supports multiple LLM providers for schema generation.

Provider Env Var Tier Best For
Groq GROQ_API_KEY Free Speed (Recommended)
OpenAI OPENAI_API_KEY Paid Quality
Ollama None Free Privacy (Local)

🏢 Enterprise

Building a platform? Misata Studio is our commercial offering for teams.

  • 🖥️ Visual Schema Editor: Drag-and-drop schema design.
  • 🔒 Privacy Filters: PII scanning and masking.
  • 📦 One-Click Deploy: Docker & Kubernetes ready.
  • 🤝 Support: Dedicated support and custom integration.

Contact Sales for a demo.


Built with ❤️ by Muhammed Rasin

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

misata-0.5.1.tar.gz (175.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

misata-0.5.1-py3-none-any.whl (177.3 kB view details)

Uploaded Python 3

File details

Details for the file misata-0.5.1.tar.gz.

File metadata

  • Download URL: misata-0.5.1.tar.gz
  • Upload date:
  • Size: 175.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for misata-0.5.1.tar.gz
Algorithm Hash digest
SHA256 ee2b11ead6eee0c8c6acca9e45a6b06a982482dc384d6126b915945bfd9e21fd
MD5 b4f068324556a447065910aaea6471ae
BLAKE2b-256 36565abe57c060ff982dfd407ff4192b755cf768c440a4fee0f499c627ba3448

See more details on using hashes here.

File details

Details for the file misata-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: misata-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 177.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for misata-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9163d53e6fa0a2665becbbf98a84b20a54c77a7c412d61f327efce33c424af65
MD5 3c37dc0c22bda0e8e5ef92285f15e77d
BLAKE2b-256 1a7c95df45e618e8c65850ad5ffb3e25d1c114d14636008fcafda12ea8887f45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page