Skip to main content

Synthetic Mock Data

Project description

Synthetic Mock Data 🔮

Documentation stats license GitHub Release PyPI - Python Version

Create data out of nothing. Prompt LLMs for Tabular Data.

Installation

The latest release of mostlyai-mock can be installed via pip:

pip install -U mostlyai-mock

Note: An API key to a LLM endpoint, with structured response, is required. It is recommended to set such a key as an environment variable (e.g. OPENAI_API_KEY, GEMINI_API_KEY, etc.). Alternatively, the key needs to be passed to every call to the library iteself via the parameter api_key.

Quick Start

Single Table

from mostlyai import mock

tables = {
    "guests": {
        "description": "Guests of an Alpine ski hotel in Austria",
        "columns": {
            "nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
            "gender": {"dtype": "category", "values": ["male", "female"]},
            "age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
            "date_of_birth": {"prompt": "date of birth", "dtype": "date"},
            "checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
            "is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
            "price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
            "room_number": {"prompt": "room number", "dtype": "integer", "values": [101, 102, 103, 201, 202, 203, 204]}
        },
    }
}
df = mock.sample(tables=tables, sample_size=10, model="openai/gpt-4.1-nano")
print(df)

Multiple Tables

from mostlyai import mock

tables = {
    "customers": {
        "description": "Customers of a hardware store",
        "columns": {
            "customer_id": {"prompt": "the unique id of the customer", "dtype": "integer"},
            "name": {"prompt": "first name and last name of the customer", "dtype": "string"},
        },
        "primary_key": "customer_id",
    },
    "orders": {
        "description": "Orders of a Customer",
        "columns": {
            "customer_id": {"prompt": "the customer id for that order", "dtype": "integer"},
            "order_id": {"prompt": "the unique id of the order", "dtype": "string"},
            "text": {"prompt": "order text description", "dtype": "string"},
            "amount": {"prompt": "order amount in USD", "dtype": "float"},
        },
        "primary_key": "order_id",
        "foreign_keys": [
            {
                "column": "customer_id",
                "referenced_table": "customers",
                "description": "each customer has anywhere between 1 and 3 orders",
            }
        ],
    },
    "items": {
        "description": "Items in an Order",
        "columns": {
            "item_id": {"prompt": "the unique id of the item", "dtype": "string"},
            "order_id": {"prompt": "the order id for that item", "dtype": "string"},
            "name": {"prompt": "the name of the item", "dtype": "string"},
            "price": {"prompt": "the price of the item in USD", "dtype": "float"},
        },
        "foreign_keys": [
            {
                "column": "order_id",
                "referenced_table": "orders",
                "description": "each order has between 2 and 5 items",
            }
        ],
    },
}
data = mock.sample(tables=tables, sample_size=2, model="openai/gpt-4.1")
df_customers = data["customers"]
df_orders = data["orders"]
df_items = data["items"]
print(df_customers)
print(df_orders)
print(df_items)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mostlyai_mock-0.0.5.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mostlyai_mock-0.0.5-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file mostlyai_mock-0.0.5.tar.gz.

File metadata

  • Download URL: mostlyai_mock-0.0.5.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for mostlyai_mock-0.0.5.tar.gz
Algorithm Hash digest
SHA256 b93d646d45cff3794334a91837a8cc83fdeef55e9cda9b371c911c9f8293cf31
MD5 8d4f1ac3ee7a218c419e14bc5e20bd20
BLAKE2b-256 c76cd35f671e7804685b49d91660e3b8c97decad1a646b4939eed691feed4e5d

See more details on using hashes here.

File details

Details for the file mostlyai_mock-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: mostlyai_mock-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for mostlyai_mock-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a834a0574d7580328de2b30b13c5ea089557d8c1475762e51865dd972a9b695a
MD5 09f0072d6008065d7222b1e565f299ae
BLAKE2b-256 59295a6bfcddb3f69af62fce25f162615e290ff21b5ff0d55ffe58b42e110be0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page