Skip to main content

Synthetic Mock Data

Project description

Synthetic Mock Data 🔮

Documentation stats license GitHub Release PyPI - Python Version

Create data out of nothing. Prompt LLMs for Tabular Data.

Installation

The latest release of mostlyai-mock can be installed via pip:

pip install -U mostlyai-mock

Note: An API key to a LLM endpoint, with structured response, is required. It is recommended to set such a key as an environment variable (e.g. OPENAI_API_KEY, GEMINI_API_KEY, etc.). Alternatively, the key needs to be passed to every call to the library iteself via the parameter api_key.

Quick Start

Single Table

from mostlyai import mock

tables = {
    "guests": {
        "description": "Guests of an Alpine ski hotel in Austria",
        "columns": {
            "nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
            "gender": {"dtype": "category", "values": ["male", "female"]},
            "age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
            "date_of_birth": {"prompt": "date of birth", "dtype": "date"},
            "checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
            "is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
            "price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
            "room_number": {"prompt": "room number", "dtype": "integer", "values": [101, 102, 103, 201, 202, 203, 204]}
        },
    }
}
df = mock.sample(tables=tables, sample_size=10, model="openai/gpt-4.1-nano")
print(df)

Multiple Tables

from mostlyai import mock

tables = {
    "guests": {
        "description": "Guests of an Alpine ski hotel in Austria",
        "columns": {
            "id": {"prompt": "the unique id of the guest", "dtype": "integer"},
            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
        },
        "primary_key": "id",
    },
    "purchases": {
        "description": "Purchases of a Guest during their stay",
        "columns": {
            "guest_id": {"prompt": "the guest id for that purchase", "dtype": "integer"},
            "purchase_id": {"prompt": "the unique id of the purchase", "dtype": "string"},
            "text": {"prompt": "purchase text description", "dtype": "string"},
            "amount": {"prompt": "purchase amount in EUR", "dtype": "float"},
        },
        "foreign_keys": [
            {
                "column": "guest_id",
                "referenced_table": "guests",
                "description": "each guest has anywhere between 1 and 10 purchases",
            }
        ],
    },
}
data = mock.sample(tables=tables, sample_size=5, model="openai/gpt-4.1-nano")
df_guests = data["guests"]
df_purchases = data["purchases"]
print(df_guests)
print(df_purchases)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mostlyai_mock-0.0.4.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mostlyai_mock-0.0.4-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file mostlyai_mock-0.0.4.tar.gz.

File metadata

  • Download URL: mostlyai_mock-0.0.4.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for mostlyai_mock-0.0.4.tar.gz
Algorithm Hash digest
SHA256 d266964e08e1e5a06877340158fc9d53a8f880954aad772ab9510a7714e70b1f
MD5 55253aa3921bb2770ddb072c7b0cf09c
BLAKE2b-256 08fc85c2544d6ed23c95d83845dfa74ae819c2bfaa0b9107544ada3e7e7a81c4

See more details on using hashes here.

File details

Details for the file mostlyai_mock-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: mostlyai_mock-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for mostlyai_mock-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7022e9325973b43862de58371735a10a918d7f928c8e7f1f1e07ff4ce550aeaf
MD5 54f145a3a5861a48051069812415cde8
BLAKE2b-256 487bc6807867d330f77cdcd1762d30ad4e3209850eec750286903033abdb09c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page