Skip to main content

Synthetic Mock Data

Project description

Synthetic Mock Data 🔮

Documentation stats license GitHub Release PyPI - Python Version

Create data out of nothing. Prompt LLMs for Tabular Data.

Installation

The latest release of mostlyai-mock can be installed via pip:

pip install -U mostlyai-mock

Note: An API key to a LLM endpoint, with structured response, is required. It is recommended to set such a key as an environment variable (e.g. OPENAI_API_KEY, GEMINI_API_KEY, etc.). Alternatively, the key needs to be passed to every call to the library iteself via the parameter api_key.

Quick Start

Single Table

from mostlyai import mock

tables = {
    "guests": {
        "description": "Guests of an Alpine ski hotel in Austria",
        "columns": {
            "nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
            "gender": {"prompt": "gender of the guest; male or female", "dtype": "string"},
            "age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
            "date_of_birth": {"prompt": "date of birth", "dtype": "date"},
            "checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
            "is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
            "price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
        },
    }
}
df = mock.sample(tables=tables, sample_size=10, model="openai/gpt-4.1-nano")
print(df)

Multiple Tables

from mostlyai import mock

tables = {
    "guests": {
        "description": "Guests of an Alpine ski hotel in Austria",
        "columns": {
            "id": {"prompt": "the unique id of the guest", "dtype": "integer"},
            "name": {"prompt": "first name and last name of the guest", "dtype": "string"},
        },
        "primary_key": "id",
    },
    "purchases": {
        "description": "Purchases of a Guest during their stay",
        "columns": {
            "guest_id": {"prompt": "the guest id for that purchase", "dtype": "integer"},
            "purchase_id": {"prompt": "the unique id of the purchase", "dtype": "string"},
            "text": {"prompt": "purchase text description", "dtype": "string"},
            "amount": {"prompt": "purchase amount in EUR", "dtype": "float"},
        },
        "foreign_keys": [
            {
                "column": "guest_id",
                "referenced_table": "guests",
                "description": "each guest has anywhere between 1 and 10 purchases",
            }
        ],
    },
}
data = mock.sample(tables=tables, sample_size=5, model="openai/gpt-4.1-nano")
df_guests = data["guests"]
df_purchases = data["purchases"]
print(df_guests)
print(df_purchases)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mostlyai_mock-0.0.1.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mostlyai_mock-0.0.1-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file mostlyai_mock-0.0.1.tar.gz.

File metadata

  • Download URL: mostlyai_mock-0.0.1.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mostlyai_mock-0.0.1.tar.gz
Algorithm Hash digest
SHA256 5d05f64122adb495278d445e6f523276b23994cfdb9c5c2c61a0f6a8e673efec
MD5 e3df91ffae491707f042be64565ebe9f
BLAKE2b-256 bc9a56cc5c315072a12e5ffbd2942f31253bf16639932c75791d6dde3a0ad44d

See more details on using hashes here.

File details

Details for the file mostlyai_mock-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mostlyai_mock-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mostlyai_mock-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ff98f58f5882e8aa111336d953587042ec84e15b3fea2651f3223bc20abe5022
MD5 5f192098848bb6f04ec0750835e25d85
BLAKE2b-256 84e01866f1e36a72a602b056554e7c8f7119c15ca703748b55271b60eb4e6fec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page