Synthetic Mock Data
Project description
Synthetic Mock Data 🔮
Create data out of nothing. Prompt LLMs for Tabular Data.
Installation
The latest release of mostlyai-mock can be installed via pip:
pip install -U mostlyai-mock
Note: An API key to a LLM endpoint, with structured response, is required. It is recommended to set such a key as an environment variable (e.g. OPENAI_API_KEY, GEMINI_API_KEY, etc.). Alternatively, the key needs to be passed to every call to the library iteself via the parameter api_key.
Quick Start
Single Table
from mostlyai import mock
tables = {
"guests": {
"description": "Guests of an Alpine ski hotel in Austria",
"columns": {
"nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
"name": {"prompt": "first name and last name of the guest", "dtype": "string"},
"gender": {"prompt": "gender of the guest; male or female", "dtype": "string"},
"age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
"date_of_birth": {"prompt": "date of birth", "dtype": "date"},
"checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
"is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
"price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
},
}
}
df = mock.sample(tables=tables, sample_size=10, model="openai/gpt-4.1-nano")
print(df)
Multiple Tables
from mostlyai import mock
tables = {
"guests": {
"description": "Guests of an Alpine ski hotel in Austria",
"columns": {
"id": {"prompt": "the unique id of the guest", "dtype": "integer"},
"name": {"prompt": "first name and last name of the guest", "dtype": "string"},
},
"primary_key": "id",
},
"purchases": {
"description": "Purchases of a Guest during their stay",
"columns": {
"guest_id": {"prompt": "the guest id for that purchase", "dtype": "integer"},
"purchase_id": {"prompt": "the unique id of the purchase", "dtype": "string"},
"text": {"prompt": "purchase text description", "dtype": "string"},
"amount": {"prompt": "purchase amount in EUR", "dtype": "float"},
},
"foreign_keys": [
{
"column": "guest_id",
"referenced_table": "guests",
"description": "each guest has anywhere between 1 and 10 purchases",
}
],
},
}
data = mock.sample(tables=tables, sample_size=5, model="openai/gpt-4.1-nano")
df_guests = data["guests"]
df_purchases = data["purchases"]
print(df_guests)
print(df_purchases)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mostlyai_mock-0.0.1.tar.gz.
File metadata
- Download URL: mostlyai_mock-0.0.1.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d05f64122adb495278d445e6f523276b23994cfdb9c5c2c61a0f6a8e673efec
|
|
| MD5 |
e3df91ffae491707f042be64565ebe9f
|
|
| BLAKE2b-256 |
bc9a56cc5c315072a12e5ffbd2942f31253bf16639932c75791d6dde3a0ad44d
|
File details
Details for the file mostlyai_mock-0.0.1-py3-none-any.whl.
File metadata
- Download URL: mostlyai_mock-0.0.1-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff98f58f5882e8aa111336d953587042ec84e15b3fea2651f3223bc20abe5022
|
|
| MD5 |
5f192098848bb6f04ec0750835e25d85
|
|
| BLAKE2b-256 |
84e01866f1e36a72a602b056554e7c8f7119c15ca703748b55271b60eb4e6fec
|