Synthetic Mock Data
Project description
Synthetic Mock Data 🔮
Create data out of nothing. Prompt LLMs for Tabular Data.
Installation
The latest release of mostlyai-mock can be installed via pip:
pip install -U mostlyai-mock
Note: An API key to a LLM endpoint, with structured response, is required. It is recommended to set such a key as an environment variable (e.g. OPENAI_API_KEY, GEMINI_API_KEY, etc.). Alternatively, the key needs to be passed to every call to the library iteself via the parameter api_key.
Quick Start
Single Table
from mostlyai import mock
tables = {
"guests": {
"description": "Guests of an Alpine ski hotel in Austria",
"columns": {
"nationality": {"prompt": "2-letter code for the nationality", "dtype": "string"},
"name": {"prompt": "first name and last name of the guest", "dtype": "string"},
"gender": {"dtype": "category", "values": ["male", "female"]},
"age": {"prompt": "age in years; min: 18, max: 80; avg: 25", "dtype": "integer"},
"date_of_birth": {"prompt": "date of birth", "dtype": "date"},
"checkin_time": {"prompt": "the check in timestamp of the guest; may 2025", "dtype": "datetime"},
"is_vip": {"prompt": "is the guest a VIP", "dtype": "boolean"},
"price_per_night": {"prompt": "price paid per night, in EUR", "dtype": "float"},
"room_number": {"prompt": "room number", "dtype": "integer", "values": [101, 102, 103, 201, 202, 203, 204]}
},
}
}
df = mock.sample(tables=tables, sample_size=10, model="openai/gpt-4.1-nano")
print(df)
Multiple Tables
from mostlyai import mock
tables = {
"customers": {
"description": "Customers of a hardware store",
"columns": {
"customer_id": {"prompt": "the unique id of the customer", "dtype": "integer"},
"name": {"prompt": "first name and last name of the customer", "dtype": "string"},
},
"primary_key": "customer_id",
},
"orders": {
"description": "Orders of a Customer",
"columns": {
"customer_id": {"prompt": "the customer id for that order", "dtype": "integer"},
"order_id": {"prompt": "the unique id of the order", "dtype": "string"},
"text": {"prompt": "order text description", "dtype": "string"},
"amount": {"prompt": "order amount in USD", "dtype": "float"},
},
"primary_key": "order_id",
"foreign_keys": [
{
"column": "customer_id",
"referenced_table": "customers",
"description": "each customer has anywhere between 1 and 3 orders",
}
],
},
"items": {
"description": "Items in an Order",
"columns": {
"item_id": {"prompt": "the unique id of the item", "dtype": "string"},
"order_id": {"prompt": "the order id for that item", "dtype": "string"},
"name": {"prompt": "the name of the item", "dtype": "string"},
"price": {"prompt": "the price of the item in USD", "dtype": "float"},
},
"foreign_keys": [
{
"column": "order_id",
"referenced_table": "orders",
"description": "each order has between 2 and 5 items",
}
],
},
}
data = mock.sample(tables=tables, sample_size=2, model="openai/gpt-4.1")
df_customers = data["customers"]
df_orders = data["orders"]
df_items = data["items"]
print(df_customers)
print(df_orders)
print(df_items)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mostlyai_mock-0.0.5.tar.gz.
File metadata
- Download URL: mostlyai_mock-0.0.5.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b93d646d45cff3794334a91837a8cc83fdeef55e9cda9b371c911c9f8293cf31
|
|
| MD5 |
8d4f1ac3ee7a218c419e14bc5e20bd20
|
|
| BLAKE2b-256 |
c76cd35f671e7804685b49d91660e3b8c97decad1a646b4939eed691feed4e5d
|
File details
Details for the file mostlyai_mock-0.0.5-py3-none-any.whl.
File metadata
- Download URL: mostlyai_mock-0.0.5-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a834a0574d7580328de2b30b13c5ea089557d8c1475762e51865dd972a9b695a
|
|
| MD5 |
09f0072d6008065d7222b1e565f299ae
|
|
| BLAKE2b-256 |
59295a6bfcddb3f69af62fce25f162615e290ff21b5ff0d55ffe58b42e110be0
|