A synthetic data generation library with 700+ built-in data fields across 22 categories.

These details have not been verified by PyPI

Project links

Project description

🎲 Iki Data Generator

Generate realistic, diverse synthetic data with 700+ built-in fields across 22 categories. Perfect for testing, development, and prototyping — without the legal baggage of real data.

What Is This?

Iki Data Generator is a Python library that creates synthetic datasets on demand. Instead of wrestling with dummy data or copy-pasting fake records, you define a schema (which fields you want), call .many(n) to generate n records, and export them to CSV, JSON, SQL, Excel, Parquet, or 10+ other formats. That's it.

It's built for developers who need:

Test data for unit/integration tests
Demo data for presentations or prototypes
Mock databases for local development
Privacy-friendly datasets with realistic properties but zero personal info
Performance testing with large datasets

Why Use Iki Data Generator?

✅ You Get

Benefit	What It Means
700+ Fields	First name, email, credit card, medical codes, stock prices, cryptocurrencies, ML metrics, etc.
22 Categories	Personal, Finance, Commerce, Healthcare, Location, Education, Legal, AI/ML, and more
Easy Schema	Simple string shortcuts or full control with dicts
Flexible Export	CSV, JSON, SQL, Parquet, DuckDB, Excel, XML, TSV, Firebase, more
Zero Dependencies on Real Data	No need to anonymize or worry about PII
Blazing Fast	Generates thousands of records instantly
Extensible	Add custom providers for domain-specific fields

❌ You Don't Get

No real person's data
No need for data anonymization lawyers
No internet calls to fake APIs
No massive CSV files to download and commit

Installation

From PyPI (recommended)

pip install iki-data-generator

From Source

git clone https://github.com/ikidevz/IkiDataGenerator.git
cd Iki-Data-Generator
pip install -e .

Requirements

Python ≥ 3.10
Dependencies: duckdb, pandas, pyarrow, numpy, openpyxl, bcrypt, and a few others (installed automatically)

Quick Start (60 Seconds)

The Simplest Example

from ikidatagen import IkiDataGenerator

# Define what fields you want
schema = ["first_name", "last_name", "email_address", "gender_binary"]

# Generate 100 records
data = IkiDataGenerator(schema).many(100).export("users")

Result: You now have output/users.csv and output/users.json with 100 realistic user records.

A More Realistic Example

from ikidatagen import IkiDataGenerator

schema = [
    {
        "label": "User ID",
        "key_label": "row_number",
        "options": {"blank_percentage": 0}  # No blanks for ID
    },
    "first_name",
    "last_name",
    "email_address",
    {
        "label": "Account Created",
        "key_label": "current_timestamp",
        "options": {"blank_percentage": 5}  # 5% will be blank
    },
    {
        "label": "IP Address",
        "key_label": "ip_address_v4",
        "options": {"blank_percentage": 25}  # 25% will be blank
    },
    {
        "label": "Full Profile",
        "key_label": "template",
        "options": {
            "template": "{{first_name}} {{last_name}} ({{email_address}})"
        }
    },
]

# Generate 500 records and save to both CSV and JSON
IkiDataGenerator(schema).many(500).export("users", formats=["csv", "json"])

Result: output/users.csv and output/users.json with 500 complete user records, ready to use.

Schema Definition

The schema is the heart of Iki Data Generator. It tells the library what fields to generate.

Schema Entry Types

1. Simple String (Shorthand)

schema = ["first_name", "last_name", "email_address"]
# Generates fields with default settings, no options

2. Full Control (Dict)

schema = [
    {
        "key_label": "email_address",      # Required: which provider to use
        "label": "Email",                  # Optional: output column name (defaults to key_label)
        "group": "personal",               # Optional: provider category (auto-resolved if omitted)
        "options": {"blank_percentage": 10}  # Optional: provider-specific config
    }
]

Key Parameters

Parameter	Required?	Description
key_label	✅ Yes	The provider name (e.g., `first_name`, `credit_card_number`)
label	❌ No	How to name the output column (defaults to `key_label`)
group	❌ No	Provider category (auto-resolved from registry; override if needed)
options	❌ No	Provider-specific settings (e.g., `blank_percentage`, `template`)

Available Options (Common)

Option	Type	Example	Effect
blank_percentage	int/float	`10`	Percentage of records where field is empty
template	str	`"{{first_name}} {{last_name}}"`	Combine fields with template syntax
pattern	str	`"[A-Z]{3}-\d{4}"`	Regex pattern (for `regular_expression` provider)
min, max	int/float	`min=0, max=100`	Range for numeric fields

22 Data Categories

Iki Data Generator organizes 700+ fields into 22 categories. Here's a quick overview:

🧑 Personal (Name, Gender, Passport, etc.)

first_name, last_name, middle_name, gender_binary, gender_spectrum, title, passport_number, ssn, nationality, etc.

💰 Finance (Credit Cards, Banking, Currency)

credit_card_number, credit_card_type, iban, bban, currency, currency_code, money, salary_range, stock_market, etc.

🛍️ Commerce (Products, Orders, Pricing)

product_name, product_category, product_price, barcode_ean13, order_status, payment_method, invoice_number, delivery_status, coupon_code, etc.

📧 Communication (Email, Phone, Social)

email_address, phone_number, username, social_media_handle, chat_message, contact_name, etc.

🏗️ Construction (Building, Materials, Codes)

construction_code, building_type, material_type, foundation_type, roof_type, door_type, etc.

💻 Tech/IT (Programming, Frameworks, Version)

programming_language, software_framework, version_number, log_level, http_status_code, file_extension, mime_type, etc.

🏥 Healthcare (Diseases, Medications, Medical Codes)

disease_name, symptom_name, medication_name, blood_type, vaccination_status, ICD10_diagnosis, ICD9_diagnosis, HCPCS_code, etc.

🌍 Location (Countries, Cities, Addresses)

country, state, city, street_address, postal_code, latitude, longitude, timezone, airport_code, etc.

📚 Education (Schools, Courses, Subjects)

university_name, degree, major, course_name, subject, educational_attainment, etc.

⚖️ Legal (Laws, Contracts, Jurisdictions)

legal_entity_type, contract_type, jurisdiction, court_type, legal_case_status, legal_term, etc.

🎬 Entertainment (Movies, Books, Games)

movie_title, movie_genre, book_title, author_name, video_game_title, music_genre, song_title, etc.

🌿 Nature (Plants, Animals, Weather)

plant_name, animal_name, tree_name, flower_name, weather_condition, season, biome, etc.

🚗 Automotive (Cars, VINs, Fuel)

car_make, car_model, car_vin, vehicle_type, license_plate, engine_type, fuel_type, transmission_type, etc.

💱 Cryptocurrency (Coins, Blockchain, Wallets)

crypto_currency, crypto_address, crypto_transaction_id, blockchain_type, smart_contract_language, etc.

🎮 Gaming (Characters, Items, Guilds)

character_class, character_race, game_genre, npc_name, item_type, quest_name, guild_name, etc.

🎵 Music (Artists, Albums, Genres)

artist_name, album_name, song_title, music_genre, instrument_name, music_production_software, etc.

📱 Marketing/Media (Campaigns, Analytics, Content)

campaign_name, social_media_platform, marketing_channel, content_type, target_audience, analytics_metric, etc.

🌐 Political (Countries, Parties, Elections)

political_party, political_ideology, election_type, government_structure, diplomatic_title, etc.

📝 Advanced (Templates, Regex, Lambdas)

template (combine fields), regular_expression (match patterns), lambda (custom Python), json_array, url, digit_sequence, character_sequence, etc.

🤖 AI/ML (Models, Metrics, Training)

model_type, model_framework, model_task, model_version, model_latency, model_confidence, cpu_utilization, gpu_utilization, data_drift_score, inference_result, etc.

✨ Basic (Utilities, Random, Generators)

row_number, blank, boolean, number, datetime, color, emoji, password, password_hash, isbn, ulid, sentiment, words, sentences, paragraphs, etc.

🎲 Miscellaneous (Random & Fun)

dice_roll, coin_flip, rating, frequency, priority_level, dimension, duration, height, weight, temperature, etc.

Export Formats

Generate data in your preferred format:

# Export to multiple formats at once
IkiDataGenerator(schema).many(1000).export("dataset", formats=[
    "csv",      # Comma-separated values
    "json",     # JSON array of objects
    "sql",      # SQL INSERT statements
    "parquet",  # Apache Parquet (columnar)
    "excel",    # Excel workbook (.xlsx)
    "duckdb",   # DuckDB database
    # "tsv", "xml", "cql", "firebase", "dbunit" also supported
])

Format	File Extension	Best For	Notes
CSV	`.csv`	Spreadsheets, import tools	Universal format
JSON	`.json`	APIs, JavaScript, NoSQL	Pretty-printed with indent=2
SQL	`.sql`	Databases	INSERT statements (specify table name)
TSV	`.tsv`	Tab-delimited data	Alternative to CSV
Excel	`.xlsx`	Business reports	Native Excel format
Parquet	`.parquet`	Big Data, Pandas, BI tools	Efficient columnar storage
DuckDB	`.duckdb`	Analytics, SQL queries	Embedded database
XML	`.xml`	Legacy systems, config	Structured XML export
Firestore	`.json`	Firebase/Firestore	Firebase-ready format
DBUnit	`.xml`	Testing frameworks	DBUnit test data format
CQL	`.cql`	Cassandra databases	CQL INSERT statements

API Reference

`IkiDataGenerator(schema)`

Initialize the generator with a schema.

Parameters:

schema (list): List of field names (strings) or field configs (dicts)

Returns: IkiDataGenerator instance

gen = IkiDataGenerator(["first_name", "email_address"])

`.many(n)`

Generate n records.

Parameters:

n (int): Number of records to generate

Returns: BaseGenerator instance

records = gen.many(100)

`.export(name, formats=None)`

Export records to file(s).

Parameters:

name (str): Output filename (without extension)
formats (list, optional): File formats to export. Defaults to ["csv", "json"]

Returns: None (files saved to output/ folder)

gen.many(100).export("users", formats=["csv", "json", "sql"])
# Creates: output/users.csv, output/users.json, output/users.sql

`KEY_LABEL_REGISTRY`

Global dictionary mapping all 700+ field names to their categories.

from ikidatagen import KEY_LABEL_REGISTRY

print(KEY_LABEL_REGISTRY["email_address"])  # → "personal"
print(KEY_LABEL_REGISTRY["credit_card_number"])  # → "commerce"

`ProviderFactory`

Advanced: dynamically load providers.

from ikidatagen import ProviderFactory

provider = ProviderFactory.create("email_address")
email = provider.generate()

Advanced Examples

Example 1: E-Commerce Dataset with Templates

from ikidatagen import IkiDataGenerator

schema = [
    {"key_label": "row_number", "label": "Order ID"},
    {"key_label": "current_timestamp", "label": "Created At"},
    {"key_label": "customer_name", "label": "Customer"},
    {"key_label": "email_address", "label": "Email"},
    {"key_label": "product_name", "label": "Product"},
    {"key_label": "product_price", "label": "Price"},
    {
        "key_label": "template",
        "label": "Description",
        "options": {
            "template": "Order for {{product_name}} by {{customer_name}} ({{email_address}})"
        }
    },
    {"key_label": "order_status", "label": "Status"},
    {
        "key_label": "ip_address_v4",
        "label": "IP",
        "options": {"blank_percentage": 20}  # 20% missing
    }
]

IkiDataGenerator(schema).many(1000).export("orders", formats=["csv", "json"])

Example 2: Healthcare Records

from ikidatagen import IkiDataGenerator

schema = [
    "row_number",
    "first_name",
    "last_name",
    "date_of_birth",
    "blood_type",
    "disease_name",
    "medication_name",
    "icd10_diagnosis",
    {
        "key_label": "current_timestamp",
        "label": "last_visit",
        "options": {"blank_percentage": 10}
    },
]

IkiDataGenerator(schema).many(500).export("patients", formats=["csv", "json", "sql"])

Example 3: Test Data with Blanks and Validation

from ikidatagen import IkiDataGenerator

schema = [
    {"key_label": "username", "options": {"blank_percentage": 0}},      # No blanks
    {"key_label": "email_address", "options": {"blank_percentage": 0}},  # No blanks
    {
        "key_label": "phone_number",
        "options": {"blank_percentage": 30}  # 30% missing phones
    },
    {
        "key_label": "address_line_1",
        "options": {"blank_percentage": 5}
    },
]

data = IkiDataGenerator(schema).many(10000).export("test_users", formats=["json"])

Example 4: AI/ML Metrics Dataset

from ikidatagen import IkiDataGenerator

schema = [
    "row_number",
    "model_type",
    "model_framework",
    "model_task",
    "model_version",
    "model_latency",
    "model_confidence",
    "cpu_utilization",
    "gpu_utilization",
    "memory_footprint",
    "inference_result",
    "inference_endpoint",
    "current_timestamp",
]

IkiDataGenerator(schema).many(5000).export("ml_metrics", formats=["parquet", "json"])

Example 5: Custom Schema with Explicit Groups

from ikidatagen import IkiDataGenerator

schema = [
    {"key_label": "username", "group": "personal"},
    {"key_label": "email_address", "group": "personal"},
    {"key_label": "product_name", "group": "commerce"},
    {"key_label": "currency", "group": "commerce"},
    {
        "key_label": "regular_expression",
        "group": "advanced",
        "label": "Custom Pattern",
        "options": {"pattern": "[A-Z]{2}[0-9]{4}"}
    }
]

IkiDataGenerator(schema).many(100).export("mixed_data")

Configuration & Options

Blank Percentage

Control how many records have empty values for a field:

{
    "key_label": "phone_number",
    "options": {"blank_percentage": 25}  # 25% of records will have empty phone
}

Templates

Combine fields with {{field_name}} syntax:

{
    "key_label": "template",
    "options": {
        "template": "Full Name: {{first_name}} {{last_name}}, Email: {{email_address}}"
    }
}

Regular Expressions

Generate data matching a pattern:

{
    "key_label": "regular_expression",
    "options": {
        "pattern": "[A-Z]{3}-[0-9]{5}"  # Generates: ABC-12345
    }
}

Custom List

Pick from a list of values:

{
    "key_label": "custom_list",
    "options": {
        "values": ["Active", "Inactive", "Pending"]
    }
}

Number Range

Generate numbers within a range:

{
    "key_label": "number",
    "options": {
        "min": 0,
        "max": 100
    }
}

📚 Examples & Demonstrations

The examples/ folder contains 45+ ready-to-run scripts demonstrating all features and providers across 22 categories.

Quick Start Examples

# Run the absolute simplest example
python examples/00_quick_start.py

# Explore basic fields
python examples/01_basic_fields.py

# Test all export formats
python examples/02_export_formats.py

Run Examples by Category

Personal & Identity (3 examples)

python examples/10_personal_data.py          # Names, gender, dates
python examples/11_contact_info.py           # Email, phone, social
python examples/12_identity_documents.py     # Passports, SSN, IDs

E-Commerce & Shopping (4 examples)

python examples/20_ecommerce_shop.py         # Products with pricing
python examples/21_shopping_cart.py          # Complete orders
python examples/22_inventory_management.py   # Stock & inventory
python examples/23_payment_processing.py     # Payments & invoices

Finance & Banking (5 examples)

python examples/30_bank_accounts.py          # Bank accounts
python examples/31_credit_cards.py           # Credit card data
python examples/32_transactions.py           # Transfers & withdrawals
python examples/33_investment_portfolio.py   # Stocks & investments
python examples/34_crypto_blockchain.py      # Cryptocurrency wallets

Healthcare & Medical (3 examples)

python examples/40_patient_records.py        # Patient demographics
python examples/41_medical_diagnosis.py      # Diagnoses & ICD codes
python examples/42_medications.py            # Prescriptions & dosages

Location & Geography (2 examples)

python examples/50_addresses.py              # Addresses & coordinates
python examples/51_international_locations.py # Countries & cities

Education (1 example)

python examples/60_student_records.py        # Students & enrollment

Automotive (1 example)

python examples/70_car_inventory.py          # Cars, models, pricing

Entertainment & Gaming (2 examples)

python examples/90_gaming_players.py         # Gaming characters & guilds
python examples/92_entertainment.py          # Movies, books, music

Tech & Programming (2 examples)

python examples/100_programming_data.py      # Languages & frameworks
python examples/110_ml_models.py             # ML model metadata
python examples/111_ml_metrics.py            # Model performance metrics

Advanced Features (3 examples)

python examples/200_templates.py             # Combining fields with {{field}}
python examples/201_regex_patterns.py        # Custom regex patterns
python examples/203_blank_percentages.py     # Missing data simulation

Real-World Scenarios (7 complete systems)

python examples/300_saas_users.py            # SaaS with subscriptions
python examples/301_social_network.py        # Social media platform
python examples/302_analytics_events.py      # Event tracking (5000 events)
python examples/303_ecommerce_platform.py    # Complete e-commerce
python examples/304_travel_booking_system.py # Flights, hotels, bookings
python examples/308_hospital_system.py       # Hospital management
python examples/309_school_system.py         # University system

Batch & Large Datasets (3 examples)

python examples/400_mixed_categories.py      # Multiple categories mixed
python examples/401_batch_processing.py      # Batch generate 4 datasets
python examples/403_large_dataset.py         # Generate 1M+ records

Specialized Use Cases (4 examples)

python examples/500_test_data_unit_tests.py  # Unit test fixtures
python examples/501_load_testing_data.py     # Load testing (100K events)
python examples/502_demo_data.py             # Demo/presentation data
python examples/504_api_response_mocking.py  # Mock API responses

Complete Feature Showcase

python examples/999_showcase_all_features.py # All 12 features in one!

Run All Examples at Once

To generate data from all 45+ examples in one command:

# Generate everything
for file in examples/[0-9]*.py; do
    echo "Running $file..."
    python "$file"
done

Or on Windows (PowerShell):

Get-ChildItem examples\*.py -Filter "[0-9]*" | ForEach-Object {
    Write-Host "Running $($_.Name)..."
    python $_.FullName
}

View Generated Output

All examples save data to the output/ folder:

output/
├── quick_start.csv
├── quick_start.json
├── personal_data.csv
├── ecommerce_products.parquet
├── medical_diagnosis.json
├── ml_metrics.parquet
├── large_dataset.parquet
└── ... (40+ more files)

Learning Path

Beginner: Start with simple examples and work up

00_quick_start → 01_basic_fields → 02_export_formats → 10_personal_data → 20_ecommerce_shop

Intermediate: Explore categories and features

30_bank_accounts → 40_patient_records → 50_addresses → 200_templates → 201_regex_patterns

Advanced: Complex real-world systems and large datasets

300_saas_users → 303_ecommerce_platform → 308_hospital_system → 400_mixed_categories → 403_large_dataset

Examples Summary

Category	Examples	Records	Topics
Getting Started	3	50-100	Basics, fields, formats
Personal	3	50-300	Names, IDs, documents
Commerce	4	300-2000	Products, orders, inventory
Finance	5	200-1000	Banking, cards, stocks, crypto
Healthcare	3	300-500	Patients, diagnoses, meds
Location	2	500	Addresses, coordinates, countries
Education	1	400	Students, courses, degrees
Automotive	1	600	Cars, models, registration
Entertainment	2	500-2000	Gaming, movies, books, music
Tech	2	200-5000	Languages, frameworks, ML
Advanced	3	50-500	Templates, regex, blanks
Real-World	7	300-5000	Complete systems
Batch/Large	3	100K-1M	Performance, scale
Specialized	4	50-100K	Testing, mocking, load tests
Total	45+	50 to 1M+	All features

Modify Examples for Your Needs

All examples are templates—feel free to copy and modify:

# Copy an example as a starting point
cp examples/20_ecommerce_shop.py my_custom_dataset.py

# Edit and run your custom version
python my_custom_dataset.py

Example Structure

Every example follows this simple pattern:

from ikidatagen import IkiDataGenerator

# 1. Define schema
schema = [
    "first_name",
    "last_name",
    "email_address",
    # ... more fields
]

# 2. Generate data
IkiDataGenerator(schema).many(100).export("my_data", formats=["csv", "json"])

# 3. Check output/ folder

Project Structure

Iki-Data-Generator/
├── examples/                    # 45+ example scripts
│   ├── README.md                # Examples guide
│   ├── 00_quick_start.py        # Simplest example
│   ├── 01_basic_fields.py
│   ├── 20_ecommerce_shop.py
│   ├── 300_saas_users.py
│   ├── 403_large_dataset.py
│   └── 999_showcase_all_features.py  # All features!
├── src/ikidatagen/              # Main package
│   ├── __init__.py              # Public API
│   ├── core.py                  # Main IkiDataGenerator class
│   ├── base_generator.py        # Data generation logic
│   ├── exporters.py             # Export to CSV, JSON, SQL, etc.
│   ├── provider_factory.py      # Dynamic provider loading
│   ├── schema_registry.py       # Maps field names to categories
│   ├── payload.py               # Data payload handling
│   ├── dataset_manager.py       # Dataset management
│   ├── external_datasets/       # External data files
│   │   ├── csv/                 # 30+ CSV files (countries, airlines, etc.)
│   │   └── json/                # 25+ JSON files (advanced data)
│   └── providers/               # Data providers (700+ fields)
│       ├── advanced/            # Template, Regex, Lambda, etc.
│       ├── ai/                  # ML/AI metrics
│       ├── basic/               # Names, dates, colors, etc.
│       ├── car/                 # Vehicle data
│       ├── commerce/            # Products, orders, payments
│       ├── communication/       # Email, phone, social
│       ├── construction/        # Building codes, materials
│       ├── crypto/              # Cryptocurrency data
│       ├── education/           # Schools, degrees, subjects
│       ├── finance/             # Credit cards, banking
│       ├── gaming/              # Characters, items, guilds
│       ├── health/              # Medical codes, symptoms
│       ├── it/                  # Programming, frameworks
│       ├── legal/               # Laws, contracts
│       ├── location/            # Countries, cities, addresses
│       ├── marketing/           # Campaigns, channels
│       ├── misc/                # Miscellaneous data
│       ├── music/               # Artists, albums, genres
│       ├── nature/              # Plants, animals, weather
│       ├── personal/            # Names, gender, documents
│       ├── political/           # Parties, elections
│       ├── products/            # Product categories
│       ├── sports/              # Athletes, teams, leagues
│       └── travel/              # Airlines, hotels, destinations
├── output/                      # Generated data (CSV, JSON, etc.)
├── main.py                      # Example usage
├── pyproject.toml               # Package metadata
├── requirements.txt             # Dependencies
└── README.md                    # This file

How It Works (Behind the Scenes)

Schema Parsing: You provide a list of fields (strings or dicts)
Provider Resolution: Each field name is looked up in KEY_LABEL_REGISTRY to find its category
Dynamic Loading: The appropriate provider class is loaded from providers/{category}/{field}.py
Generation: Each provider generates realistic data for n records
Template Processing: Template fields combine other fields using {{field}} syntax
Blank Handling: Records marked for blanks are cleared based on blank_percentage
Export: Data is serialized to your chosen format(s) and saved to output/

Common Issues & Solutions

❌ "Unknown key_label 'xxx'"

Problem: You used a field name that doesn't exist.

Solution: Check KEY_LABEL_REGISTRY or review the 22 categories above. Did you spell it correctly? (Use underscores, lowercase.)

# ❌ Wrong
schema = ["firstName"]  # camelCase? No!

# ✅ Correct
schema = ["first_name"]  # snake_case? Yes!

❌ "No data to export"

Problem: .many(0) or empty schema.

Solution: Generate at least 1 record.

# ❌ Wrong
IkiDataGenerator(schema).many(0).export("data")

# ✅ Correct
IkiDataGenerator(schema).many(100).export("data")

❌ Export folder not found

Problem: output/ directory doesn't exist.

Solution: The library creates it automatically. Make sure you have write permissions.

❌ Template field not rendering

Problem: {{field_name}} not being replaced.

Solution: Ensure the referenced field exists in your schema and the spelling matches exactly.

# ❌ Wrong
{
    "key_label": "template",
    "options": {"template": "Name: {{first_name}} {{FirstName}}"}  # FirstName ≠ first_name
}

# ✅ Correct
{
    "key_label": "template",
    "options": {"template": "Name: {{first_name}} {{last_name}}"}
}

Performance Tips

Generating Large Datasets

Use Parquet or DuckDB formats for large datasets (smaller file sizes, faster I/O)
DuckDB is perfect for immediate querying: import duckdb; duckdb.sql("SELECT * FROM 'data.duckdb'")
For 1M+ records, generate in batches to manage memory

# ✅ Generate in chunks
for i in range(10):
    IkiDataGenerator(schema).many(100_000).export(f"chunk_{i}")

Field Selection

Only include fields you need (reduces generation time)
Skip fields with expensive generation (e.g., password_hash)

Export Efficiency

# ✅ Smart exports
IkiDataGenerator(schema).many(1_000_000).export("big_data", formats=["parquet"])

# ❌ Avoid exporting to many formats at once
# IkiDataGenerator(schema).many(1_000_000).export("data", formats=["csv", "json", "sql", "excel"])

Contributing

Have ideas? Want to add new providers or categories? Open a PR!

New Provider: Add a file to src/ikidatagen/providers/{category}/{field_name}.py
New Category: Create a folder in providers/ and add your providers
Update Registry: Edit schema_registry.py to register new fields
Tests: Add tests for new providers

License

MIT License — use it freely in personal and commercial projects.

Links & Resources

GitHub: github.com/ikidevz/IkiDataGenerator
Issues: Report bugs or request features
PyPI: pypi.org/project/iki-data-generator

FAQ

Q: Can I use this data for production?

A: This is synthetic data—perfect for development, testing, and demos. For production, consider anonymizing real data or using this as a base.

Q: Can I extend it with custom fields?

A: Yes! Create a custom provider class in providers/{your_category}/ and register it in KEY_LABEL_REGISTRY.

Q: What's the difference between `blank_percentage` and `nullable`?

A: We use blank_percentage (0–100) to control how many records have empty values for a field.

Q: How do I query generated data?

A: Export to DuckDB, then query with SQL:

import duckdb
results = duckdb.sql("SELECT * FROM 'output/users.duckdb' WHERE age > 25").fetchall()

Q: Can I regenerate the exact same data?

A: Not yet. Each run generates different data. (Seed support is planned for future releases.)

Q: What if I need a field that doesn't exist?

A: Use the lambda provider for custom logic:

{
    "key_label": "lambda",
    "options": {
        "function": lambda: f"CUSTOM_{random.randint(1000, 9999)}"
    }
}

Roadmap

🔄 Seed support for reproducible datasets
🔗 Foreign key support for relational data
📊 Better performance for 100M+ records
🤖 AI-powered schema suggestions
🎨 GUI for schema builder
📈 Dataset profiling and statistics

Thanks

Built with ❤️ for developers who hate dummy data.

Happy generating! 🎲

Last updated: June 2026

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.8

Jun 19, 2026

0.1.7

Jun 18, 2026

0.1.6

Jun 18, 2026

0.1.5

Jun 18, 2026

0.1.4

Jun 18, 2026

0.1.3

Jun 18, 2026

0.1.2

Jun 18, 2026

0.1.1

Jun 18, 2026

0.1.0

Jun 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iki_data_generator-0.1.8.tar.gz (18.2 MB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iki_data_generator-0.1.8-py3-none-any.whl (18.6 MB view details)

Uploaded Jun 19, 2026 Python 3

File details

Details for the file iki_data_generator-0.1.8.tar.gz.

File metadata

Download URL: iki_data_generator-0.1.8.tar.gz
Upload date: Jun 19, 2026
Size: 18.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for iki_data_generator-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`83f7fd1b5c1438a28d589b0147401bf78d9fc75392a101262a33ed4a491dcaa0`
MD5	`7fa5aa841408fa137dbb805e27b1ce8e`
BLAKE2b-256	`d547c0e01ebd3b5ece0e1eaf7fe1efb41c96367414f0759fc5d0b3111ef7f7d6`

See more details on using hashes here.

File details

Details for the file iki_data_generator-0.1.8-py3-none-any.whl.

File metadata

Download URL: iki_data_generator-0.1.8-py3-none-any.whl
Upload date: Jun 19, 2026
Size: 18.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for iki_data_generator-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8d24fdde829c5c9668a40f850201124a5e9fe7da5c322d03fa610ad817ad980`
MD5	`7784a4788d24ccf4b84ed0c42cc14189`
BLAKE2b-256	`4ada7db9a8d3f1ab8d51eb2b8340292590a4bb78d0119c255b52bb56c0f3e91c`

See more details on using hashes here.

iki-data-generator 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎲 Iki Data Generator

What Is This?

Why Use Iki Data Generator?

✅ You Get

❌ You Don't Get

Installation

From PyPI (recommended)

From Source

Requirements

Quick Start (60 Seconds)

The Simplest Example

A More Realistic Example

Schema Definition

Schema Entry Types

1. Simple String (Shorthand)

2. Full Control (Dict)

Key Parameters

Available Options (Common)

22 Data Categories

🧑 Personal (Name, Gender, Passport, etc.)

💰 Finance (Credit Cards, Banking, Currency)

🛍️ Commerce (Products, Orders, Pricing)

📧 Communication (Email, Phone, Social)

🏗️ Construction (Building, Materials, Codes)

💻 Tech/IT (Programming, Frameworks, Version)

🏥 Healthcare (Diseases, Medications, Medical Codes)

🌍 Location (Countries, Cities, Addresses)

📚 Education (Schools, Courses, Subjects)

⚖️ Legal (Laws, Contracts, Jurisdictions)

🎬 Entertainment (Movies, Books, Games)

🌿 Nature (Plants, Animals, Weather)

🚗 Automotive (Cars, VINs, Fuel)

💱 Cryptocurrency (Coins, Blockchain, Wallets)

🎮 Gaming (Characters, Items, Guilds)

🎵 Music (Artists, Albums, Genres)

📱 Marketing/Media (Campaigns, Analytics, Content)

🌐 Political (Countries, Parties, Elections)

📝 Advanced (Templates, Regex, Lambdas)

🤖 AI/ML (Models, Metrics, Training)

✨ Basic (Utilities, Random, Generators)

🎲 Miscellaneous (Random & Fun)

Export Formats

API Reference

IkiDataGenerator(schema)

.many(n)

.export(name, formats=None)

KEY_LABEL_REGISTRY

ProviderFactory

Advanced Examples

Example 1: E-Commerce Dataset with Templates

Example 2: Healthcare Records

Example 3: Test Data with Blanks and Validation

Example 4: AI/ML Metrics Dataset

Example 5: Custom Schema with Explicit Groups

Configuration & Options

Blank Percentage

Templates

Regular Expressions

Custom List

Number Range

📚 Examples & Demonstrations

Quick Start Examples

Run Examples by Category

Personal & Identity (3 examples)

E-Commerce & Shopping (4 examples)

Finance & Banking (5 examples)

Healthcare & Medical (3 examples)

Location & Geography (2 examples)

Education (1 example)

Automotive (1 example)

Entertainment & Gaming (2 examples)

`IkiDataGenerator(schema)`

`.many(n)`

`.export(name, formats=None)`

`KEY_LABEL_REGISTRY`

`ProviderFactory`

Q: What's the difference between `blank_percentage` and `nullable`?