A synthetic data generation library with 700+ built-in data fields across 22 categories.
Project description
๐ฒ Iki Data Generator
Generate realistic, diverse synthetic data with 700+ built-in fields across 22 categories. Perfect for testing, development, and prototyping โ without the legal baggage of real data.
What Is This?
Iki Data Generator is a Python library that creates synthetic datasets on demand. Instead of wrestling with dummy data or copy-pasting fake records, you define a schema (which fields you want), call .many(n) to generate n records, and export them to CSV, JSON, SQL, Excel, Parquet, or 10+ other formats. That's it.
It's built for developers who need:
- Test data for unit/integration tests
- Demo data for presentations or prototypes
- Mock databases for local development
- Privacy-friendly datasets with realistic properties but zero personal info
- Performance testing with large datasets
Why Use Iki Data Generator?
โ You Get
| Benefit | What It Means |
|---|---|
| 700+ Fields | First name, email, credit card, medical codes, stock prices, cryptocurrencies, ML metrics, etc. |
| 22 Categories | Personal, Finance, Commerce, Healthcare, Location, Education, Legal, AI/ML, and more |
| Easy Schema | Simple string shortcuts or full control with dicts |
| Flexible Export | CSV, JSON, SQL, Parquet, DuckDB, Excel, XML, TSV, Firebase, more |
| Zero Dependencies on Real Data | No need to anonymize or worry about PII |
| Blazing Fast | Generates thousands of records instantly |
| Extensible | Add custom providers for domain-specific fields |
โ You Don't Get
- No real person's data
- No need for data anonymization lawyers
- No internet calls to fake APIs
- No massive CSV files to download and commit
Installation
From PyPI (recommended)
pip install iki-data-generator
From Source
git clone https://github.com/ikidevz/IkiDataGenerator.git
cd Iki-Data-Generator
pip install -e .
Requirements
- Python โฅ 3.10
- Dependencies: duckdb, pandas, pyarrow, numpy, openpyxl, bcrypt, and a few others (installed automatically)
Quick Start (60 Seconds)
The Simplest Example
from ikidatagen import IkiDataGenerator
# Define what fields you want
schema = ["first_name", "last_name", "email_address", "gender_binary"]
# Generate 100 records
data = IkiDataGenerator(schema).many(100).export("users")
Result: You now have output/users.csv and output/users.json with 100 realistic user records.
A More Realistic Example
from ikidatagen import IkiDataGenerator
schema = [
{
"label": "User ID",
"key_label": "row_number",
"options": {"blank_percentage": 0} # No blanks for ID
},
"first_name",
"last_name",
"email_address",
{
"label": "Account Created",
"key_label": "current_timestamp",
"options": {"blank_percentage": 5} # 5% will be blank
},
{
"label": "IP Address",
"key_label": "ip_address_v4",
"options": {"blank_percentage": 25} # 25% will be blank
},
{
"label": "Full Profile",
"key_label": "template",
"options": {
"template": "{{first_name}} {{last_name}} ({{email_address}})"
}
},
]
# Generate 500 records and save to both CSV and JSON
IkiDataGenerator(schema).many(500).export("users", formats=["csv", "json"])
Result: output/users.csv and output/users.json with 500 complete user records, ready to use.
Schema Definition
The schema is the heart of Iki Data Generator. It tells the library what fields to generate.
Schema Entry Types
1. Simple String (Shorthand)
schema = ["first_name", "last_name", "email_address"]
# Generates fields with default settings, no options
2. Full Control (Dict)
schema = [
{
"key_label": "email_address", # Required: which provider to use
"label": "Email", # Optional: output column name (defaults to key_label)
"group": "personal", # Optional: provider category (auto-resolved if omitted)
"options": {"blank_percentage": 10} # Optional: provider-specific config
}
]
Key Parameters
| Parameter | Required? | Description |
|---|---|---|
| key_label | โ Yes | The provider name (e.g., first_name, credit_card_number) |
| label | โ No | How to name the output column (defaults to key_label) |
| group | โ No | Provider category (auto-resolved from registry; override if needed) |
| options | โ No | Provider-specific settings (e.g., blank_percentage, template) |
Available Options (Common)
| Option | Type | Example | Effect |
|---|---|---|---|
| blank_percentage | int/float | 10 |
Percentage of records where field is empty |
| template | str | "{{first_name}} {{last_name}}" |
Combine fields with template syntax |
| pattern | str | "[A-Z]{3}-\d{4}" |
Regex pattern (for regular_expression provider) |
| min, max | int/float | min=0, max=100 |
Range for numeric fields |
22 Data Categories
Iki Data Generator organizes 700+ fields into 22 categories. Here's a quick overview:
๐ง Personal (Name, Gender, Passport, etc.)
first_name, last_name, middle_name, gender_binary, gender_spectrum, title, passport_number, ssn, nationality, etc.
๐ฐ Finance (Credit Cards, Banking, Currency)
credit_card_number, credit_card_type, iban, bban, currency, currency_code, money, salary_range, stock_market, etc.
๐๏ธ Commerce (Products, Orders, Pricing)
product_name, product_category, product_price, barcode_ean13, order_status, payment_method, invoice_number, delivery_status, coupon_code, etc.
๐ง Communication (Email, Phone, Social)
email_address, phone_number, username, social_media_handle, chat_message, contact_name, etc.
๐๏ธ Construction (Building, Materials, Codes)
construction_code, building_type, material_type, foundation_type, roof_type, door_type, etc.
๐ป Tech/IT (Programming, Frameworks, Version)
programming_language, software_framework, version_number, log_level, http_status_code, file_extension, mime_type, etc.
๐ฅ Healthcare (Diseases, Medications, Medical Codes)
disease_name, symptom_name, medication_name, blood_type, vaccination_status, ICD10_diagnosis, ICD9_diagnosis, HCPCS_code, etc.
๐ Location (Countries, Cities, Addresses)
country, state, city, street_address, postal_code, latitude, longitude, timezone, airport_code, etc.
๐ Education (Schools, Courses, Subjects)
university_name, degree, major, course_name, subject, educational_attainment, etc.
โ๏ธ Legal (Laws, Contracts, Jurisdictions)
legal_entity_type, contract_type, jurisdiction, court_type, legal_case_status, legal_term, etc.
๐ฌ Entertainment (Movies, Books, Games)
movie_title, movie_genre, book_title, author_name, video_game_title, music_genre, song_title, etc.
๐ฟ Nature (Plants, Animals, Weather)
plant_name, animal_name, tree_name, flower_name, weather_condition, season, biome, etc.
๐ Automotive (Cars, VINs, Fuel)
car_make, car_model, car_vin, vehicle_type, license_plate, engine_type, fuel_type, transmission_type, etc.
๐ฑ Cryptocurrency (Coins, Blockchain, Wallets)
crypto_currency, crypto_address, crypto_transaction_id, blockchain_type, smart_contract_language, etc.
๐ฎ Gaming (Characters, Items, Guilds)
character_class, character_race, game_genre, npc_name, item_type, quest_name, guild_name, etc.
๐ต Music (Artists, Albums, Genres)
artist_name, album_name, song_title, music_genre, instrument_name, music_production_software, etc.
๐ฑ Marketing/Media (Campaigns, Analytics, Content)
campaign_name, social_media_platform, marketing_channel, content_type, target_audience, analytics_metric, etc.
๐ Political (Countries, Parties, Elections)
political_party, political_ideology, election_type, government_structure, diplomatic_title, etc.
๐ Advanced (Templates, Regex, Lambdas)
template (combine fields), regular_expression (match patterns), lambda (custom Python), json_array, url, digit_sequence, character_sequence, etc.
๐ค AI/ML (Models, Metrics, Training)
model_type, model_framework, model_task, model_version, model_latency, model_confidence, cpu_utilization, gpu_utilization, data_drift_score, inference_result, etc.
โจ Basic (Utilities, Random, Generators)
row_number, blank, boolean, number, datetime, color, emoji, password, password_hash, isbn, ulid, sentiment, words, sentences, paragraphs, etc.
๐ฒ Miscellaneous (Random & Fun)
dice_roll, coin_flip, rating, frequency, priority_level, dimension, duration, height, weight, temperature, etc.
Export Formats
Generate data in your preferred format:
# Export to multiple formats at once
IkiDataGenerator(schema).many(1000).export("dataset", formats=[
"csv", # Comma-separated values
"json", # JSON array of objects
"sql", # SQL INSERT statements
"parquet", # Apache Parquet (columnar)
"excel", # Excel workbook (.xlsx)
"duckdb", # DuckDB database
# "tsv", "xml", "cql", "firebase", "dbunit" also supported
])
| Format | File Extension | Best For | Notes |
|---|---|---|---|
| CSV | .csv |
Spreadsheets, import tools | Universal format |
| JSON | .json |
APIs, JavaScript, NoSQL | Pretty-printed with indent=2 |
| SQL | .sql |
Databases | INSERT statements (specify table name) |
| TSV | .tsv |
Tab-delimited data | Alternative to CSV |
| Excel | .xlsx |
Business reports | Native Excel format |
| Parquet | .parquet |
Big Data, Pandas, BI tools | Efficient columnar storage |
| DuckDB | .duckdb |
Analytics, SQL queries | Embedded database |
| XML | .xml |
Legacy systems, config | Structured XML export |
| Firestore | .json |
Firebase/Firestore | Firebase-ready format |
| DBUnit | .xml |
Testing frameworks | DBUnit test data format |
| CQL | .cql |
Cassandra databases | CQL INSERT statements |
API Reference
IkiDataGenerator(schema)
Initialize the generator with a schema.
Parameters:
schema(list): List of field names (strings) or field configs (dicts)
Returns: IkiDataGenerator instance
gen = IkiDataGenerator(["first_name", "email_address"])
.many(n)
Generate n records.
Parameters:
n(int): Number of records to generate
Returns: BaseGenerator instance
records = gen.many(100)
.export(name, formats=None)
Export records to file(s).
Parameters:
name(str): Output filename (without extension)formats(list, optional): File formats to export. Defaults to["csv", "json"]
Returns: None (files saved to output/ folder)
gen.many(100).export("users", formats=["csv", "json", "sql"])
# Creates: output/users.csv, output/users.json, output/users.sql
KEY_LABEL_REGISTRY
Global dictionary mapping all 700+ field names to their categories.
from ikidatagen import KEY_LABEL_REGISTRY
print(KEY_LABEL_REGISTRY["email_address"]) # โ "personal"
print(KEY_LABEL_REGISTRY["credit_card_number"]) # โ "commerce"
ProviderFactory
Advanced: dynamically load providers.
from ikidatagen import ProviderFactory
provider = ProviderFactory.create("email_address")
email = provider.generate()
Advanced Examples
Example 1: E-Commerce Dataset with Templates
from ikidatagen import IkiDataGenerator
schema = [
{"key_label": "row_number", "label": "Order ID"},
{"key_label": "current_timestamp", "label": "Created At"},
{"key_label": "customer_name", "label": "Customer"},
{"key_label": "email_address", "label": "Email"},
{"key_label": "product_name", "label": "Product"},
{"key_label": "product_price", "label": "Price"},
{
"key_label": "template",
"label": "Description",
"options": {
"template": "Order for {{product_name}} by {{customer_name}} ({{email_address}})"
}
},
{"key_label": "order_status", "label": "Status"},
{
"key_label": "ip_address_v4",
"label": "IP",
"options": {"blank_percentage": 20} # 20% missing
}
]
IkiDataGenerator(schema).many(1000).export("orders", formats=["csv", "json"])
Example 2: Healthcare Records
from ikidatagen import IkiDataGenerator
schema = [
"row_number",
"first_name",
"last_name",
"date_of_birth",
"blood_type",
"disease_name",
"medication_name",
"icd10_diagnosis",
{
"key_label": "current_timestamp",
"label": "last_visit",
"options": {"blank_percentage": 10}
},
]
IkiDataGenerator(schema).many(500).export("patients", formats=["csv", "json", "sql"])
Example 3: Test Data with Blanks and Validation
from ikidatagen import IkiDataGenerator
schema = [
{"key_label": "username", "options": {"blank_percentage": 0}}, # No blanks
{"key_label": "email_address", "options": {"blank_percentage": 0}}, # No blanks
{
"key_label": "phone_number",
"options": {"blank_percentage": 30} # 30% missing phones
},
{
"key_label": "address_line_1",
"options": {"blank_percentage": 5}
},
]
data = IkiDataGenerator(schema).many(10000).export("test_users", formats=["json"])
Example 4: AI/ML Metrics Dataset
from ikidatagen import IkiDataGenerator
schema = [
"row_number",
"model_type",
"model_framework",
"model_task",
"model_version",
"model_latency",
"model_confidence",
"cpu_utilization",
"gpu_utilization",
"memory_footprint",
"inference_result",
"inference_endpoint",
"current_timestamp",
]
IkiDataGenerator(schema).many(5000).export("ml_metrics", formats=["parquet", "json"])
Example 5: Custom Schema with Explicit Groups
from ikidatagen import IkiDataGenerator
schema = [
{"key_label": "username", "group": "personal"},
{"key_label": "email_address", "group": "personal"},
{"key_label": "product_name", "group": "commerce"},
{"key_label": "currency", "group": "commerce"},
{
"key_label": "regular_expression",
"group": "advanced",
"label": "Custom Pattern",
"options": {"pattern": "[A-Z]{2}[0-9]{4}"}
}
]
IkiDataGenerator(schema).many(100).export("mixed_data")
Configuration & Options
Blank Percentage
Control how many records have empty values for a field:
{
"key_label": "phone_number",
"options": {"blank_percentage": 25} # 25% of records will have empty phone
}
Templates
Combine fields with {{field_name}} syntax:
{
"key_label": "template",
"options": {
"template": "Full Name: {{first_name}} {{last_name}}, Email: {{email_address}}"
}
}
Regular Expressions
Generate data matching a pattern:
{
"key_label": "regular_expression",
"options": {
"pattern": "[A-Z]{3}-[0-9]{5}" # Generates: ABC-12345
}
}
Custom List
Pick from a list of values:
{
"key_label": "custom_list",
"options": {
"values": ["Active", "Inactive", "Pending"]
}
}
Number Range
Generate numbers within a range:
{
"key_label": "number",
"options": {
"min": 0,
"max": 100
}
}
๐ Examples & Demonstrations
The examples/ folder contains 45+ ready-to-run scripts demonstrating all features and providers across 22 categories.
Quick Start Examples
# Run the absolute simplest example
python examples/00_quick_start.py
# Explore basic fields
python examples/01_basic_fields.py
# Test all export formats
python examples/02_export_formats.py
Run Examples by Category
Personal & Identity (3 examples)
python examples/10_personal_data.py # Names, gender, dates
python examples/11_contact_info.py # Email, phone, social
python examples/12_identity_documents.py # Passports, SSN, IDs
E-Commerce & Shopping (4 examples)
python examples/20_ecommerce_shop.py # Products with pricing
python examples/21_shopping_cart.py # Complete orders
python examples/22_inventory_management.py # Stock & inventory
python examples/23_payment_processing.py # Payments & invoices
Finance & Banking (5 examples)
python examples/30_bank_accounts.py # Bank accounts
python examples/31_credit_cards.py # Credit card data
python examples/32_transactions.py # Transfers & withdrawals
python examples/33_investment_portfolio.py # Stocks & investments
python examples/34_crypto_blockchain.py # Cryptocurrency wallets
Healthcare & Medical (3 examples)
python examples/40_patient_records.py # Patient demographics
python examples/41_medical_diagnosis.py # Diagnoses & ICD codes
python examples/42_medications.py # Prescriptions & dosages
Location & Geography (2 examples)
python examples/50_addresses.py # Addresses & coordinates
python examples/51_international_locations.py # Countries & cities
Education (1 example)
python examples/60_student_records.py # Students & enrollment
Automotive (1 example)
python examples/70_car_inventory.py # Cars, models, pricing
Entertainment & Gaming (2 examples)
python examples/90_gaming_players.py # Gaming characters & guilds
python examples/92_entertainment.py # Movies, books, music
Tech & Programming (2 examples)
python examples/100_programming_data.py # Languages & frameworks
python examples/110_ml_models.py # ML model metadata
python examples/111_ml_metrics.py # Model performance metrics
Advanced Features (3 examples)
python examples/200_templates.py # Combining fields with {{field}}
python examples/201_regex_patterns.py # Custom regex patterns
python examples/203_blank_percentages.py # Missing data simulation
Real-World Scenarios (7 complete systems)
python examples/300_saas_users.py # SaaS with subscriptions
python examples/301_social_network.py # Social media platform
python examples/302_analytics_events.py # Event tracking (5000 events)
python examples/303_ecommerce_platform.py # Complete e-commerce
python examples/304_travel_booking_system.py # Flights, hotels, bookings
python examples/308_hospital_system.py # Hospital management
python examples/309_school_system.py # University system
Batch & Large Datasets (3 examples)
python examples/400_mixed_categories.py # Multiple categories mixed
python examples/401_batch_processing.py # Batch generate 4 datasets
python examples/403_large_dataset.py # Generate 1M+ records
Specialized Use Cases (4 examples)
python examples/500_test_data_unit_tests.py # Unit test fixtures
python examples/501_load_testing_data.py # Load testing (100K events)
python examples/502_demo_data.py # Demo/presentation data
python examples/504_api_response_mocking.py # Mock API responses
Complete Feature Showcase
python examples/999_showcase_all_features.py # All 12 features in one!
Run All Examples at Once
To generate data from all 45+ examples in one command:
# Generate everything
for file in examples/[0-9]*.py; do
echo "Running $file..."
python "$file"
done
Or on Windows (PowerShell):
Get-ChildItem examples\*.py -Filter "[0-9]*" | ForEach-Object {
Write-Host "Running $($_.Name)..."
python $_.FullName
}
View Generated Output
All examples save data to the output/ folder:
output/
โโโ quick_start.csv
โโโ quick_start.json
โโโ personal_data.csv
โโโ ecommerce_products.parquet
โโโ medical_diagnosis.json
โโโ ml_metrics.parquet
โโโ large_dataset.parquet
โโโ ... (40+ more files)
Learning Path
Beginner: Start with simple examples and work up
00_quick_start โ 01_basic_fields โ 02_export_formats โ 10_personal_data โ 20_ecommerce_shop
Intermediate: Explore categories and features
30_bank_accounts โ 40_patient_records โ 50_addresses โ 200_templates โ 201_regex_patterns
Advanced: Complex real-world systems and large datasets
300_saas_users โ 303_ecommerce_platform โ 308_hospital_system โ 400_mixed_categories โ 403_large_dataset
Examples Summary
| Category | Examples | Records | Topics |
|---|---|---|---|
| Getting Started | 3 | 50-100 | Basics, fields, formats |
| Personal | 3 | 50-300 | Names, IDs, documents |
| Commerce | 4 | 300-2000 | Products, orders, inventory |
| Finance | 5 | 200-1000 | Banking, cards, stocks, crypto |
| Healthcare | 3 | 300-500 | Patients, diagnoses, meds |
| Location | 2 | 500 | Addresses, coordinates, countries |
| Education | 1 | 400 | Students, courses, degrees |
| Automotive | 1 | 600 | Cars, models, registration |
| Entertainment | 2 | 500-2000 | Gaming, movies, books, music |
| Tech | 2 | 200-5000 | Languages, frameworks, ML |
| Advanced | 3 | 50-500 | Templates, regex, blanks |
| Real-World | 7 | 300-5000 | Complete systems |
| Batch/Large | 3 | 100K-1M | Performance, scale |
| Specialized | 4 | 50-100K | Testing, mocking, load tests |
| Total | 45+ | 50 to 1M+ | All features |
Modify Examples for Your Needs
All examples are templatesโfeel free to copy and modify:
# Copy an example as a starting point
cp examples/20_ecommerce_shop.py my_custom_dataset.py
# Edit and run your custom version
python my_custom_dataset.py
Example Structure
Every example follows this simple pattern:
from ikidatagen import IkiDataGenerator
# 1. Define schema
schema = [
"first_name",
"last_name",
"email_address",
# ... more fields
]
# 2. Generate data
IkiDataGenerator(schema).many(100).export("my_data", formats=["csv", "json"])
# 3. Check output/ folder
Project Structure
Iki-Data-Generator/
โโโ examples/ # 45+ example scripts
โ โโโ README.md # Examples guide
โ โโโ 00_quick_start.py # Simplest example
โ โโโ 01_basic_fields.py
โ โโโ 20_ecommerce_shop.py
โ โโโ 300_saas_users.py
โ โโโ 403_large_dataset.py
โ โโโ 999_showcase_all_features.py # All features!
โโโ src/ikidatagen/ # Main package
โ โโโ __init__.py # Public API
โ โโโ core.py # Main IkiDataGenerator class
โ โโโ base_generator.py # Data generation logic
โ โโโ exporters.py # Export to CSV, JSON, SQL, etc.
โ โโโ provider_factory.py # Dynamic provider loading
โ โโโ schema_registry.py # Maps field names to categories
โ โโโ payload.py # Data payload handling
โ โโโ dataset_manager.py # Dataset management
โ โโโ external_datasets/ # External data files
โ โ โโโ csv/ # 30+ CSV files (countries, airlines, etc.)
โ โ โโโ json/ # 25+ JSON files (advanced data)
โ โโโ providers/ # Data providers (700+ fields)
โ โโโ advanced/ # Template, Regex, Lambda, etc.
โ โโโ ai/ # ML/AI metrics
โ โโโ basic/ # Names, dates, colors, etc.
โ โโโ car/ # Vehicle data
โ โโโ commerce/ # Products, orders, payments
โ โโโ communication/ # Email, phone, social
โ โโโ construction/ # Building codes, materials
โ โโโ crypto/ # Cryptocurrency data
โ โโโ education/ # Schools, degrees, subjects
โ โโโ finance/ # Credit cards, banking
โ โโโ gaming/ # Characters, items, guilds
โ โโโ health/ # Medical codes, symptoms
โ โโโ it/ # Programming, frameworks
โ โโโ legal/ # Laws, contracts
โ โโโ location/ # Countries, cities, addresses
โ โโโ marketing/ # Campaigns, channels
โ โโโ misc/ # Miscellaneous data
โ โโโ music/ # Artists, albums, genres
โ โโโ nature/ # Plants, animals, weather
โ โโโ personal/ # Names, gender, documents
โ โโโ political/ # Parties, elections
โ โโโ products/ # Product categories
โ โโโ sports/ # Athletes, teams, leagues
โ โโโ travel/ # Airlines, hotels, destinations
โโโ output/ # Generated data (CSV, JSON, etc.)
โโโ main.py # Example usage
โโโ pyproject.toml # Package metadata
โโโ requirements.txt # Dependencies
โโโ README.md # This file
How It Works (Behind the Scenes)
- Schema Parsing: You provide a list of fields (strings or dicts)
- Provider Resolution: Each field name is looked up in
KEY_LABEL_REGISTRYto find its category - Dynamic Loading: The appropriate provider class is loaded from
providers/{category}/{field}.py - Generation: Each provider generates realistic data for n records
- Template Processing: Template fields combine other fields using
{{field}}syntax - Blank Handling: Records marked for blanks are cleared based on
blank_percentage - Export: Data is serialized to your chosen format(s) and saved to
output/
Common Issues & Solutions
โ "Unknown key_label 'xxx'"
Problem: You used a field name that doesn't exist.
Solution: Check KEY_LABEL_REGISTRY or review the 22 categories above. Did you spell it correctly? (Use underscores, lowercase.)
# โ Wrong
schema = ["firstName"] # camelCase? No!
# โ
Correct
schema = ["first_name"] # snake_case? Yes!
โ "No data to export"
Problem: .many(0) or empty schema.
Solution: Generate at least 1 record.
# โ Wrong
IkiDataGenerator(schema).many(0).export("data")
# โ
Correct
IkiDataGenerator(schema).many(100).export("data")
โ Export folder not found
Problem: output/ directory doesn't exist.
Solution: The library creates it automatically. Make sure you have write permissions.
โ Template field not rendering
Problem: {{field_name}} not being replaced.
Solution: Ensure the referenced field exists in your schema and the spelling matches exactly.
# โ Wrong
{
"key_label": "template",
"options": {"template": "Name: {{first_name}} {{FirstName}}"} # FirstName โ first_name
}
# โ
Correct
{
"key_label": "template",
"options": {"template": "Name: {{first_name}} {{last_name}}"}
}
Performance Tips
Generating Large Datasets
- Use Parquet or DuckDB formats for large datasets (smaller file sizes, faster I/O)
- DuckDB is perfect for immediate querying:
import duckdb; duckdb.sql("SELECT * FROM 'data.duckdb'") - For 1M+ records, generate in batches to manage memory
# โ
Generate in chunks
for i in range(10):
IkiDataGenerator(schema).many(100_000).export(f"chunk_{i}")
Field Selection
- Only include fields you need (reduces generation time)
- Skip fields with expensive generation (e.g.,
password_hash)
Export Efficiency
# โ
Smart exports
IkiDataGenerator(schema).many(1_000_000).export("big_data", formats=["parquet"])
# โ Avoid exporting to many formats at once
# IkiDataGenerator(schema).many(1_000_000).export("data", formats=["csv", "json", "sql", "excel"])
Contributing
Have ideas? Want to add new providers or categories? Open a PR!
- New Provider: Add a file to
src/ikidatagen/providers/{category}/{field_name}.py - New Category: Create a folder in
providers/and add your providers - Update Registry: Edit
schema_registry.pyto register new fields - Tests: Add tests for new providers
License
MIT License โ use it freely in personal and commercial projects.
Links & Resources
- GitHub: github.com/ikidevz/IkiDataGenerator
- Issues: Report bugs or request features
- PyPI: pypi.org/project/iki-data-generator
FAQ
Q: Can I use this data for production?
A: This is synthetic dataโperfect for development, testing, and demos. For production, consider anonymizing real data or using this as a base.
Q: Can I extend it with custom fields?
A: Yes! Create a custom provider class in providers/{your_category}/ and register it in KEY_LABEL_REGISTRY.
Q: What's the difference between blank_percentage and nullable?
A: We use blank_percentage (0โ100) to control how many records have empty values for a field.
Q: How do I query generated data?
A: Export to DuckDB, then query with SQL:
import duckdb
results = duckdb.sql("SELECT * FROM 'output/users.duckdb' WHERE age > 25").fetchall()
Q: Can I regenerate the exact same data?
A: Not yet. Each run generates different data. (Seed support is planned for future releases.)
Q: What if I need a field that doesn't exist?
A: Use the lambda provider for custom logic:
{
"key_label": "lambda",
"options": {
"function": lambda: f"CUSTOM_{random.randint(1000, 9999)}"
}
}
Roadmap
- ๐ Seed support for reproducible datasets
- ๐ Foreign key support for relational data
- ๐ Better performance for 100M+ records
- ๐ค AI-powered schema suggestions
- ๐จ GUI for schema builder
- ๐ Dataset profiling and statistics
Thanks
Built with โค๏ธ for developers who hate dummy data.
Happy generating! ๐ฒ
Last updated: June 2026
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iki_data_generator-0.1.8.tar.gz.
File metadata
- Download URL: iki_data_generator-0.1.8.tar.gz
- Upload date:
- Size: 18.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83f7fd1b5c1438a28d589b0147401bf78d9fc75392a101262a33ed4a491dcaa0
|
|
| MD5 |
7fa5aa841408fa137dbb805e27b1ce8e
|
|
| BLAKE2b-256 |
d547c0e01ebd3b5ece0e1eaf7fe1efb41c96367414f0759fc5d0b3111ef7f7d6
|
File details
Details for the file iki_data_generator-0.1.8-py3-none-any.whl.
File metadata
- Download URL: iki_data_generator-0.1.8-py3-none-any.whl
- Upload date:
- Size: 18.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8d24fdde829c5c9668a40f850201124a5e9fe7da5c322d03fa610ad817ad980
|
|
| MD5 |
7784a4788d24ccf4b84ed0c42cc14189
|
|
| BLAKE2b-256 |
4ada7db9a8d3f1ab8d51eb2b8340292590a4bb78d0119c255b52bb56c0f3e91c
|