Datacrafter — AI-based, schema-driven synthetic data generator with a plugin architecture.
Project description
Datacrafter
AI-powered, schema-driven synthetic data generation platform.
Design datasets using YAML or generate them using natural language, and produce realistic data in CSV / JSON / JSONL / XML / Parquet formats.
✨ Key Highlights
- Schema-driven generation using YAML
- AI-powered schema creation from natural language prompts
- Formula engine for dynamic, cross-field computations
- Deterministic output using seed control
- Multiple formats: CSV, JSON, JSONL, XML, Parquet
- Plugin architecture for extensibility
- CLI + Python API
🤖 AI Schema Generation
Generate dataset schemas directly from prompts and save to a file:
datacrafter ai --prompt "
Generate a banking transactions dataset with:
transaction_id (uuid),
account_number (integer between 100000000000 and 999999999999),
transaction_type (categorical: Debit, Credit, Withdrawal, Deposit, Transfer),
amount (float between 50 and 50000),
currency (categorical: USD, EUR, GBP, INR),
timestamp (datetime between 2023-01-01 and 2025-12-31 in '%Y-%m-%d %H:%M:%S'),
merchant_name (categorical: Amazon, Walmart, Starbucks, Uber, Apple Store, Shell Fuel Station, Best Buy, ATM Withdrawal, Bank Transfer).
Output format: xml.
Return ONLY valid Datacrafter YAML schema.
" --out examples/banking_transactions.yaml
✔ AI-generated schema will be saved to the specified output file.
🧮 Formula Engine
Create dynamic fields using expressions:
total_price:
type: formula
expr: "price * quantity"
Supports:
- Arithmetic operations
- Comparisons and boolean logic
- Ternary expressions
- String concatenation
- Cross-field access
📦 Installation
pip install datacrafter-ai
Requirements: Python 3.9+
🚀 Quickstart
1. Create Schema
version: 1
rows: 10
fields:
price:
type: float
quantity:
type: integer
total:
type: formula
expr: "price * quantity"
output:
format: csv
path: ./output/data.csv
2. Generate Data
datacrafter generate --schema schema.yaml
🧩 Built-in Capabilities
Providers
- uuid, id.incremental
- integer, float, boolean
- person., text., string.regex
- datetime, categorical, geo.country
- formula
Features
- Unique constraints
- Null handling
- Regex validation
- Distributions
- Templating & dependencies
- Cross-field computation
🖥️ CLI Commands
datacrafter generate --schema schema.yaml
datacrafter validate --schema schema.yaml
datacrafter list providers
datacrafter list writers
datacrafter init --template minimal
datacrafter ai --prompt "..." --out schema.yaml
🔌 Extensibility
Datacrafter supports plugins for:
- Custom providers
- Custom writers
No core modification required.
⚙️ AI Configuration
Datacrafter’s AI features support multiple LLM providers and require API credentials.
1. Create a .env file
Copy the example configuration:
cp .env.example .env
2. Configure your provider and model
Edit .env and choose one provider:
# Choose one provider: openrouter / openai / gemini / groq / deepseek
LLM_PROVIDER=openai
# Choose the model supported by the provider
LLM_MODEL=gpt-4
3. Add the corresponding API key
Provide ONLY the API key for your selected provider:
OPENAI_API_KEY=your_api_key_here
Examples for other providers:
OPENROUTER_API_KEY=your_key
GEMINI_API_KEY=your_key
GROQ_API_KEY=your_key
DEEPSEEK_API_KEY=your_key
4. Run AI schema generation
datacrafter ai --prompt "..." --out schema.yaml
⚠️ Important:
- AI features will NOT work without valid API credentials
- Only one provider needs to be configured
- Ensure the selected model is supported by the chosen provider
📦 Development
python -m build
twine check dist/*
twine upload dist/*
🔒 License
MIT © 2026 Mahalakshmi Shanmuga Sundaram
🏢 About
Datacrafter is developed and maintained by DHS Tech Services.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datacrafter_ai-1.0.1.tar.gz.
File metadata
- Download URL: datacrafter_ai-1.0.1.tar.gz
- Upload date:
- Size: 29.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e1ae04ea1e52c12aa824ddd3687531a22bfa9102c7d94311dbecfbc5c7ae64e
|
|
| MD5 |
2ef55dc3e56d0fcdb6af640e00a08003
|
|
| BLAKE2b-256 |
fd74beb33365085f7760d5154128c6359f74aec7cadb9bedbc0ebe4b708df964
|
File details
Details for the file datacrafter_ai-1.0.1-py3-none-any.whl.
File metadata
- Download URL: datacrafter_ai-1.0.1-py3-none-any.whl
- Upload date:
- Size: 39.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5a4f4fe2df6b50d9982f3fde386bedb37ba964a1f07adb18b6e560398a6f9da
|
|
| MD5 |
14937edb680ec4d2dd1b0ca1e2e7d291
|
|
| BLAKE2b-256 |
c304e7975f2c8bd9d7ec6a0c09057834997064f84e23474b848c7868b0ba5145
|