Krita (कृत): Create synthetic datasets using LLMs from schemas
Project description
Krita
Generate synthetic datasets using LLMs from schemas. Upload to Hugging Face.
Quick Start
pip install krita
krita generate schema.yaml --output dataset.json
from krita import SyntheticDataGenerator, DataSchema, FieldType
schema = DataSchema(
name="reviews",
num_samples=100,
fields=[
{"name": "product", "type": FieldType.TITLE, "required": True},
{"name": "rating", "type": FieldType.NUMBER, "constraints": {"min": 1, "max": 5}},
{"name": "review", "type": FieldType.REVIEW, "required": True}
]
)
generator = SyntheticDataGenerator(llm_provider="openai")
data = generator.generate(schema)
Features
- Schema-driven: Define data structure with types, constraints, examples
- Multiple LLMs: OpenAI, Anthropic, custom OpenAI-compatible endpoints
- Custom endpoints: Ollama, vLLM, enterprise deployments
- Validation: Ensures data matches schema
- Hugging Face: Direct upload with metadata
- Multiple formats: JSON, CSV, Parquet output
Custom Endpoints
Use any OpenAI-compatible API:
generator = SyntheticDataGenerator(
llm_provider="openai",
base_url="https://your-api.com/v1", # Your endpoint
llm_model="your-model",
api_key="your-key"
)
Examples:
- Ollama:
base_url="http://localhost:11434/v1" - vLLM:
base_url="https://your-vllm.com/v1" - Enterprise:
base_url="https://internal-ai.company.com/v1"
Schema Format
name: "user_profiles"
description: "User profile data"
num_samples: 500
fields:
- name: "name"
type: "name"
required: true
- name: "email"
type: "email"
required: true
- name: "age"
type: "number"
constraints: {min: 18, max: 80}
Field Types
Built-in: text, name, email, phone, address, date, number, boolean, uuid, category, url, json, title, description, review
Custom: Define domain-specific types:
fields:
- name: "diagnosis"
type: "icd_code" # Custom type
custom_type_definition: "ICD-10 diagnosis with code and description"
examples: ["E11.9 - Type 2 diabetes mellitus"]
CLI
krita init-schema schema.yaml # Create template
krita generate schema.yaml # Generate data
krita upload data.json user/dataset # Upload to HF
Configuration
export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export HF_TOKEN="your-token"
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file krita-0.1.5.tar.gz.
File metadata
- Download URL: krita-0.1.5.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e619a522d37cdbf87a5425d6c6974119f66e2cffb0525a66e68e9018118adf29
|
|
| MD5 |
27a5a19c8d3a2e54c298756a3222d047
|
|
| BLAKE2b-256 |
e88d32bf7aa240e88c755b6f1225bae7964db4968bc649314d10e3d68414057e
|
File details
Details for the file krita-0.1.5-py3-none-any.whl.
File metadata
- Download URL: krita-0.1.5-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85398611e4eb8667156587a198e6d7cff6feccb5d294c8305c2a3e8a2faa94ad
|
|
| MD5 |
2fa8d11b0108184f2797764f52f258d1
|
|
| BLAKE2b-256 |
c7e9b5ec658c7fa1ac75bd4a19405193adc799ee5a041d59b9871403c5e8d9f8
|