Skip to main content

Krita (कृत): Create synthetic datasets using LLMs from schemas

Project description

Krita (कृत)

Generate synthetic datasets using LLMs from schemas. Upload to Hugging Face.

Quick Start

pip install krita
krita generate schema.yaml --output dataset.json
from krita import SyntheticDataGenerator, DataSchema, FieldType

schema = DataSchema(
    name="reviews",
    num_samples=100,
    fields=[
        {"name": "product", "type": FieldType.TITLE, "required": True},
        {"name": "rating", "type": FieldType.NUMBER, "constraints": {"min": 1, "max": 5}},
        {"name": "review", "type": FieldType.REVIEW, "required": True}
    ]
)

generator = SyntheticDataGenerator(llm_provider="openai")
data = generator.generate(schema)

Features

  • Schema-driven: Define data structure with types, constraints, examples
  • Multiple LLMs: OpenAI, Anthropic, custom OpenAI-compatible endpoints
  • Custom endpoints: Ollama, vLLM, enterprise deployments
  • Validation: Ensures data matches schema
  • Hugging Face: Direct upload with metadata
  • Multiple formats: JSON, CSV, Parquet output

Custom Endpoints

Use any OpenAI-compatible API:

generator = SyntheticDataGenerator(
    llm_provider="openai",
    base_url="https://your-api.com/v1",  # Your endpoint
    llm_model="your-model",
    api_key="your-key"
)

Examples:

  • Ollama: base_url="http://localhost:11434/v1"
  • vLLM: base_url="https://your-vllm.com/v1"
  • Enterprise: base_url="https://internal-ai.company.com/v1"

Schema Format

name: "user_profiles"
description: "User profile data"
num_samples: 500
fields:
  - name: "name"
    type: "name"
    required: true
  - name: "email"
    type: "email"
    required: true
  - name: "age"
    type: "number"
    constraints: {min: 18, max: 80}

Field Types

Built-in: text, name, email, phone, address, date, number, boolean, uuid, category, url, json, title, description, review

Custom: Define domain-specific types:

fields:
  - name: "diagnosis"
    type: "icd_code"  # Custom type
    custom_type_definition: "ICD-10 diagnosis with code and description"
    examples: ["E11.9 - Type 2 diabetes mellitus"]

CLI

krita init-schema schema.yaml        # Create template
krita generate schema.yaml           # Generate data
krita upload data.json user/dataset  # Upload to HF

Configuration

export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export HF_TOKEN="your-token"

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krita-0.1.4.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krita-0.1.4-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file krita-0.1.4.tar.gz.

File metadata

  • Download URL: krita-0.1.4.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for krita-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ffe7b6b9ae459f3d6bd53f127993136f4136d816180add05e219cdacd2ba5f56
MD5 6cea7f4c131faaab5cf02b0fbb06c79f
BLAKE2b-256 5dde1c52169be03ed94daf8cf17d09fa2689b15553d2c7109f4a5b2496a2b5b9

See more details on using hashes here.

File details

Details for the file krita-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: krita-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for krita-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7e0af28812a23e8503006bd4ae967d5b9a73f6054b27cdac54e68a0947232725
MD5 4096975367acb5a886a4d98ea0b695fe
BLAKE2b-256 23fbec9af67f377df56ab886e4b43239e40f4abe4284620262b5e38492baf8e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page