Krita (कृत): Create synthetic datasets using LLMs from schemas

These details have not been verified by PyPI

Project links

Project description

Krita (कृत)

Sanskrit: "made, created, formed" - the root of "Sanskrit" itself

Generate synthetic datasets using LLMs from schemas and upload to Hugging Face.

Quick Start

pip install krita

# Create a schema
krita init-schema schema.yaml

# Generate data
krita generate schema.yaml --output dataset.json

# Upload to Hugging Face
krita upload dataset.json username/my-dataset

Features

Schema-driven generation: Define your data structure with field types, constraints, and examples
Multiple LLM providers: OpenAI GPT, Anthropic Claude, and custom OpenAI-compatible endpoints
Custom endpoint support: Use any OpenAI-compatible API endpoint
Automatic validation: Ensures generated data matches your schema
Hugging Face integration: Direct upload to Hugging Face Hub with metadata
Multiple formats: JSON, JSONL, CSV, Parquet output
CLI and Python API: Use from command line or integrate into your code

Installation

pip install krita

Python API

Note: Install as krita, import as synthetica

from synthetica import SyntheticDataGenerator, DataSchema, FieldType

# Define schema
schema = DataSchema(
    name="customer_reviews",
    description="Product reviews dataset",
    num_samples=1000,
    fields=[
        {"name": "product", "type": FieldType.TITLE, "required": True},
        {"name": "rating", "type": FieldType.NUMBER, "constraints": {"min": 1, "max": 5}},
        {"name": "review", "type": FieldType.REVIEW, "required": True},
        {"name": "reviewer", "type": FieldType.NAME, "required": True}
    ]
)

# Generate data
generator = SyntheticDataGenerator(llm_provider="openai")
data = generator.generate(schema)

# Upload to Hugging Face
from synthetica import HuggingFaceUploader
uploader = HuggingFaceUploader()
uploader.upload_dataset(data, "username/customer-reviews")

Custom AI Endpoints

Use any OpenAI-compatible endpoint (Ollama, vLLM, custom deployments):

from synthetica.paypal_llm import CustomOpenAIProvider
from synthetica.generator import SyntheticDataGenerator

# Create custom provider
class CustomGenerator(SyntheticDataGenerator):
    def __init__(self, endpoint_url, model_name, **kwargs):
        self.llm = CustomOpenAIProvider(
            endpoint_url=endpoint_url,
            model=model_name,
            api_key=kwargs.get('api_key'),
            verify_ssl=kwargs.get('verify_ssl', True)
        )
        self.batch_size = kwargs.get('batch_size', 10)
        self.max_retries = kwargs.get('max_retries', 3)

# Use your custom endpoint
generator = CustomGenerator(
    endpoint_url="https://your-api.com/v1/chat/completions",
    model_name="your-model-name",
    verify_ssl=False  # For internal endpoints
)

data = generator.generate(schema)

Using Custom Types

from synthetica import SyntheticDataGenerator, DataSchema, FieldSchema, FieldType

# Define schema with custom types
schema = DataSchema(
    name="healthcare_records",
    description="Patient healthcare records",
    num_samples=50,
    fields=[
        FieldSchema(name="patient_id", type=FieldType.UUID, required=True),
        FieldSchema(name="name", type=FieldType.NAME, required=True),
        FieldSchema(
            name="diagnosis",
            type="icd_diagnosis",  # Custom type
            description="Primary diagnosis",
            custom_type_definition="ICD-10 diagnosis with code and description",
            examples=["E11.9 - Type 2 diabetes mellitus"],
            required=True
        ),
        FieldSchema(
            name="medication",
            type=FieldType.CUSTOM,  # Using CUSTOM enum
            description="Current medication",
            custom_type_definition="Medication name, dosage, and frequency",
            examples=["Metformin 500mg twice daily"],
            required=False
        )
    ]
)

# Generate data with custom types
generator = SyntheticDataGenerator(llm_provider="openai")
data = generator.generate(schema)

Schema Format

name: "user_profiles"
description: "User profile data"
num_samples: 500
context: "Generate diverse, realistic user profiles"
fields:
  - name: "id"
    type: "uuid"
    required: true
  - name: "name"
    type: "name"
    required: true
    examples: ["John Doe", "Jane Smith"]
  - name: "email"
    type: "email"
    required: true
  - name: "age"
    type: "number"
    constraints:
      min: 18
      max: 80
  - name: "bio"
    type: "description"
    required: false

Supported Field Types

Built-in Types

text, name, email, phone, address
date, number, boolean, uuid
category, url, json
title, description, review

Custom Types

Define your own field types for specialized domains:

fields:
  - name: "medical_diagnosis"
    type: "icd_diagnosis"  # Custom type name
    description: "Medical diagnosis"
    custom_type_definition: "ICD-10 diagnosis code with description (e.g., 'E11.9 - Type 2 diabetes')"
    examples:
      - "I10 - Essential hypertension"
      - "E78.5 - Hyperlipidemia"

  - name: "certification"
    type: "custom"  # Use 'custom' enum value
    description: "Professional certification"
    custom_type_definition: "Professional certification with issuing body and expiration date"
    examples:
      - "AWS Solutions Architect - Valid until 2025-12-31"

CLI Commands

# Initialize schema
krita init-schema schema.yaml

# Generate data
krita generate schema.yaml --provider openai --output data.json

# Upload to Hugging Face
krita upload data.json username/dataset-name --description "My dataset"

# List providers
krita list-providers

Configuration

Set environment variables:

export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"
export HF_TOKEN="your-token"

Custom Endpoint Examples

Ollama (local):

generator = CustomGenerator(
    endpoint_url="http://localhost:11434/v1/chat/completions",
    model_name="llama3.1"
)

vLLM deployment:

generator = CustomGenerator(
    endpoint_url="https://your-vllm-server.com/v1/chat/completions",
    model_name="meta-llama/Llama-3.1-8B-Instruct",
    api_key="your-api-key"
)

Internal enterprise endpoint:

generator = CustomGenerator(
    endpoint_url="https://internal-ai.company.com/v1/chat/completions",
    model_name="company-model-v1",
    verify_ssl=False
)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Sep 23, 2025

0.1.5

Sep 22, 2025

0.1.4

Sep 22, 2025

0.1.3

Sep 22, 2025

0.1.2

Sep 22, 2025

This version

0.1.1

Sep 22, 2025

0.1.0

Sep 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krita-0.1.1.tar.gz (16.6 kB view details)

Uploaded Sep 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

krita-0.1.1-py3-none-any.whl (15.9 kB view details)

Uploaded Sep 22, 2025 Python 3

File details

Details for the file krita-0.1.1.tar.gz.

File metadata

Download URL: krita-0.1.1.tar.gz
Upload date: Sep 22, 2025
Size: 16.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for krita-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d92590107b41227f9d9a981645baf6fbc71bfaf2d1bcf7ab621b927db1e8d59a`
MD5	`01928a6e599bf698ab3a7788f5537453`
BLAKE2b-256	`bdfbd28ee68bdeac87d5c61b2bf992eb2a2d39921f228ec0187f0e943ff3c68b`

See more details on using hashes here.

File details

Details for the file krita-0.1.1-py3-none-any.whl.

File metadata

Download URL: krita-0.1.1-py3-none-any.whl
Upload date: Sep 22, 2025
Size: 15.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for krita-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`521efa540bbfe260b896267af4259447c6b1faadd846036e8d38d9cbaeae95b7`
MD5	`668a754c15924ccff2b48aab0cbfef83`
BLAKE2b-256	`9331c77731a76219c65dcb58b512e6c5e2d274483571646a27db3c0129b0bcb1`

See more details on using hashes here.

krita 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Krita (कृत)

Quick Start

Features

Installation

Python API

Custom AI Endpoints

Using Custom Types

Schema Format

Supported Field Types

Built-in Types

Custom Types

CLI Commands

Configuration

Custom Endpoint Examples

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes